Skip to content

feat: force sync route after sandbox resume#52

Merged
furykerry merged 2 commits intoopenkruise:masterfrom
PersistentJZH:feat/force-sync-route-in-sandbox-manager-after-resuming-some-sandbox
Jan 12, 2026
Merged

feat: force sync route after sandbox resume#52
furykerry merged 2 commits intoopenkruise:masterfrom
PersistentJZH:feat/force-sync-route-in-sandbox-manager-after-resuming-some-sandbox

Conversation

@PersistentJZH
Copy link
Contributor

@PersistentJZH PersistentJZH commented Dec 31, 2025

Ⅰ. Describe what this PR does

Add force route synchronization in sandbox-manager after resuming sandbox to ensure all peer instances are aware of route changes (especially PodIP) without waiting for informer watch events.

Changes:

  • Add PauseSandbox and ResumeSandbox methods to SandboxManager
  • Replace direct sbx.Pause/Resume calls in e2b controller with SandboxManager methods
  • Add InplaceRefresh before route sync to ensure latest PodIP and State

Ⅱ. Does this pull request fix one issue?

#35

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

@AiRanthem
Copy link
Member

@PersistentJZH
I think it's not necessary to make it so complicated. Refer to the ClaimSandbox method of SandboxManager (api.go), add a set of Pause/Resume methods, roughly the logic:

func (m *SandboxManager) PauseSandbox(ctx context.Context, sbx infra.Sandbox) error {
    sbx.Pause(ctx)
    m.proxy.SyncRouteWithPeers(sbx.GetRoute())
}

And just replace the direct calls to sbx.Pause/Resume in the e2b controller.

@PersistentJZH PersistentJZH force-pushed the feat/force-sync-route-in-sandbox-manager-after-resuming-some-sandbox branch from a01bc96 to eef377c Compare January 8, 2026 10:14
@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 67.64706% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.89%. Comparing base (1390314) to head (682ea25).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
pkg/sandbox-manager/api.go 67.74% 5 Missing and 5 partials ⚠️
pkg/servers/e2b/pause_resume.go 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #52      +/-   ##
==========================================
+ Coverage   45.71%   45.89%   +0.17%     
==========================================
  Files          75       75              
  Lines        4379     4410      +31     
==========================================
+ Hits         2002     2024      +22     
- Misses       2188     2193       +5     
- Partials      189      193       +4     
Flag Coverage Δ
unittests 45.89% <67.64%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PersistentJZH PersistentJZH force-pushed the feat/force-sync-route-in-sandbox-manager-after-resuming-some-sandbox branch from eef377c to 006937a Compare January 8, 2026 10:41
@kruise-bot kruise-bot added size/L and removed size/M labels Jan 8, 2026
Signed-off-by: PersistentJZH <zhihao.kan17@gmail.com>

- Add PauseSandbox and ResumeSandbox methods to SandboxManager
- Replace direct sbx.Pause/Resume calls in e2b controller with SandboxManager methods
- Add InplaceRefresh before route sync to ensure latest PodIP and State
@PersistentJZH PersistentJZH force-pushed the feat/force-sync-route-in-sandbox-manager-after-resuming-some-sandbox branch from 006937a to eb14cb6 Compare January 8, 2026 10:47
@PersistentJZH
Copy link
Contributor Author

@PersistentJZH I think it's not necessary to make it so complicated. Refer to the ClaimSandbox method of SandboxManager (api.go), add a set of Pause/Resume methods, roughly the logic:

func (m *SandboxManager) PauseSandbox(ctx context.Context, sbx infra.Sandbox) error {
    sbx.Pause(ctx)
    m.proxy.SyncRouteWithPeers(sbx.GetRoute())
}

And just replace the direct calls to sbx.Pause/Resume in the e2b controller.

thanks @AiRanthem, pls help to review again

log.Error(err, "failed to refresh sandbox after resume, route sync may use stale IP")
// Continue to sync route even if refresh fails, as the route might still be valid
}
start := time.Now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchronous routing code after pause and resume looks somewhat repetitive. Could we consider extracting it into a common function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

…cessary deepcopy

Signed-off-by: PersistentJZH <zhihao.kan17@gmail.com>
Copy link
Member

@AiRanthem AiRanthem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
@furykerry PTAL

Copy link
Member

@furykerry furykerry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@kruise-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: furykerry

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@furykerry furykerry merged commit 479e3ef into openkruise:master Jan 12, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants