-
Notifications
You must be signed in to change notification settings - Fork 5
fix: http tunnel in cluster mode #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: http tunnel in cluster mode #46
Conversation
WalkthroughUpdates node-state propagation and vnode lease/update control flow, changes UpdatePod timeout behavior to start containers, adds an exported NodeState field to node status, simplifies a utility loop, adjusts some import/formatting and warning wording, and adds/updates tests to populate NodeState. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant VC as VNodeController
participant API as K8s API
participant Prov as Provider
rect rgb(235,245,255)
note over VC: onBaseStatusArrived (leader path)
VC->>API: Sync node status
VC->>Prov: UpdateNodeStateOnProviderArrived(data.NodeState)
end
rect rgb(245,235,255)
note over VC: createOrRetryUpdateLease
VC->>API: Get Lease
alt Lease NotFound
VC->>API: Create Lease
VC->>VC: set created=true
VC-->>VC: return early (skip update)
else Lease exists
alt Not leader
VC-->>VC: return early (no update)
else Leader
VC->>API: Update Lease
end
else Other error
VC-->>VC: log and retry
end
end
sequenceDiagram
autonumber
participant VP as VPodProvider
participant Timer as Eventually(timeout)
participant C as Containers
VP->>Timer: UpdatePod(...)
Timer-->>VP: timeout callback
note over VP: Changed: call shouldStartContainers on timeout
VP->>C: Start selected containers
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
provider/vpod_provider.go (1)
387-391: Bug: map stores address of loop variablecs(all entries point to the last item)Using
&csin a range loop yields the same address each iteration. Store pointers to the slice elements instead.Apply:
- nameToContainerStatus := make(map[string]*corev1.ContainerStatus) - for _, cs := range pod.Status.ContainerStatuses { - nameToContainerStatus[cs.Name] = &cs - } + nameToContainerStatus := make(map[string]*corev1.ContainerStatus) + for i := range pod.Status.ContainerStatuses { + cs := &pod.Status.ContainerStatuses[i] + nameToContainerStatus[cs.Name] = cs + }provider/vnode.go (1)
300-307: Nil guard for HolderIdentity to prevent panic
*vNode.lease.Spec.HolderIdentitymay be nil; dereferencing can panic.Apply:
- if *vNode.lease.Spec.HolderIdentity == "" { + if vNode.lease.Spec.HolderIdentity == nil || *vNode.lease.Spec.HolderIdentity == "" { log.L.Warnf("vnode %s does not have lease holder", vNode.name) return false }vnode_controller/vnode_controller.go (1)
561-606: Lease update flow: good early-returns, but SetLease uses stale object after Patch
- The created flag + early-return is fine.
- After a successful Patch, you call
vNode.SetLease(lease)(old object). This leavesRenewTimestale and can cause false “expired lease” checks.Apply:
- err = vNodeController.client.Patch(vnCtx, newLease, client.MergeFrom(lease)) + err = vNodeController.client.Patch(vnCtx, newLease, client.MergeFrom(lease)) if err == nil { log.G(vnCtx).WithField("retries", i).Debugf("Successfully updated lease for %s", vNode.GetNodeName()) - vNode.SetLease(lease) + vNode.SetLease(newLease) return }Also applies to: 611-615
🧹 Nitpick comments (1)
vnode_controller/vnode_controller.go (1)
383-386: Consider lowering log level for full pod diff
cmp.Diffcan be large; using Debug reduces noise in busy clusters.Apply:
- log.G(ctx).Infof("try to update pod %s/%s with diff %s in podUpdateHandler.", vNode.GetNodeName(), podKey, diff) + log.G(ctx).Debugf("try to update pod %s/%s with diff %s in podUpdateHandler.", vNode.GetNodeName(), podKey, diff)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
common/utils/utils.go(1 hunks)provider/vnode.go(1 hunks)provider/vpod_provider.go(2 hunks)vnode_controller/vnode_controller.go(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
provider/vnode.go (1)
provider/vnode_store.go (1)
r(121-131)
common/utils/utils.go (1)
common/utils/utils_test.go (1)
TestFillPodKey(270-347)
vnode_controller/vnode_controller.go (1)
vnode_controller/vnode_controller_test.go (1)
TestPodHandler_NoVnodeOrNotLeader(236-285)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-test
- GitHub Check: lint
🔇 Additional comments (3)
common/utils/utils.go (1)
389-399: Idiomatic range use — looks goodSwitching to single-value range is correct and clearer. No functional change.
provider/vpod_provider.go (1)
335-338: Fix timeout success action to start new containers — good changeStarting
shouldStartContainerson success matches the predicate logic and avoids inadvertently stopping again.vnode_controller/vnode_controller.go (1)
255-259: Update node state to Activated on status arrival — OKGating by leadership and marking provider state Activated is consistent with the flow.
0a7f3e0 to
4326fe1
Compare
4326fe1 to
45227cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
vnode_controller/vnode_controller.go (1)
610-614: Bug: storing the pre-patch lease object instead of the updated oneAfter a successful Patch, vNode.SetLease should use newLease, not the stale lease copy.
Apply this diff:
- log.G(vnCtx).WithField("retries", i).Debugf("Successfully updated lease for %s", vNode.GetNodeName()) - vNode.SetLease(lease) + log.G(vnCtx).WithField("retries", i).Debugf("Successfully updated lease for %s", vNode.GetNodeName()) + vNode.SetLease(newLease)suite/vpod_lifecycle_test.go (5)
92-100: Avoid indexing ContainerStatuses[0]; use PodReady condition to prevent flakiness/panics.ContainerStatuses may be temporarily empty. Favor PodReady.
Apply this diff:
- Eventually(func() bool { - podFromKubernetes := &v1.Pod{} - err := k8sClient.Get(ctx, types.NamespacedName{ - Namespace: basicPod.Namespace, - Name: basicPod.Name, - }, podFromKubernetes) - return err == nil && podFromKubernetes.Status.Phase == v1.PodRunning && podFromKubernetes.Status.ContainerStatuses[0].Ready == true - }, time.Second*20, time.Second).Should(BeTrue()) + Eventually(func() bool { + podFromKubernetes := &v1.Pod{} + if err := k8sClient.Get(ctx, types.NamespacedName{ + Namespace: basicPod.Namespace, + Name: basicPod.Name, + }, podFromKubernetes); err != nil { + return false + } + if podFromKubernetes.Status.Phase != v1.PodRunning { + return false + } + for _, c := range podFromKubernetes.Status.Conditions { + if c.Type == v1.PodReady { + return c.Status == v1.ConditionTrue + } + } + return false + }, time.Second*20, time.Second).Should(BeTrue())
113-121: Don’t assume Conditions[0]; search for PodReady.Indexing [0] is brittle and may be empty/ordered differently.
Apply this diff:
- Eventually(func() bool { - podFromKubernetes := &v1.Pod{} - err := k8sClient.Get(ctx, types.NamespacedName{ - Namespace: basicPod.Namespace, - Name: basicPod.Name, - }, podFromKubernetes) - return err == nil && podFromKubernetes.Status.Phase == v1.PodRunning && podFromKubernetes.Status.Conditions[0].Status == v1.ConditionFalse - }, time.Second*10, time.Second).Should(BeTrue()) + Eventually(func() bool { + podFromKubernetes := &v1.Pod{} + if err := k8sClient.Get(ctx, types.NamespacedName{ + Namespace: basicPod.Namespace, + Name: basicPod.Name, + }, podFromKubernetes); err != nil { + return false + } + for _, c := range podFromKubernetes.Status.Conditions { + if c.Type == v1.PodReady { + return c.Status == v1.ConditionFalse + } + } + return false + }, time.Second*10, time.Second).Should(BeTrue())
135-143: Same issue: avoid Conditions[0] assumption.Apply this diff:
- Eventually(func() bool { - podFromKubernetes := &v1.Pod{} - err := k8sClient.Get(ctx, types.NamespacedName{ - Namespace: basicPod.Namespace, - Name: basicPod.Name, - }, podFromKubernetes) - return err == nil && podFromKubernetes.Status.Phase == v1.PodRunning && podFromKubernetes.Status.Conditions[0].Status == v1.ConditionFalse - }, time.Second*10, time.Second).Should(BeTrue()) + Eventually(func() bool { + podFromKubernetes := &v1.Pod{} + if err := k8sClient.Get(ctx, types.NamespacedName{ + Namespace: basicPod.Namespace, + Name: basicPod.Name, + }, podFromKubernetes); err != nil { + return false + } + for _, c := range podFromKubernetes.Status.Conditions { + if c.Type == v1.PodReady { + return c.Status == v1.ConditionFalse + } + } + return false + }, time.Second*10, time.Second).Should(BeTrue())
157-165: Same issue: avoid Conditions[0]; check PodReady == True.Apply this diff:
- Eventually(func() bool { - podFromKubernetes := &v1.Pod{} - err := k8sClient.Get(ctx, types.NamespacedName{ - Namespace: basicPod.Namespace, - Name: basicPod.Name, - }, podFromKubernetes) - return err == nil && podFromKubernetes.Status.Phase == v1.PodRunning && podFromKubernetes.Status.Conditions[0].Status == v1.ConditionTrue - }, time.Second*10, time.Second).Should(BeTrue()) + Eventually(func() bool { + podFromKubernetes := &v1.Pod{} + if err := k8sClient.Get(ctx, types.NamespacedName{ + Namespace: basicPod.Namespace, + Name: basicPod.Name, + }, podFromKubernetes); err != nil { + return false + } + for _, c := range podFromKubernetes.Status.Conditions { + if c.Type == v1.PodReady { + return c.Status == v1.ConditionTrue + } + } + return false + }, time.Second*10, time.Second).Should(BeTrue())
228-236: Repeat: avoid ContainerStatuses[0]; use PodReady.Apply this diff:
- Eventually(func() bool { - podFromKubernetes := &v1.Pod{} - err := k8sClient.Get(ctx, types.NamespacedName{ - Namespace: basicPod2.Namespace, - Name: basicPod2.Name, - }, podFromKubernetes) - - return err == nil && podFromKubernetes.Status.Phase == v1.PodRunning && podFromKubernetes.Status.ContainerStatuses[0].Ready == true - }, time.Second*50, time.Second).Should(BeTrue()) + Eventually(func() bool { + podFromKubernetes := &v1.Pod{} + if err := k8sClient.Get(ctx, types.NamespacedName{ + Namespace: basicPod2.Namespace, + Name: basicPod2.Name, + }, podFromKubernetes); err != nil { + return false + } + if podFromKubernetes.Status.Phase != v1.PodRunning { + return false + } + for _, c := range podFromKubernetes.Status.Conditions { + if c.Type == v1.PodReady { + return c.Status == v1.ConditionTrue + } + } + return false + }, time.Second*50, time.Second).Should(BeTrue())
🧹 Nitpick comments (9)
suite/vnode_lifecycle_test.go (2)
30-32: Keep state assignment consistent (embedded vs explicit)Good to set NodeState alongside NodeInfo.State. For readability, prefer one style consistently (either nodeInfo.State and nodeInfo.NodeState, or nodeInfo.NodeInfo.State and nodeInfo.NodeState) across the file.
95-98: Mirror state on deactivation as wellLooks correct and mirrors activation path.
Consider a tiny helper to set both fields together to prevent drift in future edits.
suite/suite_test.go (1)
55-60: Optional: make UseExistingCluster togglableIf you often flip this during local runs, consider an env var flag to reduce diffs.
vnode_controller/vnode_controller.go (1)
382-385: Reduce noisy pod diffs to debug-level (optional)cmp.Diff on full pod objects can be large and verbose. Prefer debug-level to keep logs clean in production.
Apply this diff:
-// get the diff of oldPod and new Pod -diff := cmp.Diff(oldPodFromKubernetes, newPodFromKubernetes) -log.G(ctx).Infof("try to update pod %s/%s with diff %s in podUpdateHandler.", vNode.GetNodeName(), podKey, diff) +// get the diff of oldPod and new Pod +diff := cmp.Diff(oldPodFromKubernetes, newPodFromKubernetes) +log.G(ctx).Debugf("try to update pod %s/%s with diff %s in podUpdateHandler.", vNode.GetNodeName(), podKey, diff)suite/vpod_lifecycle_test.go (5)
47-48: Keep NodeState and NodeInfo.State consistent on activation.Some codepaths may read the new NodeState; set it when activating too.
Apply this diff:
- nodeInfo.NodeInfo.State = model.NodeStateActivated + nodeInfo.NodeInfo.State = model.NodeStateActivated + nodeInfo.NodeState = model.NodeStateActivated
33-43: Typo: “deactived” → “deactivated”.User-facing test descriptions should be clean.
Apply this diff:
- "when one container's status changes to deactived, pod should become unready.", + "when one container's status changes to deactivated, pod should become unready.",
83-84: Prefer polling over sleeps to reduce flakiness.Drop fixed sleeps; rely on Eventually’s polling to observe state.
Example (for one spot):
- time.Sleep(time.Second) tl.UpdateBizStatus(nodeName, key, model.BizStatusData{Also applies to: 134-135, 156-157, 219-221
79-101: DRY: extract helpers for pod readiness checks.Several blocks duplicate the same readiness logic; a small helper improves readability.
Example helper:
func isPodReady(p *v1.Pod) bool { if p.Status.Phase != v1.PodRunning { return false } for _, c := range p.Status.Conditions { if c.Type == v1.PodReady { return c.Status == v1.ConditionTrue } } return false }Also applies to: 215-236
252-267: Commented-out spec: consider XIt or remove.Use XIt to mark pending instead of large commented block, or delete to reduce noise.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
common/utils/utils.go(1 hunks)model/models.go(1 hunks)provider/vnode.go(1 hunks)provider/vpod_provider.go(2 hunks)suite/suite_test.go(3 hunks)suite/vnode_lifecycle_test.go(4 hunks)suite/vpod_lifecycle_test.go(4 hunks)vnode_controller/vnode_controller.go(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- provider/vnode.go
- common/utils/utils.go
- provider/vpod_provider.go
🧰 Additional context used
🧬 Code graph analysis (4)
model/models.go (2)
model/consts.go (1)
NodeState(80-80)tunnel/mock_tunnel.go (1)
NodeInfo(14-17)
suite/vnode_lifecycle_test.go (2)
model/consts.go (3)
NodeState(80-80)NodeStateActivated(85-85)NodeStateDeactivated(88-88)provider/vnode_store.go (1)
r(43-54)
suite/suite_test.go (1)
model/consts.go (2)
NodeState(80-80)NodeStateActivated(85-85)
suite/vpod_lifecycle_test.go (1)
model/consts.go (2)
NodeState(80-80)NodeStateDeactivated(88-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-test
🔇 Additional comments (9)
model/models.go (1)
46-47: Add guard for empty NodeState in onBaseStatusArrived
No literalNodeStatusDataconstructions were found in this repo—external tunnel code may still omitNodeState. Manually verify all tunnel paths to ensure they only propagateNodeStatewhen non-empty or supply a default. Consider updatingvnode_controller/vnode_controller.go(around line 255) to ignore emptydata.NodeStatebefore callingUpdateNodeStateOnProviderArrived.suite/vnode_lifecycle_test.go (2)
5-6: Import change is fineNo behavioral impact.
134-138: Deactivation path matches activation updatesNo issues spotted.
suite/suite_test.go (2)
5-7: Test imports OKNo functional change.
163-164: Seeding NodeStatusData.NodeStateSetting NodeState to Activated in prepareNode aligns tests with the new field. LGTM.
vnode_controller/vnode_controller.go (1)
560-605: Lease creation path properly initializes HolderIdentity and RenewTime
NewLease sets Spec.HolderIdentity to the provided clientID and Spec.RenewTime to now, so the early return after creation is safe.suite/vpod_lifecycle_test.go (3)
5-6: Import of time is appropriate.Needed for Sleep/Now/Eventually durations. LGTM.
172-176: Guard container count before indexing; avoid potential panic.If prepareBasicPod ever returns <2 containers, this panics.
Apply this diff:
Expect(err).To(BeNil()) - podFromKubernetes.Spec.Containers[0].Image = "suite-biz1-updated.jar" - podFromKubernetes.Spec.Containers[1].Image = "suite-biz2-updated.jar" + Expect(len(podFromKubernetes.Spec.Containers)).To(BeNumerically(">=", 2)) + for i := range podFromKubernetes.Spec.Containers { + podFromKubernetes.Spec.Containers[i].Image = "suite-biz-updated.jar" + }
238-249: LGTM: deactivation sets both fields, aligned with model change.Setting NodeState alongside State should satisfy new consumers.
e11541b to
020389b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
vnode_controller/vnode_controller.go (1)
612-616: Set the updated lease object after Patch.After a successful Patch,
vNode.SetLease(lease)keeps the stale pre-patch object in memory. UsenewLeaseto reflect the latest holder/renew time.Apply:
- if err == nil { - log.G(vnCtx).WithField("retries", i).Debugf("Successfully updated lease for %s", vNode.GetNodeName()) - vNode.SetLease(lease) - return - } + if err == nil { + log.G(vnCtx).WithField("retries", i).Debugf("Successfully updated lease for %s", vNode.GetNodeName()) + vNode.SetLease(newLease) + return + }
♻️ Duplicate comments (1)
vnode_controller/vnode_controller.go (1)
257-259: Good guard: avoid downgrading NodeState when older tunnels send empty.This resolves the earlier concern. No further action.
🧹 Nitpick comments (5)
vnode_controller/vnode_controller.go (2)
384-387: Log diff at debug level to cut noise and cost.cmp.Diff on whole Pods is heavy and chatty at info. Prefer debug.
Apply:
-// get the diff of oldPod and new Pod -diff := cmp.Diff(oldPodFromKubernetes, newPodFromKubernetes) -log.G(ctx).Infof("try to update pod %s/%s with diff %s in podUpdateHandler.", vNode.GetNodeName(), podKey, diff) +// get the diff of oldPod and new Pod +diff := cmp.Diff(oldPodFromKubernetes, newPodFromKubernetes) +log.G(ctx).Debugf("try to update pod %s/%s with diff %s in podUpdateHandler.", vNode.GetNodeName(), podKey, diff)
562-585: Lease creation flow looks correct; consider minor resiliency polish.The created flag and Create-on-NotFound path are good. Optionally add small jitter to the fixed 200ms sleep to reduce herd effects.
Example:
- time.Sleep(time.Millisecond * 200) // TODO: add random sleep time for reduce the client rate + // add small jitter to reduce client rate synchronization + time.Sleep(time.Duration(150+rand.Intn(150)) * time.Millisecond)(Remember to seed and import math/rand.)
provider/vpod_provider_test.go (3)
14-15: Prefer the builder API with an explicit scheme for the fake client.Newer controller-runtime recommends
fake.NewClientBuilder()with a registered scheme to avoid “no kind registered” issues and reduce flakiness.Apply:
- fakeCli := fake.NewFakeClient(oldPod) + scheme := runtime.NewScheme() + _ = clientgoscheme.AddToScheme(scheme) + fakeCli := fake.NewClientBuilder().WithScheme(scheme).WithObjects(oldPod).Build()(Add imports:
k8s.io/apimachinery/pkg/runtimeandk8s.io/client-go/kubernetes/schemeas clientgoscheme.)
126-129: Avoid reaching into internal store maps from tests.Use the public API to reduce tight coupling to internals.
Apply:
- provider.vPodStore.podKeyToPod = map[string]*corev1.Pod{ - "default/test": oldPod, - } + provider.vPodStore.PutPod(oldPod.DeepCopy())
141-143: Add a timeout to the channel receive to prevent hangs.Prevents a test from blocking indefinitely if notify isn’t delivered.
Apply:
- revdPod := <-newPodCh - assert.Len(t, revdPod.Spec.Containers, 2) + select { + case revdPod := <-newPodCh: + assert.Len(t, revdPod.Spec.Containers, 2) + case <-time.After(2 * time.Second): + t.Fatal("timed out waiting for pod notification") + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
provider/vpod_provider_test.go(2 hunks)vnode_controller/vnode_controller.go(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
provider/vpod_provider_test.go (2)
tunnel/mock_tunnel.go (1)
MockTunnel(19-29)provider/vpod_provider.go (2)
NewVPodProvider(68-80)b(252-341)
vnode_controller/vnode_controller.go (3)
model/consts.go (1)
NodeState(80-80)vnode_controller/vnode_controller_test.go (1)
TestPodHandler_NoVnodeOrNotLeader(236-285)provider/vnode.go (2)
vNode(244-249)vNode(252-259)
🔇 Additional comments (4)
vnode_controller/vnode_controller.go (1)
602-607: Early return after create or when not leader is correct.Prevents redundant immediate Patch and unnecessary write pressure.
provider/vpod_provider_test.go (3)
5-7: Imports added for testing/time are fine.
93-101: Test setup for tunnel callbacks is fine.
118-121: Construct provider with the fake client scheme fix above.Once you switch to the builder, this remains the same. Just ensuring consistency.
lylingzhen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed paased
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #46 +/- ##
==========================================
- Coverage 76.00% 75.83% -0.17%
==========================================
Files 19 19
Lines 3326 3336 +10
==========================================
+ Hits 2528 2530 +2
- Misses 676 683 +7
- Partials 122 123 +1
🚀 New features to boost your workflow:
|
see koupleless/koupleless#423
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Style