-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxy healthz server for dualstack clusters #118146
proxy healthz server for dualstack clusters #118146
Conversation
Skipping CI for Draft Pull Request. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
this will partly fix #116486 I have created this draft to get initial response/feedback on the approach for the fix the issue |
41f4ab2
to
9afaf97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is probably right.
This is going to conflict with #116470. I'm not sure if @alexanderConstantinescu is working on that PR again currently?
Regardless of which is going to merge first, it would be good for you to look at that PR and make sure you're not changing things in ways that will need to be reverted or rewritten completely differently for that PR.
I am. I've pinged people off-band for a review on it. In case you have some cycles @danwinship: feel free to have a look. |
9afaf97
to
26e8082
Compare
@alexanderConstantinescu if you are just waiting for final review/approval on #116470 I'll wait for it to merge first. |
c7e1960
to
9838c4d
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like this better, overall.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
f3d686a
to
9a55712
Compare
// Set oldestPendingQueuedMap only if it's currently zero | ||
if val, ok := hs.oldestPendingQueuedMap[ipFamily]; ok && val == zeroTime { | ||
hs.oldestPendingQueuedMap[ipFamily] = hs.clock.Now() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is wrong now, because we don't initialize oldestPendingQueuedMap
to { v1.IPv4Protocol: zeroTime, v1.IPv6Protocol: zeroTime }
. We initialize it to {}
. So you want:
// Set oldestPendingQueuedMap[ipFamily] only if it's currently unset
if _, set := hs.oldestPendingQueuedMap[ipFamily]; !set {
hs.oldestPendingQueuedMap[ipFamily] = hs.clock.Now()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if _, set := hs.oldestPendingQueuedMap[ipFamily]; !set {
This feels wrong to me, this will only allow the first QueuedUpdate()
call to set hs.oldestPendingQueuedMap[ipFamily]
to current time, all the subsequent calls won't be updating hs.oldestPendingQueuedMap[ipFamily]
as the key will already be part of the map and proxy will always respond unhealthy.
I may be wrong here, but in the current code we do not initialize the oldestPendingQueued
and being of type atomic.Value
its zero value won't be zeroTime and hs.oldestPendingQueued.CompareAndSwap(zeroTime, hs.clock.Now())
won't work as old value is not zeroTime
hs.oldestPendingQueued.CompareAndSwap(zeroTime, hs.clock.Now()) |
I tried to play around with it here https://go.dev/play/p/q2onw6P3aZB, CompareAndSwap(zeroTime, currentTime)
only works if we explicitly set oldestPendingQueued to zeroTime which I guess happens in the first Updated() call.
hs.oldestPendingQueued.Store(zeroTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will only allow the first
QueuedUpdate()
call to seths.oldestPendingQueuedMap[ipFa
oh, right, Updated
needs to be updated too.
Anyway, the stuff with comparing against zeroTime
is just because, with the old code, oldestPendingQueued
stored a time.Time
, not a *time.Time
, so we couldn't store nil
to mean "nothing is queued", so we recognized the zero time as meaning that instead. Which is kind of weird... we really should have used a pointer.
Anyway, with the map, there's no reason to worry about comparing against the zero time. You should just have a value in the map if there is an update queued, and no value in the map when there is no update queued. So Updated
should do delete(hs.oldestPendingQueuedMap, ipFamily)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks !
case oldestPendingQueued.IsZero(): | ||
// The proxier is healthy while it's starting up | ||
// or the proxier is fully synced. | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So again, this case would never get hit. The initial state is that there's no entry for that IP family, not a zero entry. But we don't need any special case for this here now; when the proxy is starting up (ie, oldestPendingQueuedMap
is empty), the for
loop just gets skipped and we return true
below.
9a55712
to
a21a6a1
Compare
Signed-off-by: Daman Arora <aroradaman@gmail.com>
a21a6a1
to
bfda244
Compare
/retest |
/lgtm /kind bug @aroradaman I feel like this is worthy of a small release note. Something like "kube-proxy now reports its health more accurately in dual-stack clusters when there are problems with only one IP family." or something like that. Can you update the release note field in the initial comment, and then |
LGTM label has been added. Git tree hash: 375e63aee9ce61600c8c9a0db2bb54d9549a80a2
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aroradaman, danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
What type of PR is this?
/kind cleanup
/sig network
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: