New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IPVS proxier to update stale real server after restart #111635
Conversation
@aryan9600: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @aryan9600. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig network |
/assign @uablrek @andrewsykim |
pkg/proxy/ipvs/proxier.go
Outdated
proxier.mu.Unlock() | ||
|
||
// Sync unconditionally - this is called once per lifetime. | ||
proxier.syncProxyRules() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the proxier.updateWeights = false
from here and instead do it where it's checked (see below)
pkg/proxy/ipvs/proxier.go
Outdated
proxier.mu.Unlock() | ||
|
||
// Sync unconditionally - this is called once per lifetime. | ||
proxier.syncProxyRules() | ||
|
||
proxier.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
// if we want to update weights, loop through all current destinations and | ||
// reset their weight. | ||
if proxier.updateWeights { | ||
for _, dest := range curDests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add proxier.updateWeights = false
here. The mutex is held so it's safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(btw, you may make a comment that the mutex is held and that it's a one-time event)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't add this here, as this is inside a loop. adding this outside the loop instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside the "for _, ep := range newEndpoints.List() {" loop you mean? Seens reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on second thought, is it completely safe to have proxier.updateWeights = false
inside syncProxyRules()
instead of OnServicesSynced()
/OnEndpointSynced()
?
i'm asking because in OnServicesSynced()
we do something like:
proxier.mu.Lock()
...
proxier.updateWeights = true
proxier.mu.Unlock()
proxier.syncProxyRules()
so even though proxier.syncProxyRules()
would capture the mutex first thing, theoretically another goroutine which called syncProxyRules()
could execute proxier.updateWeights = false
at the exact moment between OnServiceSynced
let go of the lock and proxier.syncProxyRules()
acquired the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly mistake, now it works https://go.dev/play/p/CuVZg3B2itE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i noticed first
needs to be initialized to true instead of false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems go
has support for "Once"; https://golangcode.com/run-code-once-with-sync/
You may see if you can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if it's better since we already have the proxier.mu mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to avoid sync.Once
because if in the future we want to do more things only on startup, the underlying function will get large. Having a flag gives us flexibility about where we want to run the one-time logic during sync.
20054a6
to
da19d62
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aryan9600, uablrek The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
EDIT; this is a bad solution since it checks proxier.initialized directly. Please see the updated proposal belowI was a bit hasty with the approve button 😊 The code looks good but it doesn't solve the problem. It seems there is a need to do the update both for endpoints and endpointSlices. I found my old tests for the issue and as it is now the weight stays a 0. I altered the code to; func (proxier *Proxier) syncProxyRules() {
proxier.mu.Lock()
defer proxier.mu.Unlock()
// To ensure complete initialization we can only consider the
// initial sync done when the proxier is initialized.
defer func() {
if proxier.initialized == 1 {
proxier.initialSync = false
}
}() then it works. |
okay, thanks for catching this. let me confirm this and update 👍 |
/hold just to avoid we merge unintentionally , once you are good unhold it, is just a precaution |
Taking another look and I see that func (proxier *Proxier) syncProxyRules() {
proxier.mu.Lock()
defer proxier.mu.Unlock()
// don't sync rules till we've received services and endpoints
if !proxier.isInitialized() {
klog.V(2).InfoS("Not syncing ipvs rules until Services and Endpoints have been received from master")
return
}
defer func() {
proxier.initialSync = false
}() |
(the proposed fix above is tested and works) |
Update the IPVS proxier to have a bool `initialSync` which is set to true when a new proxier is initialized and then set to false on all syncs. This lets us run startup-only logic, which subsequently lets us update the realserver only when needed and avoiding any expensive operations. Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
da19d62
to
8b5f263
Compare
Tested and works. /lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Update the IPVS proxier to have a bool
updateWeights
which is set totrue during the initial syncs performed by
OnEndpointSlicesSynced
andOnServiceSynced
to make sure any real servers with stale weights areupdated accordingly at startup. This logic is gated behind a bool to
avoid doing this during every sync as it's an expensive operation.
Which issue(s) this PR fixes:
Fixes #108319
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: