New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[net-8411] bug: fix premature token and service instance deletion due to pod fetch errors #3758
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One blocking question about the deregister
function.
if ok { | ||
return deregister | ||
} | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to return true
if it's not in the map, like in the case an service instance no longer appears in the endpoints subset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this-- I intended the change to be that only if you explicitly add the address to the map for deregistration then you should deregister, but I think there's too many cases where we do rely on the implicit deregistration when a service instance no longer appears in the endpoints, and a bunch of tests were failing. So I just changed the logic to be very specific to the pod-fetch-error case, so only if there is a k8s api related pod fetch error, we explicitly choose to add it to the map and say "don't deregister this", and now the deregister function does return true if it's not in the map!
control-plane/connect-inject/controllers/endpoints/endpoints_controller_test.go
Show resolved
Hide resolved
51e68f0
to
acbaead
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one non-blocking note about the changelog.
.changelog/3758.txt
Outdated
control-plane: fix an issue where ACL token cleanup did not respect a pod's GracefulShutdownPeriodSeconds and | ||
tokens were invalidated immediately on pod entering Terminating state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a copy-paste error from the other PR; we want something specific here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOOL yes thanks!
6c08b80
to
215dc30
Compare
Changes proposed in this PR
How I've tested this PR
Unit tests, Manually testing by changing the code to simulate pod fetch errors and with the fix in this PR it doesn't deregister the instance and delete the token
How I expect reviewers to test this PR
Checklist