Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[net-8411] bug: fix premature token and service instance deletion due to pod fetch errors #3758

Merged
merged 5 commits into from Mar 25, 2024

Conversation

ndhanushkodi
Copy link
Contributor

@ndhanushkodi ndhanushkodi commented Mar 15, 2024

Changes proposed in this PR

  • When a pod fetch fails, we used to not add it to the endpointAddressMap resulting in it getting deregistered and its token deleted.
  • This PR makes the cases we deregister in more explicit-- only if the pod does not exist, we will deregister the service instance and delete the token. This PR also switches endpointAddressMap to a map that marks endpoints explicitly for deregistration (or not). And only entries explicitly marked in the map for deregistration are deregistered.

How I've tested this PR

Unit tests, Manually testing by changing the code to simulate pod fetch errors and with the fix in this PR it doesn't deregister the instance and delete the token

How I expect reviewers to test this PR

Checklist

@ndhanushkodi ndhanushkodi added the pr/no-backport signals that a PR will not contain a backport label label Mar 15, 2024
@ndhanushkodi ndhanushkodi requested review from DanStough and hashi-derek and removed request for DanStough and hashi-derek March 15, 2024 06:51
@ndhanushkodi ndhanushkodi marked this pull request as draft March 15, 2024 07:15
Copy link
Member

@DanStough DanStough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking question about the deregister function.

if ok {
return deregister
}
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to return true if it's not in the map, like in the case an service instance no longer appears in the endpoints subset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this-- I intended the change to be that only if you explicitly add the address to the map for deregistration then you should deregister, but I think there's too many cases where we do rely on the implicit deregistration when a service instance no longer appears in the endpoints, and a bunch of tests were failing. So I just changed the logic to be very specific to the pod-fetch-error case, so only if there is a k8s api related pod fetch error, we explicitly choose to add it to the map and say "don't deregister this", and now the deregister function does return true if it's not in the map!

@ndhanushkodi ndhanushkodi force-pushed the nd/net-8411-premature-deregister branch from 51e68f0 to acbaead Compare March 15, 2024 17:18
@ndhanushkodi ndhanushkodi marked this pull request as ready for review March 15, 2024 19:12
Copy link
Member

@DanStough DanStough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one non-blocking note about the changelog.

Comment on lines 2 to 3
control-plane: fix an issue where ACL token cleanup did not respect a pod's GracefulShutdownPeriodSeconds and
tokens were invalidated immediately on pod entering Terminating state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a copy-paste error from the other PR; we want something specific here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOOL yes thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-backport signals that a PR will not contain a backport label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants