Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update health check status based on the endpoints object instead of pod #503

Merged
merged 3 commits into from Apr 26, 2021

Conversation

ishustava
Copy link
Member

@ishustava ishustava commented Apr 23, 2021

Previously, we were updating health check status based on the pod status
of the pod that the endpoints address is pointing to. However, since the
endpoints controller is not watching pod objects, it will not always have
the most updated pod information in the cache. This results in incorrect behavior
where sometimes the pod we get will have stale information and we update the
health check status in Consul incorrectly.

This PR changes this behavior to instead get the health check status from
the endpoints object. The endpoints object already has information about ready
and not ready addresses, and it will always have updated cache since we receive
events based on updates from the Kubernetes API server to the cached Informer.

How I've tested this PR:
Acceptance test here

How I expect reviewers to test this PR:
code review

Checklist:

  • Tests added
  • CHANGELOG entry added (HashiCorp engineers only, community PRs should not add a changelog entry)

@ishustava ishustava added type/bug Something isn't working area/connect Related to Connect service mesh, e.g. injection labels Apr 23, 2021
@ishustava ishustava force-pushed the fix-health-check-status-bug branch 2 times, most recently from 5fbdcc2 to e928fbd Compare April 23, 2021 19:27
@ishustava ishustava requested review from a team, lkysow and ndhanushkodi and removed request for a team April 26, 2021 16:02
Previously, we were updating health check status based on the pod status
of the pod that the endpoints address is pointing to. However, since the
endpoints controller is not watching pod objects, it will not always have
the most updated pod information in the cache. This results in incorrect behavior
where sometimes the pod we get will have stale information and we update the
health check status in Consul incorrectly.

This commit changes this behavior to instead get the health check status from
the endpoints object. The endpoints object already has information about ready
and not ready addresses, and it will always have updated cache since we receive
events based on updates from the Kubernetes API server to the cached Informer.
Copy link
Member

@lkysow lkysow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

connect-inject/endpoints_controller.go Outdated Show resolved Hide resolved
Copy link
Contributor

@ndhanushkodi ndhanushkodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, awesome job tracking this down and I like the decision to look at the Endpoints object. Thanks for all the context in the PR as well. Nice work!!!

connect-inject/endpoints_controller.go Outdated Show resolved Hide resolved
@ishustava ishustava merged commit a3f3656 into master Apr 26, 2021
@ishustava ishustava deleted the fix-health-check-status-bug branch April 26, 2021 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connect Related to Connect service mesh, e.g. injection type/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants