Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Leases for etcd member health checks #214

Merged
merged 8 commits into from
Aug 10, 2021

Conversation

timuthy
Copy link
Member

@timuthy timuthy commented Aug 3, 2021

How to categorize this PR?

/area scalability
/kind api-change
/kind enhancement

What this PR does / why we need it:
This PR changes Druid's health checks in a way that it considers Lease resources in order to find out the readiness of a etcd cluster member.

Which issue(s) this PR fixes:
Fixes #206
Fixes #219

Special notes for your reviewer:
Please have a look at the separate commits for an easier review process.
The creation/deletion of Lease objects through Druid will follow later in a separate PR.
/cc @amshuman-kr @shreyas-s-rao

Release note:

Druid now fetches `Lease` resources in order to derive the readiness state of an etcd cluster member. This serves as a preparation for the etcd multi-node feature.

@timuthy timuthy requested a review from a team as a code owner August 3, 2021 08:41
@gardener-robot gardener-robot added needs/review Needs review size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else labels Aug 3, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 3, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 3, 2021
Copy link
Collaborator

@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timuthy Thanks for the changes to include handling of leases in status! Looks good already. Just a few minor questions below.

Also, one additional question. Is there any explicit configuration required during startup to make sure that the leases are cached? Or is that postponed to #215 ?

pkg/health/condition/check_ready.go Outdated Show resolved Hide resolved
pkg/health/etcdmember/check_ready.go Outdated Show resolved Hide resolved
pkg/health/etcdmember/check_ready.go Outdated Show resolved Hide resolved
pkg/health/etcdmember/builder_test.go Show resolved Hide resolved
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 3, 2021
@timuthy
Copy link
Member Author

timuthy commented Aug 3, 2021

Thanks for the fast feedback @amshuman-kr.

Also, one additional question. Is there any explicit configuration required during startup to make sure that the leases are cached? Or is that postponed to #215 ?

No, there is no additional configuration required. Leases will be cached automatically. #215 is an optimization, so that we don't cache all Lease objects in the cluster. We can also make caching available as a command line flag, so that one can turn it off in case there are too many Lease objects in a cluster. WDYT?

@amshuman-kr
Copy link
Collaborator

No, there is no additional configuration required. Leases will be cached automatically. #215 is an optimization, so that we don't cache all Lease objects in the cluster. We can also make caching available as a command line flag, so that one can turn it off in case there are too many Lease objects in a cluster. WDYT?

I now remember that controller-runtime now automatically caches if there one list call. Sounds good 👍

Copy link
Collaborator

@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @timuthy! LGTM

@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 9, 2021
@timuthy
Copy link
Member Author

timuthy commented Aug 9, 2021

@amshuman-kr as discussed, I added a command line option to disable the lease cache.

@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 9, 2021
@amshuman-kr amshuman-kr added this to the v0.6.0 milestone Aug 9, 2021
Copy link
Collaborator

@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for disabling the lease cache! LGTM

@gardener-robot gardener-robot added the area/scalability Scalability related label Aug 10, 2021
@gardener-robot gardener-robot added kind/api-change API change with impact on API users kind/enhancement Enhancement, improvement, extension labels Aug 10, 2021
@amshuman-kr
Copy link
Collaborator

/invite @shreyas-s-rao @abdasgupta @ishan16696 @aaronfern

The changes look good from my side. Shall we merge this?

@@ -4,7 +4,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.5.0
controller-gen.kubebuilder.io/version: v0.4.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some clarification needed.
Is this an intended change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #193 (comment)

Copy link
Contributor

@abdasgupta abdasgupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abdasgupta abdasgupta merged commit 1ae26ee into gardener:master Aug 10, 2021
@timuthy timuthy deleted the feature.member-leases branch August 10, 2021 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/scalability Scalability related kind/api-change API change with impact on API users kind/enhancement Enhancement, improvement, extension needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/review Needs review needs/second-opinion Needs second review by someone else size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Disable cache for lease ☂️ [Feature] Add EtcdMember resource
7 participants