-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate scheduler, controller-manager and cloud-controller-manager to use LeaseLock #94603
Migrate scheduler, controller-manager and cloud-controller-manager to use LeaseLock #94603
Conversation
@@ -44,7 +44,7 @@ func RecommendedDefaultLeaderElectionConfiguration(obj *LeaderElectionConfigurat | |||
obj.RetryPeriod = metav1.Duration{Duration: 2 * time.Second} | |||
} | |||
if obj.ResourceLock == "" { | |||
// TODO: Migrate to LeaseLock. | |||
// TODO(#80289): Migrate to LeaseLock when graduating to v1beta1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt - who owns that part now?
I would be happy to promote it to v1beta1 in 1.20 to clean this up, but wanted to check with the owner if there aren't any known blockers for doing that (or other things we want to fix during this migration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure... cc @mtaufen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a particularly active file, but it's a core dependency so it would be good to stabilize the version. I'm not sure what the blockers would be off the top of my head. I'm happy to help if you want to work on moving it forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I will try to get back to in in the next couple weeks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, no rush as I'll be OOO for the next couple weeks anyway.
/retest |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
@@ -44,7 +44,7 @@ func RecommendedDefaultLeaderElectionConfiguration(obj *LeaderElectionConfigurat | |||
obj.RetryPeriod = metav1.Duration{Duration: 2 * time.Second} | |||
} | |||
if obj.ResourceLock == "" { | |||
// TODO: Migrate to LeaseLock. | |||
// TODO(#80289): Migrate to LeaseLock when graduating to v1beta1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure... cc @mtaufen?
@@ -44,7 +44,7 @@ func RecommendedDefaultLeaderElectionConfiguration(obj *LeaderElectionConfigurat | |||
obj.RetryPeriod = metav1.Duration{Duration: 2 * time.Second} | |||
} | |||
if obj.ResourceLock == "" { | |||
// TODO: Migrate to LeaseLock. | |||
// TODO(#80289): Migrate to LeaseLock when graduating to v1beta1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going straight from Endpoints -> Lease isn't safe, right? We have to go Endpoints -> EndpointsLeases -> Lease over three releases, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, I'm not sure we can know when it's safe to migrate this function to LeaseLock. With components we control, we know their rollout cadence and skew support, so we can go endpoints -> endpointsleases -> leases over three versions. With this helper method, we have no idea what the consuming component's release schedule and skew support is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - going Endpoints->Leases isn't safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
5af5011
to
f34cb47
Compare
f34cb47
to
3ddbb04
Compare
@@ -115,6 +116,10 @@ func NewCloudControllerManagerOptions() (*CloudControllerManagerOptions, error) | |||
// NewDefaultComponentConfig returns cloud-controller manager configuration object. | |||
func NewDefaultComponentConfig(insecurePort int32) (*ccmconfig.CloudControllerManagerConfiguration, error) { | |||
versioned := &ccmconfigv1alpha1.CloudControllerManagerConfiguration{} | |||
// Use lease-based leader election to reduce cost. | |||
// The default endpoints-leases one has already been used for couple releases. | |||
versioned.LeaderElection.ResourceLock = resourcelock.LeasesResourceLock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be specific about which release we switched the default to endpointslease. also, why not change the default in SetDefaults_CloudControllerManagerConfiguration instead of here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment about introducing in 1.17 (same below).
Modifying defaulting seems like backward-incompatible (though technically it's probably very unlikely someone will reuse it for something else...).
Should I really change to modifying the defaulting?
@@ -210,6 +211,10 @@ func NewKubeControllerManagerOptions() (*KubeControllerManagerOptions, error) { | |||
// NewDefaultComponentConfig returns kube-controller manager configuration object. | |||
func NewDefaultComponentConfig(insecurePort int32) (kubectrlmgrconfig.KubeControllerManagerConfiguration, error) { | |||
versioned := kubectrlmgrconfigv1alpha1.KubeControllerManagerConfiguration{} | |||
// Use lease-based leader election to reduce cost. | |||
// The default endpoints-leases one has already been used for couple releases. | |||
versioned.LeaderElection.ResourceLock = resourcelock.LeasesResourceLock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be specific about which release we switched the default to endpointslease. also, why not change the default in SetDefaults_KubeControllerManagerConfiguration instead of here?
@@ -138,6 +138,9 @@ func splitHostIntPort(s string) (string, int, error) { | |||
func newDefaultComponentConfig() (*kubeschedulerconfig.KubeSchedulerConfiguration, error) { | |||
versionedCfg := kubeschedulerconfigv1beta1.KubeSchedulerConfiguration{} | |||
versionedCfg.DebuggingConfiguration = *configv1alpha1.NewRecommendedDebuggingConfiguration() | |||
// Use lease-based leader election to reduce cost. | |||
// The default endpoints-leases one has already been used for couple releases. | |||
versionedCfg.LeaderElection.ResourceLock = resourcelock.LeasesResourceLock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be specific about which release we switched the default to endpointslease. also, why not change the default in SetDefaults_KubeSchedulerConfiguration instead of here?
8940a77
to
0d3216e
Compare
@liggitt - PTAL |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
are we making an exception here in the sense that a CC default value is being changed without increasing the version (to v1beta2 in the scheduler case)? Note that we are planning to make some changes to scheduler's CC and introduce v1beta2, so we could make this change in v1beta2 and keep the old behavior for v1beta1. |
that would be fine as well. the controller manager configs aren't exposed as config files yet, so the default change is only affecting the CLI flag defaults |
/lgtm |
@@ -130,7 +130,10 @@ func RecommendedDefaultGenericControllerManagerConfiguration(obj *kubectrlmgrcon | |||
} | |||
|
|||
if len(obj.LeaderElection.ResourceLock) == 0 { | |||
obj.LeaderElection.ResourceLock = "endpointsleases" | |||
// Use lease-based leader election to reduce cost. | |||
// We migrated for EndpointsLease lock in 1.17 and starting in 1.20 we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We migrated to EndpointLease lock in 1.17
/lgtm |
Cancelling hold based on Abdullah and Walter lgtms above. /hold cancel |
Watch events for objects in the kube-system namespace were previously ignored. In certain situations, this would cause the destination service to return invalid (outdated) endpoints for services in kube-system - including unmeshed services. It was suggested [1] that kube-system events were ignored to avoid handling frequent Endpoint updates - specifically from controllers using Endpoints for leader elections [2]. As of Kubernetes 1.20, these controllers default to using Leases instead of Endpoints for their leader elections [3], obviating the need to exclude (or filter) updates from kube-system. The exclusions have been removed accordingly. [1]: linkerd#4133 (comment) [2]: kubernetes/kubernetes#86286 [3]: kubernetes/kubernetes#94603 Signed-off-by: Jacob Henner <code@ventricle.us>
Watch events for objects in the kube-system namespace were previously ignored. In certain situations, this would cause the destination service to return invalid (outdated) endpoints for services in kube-system - including unmeshed services. It [was suggested][1] that kube-system events were ignored to avoid handling frequent Endpoint updates - specifically from [controllers using Endpoints for leader elections][2]. As of Kubernetes 1.20, these controllers [default to using Leases instead of Endpoints for their leader elections][3], obviating the need to exclude (or filter) updates from kube-system. The exclusions have been removed accordingly. [1]: #4133 (comment) [2]: kubernetes/kubernetes#86286 [3]: kubernetes/kubernetes#94603 Signed-off-by: Jacob Henner <code@ventricle.us>
Ref #80289
/kind cleanup
/priority important-longterm