Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimistically update leader lock #122069

Merged
merged 2 commits into from
Dec 14, 2023
Merged

Conversation

linxiulei
Copy link
Contributor

@linxiulei linxiulei commented Nov 27, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Client-go: Optimized leaders renewing leases by updating leader lock optimistically without getting the record from the apiserver first. Also added a new metric `leader_election_slowpath_total` that allow users to monitor how many leader elections are updated non-optimistically.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Nov 27, 2023
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.29 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.29.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Mon Nov 27 09:58:46 UTC 2023.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 27, 2023
@linxiulei
Copy link
Contributor Author

Partly fix #121806

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 27, 2023
@linxiulei
Copy link
Contributor Author

/assign @wojtek-t @aojea

Name: "leader_election_slowpath_total",
StabilityLevel: k8smetrics.ALPHA,
Help: "Total number of slow path exercised in renewing leader leases. 'name' is the string used to identify the lease. Please make sure to group by name.",
}, []string{"name"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has in theory unbounded cardinality - we shouldn't make a name to be a label.

Why do we need a name? If we really need to check the name, we should use logs - for metrics we should focus oncount.

OTOH - I see other metrics above to be labeled with name - isn't that an issue?

@dgrisonnet @logicalhan ^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What name is this? If it's a per-cluster (or per control-plane node) name it should be practically bounded and we've traditionally been okay with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "per component taking part in leader election".

So in general yes - it should be bounded [modulo bugs etc.]. So I'm assuming this is fine for you...

// and down.
type SwitchMetric interface {
// LeaderMetric instruments metrics used in leader election.
type LeaderMetric interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building a bit on @aojea comment from another PR - changing public interfaces in our official client library is always problematic.
Can we leave the SwitchMetric interface as it was and simply add a new LeaderMetric interface as:

type LeaderMetric interface {
  SwitchMetric
  SlowPathExercised(name string)
}

I'm also thinking about better naming for the method - but don't have a better suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH - we're changing below MetricsProvider interface so maybe that's an overkill.
I wouldn't expect people to generally overwrite provider factory and we're fixing all implementations.
So maybe that's actually fine as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, I keep it as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this: it is problematic and causes another tight coupling on controller-runtime and client-go which provides its own implementation.

If the original name was kept, even adding the new method would have been more compatible, since it would allow a single consumer to support 1.29 and 1.30. By changing the name, it strictly forces 1.30 usage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks the upgrade to 1.30 for every controller built on top of controller-runtime e.g. Flux fluxcd/pkg#763. Hopefully the controller-runtime maintainers will do a release soon that deals with this breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on #124372 to address this. Will cherry pick to 1.30 after merged.

@jiahuif
Copy link
Member

jiahuif commented Nov 28, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 28, 2023
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 29, 2023
@linxiulei
Copy link
Contributor Author

/cc @pohly

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core code LGTM - I added few more comments about the test.

@linxiulei
Copy link
Contributor Author

Addressed all suggestions and updated release-note

Signed-off-by: Eric Lin <exlin@google.com>
Signed-off-by: Eric Lin <exlin@google.com>
@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 11, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c9d315df71af36c8b687a357047a80f42ade1ead

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: linxiulei, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants