Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-aggregator/openapi/controller: remove trailing 1 in failure ratelimiter #77979

Merged
merged 1 commit into from Jul 4, 2020

Conversation

s-urbaniak
Copy link
Contributor

@s-urbaniak s-urbaniak commented May 16, 2019

What type of PR is this?

/kind api-change
/kind bug

/kind cleanup

/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

This removes a trailing 1 from the rate limiting name as it was confusing, and also appears in metrics.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

openapi-controller: remove the trailing `1` character literal from the rate limiting metric `APIServiceOpenAPIAggregationControllerQueue1` and rename it to `open_api_aggregation_controller` to adhere to Prometheus best practices.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 16, 2019
@s-urbaniak
Copy link
Contributor Author

/cc @brancz @sttts

@k8s-ci-robot k8s-ci-robot requested review from brancz and sttts May 16, 2019 15:24
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 16, 2019
@brancz
Copy link
Member

brancz commented May 16, 2019

/lgtm

from instrumentation side, let's have one of the approvers confirm that this number isn't there for a reason, but it looks strange at best.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 16, 2019
@fedebongio
Copy link
Contributor

/assign @logicalhan

@fedebongio
Copy link
Contributor

/assign @roycaihw

@roycaihw
Copy link
Member

/hold

before we merge this, just to verify if the name is exposed in metrics and if changing it will cause any breaking change

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 16, 2019
Copy link
Member

@logicalhan logicalhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming a metric is equivalent to deleting a metric and creating a new one. We should probably deprecate the old metric so that if people are using it, they have some window to migrate off of it. Also, if we're going to do this, we should really just completely fix the name of the metric (i.e. no camel-casing at all).

@brancz
Copy link
Member

brancz commented May 17, 2019

This is renaming a single label value, not the metric name, nor the label-key. I don’t know whether deprecation is necessary here as it’s more a time series identity label rather than one you would actively query and for what it’s worth the underlying metrics were broken until @s-urbaniak recently fixed them (the PR may still be open in which case they are still broken).

@logicalhan
Copy link
Member

This is renaming a single label value, not the metric name, nor the label-key. I don’t know whether deprecation is necessary here as it’s more a time series identity label rather than one you would actively query and for what it’s worth the underlying metrics were broken until Sergiusz Urbaniak recently fixed them (the PR may still be open in which case they are still broken).

It is used to construct a number of different metric names:

# HELP APIServiceOpenAPIAggregationControllerQueue1_adds Total number of adds handled by workqueue: APIServiceOpenAPIAggregationControllerQueue1
# TYPE APIServiceOpenAPIAggregationControllerQueue1_adds counter
APIServiceOpenAPIAggregationControllerQueue1_adds 2607
# HELP APIServiceOpenAPIAggregationControllerQueue1_depth Current depth of workqueue: APIServiceOpenAPIAggregationControllerQueue1
# TYPE APIServiceOpenAPIAggregationControllerQueue1_depth gauge
APIServiceOpenAPIAggregationControllerQueue1_depth 0
# HELP APIServiceOpenAPIAggregationControllerQueue1_queue_latency How long an item stays in workqueueAPIServiceOpenAPIAggregationControllerQueue1 before being requested.
# TYPE APIServiceOpenAPIAggregationControllerQueue1_queue_latency summary
APIServiceOpenAPIAggregationControllerQueue1_queue_latency{quantile="0.5"} 13
APIServiceOpenAPIAggregationControllerQueue1_queue_latency{quantile="0.9"} 17
APIServiceOpenAPIAggregationControllerQueue1_queue_latency{quantile="0.99"} 17
APIServiceOpenAPIAggregationControllerQueue1_queue_latency_sum 52987
APIServiceOpenAPIAggregationControllerQueue1_queue_latency_count 2607
# HELP APIServiceOpenAPIAggregationControllerQueue1_retries Total number of retries handled by workqueue: APIServiceOpenAPIAggregationControllerQueue1
# TYPE APIServiceOpenAPIAggregationControllerQueue1_retries counter
APIServiceOpenAPIAggregationControllerQueue1_retries 3478
# HELP APIServiceOpenAPIAggregationControllerQueue1_work_duration How long processing an item from workqueueAPIServiceOpenAPIAggregationControllerQueue1 takes.
# TYPE APIServiceOpenAPIAggregationControllerQueue1_work_duration summary
APIServiceOpenAPIAggregationControllerQueue1_work_duration{quantile="0.5"} 96023
APIServiceOpenAPIAggregationControllerQueue1_work_duration{quantile="0.9"} 140829
APIServiceOpenAPIAggregationControllerQueue1_work_duration{quantile="0.99"} 140829
APIServiceOpenAPIAggregationControllerQueue1_work_duration_sum 3.69094537e+08
APIServiceOpenAPIAggregationControllerQueue1_work_duration_count 2607

@brancz
Copy link
Member

brancz commented May 17, 2019

Thanks, I didn’t see that while browsing the code. That’s unfortunate. In that case I agree this should be changed to snake case and remove the trailing number. In regards to deprecation I think these metrics violate just about every guideline we have and I can’t find any usages in the popular sources. I think just changing would be ok.

@s-urbaniak
Copy link
Contributor Author

@logicalhan @brancz thanks for verifying the bigger picture! It seems we have a consensus to a) convert this to snake case and b) omit deprecation. I am pushing a commit on top of this then.

@s-urbaniak
Copy link
Contributor Author

fwiw I did a quick sweep and verified we have more camel-case occurences:

queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "APIServiceRegistrationController"),

queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "ClusterRoleAggregator"),

queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "DiscoveryController"),

queue: workqueue.NewNamedRateLimitingQueue(
// We want a fairly tight requeue time. The controller listens to the API, but because it relies on the routability of the
// service network, it is possible for an external, non-watchable factor to affect availability. This keeps
// the maximum disruption time to a minimum, but it does prevent hot loops.
workqueue.NewItemExponentialFailureRateLimiter(5*time.Millisecond, 30*time.Second),
"AvailableConditionController"),

queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "crdEstablishing"),

Finally the sample controller includes a camel case example which we probably should also fix along with a comment:

workqueue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "Foos"),

I can grep to check if those are in general use and submit another cleanup PR fixing the above.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 17, 2019
@sttts
Copy link
Contributor

sttts commented Jun 24, 2020

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 24, 2020
@sttts
Copy link
Contributor

sttts commented Jun 24, 2020

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, s-urbaniak, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2020
@dims
Copy link
Member

dims commented Jun 29, 2020

/test all

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

21 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 2da917d into kubernetes:master Jul 4, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jul 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants