Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Course aggregation of request duration metrics #3303

Closed
dippynark opened this issue Mar 12, 2024 · 2 comments · Fixed by #3307
Closed

Course aggregation of request duration metrics #3303

dippynark opened this issue Mar 12, 2024 · 2 comments · Fixed by #3307
Labels
bug Something isn't working

Comments

@dippynark
Copy link
Contributor

dippynark commented Mar 12, 2024

What steps did you take and what happened:

The smallest non-zero aggregation boundary for request duration metrics (such as gatekeeper_validation_request_duration_seconds_bucket) is 5 seconds (up from 1 millisecond). We are alerting if Gatekeeper has a high 99th percentile request latency, so this course aggregation is causing our alerts to fire.

Curling the Gatekeeper metrics endpoint gives the following:

gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="0"} 0
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="5"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="10"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="25"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="50"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="75"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="100"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="250"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="500"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="750"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="1000"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="2500"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="5000"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="7500"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="10000"} 1055
gatekeeper_validation_request_duration_seconds_bucket{admission_status="allow",le="+Inf"} 1055

What did you expect to happen:

Request duration metric aggregation boundaries to be as specified here:

Boundaries: []float64{0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.5, 2, 2.5, 3},

Anything else you would like to add:

I guess this issue was introduced in this PR: #3011

Environment:

  • Gatekeeper version: v3.15.0
  • Kubernetes version: (use kubectl version):
% kubectl version -o yaml
clientVersion:
  buildDate: "2023-07-19T12:20:54Z"
  compiler: gc
  gitCommit: fa3d7990104d7c1f16943a67f11b154b71f6a132
  gitTreeState: clean
  gitVersion: v1.27.4
  goVersion: go1.20.6
  major: "1"
  minor: "27"
  platform: darwin/amd64
kustomizeVersion: v5.0.1
serverVersion:
  buildDate: "2024-01-04T22:48:32Z"
  compiler: gc
  gitCommit: 6f460c12ad45abb234c18ec4f0ea335a1203c415
  gitTreeState: clean
  gitVersion: v1.27.8-gke.1067004
  goVersion: go1.20.11 X:boringcrypto
  major: "1"
  minor: "27"
  platform: linux/amd64
@dippynark dippynark added the bug Something isn't working label Mar 12, 2024
@JaydipGabani
Copy link
Contributor

@dippynark Thanks for rasing this. I am investigating on whats going on here.

@zmedico
Copy link

zmedico commented Mar 21, 2024

I tried 7349e22 (current tip of release-3.15) and the gatekeeper_validation_request_duration histogram is totally missing from the metrics (also gatekeeper_validation_request_count_total went away). I think I'm getting better results with c68a029 (v3.16.0-beta.1).

Actually I think I used the wrong image when I tried to test 7349e22 so I think the fix is probably going to work for me. Thanks!

Yeah, 7349e22 is working well for me. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants