Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubectl --raw reporting an unknown metric even though it shows up in the list of known metrics #641

Open
evin-bz opened this issue Feb 16, 2024 · 1 comment
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@evin-bz
Copy link

evin-bz commented Feb 16, 2024

What happened?: Getting HPA reporting status unknown for a given metric, other metrics seem to work fine:

  "error_rate_metric" on Ingress/my-ingress (target value):             <unknown> / 1
...
  Warning  FailedGetObjectMetric  83s (x95 over 25m)    horizontal-pod-autoscaler  unable to get metric error_rate_metric: Ingress on my-namespace my-ingress/unable to fetch metrics from custom metrics API: the server could not find the metric error_rate_metric for ingresses.networking.k8s.io my-ingress

What did you expect to happen?: Custom metric reports back with at least 1 given the query being used

Please provide the prometheus-adapter config:

The config for this metric is fairly simple, and in theory should always return SOME value via the clamp_min:

  - metricsQuery: clamp_min(round(sum(rate(<<.Series>>{<<.LabelMatchers>>,status=~"^5.."}[1m])) or vector(0.00001) / sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m]) ), 0.01) * 100, 1)
    resources:
      template: <<.Resource>>
    name:
      as: error_rate_metric
    seriesFilters: []
    seriesQuery: '{__name__="nginx_ingress_controller_requests",ingress="my-ingress",namespace!=""}'

Please provide the HPA resource used for autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  maxReplicas: 1
  metrics:
  - object:
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: my-ingress
      metric:
        name: nginx_ingress_controller_requests_rate_my_ingress_ingress
      target:
        averageValue: "75"
        type: AverageValue
        value: "0"
    type: Object
  - object:
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: my-ingress
      metric:
        name: nginx_ingress_controller_response_duration_p95_my_ingress_ingress
      target:
        type: Value
        value: "7"
    type: Object
  - object:
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: my-ingress
      metric:
        name: error_rate_metric
      target:
        type: Value
        value: "1"
    type: Object
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment

Please provide the HPA status:

Conditions:
  Type            Status  Reason                 Message
  ----            ------  ------                 -------
  AbleToScale     True    SucceededGetScale      the HPA controller was able to get the target's current scale
  ScalingActive   False   FailedGetObjectMetric  the HPA was unable to compute the replica count: unable to get metric error_rate_metric: Ingress on my-namespace my-ingress/unable to fetch metrics from custom metrics API: the server could not find the metric error_rate_metric for ingresses.networking.k8s.io my-ingress
  ScalingLimited  False   DesiredWithinRange     the desired count is within the acceptable range

Please provide the prometheus-adapter logs with -v=6 around the time the issue happened:

Verbose logging in the adapter shows the following when trying to request the data from the HPA:

I0215 22:22:39.231327       1 httplog.go:132] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/ingresses.networking.k8s.io/my-ingress/error_rate_metric" latency="1.923735ms" userAgent="kube-controller-manager/v1.25.11 (linux/arm64) kubernetes/8cfcba0/system:serviceaccount:kube-system:horizontal-pod-autoscaler" audit-ID="a-b-c-d-e" srcIP="172.1.1.1:47017" resp=404

Other logs were present but not relevant the this error rate metric failing

Anything else we need to know?:

Querying prometheus for the what I expect it should be translating to shows data in response, to be clear though, the results have no labels:

# query: 

clamp_min(
    round(
        sum(
            rate(nginx_ingress_controller_requests{ingress="my-ingress",namespace!="",status=~"^5.."}[1m]) 
            ) or vector(0.00001)
    /
        sum(
            rate(nginx_ingress_controller_requests{ingress="my-ingress",namespace!=""}[1m]) 
            )
    , 0.01) * 100, 
1) 

# result:
{}   - 1

When querying via the RAW addresses in kubectl, I can see that this named metric does exist:

❯ kubectl --context=cluster-context get --raw '/apis/custom.metrics.k8s.io/v1beta1' | jq . | grep error_rate_metric
      "name": "jobs.batch/error_rate_metric",
      "name": "prometheuses.monitoring.coreos.com/error_rate_metric",
      "name": "pods/error_rate_metric",
      "name": "services/error_rate_metric",
      "name": "ingresses.networking.k8s.io/error_rate_metric",
      "name": "namespaces/error_rate_metric",

However when I attempt to query it I get a NotFound:

❯ kubectl --context=cluster-context get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/Ingress/my-ingress/error_rate_metric" | jq .
Error from server (NotFound): the server could not find the metric error_rate_metric for Ingress my-ingress

I expect this to at the very least show 1.

It seems like this issue may be related to this, however the fixes in there do not seem to have helped: #150

Environment:

  • prometheus-adapter version: prometheus-adapter-4.3.0
  • prometheus version: 0.31.0
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.9", GitCommit:"a1a87a0a2bcd605820920c6b0e618a8ab7d117d4", GitTreeState:"clean", BuildDate:"2023-04-12T12:16:51Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.11", GitCommit:"8cfcba0b15c343a8dc48567a74c29ec4844e0b9e", GitTreeState:"clean", BuildDate:"2023-06-14T09:49:38Z", GoVersion:"go1.19.10", Compiler:"gc", Platform:"linux/arm64"}
  • Cloud provider or hardware configuration: EKS
@evin-bz evin-bz added the kind/bug Categorizes issue or PR as related to a bug. label Feb 16, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 16, 2024
@dashpole
Copy link

/assign @dgrisonnet
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants