Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda-metrics-server can't reach keda-operator #1260

Closed
avbelyaev opened this issue Nov 15, 2023 · 1 comment
Closed

Keda-metrics-server can't reach keda-operator #1260

avbelyaev opened this issue Nov 15, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@avbelyaev
Copy link

I install latest KEDA v2.12 with Helm in EKS (and I also install directly from master). I'm using Kafka scaler and when I produce messages to Kafka I expect a sample service to scale on consumer lag.

However it doesn't happen. And in the logs of keda-operator-metrics-apiserver I see this:

W1114 17:22:44.958467       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.a-m.svc.cluster.local:9666", ServerName: "keda-operator.a-m.svc.cluster.local:9666", }. Err: connection error: desc = "transport: Error while dialing: dial tcp: lookup keda-operator.a-m.svc.cluster.local on 10.230.128.2:53: no such host"
I1114 17:22:48.267369       1 client.go:88] keda_metrics_adapter/provider "msg"="Waiting for establishing a gRPC connection to KEDA Metrics Server"
E1114 17:22:48.308944       1 provider.go:91] keda_metrics_adapter/provider "msg"="timeout" "error"="timeout while waiting to establish gRPC connection to KEDA Metrics Service server" "server"="keda-operator.a-m.svc.cluster.local:9666"
I1114 17:22:48.309046       1 trace.go:236] Trace[1260412825]: "List" accept:application/vnd.kubernetes.protobuf, */*,audit-id:899ce6ac-9116-46f2-a6a0-4afcebb5497d,client:172.16.172.182,protocol:HTTP/2.0,resource:s0-kafka-mytopic1,scope:namespace,url:/apis/external.metrics.k8s.io/v1beta1/namespaces/a-m/s0-kafka-mytopic1,user-agent:kube-controller-manager/v1.27.7 (linux/amd64) kubernetes/3719c84/system:serviceaccount:kube-system:horizontal-pod-autoscaler,verb:LIST (14-Nov-2023 17:21:48.252) (total time: 60056ms):
Trace[1260412825]: [1m0.056898777s] [1m0.056898777s] END
E1114 17:22:48.309267       1 timeout.go:142] post-timeout activity - time-elapsed: 63.059454ms, GET "/apis/external.metrics.k8s.io/v1beta1/namespaces/a-m/s0-kafka-mytopic1" result: runtime error: invalid memory address or nil pointer dereference
goroutine 2390 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1()
	/workspace/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:110 +0x9c
panic({0x2834040, 0x4a21350})
	/usr/local/go/src/runtime/panic.go:884 +0x213
sigs.k8s.io/custom-metrics-apiserver/pkg/registry/external_metrics.(*REST).List(0xc000047a10, {0x31dae78, 0xc0011e5890}, 0x0?)
	/workspace/vendor/sigs.k8s.io/custom-metrics-apiserver/pkg/registry/external_metrics/reststorage.go:92 +0x120
k8s.io/apiserver/pkg/endpoints/handlers.ListResource.func1({0x31da0d0, 0xc0007d3060}, 0xc0011e3100)
	/workspace/vendor/k8s.io/apiserver/pkg/endpoints/handlers/get.go:278 +0xf3b
sigs.k8s.io/custom-metrics-apiserver/pkg/apiserver/installer.restfulListResource.func1(0xc0007d2fc0, 0xc0001dc540)
	/workspace/vendor/sigs.k8s.io/custom-metrics-apiserver/pkg/apiserver/installer/installer.go:274 +0x6b
k8s.io/apiserver/pkg/endpoints/metrics.InstrumentRouteFunc.func1(0xc0007d2fc0, 0xc0001dc540)
	/workspace/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:571 +0x22c
github.com/emicklei/go-restful/v3.(*Container).dispatch(0xc000ef4990, {0x31da0d0, 0xc0007d2820}, 0xc0011e3100)
	/workspace/vendor/github.com/emicklei/go-restful/v3/container.go:299 +0x5db
github.com/emicklei/go-restful/v3.(*Container).Dispatch(...)
	/workspace/vendor/github.com/emicklei/go-restful/v3/container.go:204
k8s.io/apiserver/pkg/server.director.ServeHTTP({{0x2cca0cb?, 0x40bdf4?}, 0xc000ef4990?, 0xc000fe99d0?}, {0x31da0d0, 0xc0007d2820}, 0xc0011e3100)
	/workspace/vendor/k8s.io/apiserver/pkg/server/handler.go:146 +0x4e7
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x31da0d0, 0xc0007d2820}, 0xc0011e3100)
	/workspace/vendor/k8s.io/apiserver/pkg/endpoints/filterlatency/filterlatency.go:110 +0x1ca
net/http.HandlerFunc.ServeHTTP(0x31dae78?, {0x31da0d0?, 0xc0007d2820?}, 0x4?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.withAuthorization.func1({0x31da0d0, 0xc0007d2820}, 0xc0011e3100)
	/workspace/vendor/k8s.io/apiserver/pkg/endpoints/filters/authorization.go:78 +0x654
net/http.HandlerFunc.ServeHTTP(0xc14d090b0eb805ad?, {0x31da0d0?, 0xc0007d2820?}, 0xc0004514b8?)
	/usr/local/go/src/net/http/server.go:2122 +0x2f

This is the last line of keda-operator logs

2023-11-14T17:17:50Z	INFO	grpc_server	Starting Metrics Service gRPC Server	{"address": ":9666"}

External metrics should've appeared, however:

$ k get scaledobject kafka-consumer-keda-scaled-object -n a-m -o jsonpath={.status.externalMetricNames}
["s0-kafka-mytopic1"]

$ k get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "externalmetrics",
      "singularName": "",
      "namespaced": true,
      "kind": "ExternalMetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

Expected Behavior

  • service to scale on Kafka lag

Actual Behavior

  • errors in metrics-server

Steps to Reproduce the Problem

  1. Install KEDA v2.12 on EKS: helm install keda kedacore/keda --namespace a-m --set metricsServer.dnsPolicy=ClusterFirst,metricsServer.useHostNetwork=true
  2. Realise that v2.12 doesnt expose port 9666 on keda-operator pods. It's exposed in the service however:
$ helm install keda kedacore/keda --namespace a-m --dry-run
---
# Source: keda/templates/manager/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: keda-operator
spec:
  ports:
  - name: metricsservice
    port: 9666
    targetPort: 9666
---
# Source: keda/templates/manager/deployment.yaml
kind: Deployment
metadata:
  name: keda-operator
spec:
  template:
    spec:
      containers:
        - name: keda-operator
          ...
          ports:                                       <--- only 8080 here
          - containerPort: 8080
            name: http
            protocol: TCP
          env:
            - name: WATCH_NAMESPACE
              value: ""
  1. Install from master:
$ git clone https://github.com/kedacore/charts.git
$ helm install keda . --values values.yaml --namespace a-m --set metricsServer.dnsPolicy=ClusterFirst,metricsServer.useHostNetwork=true --dry-run
---
# Source: keda/templates/manager/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: keda-operator
spec:
  template:
    spec:
      containers:
        - name: keda-operator
          ports:                                   <---- now both 8080 and 9666 are exposed
          - containerPort: 8080
            name: http
            protocol: TCP
          - containerPort: 9666
            name: metricsservice
            protocol: TCP

  1. Create scaled object
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-keda-scaled-object
  namespace: a-m
spec:
  scaleTargetRef:
    name: kafka-consumer
  pollingInterval:  10
  cooldownPeriod:   120
  minReplicaCount:  1
  maxReplicaCount:  100
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka-zookeep:9092
      consumerGroup: my-consumer-group
      topic: mytopic1
      lagThreshold: '1'
      offsetResetPolicy: latest
  1. Check logs again, still the same errors

Specifications

  • KEDA Version: 2.12 and main
  • Platform & Version: EKS, Server Version: v1.27.7-eks-4f4795d
  • Kubernetes Version: 1.27.7
  • Scaler(s): Kafka
@avbelyaev avbelyaev added the bug Something isn't working label Nov 15, 2023
@avbelyaev
Copy link
Author

Sorry, created the issue in the wrong repo. Created kedacore/keda#5184 in the main repo

@avbelyaev avbelyaev closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant