Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus panic: &runtime.TypeAssertionError #2178

Closed
WeiBanjo opened this Issue Nov 9, 2016 · 4 comments

Comments

Projects
None yet
4 participants
@WeiBanjo
Copy link

WeiBanjo commented Nov 9, 2016

What did you do?
Running Prometheus inside Kubernetes. When I scale down number of replicas for a deployment, Prometheus panic.

What did you expect to see?
Not panic

What did you see instead? Under which circumstances?
When I scale 350 replicas to 0, Prometheus crashed.

Environment

  • System information:

      CentOS Linux release 7.2.1511 (Core)
    

    Linux 3.10.0-327.4.4.el7.x86_64 x86_64

  • Prometheus version:

     Docker image: prom/prometheus:v1.3.1
    
  • Docker version:

     Server:
      Version:      1.11.2
      API version:  1.23
      Go version:   go1.5.4
      Git commit:   b9f10c9
      Built:        Wed Jun  1 21:23:11 2016
      OS/Arch:      linux/amd64
    
  • Kubelet version:

     Kubernetes v1.4.1
    
  • Prometheus configuration file:

global:
  scrape_interval: 600s

scrape_configs:
  - job_name: 'cadvisor-exporter'
    scrape_interval: 60s
    scheme: 'http'
    kubernetes_sd_configs:
    - role: node
      api_server: https://API_SERVER
      tls_config:
        ca_file: CA_CRT_PATH
    relabel_configs:
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        regex: (.*)
        replacement: ${1}:4194
        action: replace
        target_label: __address__
      - source_labels: [instance]
        regex: (.*)\.(.*)\.(.*)
        replacement: ${1}
        action: replace
        target_label: hostname
  - job_name: 'node-exporter'
    scrape_interval: 60s
    scheme: 'http'
    kubernetes_sd_configs:
    - role: node
      api_server: https://API_SERVER
      tls_config:
        ca_file: CA_CRT_PATH
    relabel_configs:
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        regex: (.*)
        replacement: ${1}:9100
        action: replace
        target_label: __address__
      - source_labels: [instance]
        regex: (.*)\.(.*)\.(.*)
        replacement: ${1}
        action: replace
        target_label: hostname
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
  - job_name: 'prod-pod-exporter'
    scrape_interval: 60s
    scheme: 'http'
    kubernetes_sd_configs:
    - role: pod
      api_server: https://API_SERVER
      tls_config:
        ca_file: CA_CRT_PATH
    tls_config:
      ca_file: CA_CRT_PATH
      server_name: '*.k8s.local'
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__meta_kubernetes_pod_label_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_container_port_name]
      action: keep
      regex: prometheus-port
    - source_labels: [__address__, __meta_kubernetes_pod_container_port_map_prometheus_port]
      action: replace
      regex: (.+):(?:\d+);(\d+)
      replacement: ${1}:${2}
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_pod_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name
  • Logs:
E1109 20:02:54.793285       1 runtime.go:64] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"cache.DeletedFinalStateUnknown", assertedString:"*v1.Pod", missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/pkg/util/runtime/runtime.go:70
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/pkg/util/runtime/runtime.go:63
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/pkg/util/runtime/runtime.go:49
/usr/local/go/src/runtime/asm_amd64.s:479
/usr/local/go/src/runtime/panic.go:458
/usr/local/go/src/runtime/iface.go:201
/go/src/github.com/prometheus/prometheus/retrieval/discovery/kubernetes/pod.go:77
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/controller.go:182
<autogenerated>:52
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/shared_informer.go:408
/usr/local/go/src/runtime/asm_amd64.s:2086
panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod [recovered]
panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod

goroutine 419 [running]:
  panic(0x17a4060, 0xc514b803c0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/pkg/util/runtime/runtime.go:56 +0x126
panic(0x17a4060, 0xc514b803c0)
/usr/local/go/src/runtime/panic.go:458 +0x243
github.com/prometheus/prometheus/retrieval/discovery/kubernetes.(*Pod).Run.func3(0x1816540, 0xc502773820)
/go/src/github.com/prometheus/prometheus/retrieval/discovery/kubernetes/pod.go:77 +0x166
github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache.ResourceEventHandlerFuncs.OnDelete(0xc4d5821240, 0xc4d5821280, 0xc5266dcf40, 0x1816540, 0xc502773820)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/controller.go:182 +0x49
github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache.(*ResourceEventHandlerFuncs).OnDelete(0xc4d58212a0, 0x1816540, 0xc502773820)
<autogenerated>:52 +0x78
github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache.(*processorListener).run(0xc4cbe56e80, 0xc49c814ba0)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/shared_informer.go:408 +0x356
created by github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache.(*sharedIndexInformer).AddEventHandler
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/1.5/tools/cache/shared_informer.go:257 +0x24b
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 10, 2016

Sounds like a bug in the k8 client code.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Nov 10, 2016

I think this is a problem in our code. This behavior is documented here. I think we just need to check whether the type cast was successful, if not then return and wait for the next resync to fix the state of the cache. I'll prepare the fix and we'll see what @fabxc says in the review.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Nov 12, 2016

Yea, sounds like we have to handle it (in the operator, too!). The returned object has a key and a stale object though, which is generally fine for our use case I think.

We just have to handle it like here.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.