Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic using Azure service discovery #4779

Closed
mblaschke opened this Issue Oct 24, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@mblaschke
Copy link

mblaschke commented Oct 24, 2018

Bug Report

Panic using Azure service discovery

Environment

  • System information:
    Linux 4.15.0-1021-azure x86_64
    Docker version 18.06.1-ce, build e68fc7a

  • Prometheus version:
    image: prom/prometheus:v2.4.3

  • Prometheus configuration file:

  - job_name: 'azurerm-vm-node'
    scrape_interval: 1m
    azure_sd_configs:
    - subscription_id: aaaaa
      tenant_id: xxxxxx
      client_id: xxxxxx
      client_secret: xxxxxx
      port: 9100
    - subscription_id: bbbbb
      tenant_id: xxxxxx
      client_id: xxxxxx
      client_secret: xxxxxx
      port: 9100
    - subscription_id: ccccc
      tenant_id: xxxxxx
      client_id: xxxxxx
      client_secret: xxxxxx
      port: 9100
    - subscription_id: ddddd
      tenant_id: xxxxxx
      client_id: xxxxxx
      client_secret: xxxxxx
      port: 9100
    relabel_configs:
    - source_labels: [__meta_azure_machine_location]
      target_label: location
    - source_labels: [__meta_azure_machine_name]
      target_label: nodename
    - source_labels: [__meta_azure_machine_os_type]
      target_label: os
    - source_labels: [__meta_azure_machine_resource_group]
      target_label: resourcegroup
    - source_labels: [__meta_azure_machine_tag_owner]
      target_label: team
    - source_labels: [__meta_azure_machine_id]
      regex: "/subscriptions/aaaaa/.*"
      target_label: "environment"
      replacement: "aaaaa"
    - source_labels: [__meta_azure_machine_id]
      regex: "/subscriptions/bbbbb/.*"
      target_label: "environment"
      replacement: "bbbbb"
    - source_labels: [__meta_azure_machine_id]
      regex: "/subscriptions/ccccc/.*"
      target_label: "environment"
      replacement: "ccccc"
    - source_labels: [__meta_azure_machine_id]
      regex: "/subscriptions/ddddd/.*"
      target_label: "environment"
      replacement: "ddddd"

  - job_name: 'azurerm-vm-wmi'
    scrape_interval: 1m
    azure_sd_configs:
    - subscription_id: aaaaa
      tenant_id: xxxxxx
      client_id: xxxxxx
      client_secret: xxxxxx
      port: 9182
    - subscription_id: bbbbb
      tenant_id: xxxxxx
      client_id: xxxxxx
      client_secret: xxxxxx
      port: 9182
    - subscription_id: ccccc
      tenant_id: xxxxxx
      client_id: xxxxxx
      client_secret: xxxxxx
      port: 9182
    - subscription_id: ddddd
      tenant_id: xxxxxx
      client_id: xxxxxx
      client_secret: xxxxxx
      port: 9182
    relabel_configs:
    - source_labels: [__meta_azure_machine_location]
      target_label: location
    - source_labels: [__meta_azure_machine_name]
      target_label: nodename
    - source_labels: [__meta_azure_machine_os_type]
      target_label: os
    - source_labels: [__meta_azure_machine_resource_group]
      target_label: resourcegroup
    - source_labels: [__meta_azure_machine_tag_owner]
      target_label: team
    - source_labels: [__meta_azure_machine_id]
      regex: "/subscriptions/aaaaa/.*"
      target_label: "environment"
      replacement: "aaaaa"
    - source_labels: [__meta_azure_machine_id]
      regex: "/subscriptions/bbbbb/.*"
      target_label: "environment"
      replacement: "bbbbb"
    - source_labels: [__meta_azure_machine_id]
      regex: "/subscriptions/ccccc/.*"
      target_label: "environment"
      replacement: "ccccc"
    - source_labels: [__meta_azure_machine_id]
      regex: "/subscriptions/ddddd/.*"
      target_label: "environment"
      replacement: "ddddd"

  • Logs:
level=info ts=2018-10-24T06:03:43.686740841Z caller=main.go:564 msg="TSDB started"
level=info ts=2018-10-24T06:03:43.686811045Z caller=main.go:624 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-10-24T06:03:43.693751524Z caller=main.go:650 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-10-24T06:03:43.693789126Z caller=main.go:523 msg="Server is ready to receive web requests."
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x8e80c8]

goroutine 1615 [running]:
github.com/prometheus/prometheus/discovery/azure.(*Discovery).refresh.func2(0xc02106a870, 0xc006b24fc0, 0xc00050a840, 0x3e, 0xc0294336b0, 0xa5, 0xc0206f29f0, 0xc, 0xc022aba340, 0x39, ...)
        /go/src/github.com/prometheus/prometheus/discovery/azure/azure.go:322 +0x588
created by github.com/prometheus/prometheus/discovery/azure.(*Discovery).refresh
        /go/src/github.com/prometheus/prometheus/discovery/azure/azure.go:282 +0x674
@sylr

This comment has been minimized.

Copy link
Contributor

sylr commented Oct 24, 2018

I've got exactly the same issue (same prometheus version) :

level=error ts=2018-10-24T03:46:32.164842306Z caller=engine.go:526 component="query engine" msg="error expanding series set" err="context canceled"
level=error ts=2018-10-24T04:21:26.967205519Z caller=api.go:1013 component=web msg="error writing response" bytesWritten=0 err="write tcp 10.199.0.149:9090->10.100.3.133:50800: write: broken pipe"
level=error ts=2018-10-24T04:30:03.85725554Z caller=engine.go:526 component="query engine" msg="error expanding series set" err="context canceled"
level=error ts=2018-10-24T04:30:04.864806469Z caller=engine.go:526 component="query engine" msg="error expanding series set" err="context canceled"
level=error ts=2018-10-24T06:01:34.044734663Z caller=engine.go:526 component="query engine" msg="error expanding series set" err="context canceled"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x8e80c8]
goroutine 3621262 [running]:
github.com/prometheus/prometheus/discovery/azure.(*Discovery).refresh.func2(0xc05cc34b70, 0xc07237e960, 0xc0a25a0b00, 0x11, 0xc06766b860, 0x9b, 0xc00ae86aa0, 0x16, 0xc02932f380, 0x21, ...)
	/go/src/github.com/prometheus/prometheus/discovery/azure/azure.go:322 +0x588
created by github.com/prometheus/prometheus/discovery/azure.(*Discovery).refresh
	/go/src/github.com/prometheus/prometheus/discovery/azure/azure.go:282 +0x674
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Oct 24, 2018

It is crashing here:

if networkInterface.Properties.Primary == nil {
level.Debug(d.logger).Log("msg", "Skipping deallocated virtual machine", "machine", vm.Name)
ch <- target{}
return
}

This part of the code hasn't changed recently so it is likely that the Azure API is returning something different.

@mblaschke and/or @sylr can you test with changing the code to if networkInterface.Properties == nil || networkInterface.Properties.Primary == nil?

sylr added a commit to sylr/prometheus that referenced this issue Nov 15, 2018

Prevent Azure SD panic (fix prometheus#4779)
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>

sylr added a commit to sylr/prometheus that referenced this issue Dec 15, 2018

Prevent Azure SD panic (fix prometheus#4779) (prometheus#4867)
cherry picked from commit 1fd3b33

Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.