Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV Segmentation fault while scrape deleted kubernetes endpoints #5172

Closed
ferrandinand opened this Issue Feb 1, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@ferrandinand
Copy link

ferrandinand commented Feb 1, 2019

Bug Report

What did you do?
We had some config to monitor some kubernetes endpoint but when some files and resources are deleted (e.g. certificate files) prometheus crashes.

What did you expect to see?
Capture and return errors during scrapping preventing prometheus crash.

What did you see instead? Under which circumstances?
When deleting kubernetes cluster and all files associated to that cluster.

Environment

  • System information:
    linux 4.19.9-coreos

  • Prometheus version:
    prometheus version=2.6.0

  • Prometheus configuration file:

- job_name: guest-cluster-2sd23-apiserver
  scheme: https
  kubernetes_sd_configs:
  - api_server: https://master.2sd23
    role: endpoints
    tls_config:
      ca_file: /certs/2sd23-ca.pem
      cert_file: /certs/2sd23-crt.pem
      key_file: /certs/2sd23-key.pem
      insecure_skip_verify: false
  • Logs:
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x6675c2]

goroutine 2529 [running]:
net/http.(*Client).deadline(0x0, 0xc019b57208, 0x40bb2f, 0xc015cd40e0)
        /usr/local/go/src/net/http/client.go:187 +0x22
net/http.(*Client).do(0x0, 0xc04b85ef00, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:527 +0xab
net/http.(*Client).Do(0x0, 0xc04b85ef00, 0x23, 0xc02580e4c0, 0x9)
        /usr/local/go/src/net/http/client.go:509 +0x35
github.com/prometheus/prometheus/scrape.(*targetScraper).scrape(0xc0203c1f50, 0x1eb97e0, 0xc00a3c7a40, 0x1e997a0, 0xc0515e1b20, 0x0, 0x0, 0x0, 0x0)
        /app/scrape/scrape.go:471 +0x111
github.com/prometheus/prometheus/scrape.(*scrapeLoop).run(0xc017ed6e80, 0xdf8475800, 0xdf8475800, 0x0)
        /app/scrape/scrape.go:813 +0x487
created by github.com/prometheus/prometheus/scrape.(*scrapePool).sync
        /app/scrape/scrape.go:336 +0x45d
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Feb 1, 2019

Tried to reproduce but I couldn't make it crash... Did you reload the Prometheus process?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Feb 4, 2019

Doesn't getting a segfault in Go mean either a bug in Go, memory corruption, or some bug in cgo libraries? But this crash seems to have happened in Go's HTTP client code, judging by the stack trace above?

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Feb 4, 2019

The stack trace show that the http.Client pointer is nil. Looking at the code, it might happen when the TLS files are unavailable/deleted when Prometheus reloads the scrape configuration. It is a very edge case as the TLS files need to be present again when the discovery manager applies the new configuration (edit: TLS configuration for discovery and scraping are different so it is easy to trigger it). I'll send a PR...

prometheus/scrape/scrape.go

Lines 144 to 148 in e158c53

client, err := config_util.NewClientFromConfig(cfg.HTTPClientConfig, cfg.JobName)
if err != nil {
// Any errors that could occur here should be caught during config validation.
level.Error(logger).Log("msg", "Error creating HTTP client", "err", err)
}

prometheus/scrape/scrape.go

Lines 236 to 240 in e158c53

client, err := config_util.NewClientFromConfig(cfg.HTTPClientConfig, cfg.JobName)
if err != nil {
// Any errors that could occur here should be caught during config validation.
level.Error(sp.logger).Log("msg", "Error creating HTTP client", "err", err)
}

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Feb 4, 2019

Ahh ok, because Go logs panic: runtime error: invalid memory address or nil pointer dereference (which wasn't included in the logs above), and I forgot that it also logs SIGSEGV below in that case (which was included here). Thanks for digging into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.