Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Stable (0.14.0) segfault in local/storage.go #895

Closed
mwitkow opened this Issue Jul 16, 2015 · 6 comments

Comments

Projects
None yet
2 participants
@mwitkow
Copy link
Contributor

mwitkow commented Jul 16, 2015

We're running:

prometheus, version 0.14.0 (branch: stable, revision: 67e7741)
  build user:       root
  build date:       20150603-06:20:33
  go version:       1.4.2

We got Prometheus segfaulting periodically on multiple instances. We do periodically SIGHUP to reload configs, and we do have of warnings of

time="2015-07-15T16:25:49Z" level=warning msg="Error expanding alert template OurMagicalAlert with data '{map[deployment_name:Foobar metric:Load severity:page] 13}': error parsing template __alert_OurMagicalAlert: template: __alert_OurMagicalAlert:1: function \"labels\" not defined" file=manager.go line=201

a couple of times over a window of 2-3 minutes before:

unexpected fault address 0xc2172e87b0
fatal error: fault
[signal 0xb code=0x2 addr=0xc2172e87b0 pc=0xc2172e87b0]

goroutine 81 [running]:
runtime.gothrow(0xb13be0, 0x5)
    /usr/lib/go/src/runtime/panic.go:503 +0x8e fp=0xc2172e85c0 sp=0xc2172e85a8
runtime.sigpanic()
    /usr/lib/go/src/runtime/sigpanic_unix.go:29 +0x265 fp=0xc2172e8610 sp=0xc2172e85c0
created by github.com/prometheus/prometheus/storage/local.(*memorySeriesStorage).Start
    /go/src/github.com/prometheus/prometheus/storage/local/storage.go:240 +0x502

goroutine 1 [select, 19 minutes]:

We do have a full set of error stack available off-record.

Is this a known issue?

LOLz bonus: I posted this by mistake to fleet: coreos/fleet#1309 ;)

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 16, 2015

This is not a known issue.

The rule file with the alert raising the warning and any additional error stacks would probably help a lot to track this down. Does the segfault always happen in storage.go:240?

@mwitkow

This comment has been minimized.

Copy link
Contributor Author

mwitkow commented Jul 16, 2015

@fabxc, I'm happy to drop them off-public record.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 16, 2015

@mwitkow

This comment has been minimized.

Copy link
Contributor Author

mwitkow commented Jul 17, 2015

So yea, it seems like it was an OS issue. Rolling back CoreOS from 735.0.0 to 723.0.0 made Prometheus stable again.

It seems that 735 was broken as per newer release note:
https://coreos.com/releases/#745.1.0

We experienced similar issues with Etcd and other Go programs. Apologies if we have wasted your time.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 17, 2015

No worries at all. Glad this could be resolved quickly.
Closing here then.

@fabxc fabxc closed this Jul 17, 2015

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.