Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic "divide by zero" in remote_write loop #2808

Closed
cubranic opened this Issue Jun 5, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@cubranic
Copy link

cubranic commented Jun 5, 2017

What did you do?
While prometheus was using it as the remtoe write endpoint, I suspended the example remote write receiver with Ctrl-Z. Prometheus crashed with the following message:

Jun 05 09:58:08 node0.vagrant.test prometheus[13105]: time="2017-06-05T09:58:08-07:00" level=warning msg="Error sending 100 samples to remote storage: context deadline exceeded" source="queue_manager.go:500"
Jun 05 09:58:39 node0.vagrant.test prometheus[13105]: time="2017-06-05T09:58:39-07:00" level=warning msg="Error sending 100 samples to remote storage: context deadline exceeded" source="queue_manager.go:500"
Jun 05 09:58:44 node0.vagrant.test prometheus[13105]: time="2017-06-05T09:58:44-07:00" level=info msg="Completed maintenance sweep through 4 archived fingerprints in 40.00907729s." source="storage.go:1423"
Jun 05 09:59:09 node0.vagrant.test prometheus[13105]: time="2017-06-05T09:59:09-07:00" level=warning msg="Error sending 100 samples to remote storage: context deadline exceeded" source="queue_manager.go:500"
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: time="2017-06-05T09:59:24-07:00" level=info msg="Remote storage resharding from 1000 to 0 shards." source="queue_manager.go:351"
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: panic: runtime error: integer divide by zero
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: goroutine 203 [running]:
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/storage/remote.(*shards).enqueue(0xc462252840, 0xc4600ba0a0, 0x0)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:430 +0xcb
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/storage/remote.(*QueueManager).Append(0xc4200c4540, 0xc457e6efe0, 0xc42033e330, 0x0)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:239 +0x22e
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/storage/remote.(*Writer).Append(0xc42033e330, 0xc457e6efe0, 0x0, 0x0)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/storage/remote/write.go:79 +0xa8
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/storage.Fanout.Append(0xc420519180, 0x2, 0x2, 0xc457e6efe0, 0x199eb60, 0xc421da8e00)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/storage/storage.go:60 +0x66
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/storage.(*Fanout).Append(0xc420519600, 0xc457e6efe0, 0x26dec90, 0xc462b17c20)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: <autogenerated>:3 +0x69
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/retrieval.(*countingAppender).Append(0xc4600ba040, 0xc457e6efe0, 0xc420b1fa90, 0xc462b17cd8)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/retrieval/target.go:261 +0x48
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/retrieval.ruleLabelsAppender.Append(0x262bb00, 0xc4600ba040, 0xc422cada10, 0xc457e6efe0, 0xc4600ba060, 0x19272e0)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/retrieval/target.go:201 +0x1af
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/retrieval.(*ruleLabelsAppender).Append(0xc4600ba060, 0xc457e6efe0, 0x262e800, 0xc4600ba060)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: <autogenerated>:35 +0x69
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/retrieval.(*scrapeLoop).append(0xc42044fb20, 0xc4238bf300, 0x275, 0x330, 0x236f1514, 0x26ea4e0, 0xc4238bf300)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:519 +0x2ec
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: github.com/prometheus/prometheus/retrieval.(*scrapeLoop).run(0xc42044fb20, 0x37e11d600, 0x2540be400, 0x0)
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:429 +0x5b8
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: created by github.com/prometheus/prometheus/retrieval.(*scrapePool).sync
Jun 05 09:59:24 node0.vagrant.test prometheus[13105]: /go/src/github.com/prometheus/prometheus/retrieval/scrape.go:258 +0x3a6
  • System information:

Linux 3.10.0-327.el7.x86_64 x86_64

  • Prometheus version:
prometheus, version 1.6.3 (branch: master, revision: c580b60c67f2c5f6b638c3322161bcdf6d68d7fc)
  build user:       root@a6410e65f5c7
  build date:       20170522-09:15:06
  go version:       go1.8.1
  • Prometheus configuration file:
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

remote_write:
  - url: "http://localhost:1234/receive"

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets:
        - node0.vagrant.test:9100
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 5, 2017

@cubranic

This comment has been minimized.

Copy link
Author

cubranic commented Jun 5, 2017

The issue is reproducible, but you have to keep the remote storage adapter suspended for at least a couple of minutes. During that time, Prometheus is running and is logging warnings about exceeding the context deadline to the remote write. After unsuspending the receiver, it will print a batch of samples it received, and then no more, and checking the Prometheus status shows that it's panicked.

@cubranic

This comment has been minimized.

Copy link
Author

cubranic commented Jun 5, 2017

Actually, you don't even have to restart the suspended receiver, just leave it suspended for five minutes or so.

@tomwilkie

This comment has been minimized.

Copy link
Member

tomwilkie commented Jul 25, 2017

This will be fixed when #2973 is merged.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.