Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus 2.0.0-rc.0 stuck compacting blocks #3295

Closed
zemek opened this Issue Oct 13, 2017 · 3 comments

Comments

Projects
None yet
1 participant
@zemek
Copy link
Contributor

zemek commented Oct 13, 2017

What did you do?
Took a snapshot on one server and copied it to another server

What did you expect to see?
Normal operation

What did you see instead? Under which circumstances?
After a couple hours, both prometheus servers maxed out memory usage and cpu usage steady stated to 40%
Logs showed the same repeated message for 12+ hours:

level=info ts=2017-10-13T06:34:10.516142274Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T06:34:16.543112573Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T06:34:22.568097673Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T06:34:28.514276967Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T06:34:34.472478839Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000

It also looks like it keeps creating new directories inside the data directory:

/opt/prometheus/data$ ls | wc -l
5705

Disk usage also went originally from ~20GB to 600GB+

/opt/prometheus/data$ du -sh
683G	.

Environment

  • System information:

    Linux 4.4.0-97-generic x86_64

  • Prometheus version:

    prometheus, version 2.0.0-rc.0 (branch: HEAD, revision: 012e52e)
    build user: root@d94ce23e8b9b
    build date: 20171005-14:42:30
    go version: go1.9.1

  • Alertmanager version:

    N/A

  • Prometheus configuration file:

global:
  scrape_interval:     15s # Scrape targets every 15 seconds.
  evaluation_interval: 15s # Evaluate rules every 15 seconds.

rule_files:
  - /etc/prometheus/recording_rules.yml

scrape_configs:
  - job_name: 'default'
    consul_sd_configs:
      - server: '127.0.0.1:8500'
        services: ['node_exporter', 'statsd_exporter', 'prometheus']
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: '.*,host=([^,]+),.*'
        replacement: '${1}'
        target_label: 'host'
      - source_labels: [__meta_consul_tags]
        regex: '.*,role=([^,]+),.*'
        replacement: '${1}'
        target_label: 'role'
  • Logs:
    Prometheus server that was restored from a snapshot:
level=info ts=2017-10-13T01:32:53.419554834Z caller=main.go:213 msg="Starting prometheus" version="(version=2.0.0-rc.0, branch=HEAD, revision=012e52e3f9a0c1741b160498615a29bd5d723028)"
level=info ts=2017-10-13T01:32:53.419733311Z caller=main.go:214 build_context="(go=go1.9.1, user=root@d94ce23e8b9b, date=20171005-14:42:30)"
level=info ts=2017-10-13T01:32:53.419761687Z caller=main.go:215 host_details="(Linux 4.4.0-97-generic #120~14.04.1-Ubuntu SMP Wed Sep 20 15:53:13 UTC 2017 x86_64 ip-10-1-30-236 (none))"
level=info ts=2017-10-13T01:32:53.42156497Z caller=web.go:378 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2017-10-13T01:32:53.421582077Z caller=main.go:307 msg="Starting TSDB"
level=info ts=2017-10-13T01:32:53.421585365Z caller=targetmanager.go:68 component="target manager" msg="Starting target manager..."
level=info ts=2017-10-13T01:32:53.623376842Z caller=main.go:319 msg="TSDB started"
level=info ts=2017-10-13T01:32:53.623431997Z caller=main.go:386 msg="Loading configuration file" filename=/etc/prometheus/config.yml
level=info ts=2017-10-13T01:32:53.624480624Z caller=main.go:363 msg="Server is ready to receive requests."
level=info ts=2017-10-13T04:32:45.74589705Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T04:32:49.727479508Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T04:32:53.780102332Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T04:32:57.920778193Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T04:33:02.055650537Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000

Prometheus server that was used to take a snapshot

level=info ts=2017-10-13T01:16:39.210476395Z caller=main.go:319 msg="TSDB started"
level=info ts=2017-10-13T01:16:39.210555562Z caller=main.go:386 msg="Loading configuration file" filename=/etc/prometheus/config.yml
level=info ts=2017-10-13T01:16:39.21175497Z caller=main.go:363 msg="Server is ready to receive requests."
level=info ts=2017-10-13T01:31:53.210713496Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW6TZNFXDWJ24PHNQXR8W7X6
level=info ts=2017-10-13T01:31:53.210980147Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW71HQXGGXGRH1BC42703568
level=info ts=2017-10-13T01:31:53.211173725Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW8ZB1A4GBAVD63BH4CZT5JK
level=info ts=2017-10-13T01:31:53.211294574Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9KWHYH42PEARDCMJ6NG797
level=info ts=2017-10-13T01:31:53.211389436Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9KRVA3275HAA0G2V81K75M
level=info ts=2017-10-13T01:31:53.211465433Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9KTE5BEX276P4AVHEX0CR9
level=info ts=2017-10-13T01:31:53.211541616Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9MSYHJR1K6MEZHNYB54T92
level=info ts=2017-10-13T01:31:53.21167316Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9MVGA943JWNG4T4Y04YKXF
level=info ts=2017-10-13T01:31:53.211768436Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507857383727 maxt=1507858313200
level=info ts=2017-10-13T01:32:45.727503864Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW6TZNFXDWJ24PHNQXR8W7X6
level=info ts=2017-10-13T01:32:45.727831081Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW71HQXGGXGRH1BC42703568
level=info ts=2017-10-13T01:32:45.72803462Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW8ZB1A4GBAVD63BH4CZT5JK
level=info ts=2017-10-13T01:32:45.728167841Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9KWHYH42PEARDCMJ6NG797
level=info ts=2017-10-13T01:32:45.728309847Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9KRVA3275HAA0G2V81K75M
level=info ts=2017-10-13T01:32:45.728400728Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9KTE5BEX276P4AVHEX0CR9
level=info ts=2017-10-13T01:32:45.728497724Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9MSYHJR1K6MEZHNYB54T92
level=info ts=2017-10-13T01:32:45.728572796Z caller=db.go:594 component=tsdb msg="snapshotting block" block=01BW9MVGA943JWNG4T4Y04YKXF
level=info ts=2017-10-13T01:32:45.728665202Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507857383727 maxt=1507858365705
level=info ts=2017-10-13T04:16:23.753186578Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T04:16:27.990981021Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000
level=info ts=2017-10-13T04:16:32.138246912Z caller=compact.go:360 component=tsdb msg="compact blocks" count=1 mint=1507852800000 maxt=1507860000000

@zemek zemek changed the title Prometheus stuck compacting blocks Prometheus 2.0.0-rc.0 stuck compacting blocks Oct 13, 2017

@zemek

This comment has been minimized.

Copy link
Contributor Author

zemek commented Oct 13, 2017

It might also be helpful to note that I was doing this a couple times between fresh boxes.
i.e.
snapshot server1 and restore to new server2
destroy server1
snapshot server2 and restore to new server3

@zemek

This comment has been minimized.

Copy link
Contributor Author

zemek commented Oct 17, 2017

i'm gonna assume this was me accidentally restoring without wiping the datadir

@zemek zemek closed this Oct 17, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.