Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting a ulimit on virtual memory causes compaction failures and subsequent data corruption #5135

Closed
sandyteenan opened this Issue Jan 24, 2019 · 8 comments

Comments

Projects
None yet
4 participants
@sandyteenan
Copy link

sandyteenan commented Jan 24, 2019

Proposal

Use case. Why is this important?

Running out of memory during compaction causes data corruption.

Bug Report

What did you do?

Set a ulimit to restrict the virtual memory available to Prometheus on a shared host. In our main instance, the ulimit is currently 40GB. For repro purposes, the ulimit was set to 230000 bytes.

To recreate the issue quickly, used the parameters --storage.tsdb.min-block-duration 5m --storage.tsdb.max-block-duration 5m

Scrape intervals were also set low (2s) for the purposes of repro.

What did you expect to see?

Clean termination of the prometheus process when an OOM condition is encountered.

What did you see instead? Under which circumstances?

Corruption of data directories, resulting in an inability to restart prometheus.

Rapid creation of new folders in the data directory of
prometheus-2.6.0-datadircreationafteroom.txt

Environment

  • System information:

Linux 3.10.0-957.1.3.el7.x86_64 x86_64

  • Prometheus version:

prometheus, version 2.6.0 (branch: HEAD, revision: dbd1d58)
build user: root@bf5760470f13
build date: 20181217-15:14:46
go version: go1.11.3

  • Prometheus configuration file:
    global:
    scrape_interval: 2s
    scrape_configs:
  • job_name: example
    static_configs:
    • targets:
      • localhost:8000
  • job_name: node_exporter
    static_configs:
    • targets:
      • localhost:9100
  • job_name: prometheus
    static_configs:
  • targets:
    • localhost:9090
  • Logs:
insert Prometheus and Alertmanager logs relevant to the issue here

prometheus-2.6.0-ulimit-compactionfailed.txt

prometheus-2.6.0-datacorruptionaftercompactionerrors.txt

prometheus-2.6.0-pprofheap.txt

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Jan 24, 2019

may be somewhat related to #4392

cc @krasi-georgiev @simonpasquier

@sandyteenan

This comment has been minimized.

Copy link
Author

sandyteenan commented Jan 24, 2019

#4392 looks like there just isn’t enough space to mmap all the blocks required?

My virtual memory hard ulimit is unlimited, but a soft ulimit is important as Prometheus isn’t the only thing running on this machine.

Does anyone know what dictates when blocks are munmap’d?

@cstyan

This comment has been minimized.

Copy link
Contributor

cstyan commented Jan 24, 2019

Any time Prometheus opens a Block. Compaction opens Blocks, db.reload() when the tsdb is opened on start opens Blocks, cleaning tombstones, etc.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 25, 2019

it is safe to delete all but one of the overlapping blocks with the 1548325500000-1548325800000 ranges as these are duplicates. I don't think any data is corrupted just that the db couldn't be reloaded due to running out of memory.

we have a long standing PR to add a scan command to the tsdb cli tool and if this one gets merged it would be easier to recover after such cases.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 25, 2019

The current code is already meant to handle overlaps gracefully, did that break?

@sandyteenan

This comment has been minimized.

Copy link
Author

sandyteenan commented Jan 25, 2019

This one @krasi-georgiev ? prometheus/tsdb#320 Thanks - I'll have a read through.

Are all blocks mmap'd at startup? I'm trying to understand the memory requirements so I can play nicely with others.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 25, 2019

Yes, everything is mmaped at startup.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jan 25, 2019

@sandyteenan yes sorry , forgot to link it.

we run benchmarks occasionally and it seems that during compaction it needs around 10% more memory.

You can look at the latest bench test here:
http://prombench.prometheus.io/grafana/d/7gmLoNDmz/prombench?orgId=1&var-RuleGroup=All&var-pr-number=5123&from=1548083380645&to=1548209109677

the Prometheus dashboard shows around 50% jumps but the node exporter dashboards show around 10% jumps. Not sure why this is but 10% looks close to what we can expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.