Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wide range of timestamps leads to running out of file descriptors #2725

Closed
brian-brazil opened this Issue May 16, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@brian-brazil
Copy link
Member

brian-brazil commented May 16, 2017

I just fired up a Prometheus 2.0 with a fresh data directory. I was manually specifying timestamps and specified a timestamp in 1970 by mistake. This lead to 1018 directories being created in the data directory, and did not end well:

INFO[0000] Starting prometheus (version=2.0.0-alpha.0, branch=staleness, revision=ff3fd7e931cf4fb55e0e3177f7395a33eb3485c5)  source=main.go:73
INFO[0000] Build context (go=go1.8, user=bbrazil@kozo, date=20170516-13:04:56)  source=main.go:74
INFO[0000] Loading configuration file prometheus.yml     source=main.go:217
INFO[0000] Starting target manager...                    source=targetmanager.go:65
INFO[0000] Listening on :9090                            source=web.go:260
ts=2017-05-16T13:09:01.296898914Z caller=db.go:234 msg="compaction failed" err="plan compaction: open data: too many open files"
ERRO[0011] append failed                                 err=open data/b-001017/wal: too many open files source=scrape.go:519 target={__address__="kozo:9100", __metrics_path__="/metrics", __scheme__="http", instance="kozo:9100", job="prometheus"}
ts=2017-05-16T13:09:01.297235707Z caller=compact.go:198 msg="compact blocks" blocks="[(1, 01BG8P0A4HF9SCRYEB51CTWCJX)]"
ts=2017-05-16T13:09:01.297510717Z caller=db.go:234 msg="compaction failed" err="persist head block: open chunk writer: open data/b-000001.tmp/chunks: too many open files"
ERRO[0011] append failed                                 err=not found source=scrape.go:519 target={__address__="kozo:9100", __metrics_path__="/metrics", __scheme__="http", instance="kozo:9100", job="prometheus"}

This should be handled more gracefully.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 16, 2017

Yea, initial invariant was that all blocks must be non-overlapping and also have no gaps. Realised pretty early on that will cause issues and prometheus/tsdb#80 should kill that idea for good.

Not sure though if we really want to handle a case like described here at all. Data would be deleted right after anyway due to retention and even if not, our limited append window makes it never really work.
This should probably be captured by the timestamp sanitisation we have discussed before. If a sample is more than 1 min block size before or after wall time, reject the sample for appends.

@brian-brazil

This comment has been minimized.

Copy link
Member Author

brian-brazil commented May 16, 2017

The issue here is more that we ran out of FDs.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented May 16, 2017

Yes, because it created 1080 blocks which it all kept open. In a practical setup that applies the above guards I mention, this cannot happen anymore.
Then, any realistic setup shouldn't hit that limit. If it turns out it does, we can re-investigate. But for now I wouldn't start closing blocks and reopening them on demand.

Max block size is 10% time range of retention by default. Assume that the most recent 10% time range holds 20 less-compacted blocks, that puts us at 29 blocks in total with a handful of open FDs each. That should practically be fine. Within reasonable limits, FDs are not an expensive resource.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.