Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upNeed overflow check for `--storage.tsdb.retention.time` #5398
Comments
beorn7
added
kind/bug
component/local storage
labels
Mar 22, 2019
This comment has been minimized.
This comment has been minimized.
pranjaltale16
commented
Mar 23, 2019
|
Hello, I would like to work on this issue. I think we can add a check here. I'm new to prometheus and hence I'm not sure whether is the best way to fix this or not. Let me know if there is some other way to fix this issue. |
This comment has been minimized.
This comment has been minimized.
|
There is already a check: prometheus/cmd/prometheus/main.go Lines 288 to 296 in 844af4c It should have been set to 100y if there was an overflow. Not sure why it showed negative on the UI.
@beorn7 was there any log mentioning that time retention value is too high? |
This comment has been minimized.
This comment has been minimized.
|
On another note, When I pass |
This comment has been minimized.
This comment has been minimized.
pranjaltale16
commented
Mar 23, 2019
•
|
I think, the problem is because we are not updating if cfg.tsdb.RetentionDuration < 0 {
y, err := model.ParseDuration("100y")
if err != nil {
panic(err)
}
cfg.tsdb.RetentionDuration = y
if newFlagRetentionDuration != 0 {
newFlagRetentionDuration = y
}
level.Info(logger).Log("msg", "time retention value is too high. Limiting to: "+y.String())
} |
This comment has been minimized.
This comment has been minimized.
|
As I said, The fix for this could be display the string passed in the flag if we encounter overflow. Note that this is just for the UI and the corrected value is already being used via |
codesome
added
component/ui
and removed
component/local storage
labels
Mar 23, 2019
This comment has been minimized.
This comment has been minimized.
That depends on kingpin library internals. I wouldn't go to a lot of effort to adjust this, we've already a log message and the right value on /status. |
This comment has been minimized.
This comment has been minimized.
|
All good, saw the logging now, and the correction. I was just confused by the |
beorn7
closed this
Mar 23, 2019
This comment has been minimized.
This comment has been minimized.
|
Yeah, warn would be better. |
beorn7 commentedMar 22, 2019
Bug Report
What did you do?
Ran
prometheus --storage.tsdb.retention.time=300yWhat did you expect to see?
A Prometheus server deleting only samples more than 300y old. Or more realistically an error that the retention time is too long.
What did you see instead? Under which circumstances?
The retention time, according to the
/flagspage, was set to -8985944073709ms.Environment
Context
We are well aware that our retention time is limited to what is expressible with the Go
time.Durationtype, i.e. ~290y. Which should be enough for most purposes.However, with the new
--storage.tsdb.retention.sizeflag, users might be tempted to setstorage.tsdb.retention.timeto something absurdly high to make sure samples are only deleted once the disk is filled. If they are unlucky, thetime.Durationoverflow might result in a very short retention time.