-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filesystem fill up time #29
Comments
|
Hi Anarcat, Thanks for the upgrade proposal, it looks nice. I've been testing both formulas and the first one seems to work better, but take note that it only reports content if the values are "> 0", if not, the box it will be empty. The second formula doesn't report good values, in my testing lab, 11ms in a filesystem without changes. In any case, if you want to test it, the corrected formula is:
Please, check the last commit on node-exporter-full.json it have the new box under "CPU Memory Net Disk", you can move it to other place without problem. Regards, |
|
that looks okay, but I still find some strange things going on. take this graph for example: This gives the following table:
There are many problems here, the first of which of course is the host isn't continuously available (it's a workstation, and it shuts down once in a while). But then the other filesystems (I'm specifically interested in
Here's the raw unprocessed output from Prometheus doing the query (
Notice how Prom thinks those numbers are negative. I would also point out that it's somewhat unlikely that (for example) So I'm not sure those derivatives are that useful in predicting the future. There might be something fishy going on here... I find it especially strange that the estimates would vary based on the Grafana time range... |
|
Another example of the estimate failing, on my home server:
Here's the absolute numbers: And relative: As you can see, In fact, maybe we should use the infinity symbol ( |
|
Maybe I'm just proving how useless those metrics are, sorry for thinking out loud. :) |
|
Well, it's a fact that the formula doesn't work as expected. As the original was made by Robust Perception, maybe @brian-brazil or @Conorbro can said something about it and help us? |
|
Could you check if predict_linear function return results in your case:
|
|
from what i understand, predict_linear tries to find the value at a specific time. we're looking for the opposite: the time for a specific value (namely, "zero space left")... |
|
Do you finally find any working solultion? If the dashboard isn't reliable, I think that it's better to remove it. |
|
i haven't, unfortunately, and i agree. :/ |
|
I am using a similar query to find filesystem usage but i get error ("1:2: parse error: unexpected character: '\ufeff'") when i try to filter for just one node/instance. ( 1 - (node_filesystem_free_bytes{device! below is the original query i am using but this gives data for all the nodes that are registered to my PMM. ( 1 - (node_filesystem_free_bytes{device! Any idea on how to filter specific nodes? |
|
Small brain dump as I looked into this. https://promcon.io/2022-munich/talks/tamland-how-gitlabcom-uses-long-/ seem to be the way to go. Ref that pointed me into this direction: prometheus/prometheus#11705 (reply in thread) About the formulas used here, I reworked them into: Which gives daily until full but only for linear disk usage change. Also note that you probably want "Days until 80 % are reached". Still very poor approach compared to Tamland. |



One reason why I still have the host stats dashboard is because it has this neat little table of "Filesystem Fill Up Time" which (tries to?) compute the time at which the filesystem will fill up.
I don't think it's working very well because the results are just off here. But it got me thinking about how this could be implemented and whether you'd be interested in adding this to the dashboard...
The hosts stats dashboard uses this formula:
This blog post suggests instead just using the derivative as a base:
I would suggest using
node_filesystem_avail_bytesin any case, as that is the user-visible metric that will detect actual failures in userspace...I'm not very familiar with Prometheus formulas, so I'm not sure how it works. I suspect it just doesn't, because it gives me negative numbers here (they don't show up) or absurd estimates (
293481462547366 yearfor a 99% full disk), etc.Yet this could be an interesting addition.
The text was updated successfully, but these errors were encountered: