-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory usage stats are incorrect #46
Comments
to expand on this, looking at this: it seems the problem is that we're listing only the main PID which obviously fails for cases like postgresql or apache (which starts multiple processes) or cron jobs (which necessarily start a subprocess). so i guess it's separate from #2 in the sense that it could be fixed by implementing the above TODO and just add up the memory of all the processes in the slice by hand, without having to go through reimplementing everything with cgroups, which seems to be stalled in #10... |
The README.md file promises "If you've chosen to pack 400 threads and 20 processes inside the mysql.service, we will only export metrics on the service unit, not on the individual tasks.". This is absolutely not true (if I were less charitable I would call it a lie). |
I created a merge request #65 to fix the README.md so other people don't rely on information that the exporter does not provide. |
We should probably fix this collector to not work the way it does. IMO, it's probably something we should just delete until it works the way users expect. |
#67 is probably doing what's expected here. |
I've decided that these metrics are not worth maintaining in this exporter. cgroup-based metrics can be gathered using cAdvisor. |
I've opened #87 which exposes systemds own memory metrics, which are a) accurate, b) cheap for us to obtain :) |
I setup this exporter to diagnose OOM conditions on a server, but the output it gives me is inconsistent with the stats I'm getting through other systems. In particular, I just don't see the memory numbers add up to the actual memory usage on the machine.
I'm not sure, but I think this might be related to #2 except that I don't think this is just a small adjustment that can be made to switch to cgroups: the current stats just don't work in any meaningful way, so I think they're just buggy.
just to give an example, right now, postgres is taking up 2.3GB of memory according to systemctl:
... but the exporter is only reporting 21MB RSS and 560MB VSS, so it's obviously way off:
i used this tool to track down this issue we're facing but it seems like, unfortunately, i'll have to look elsewhere...
thanks for any clarification.
The text was updated successfully, but these errors were encountered: