Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of process_* metrics #117

Closed
hynek opened this issue Nov 12, 2016 · 7 comments
Closed

Lack of process_* metrics #117

hynek opened this issue Nov 12, 2016 · 7 comments

Comments

@hynek
Copy link
Contributor

hynek commented Nov 12, 2016

Recently I’ve run into the problem that my processes don't have any process_ metrics although the platform (Ubuntu Trusty) hasn’t changed.

I’m not sure how to debug the problem, do you have any idea what could lead to this problem? They are not using the new multiprocessing feature.

@brian-brazil
Copy link
Contributor

Are you doing anything that'd make /proc inaccessible?

@hynek
Copy link
Contributor Author

hynek commented Nov 12, 2016

Not that I’d know of.

They’re run using runit and inside envconsul. The latter is the only recent change I can think of. Other than that, it’s regular processes in an LXC container with a distinct user.

Are you saying that /proc problems are the only possible reason why those metrics might be missing?

@brian-brazil
Copy link
Contributor

With the current implementation, yes. We read from /proc/self/stat.

@hynek
Copy link
Contributor Author

hynek commented Nov 12, 2016

I've added a

with open("/proc/self/stat", "rb") as f:
    print(f.read())¬

block to the app's startup and before calling generate_latest(core.REGISTRY) and in both cases I get a legit-looking output back.

I suppose it's too late at that point anyway because something goes wrong write registering metrics?

@hynek
Copy link
Contributor Author

hynek commented Nov 12, 2016

OK it was /proc but not the self thing (free -m stopped working too). Seems like procfs crashed and I needed to reboot the container.

It may be nice if the docs mentioned this error case? Or a log message a la “could not access XYZ, process metrics won't be collected”?

@brian-brazil
Copy link
Contributor

I don't think we need to document that if kernel is hosed things will break. We don't log as that could get spammy in other circumstances.

@hynek
Copy link
Contributor Author

hynek commented Nov 13, 2016

OK. Maybe if someone runs into it, they find this ticket. :)

@hynek hynek closed this as completed Nov 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants