Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus/OpenMetrics endpoint with a number of useful metrics #561

Merged
merged 10 commits into from
Oct 11, 2022

Conversation

LukasKalbertodt
Copy link
Member

@LukasKalbertodt LukasKalbertodt commented Oct 7, 2022

Closes #80

This PR adds the /~metrics endpoint (for tobira serve). This endpoint exposes several metrics in the OpenMetrics format, which are tailored to Prometheus and Grafana. Many metrics are collected on the fly when the endpoint is requested, a few others are constantly collected/counted.

To see which metrics are added, take a look at constants at the top of metrics.rs. Of course, there are many more metrics one can expose. I plan to collect ideas for those in this issue: #562

I tried to keep the commits atomic, but I would guess reviewing it all in one go is better for this PR.

Pretty screenshots of the example Dashboard:

image
image

@LukasKalbertodt LukasKalbertodt added the changelog:admin Changes primarily for admins label Oct 7, 2022
@LukasKalbertodt LukasKalbertodt marked this pull request as ready for review October 7, 2022 09:01
@github-actions
Copy link

github-actions bot commented Oct 7, 2022

🚀 This PR was deployed at https://pr561.tobira.opencast.org. The deployment will be updated whenever someone pushes onto this PR's branch.

@LukasKalbertodt
Copy link
Member Author

FWIW: I just checked why requests to /~metrics take between 20 and 50ms. In debug mode it's 9ms database operations and 32ms gathering the memory info. The latter is almost entirely in procfs as it parses the quite long /proc/self/smaps file and create a bunch of data structures. We could in theory optimize that as we don't need to parse everything and are only interested in a few lines. Buuut I rather let an external library handle the potentially undocumented format of smaps.
And well the important thing: in release mode the whole endpoint replies in 10ms, which is totally fine. If we need to, we can still optimize a bit in the future.

@JulianKniephoff
Copy link
Member

For the record: The format of /proc/[pid]/smaps isn't undocumented. It's documented in man 5 proc for example, and besides, the files in /proc are a public kernel API AFAIK. x)

@github-actions github-actions bot added the status:conflicts This PR has conflicts that need to be resolved label Oct 10, 2022
@github-actions

This comment was marked as resolved.

@github-actions github-actions bot removed the status:conflicts This PR has conflicts that need to be resolved label Oct 10, 2022
@lkiesow
Copy link
Contributor

lkiesow commented Oct 10, 2022

Counting requests probably do not work like this if Tobira is deployed as a cluster. Especially if you have a dynamic cluster like in a 58s environment. You only count requests per instance.

@LukasKalbertodt
Copy link
Member Author

LukasKalbertodt commented Oct 10, 2022

That is correct. Isn't it possible to easily add up the counters of all nodes in prometheus or grafana? Like... I don't want to have a counter in the DB for this as it would add immense overhead to every request. How do other applications do this?

backend/src/metrics.rs Outdated Show resolved Hide resolved
backend/src/metrics.rs Outdated Show resolved Hide resolved
backend/src/metrics.rs Show resolved Hide resolved
backend/src/metrics.rs Outdated Show resolved Hide resolved
docs/docs/setup/metrics.md Outdated Show resolved Hide resolved
I also tweaked tiny things about two panels.
@JulianKniephoff JulianKniephoff merged commit 72859c9 into elan-ev:master Oct 11, 2022
@LukasKalbertodt LukasKalbertodt deleted the prometheus branch October 11, 2022 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:admin Changes primarily for admins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prometheus integration
3 participants