Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus metrics for the number of jobs completed by each agent #477

Closed
pietroalbini opened this issue Oct 11, 2019 · 1 comment · Fixed by #478
Closed

Add Prometheus metrics for the number of jobs completed by each agent #477

pietroalbini opened this issue Oct 11, 2019 · 1 comment · Fixed by #478
Labels
C-new-feature Category: a new feature to implement E-mentor Call for participation: this issue has instructions how to fix it E-needs-help Call for participation: we need help for this issue

Comments

@pietroalbini
Copy link
Member

Before distributed experiments were merged it was possible to see if an agent was actually doing work by looking at the number of completed jobs for the experiment it was working on and seeing if it increments over time.

Now that distributed experiments are implemented it's not possible anymore to do that: if two agents are working on the distributed experiment and only one is actually completing jobs we'd have no way to actually know that.

The Rust infrastructure team is using Prometheus for our monitoring and alerting, and if we expose a counter for each agent we'll be able to gather enough data.

The metrics would be implemented on the server on a /metrics endpoint (which is the standard for Prometheus), and expose a crater_completed_jobs metrics with the labels agent and experiment (respectively with the agent name and the experiment name). The counter should only increase over time, and doesn't need to be persisted (Prometheus handles counters resetting after a restart just fine).

By the way, on docs.rs I used the Prometheus library to implement the changes. It'd be a bit nice to use the same library across services.

@pietroalbini pietroalbini added C-new-feature Category: a new feature to implement E-mentor Call for participation: this issue has instructions how to fix it E-needs-help Call for participation: we need help for this issue labels Oct 11, 2019
@zeegomo
Copy link
Contributor

zeegomo commented Oct 11, 2019

I can work on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-new-feature Category: a new feature to implement E-mentor Call for participation: this issue has instructions how to fix it E-needs-help Call for participation: we need help for this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants