New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metrics.md document #591
add metrics.md document #591
Conversation
docs/dev/metrics.md
Outdated
``` | ||
# HELP mapi_machine_created_timestamp_seconds Timestamp of the mapi managed Machine creation time | ||
# TYPE mapi_machine_created_timestamp_seconds gauge | ||
mapi_machine_created_timestamp_seconds{api_version="machine.openshift.io/v1beta1",name="ocp-cluster-rndpg-master-0",namespace="openshift-machine-api",node="ip-10-0-130-139.us-east-2.compute.internal",phase="Running",spec_provider_id="aws:///us-east-2a/i-08624d119917119d6"} 1.589550152e+09 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice there are some labels on this that might change through the lifecycle of a machine, eg phase, does this really just report when a machine was created? Or is there something deeper on this that's not being expressed by the name? I wonder if this needs a bit more explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good question. i produced this output by scraping the mao on a running cluster i had been using for testing, i will look into this a little deeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, did some checking. themapi_machine_created_timestamp_seconds
and mapi_machine_set_created_timestamp_seconds
will both update their .status.phase
on every update cycle (not sure on the frequency here). theoretically, the other values could change as well but it seems like phase
is the only one that will change.
should i add a note about this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to add some detail if you think this will add understanding for the metrics.
At the moment, this list to me is just a list of prometheus metrics, it's still quite hard to interpret. If there's any detail we can work out and add to explain what the metrics are/why they exist, that's helpful for future readers so they don't also have to try and interpret the metrics scrape too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that's fair. my thought process here was to put up something with the raw scrape, broken into sections, to help start the ball rolling. i'm a little torn about how much detail to add here, but i'll go back and give it another pass.
added more text around the metrics sections and reorganized things a little. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for the additional info, I think it adds a lot more value to the doc!
/kind documentation |
docs/dev/metrics.md
Outdated
They can be used to diagnose issues such as increased memory or cpu usage and | ||
other system resource related queries. | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you point me to where are these coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i found these by scraping the running pod, but i will dig into the code a little to figure out what is generating them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, apparently these are coming from the prometheus go-client. i am trying to find some documentation that i could link to, i think it will make these cleaner.
@enxebre i cleaned this up considerably and removed all the metrics from the prometheus client-go in favor of links to those docs and code. |
This change adds a document which details all the available metrics for scraping by Prometheus. It has sample dumps along with some text to help guide the reader.
/lgtm Defer to @enxebre to approve |
thanks a lot @elmiko |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
@elmiko: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This change adds a document which details all the available metrics for
scraping by Prometheus. It has sample dumps along with some text to help
guide the reader.