-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OADP- 2300 OADP Metric Collection #62710
Conversation
🤖 Updated build preview is available at: Build log: https://circleci.com/gh/ocpdocs-previewbot/openshift-docs/21729 |
33745d8
to
078924a
Compare
078924a
to
5443488
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarmiWisemon Thanks.
I have reviewed updated docs, looks fine to me.
One question which came to my mind when I was looking for them. They are under:
Should the monitoring be outside of Troubleshooting ? (same level as Troubleshooting, so one level up) or inside "Advanced OADP features and functionalities"
Some monitoring parts are allowing to receive alerts when backup fails, but this isn't logically troubleshooting to me, which should be more related to OADP issues, misconfigurations or problems that affects the OADP itself.
Monitoring can alert when some problems may happen, but metrics also allows to monitor number of backups, it's duration, size, so not really under troubleshooting umbrella.
@mpryc Thank you very much for all of your wonderful feedback! Carmi |
57d92dd
to
848a8c6
Compare
/label OADP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the number of comments; some of them are duplicates that I thought worth re-highlighting in-line versus just saying "do this everywhere". Please let me know if you have any questions!
aac547b
to
2e42e01
Compare
@adellape |
/label merge-review-needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to hold merge for some of the following remaining issues.
@CarmiWisemon Leaving the merge-review-in-progress
label on for now so it stays in my view. Please ping me on Slack or here when it's ready for another look. Thanks!
modules/oadp-monitoring-setup.adoc
Outdated
$ oc edit configmap cluster-monitoring-config -n openshift-monitoring | ||
---- | ||
+ | ||
.Example output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess technically it's not really "output" in this case, since we're showing what we want them to change. Maybe .Example changes
just for this block?
Altho I think more guidelines-appropriate would be to break it into 2 actions (and then you don't need the .Example output
block title at all):
. Edit the `cluster-monitoring-config` `ConfigMap` object in the `openshift-monitoring` namespace:
+
[source,terminal]
----
$ oc edit configmap cluster-monitoring-config -n openshift-monitoring
----
. Add or enable the `enableUserWorkload` option in the `data` section's `config.yaml` field:
+
[source,yaml]
----
apiVersion: v1
data:
config.yaml: |
enableUserWorkload: true <1>
kind: ConfigMap
metadata:
# ...
----
<1> Add this option or set to `true`
modules/oadp-monitoring-setup.adoc
Outdated
enableUserWorkload: true <1> | ||
kind: ConfigMap | ||
metadata: | ||
# [...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# [...] | |
# ... |
Per repo guidelines for YAML.
modules/oadp-monitoring-setup.adoc
Outdated
. Wait a short period of time to verify the User Workload Monitoring Setup by checking if the following components are up and running in the `openshift-user-workload-monitoring` namespace. | ||
+ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Colon and blank line:
. Wait a short period of time to verify the User Workload Monitoring Setup by checking if the following components are up and running in the `openshift-user-workload-monitoring` namespace. | |
+ | |
. Wait a short period of time to verify the User Workload Monitoring Setup by checking if the following components are up and running in the `openshift-user-workload-monitoring` namespace: | |
+ |
---- | ||
NAME READY STATUS RESTARTS AGE | ||
prometheus-operator-6844b4b99c-b57j9 2/2 Running 0 43s | ||
prometheus-user-workload-0 5/5 Running 0 32s | ||
prometheus-user-workload-1 5/5 Running 0 32s | ||
thanos-ruler-user-workload-0 3/3 Running 0 32s | ||
thanos-ruler-user-workload-1 3/3 Running 0 32s | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing block title and source:
---- | |
NAME READY STATUS RESTARTS AGE | |
prometheus-operator-6844b4b99c-b57j9 2/2 Running 0 43s | |
prometheus-user-workload-0 5/5 Running 0 32s | |
prometheus-user-workload-1 5/5 Running 0 32s | |
thanos-ruler-user-workload-0 3/3 Running 0 32s | |
thanos-ruler-user-workload-1 3/3 Running 0 32s | |
---- | |
.Example output | |
[source,terminal] | |
---- | |
NAME READY STATUS RESTARTS AGE | |
prometheus-operator-6844b4b99c-b57j9 2/2 Running 0 43s | |
prometheus-user-workload-0 5/5 Running 0 32s | |
prometheus-user-workload-1 5/5 Running 0 32s | |
thanos-ruler-user-workload-0 3/3 Running 0 32s | |
thanos-ruler-user-workload-1 3/3 Running 0 32s | |
---- |
modules/oadp-monitoring-setup.adoc
Outdated
[source,terminal] | ||
---- | ||
$ oc get configmap user-workload-monitoring-config -n openshift-user-workload-monitoring | ||
Error from server (NotFound): configmaps "user-workload-monitoring-config" not found | ||
|
||
# We need to create: user-workload-monitoring-config ConfigMap | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs to be broken into 2 code blocks:
[source,terminal] | |
---- | |
$ oc get configmap user-workload-monitoring-config -n openshift-user-workload-monitoring | |
Error from server (NotFound): configmaps "user-workload-monitoring-config" not found | |
# We need to create: user-workload-monitoring-config ConfigMap | |
---- | |
[source,terminal] | |
---- | |
$ oc get configmap user-workload-monitoring-config -n openshift-user-workload-monitoring | |
---- | |
+ | |
.Example output | |
[source,terminal] | |
---- | |
Error from server (NotFound): configmaps "user-workload-monitoring-config" not found | |
---- |
+ | ||
. Create a `ServiceMonitor` YAML file that matches the existing SVC label, and save the file as `3_create_oadp_service_monitor.yaml`. The service monitor is created in the `openshift-adp` namespace where the `openshift-adp-velero-metrics-svc` service resides. | ||
+ | ||
.Example output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.Example output | |
.Example `ServiceMonitor` object |
---- | ||
$ oc apply -f 3_create_oadp_service_monitor.yaml | ||
servicemonitor.monitoring.coreos.com/oadp-service-monitor created | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break up into 2 blocks, or drop the output.
servicemonitor.monitoring.coreos.com/oadp-service-monitor created | ||
---- | ||
+ | ||
. Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also turn this step into a .Verification
section, to match the other module:
. Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console: | |
.Verification | |
* Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console: |
^ Either way, since this is also technically a single-step procedure w/ substeps, change .
to *
. The substeps are ordered here tho, so they can stay ..
modules/oadp-list-of-metrics.adoc
Outdated
|`kopia_content_get_bytes` | ||
|Number of bytes retrieved using GetContent | ||
|Counter | ||
|
||
|`kopia_content_get_count` | ||
|Number of times GetContent() was called | ||
|Counter | ||
|
||
|`kopia_content_get_error_count` | ||
|Number of times GetContent() was called and the result was an error | ||
|Counter | ||
|
||
|`kopia_content_get_not_found_count` | ||
|Number of times GetContent() was called and the result was not found | ||
|Counter | ||
|
||
|`kopia_content_write_bytes` | ||
|Number of bytes passed to WriteContent() | ||
|Counter | ||
|
||
|`kopia_content_write_count` | ||
|Number of times WriteContent() was called | ||
|Counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetContent()
and WriteContent()
instances here still need to get marked-up with backticks.
Also note the first instance of GetContent
is missing the ()
|`kopia_content_get_bytes` | |
|Number of bytes retrieved using GetContent | |
|Counter | |
|`kopia_content_get_count` | |
|Number of times GetContent() was called | |
|Counter | |
|`kopia_content_get_error_count` | |
|Number of times GetContent() was called and the result was an error | |
|Counter | |
|`kopia_content_get_not_found_count` | |
|Number of times GetContent() was called and the result was not found | |
|Counter | |
|`kopia_content_write_bytes` | |
|Number of bytes passed to WriteContent() | |
|Counter | |
|`kopia_content_write_count` | |
|Number of times WriteContent() was called | |
|Counter | |
|`kopia_content_get_bytes` | |
|Number of bytes retrieved using `GetContent()` | |
|Counter | |
|`kopia_content_get_count` | |
|Number of times `GetContent()` was called | |
|Counter | |
|`kopia_content_get_error_count` | |
|Number of times `GetContent()` was called and the result was an error | |
|Counter | |
|`kopia_content_get_not_found_count` | |
|Number of times `GetContent()` was called and the result was not found | |
|Counter | |
|`kopia_content_write_bytes` | |
|Number of bytes passed to `WriteContent()` | |
|Counter | |
|`kopia_content_write_count` | |
|Number of times `WriteContent()` was called | |
|Counter |
modules/oadp-viewing-metrics-ui.adoc
Outdated
* Navigate to the *Observe* -> *Metrics* page: | ||
** If you are using the *Developer* perspective, follow these steps: | ||
.. Select *Custom query*, or click on the *Show PromQL* link. | ||
.. Type the query and click *Enter*. | ||
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*. | ||
+ | ||
.OADP metrics query | ||
|
||
image::oadp-metrics-query.png[OADP metrics query] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still suggest moving the screenshot (which shows Developer perspective) up to pair with the Developer step:
* Navigate to the *Observe* -> *Metrics* page: | |
** If you are using the *Developer* perspective, follow these steps: | |
.. Select *Custom query*, or click on the *Show PromQL* link. | |
.. Type the query and click *Enter*. | |
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*. | |
+ | |
.OADP metrics query | |
image::oadp-metrics-query.png[OADP metrics query] | |
* Navigate to the *Observe* -> *Metrics* page: | |
** If you are using the *Developer* perspective, follow these steps: | |
.. Select *Custom query*, or click on the *Show PromQL* link. | |
.. Type the query and click *Enter*. | |
+ | |
.OADP metrics query | |
image::oadp-metrics-query.png[OADP metrics query] | |
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*. |
or just remove the +
so that it floats after the step is done (and doesn't look attached to the Administrator step anymore):
* Navigate to the *Observe* -> *Metrics* page: | |
** If you are using the *Developer* perspective, follow these steps: | |
.. Select *Custom query*, or click on the *Show PromQL* link. | |
.. Type the query and click *Enter*. | |
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*. | |
+ | |
.OADP metrics query | |
image::oadp-metrics-query.png[OADP metrics query] | |
* Navigate to the *Observe* -> *Metrics* page: | |
** If you are using the *Developer* perspective, follow these steps: | |
.. Select *Custom query*, or click on the *Show PromQL* link. | |
.. Type the query and click *Enter*. | |
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*. | |
.OADP metrics query | |
image::oadp-metrics-query.png[OADP metrics query] |
c75b687
to
bf88086
Compare
69cccf3
to
c157822
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the applied changes. I am going to merge per Slack discussion, however please consider the few remaining items below that could be considered for a quick follow-up PR.
. Create `user-workload-monitoring-config` ConfigMap for the User Workload Monitoring and save it under `2_configure_user_workload_monitoring.yaml` filename. | ||
+ | ||
.Example output | ||
[source,yaml] | ||
+ | ||
---- | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: user-workload-monitoring-config | ||
namespace: openshift-user-workload-monitoring | ||
data: | ||
config.yaml: | | ||
---- | ||
+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one wasn't addressed.
|
||
.Procedure | ||
|
||
. Ensure the `openshift-adp-velero-metrics-svc` exists. It should contain `app.kubernetes.io/name=velero` label which will be used as selector for the `ServiceMonitor` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Ensure the `openshift-adp-velero-metrics-svc` exists. It should contain `app.kubernetes.io/name=velero` label which will be used as selector for the `ServiceMonitor` object. | |
. Ensure the `openshift-adp-velero-metrics-svc` exists. It should contain `app.kubernetes.io/name=velero` label, which will be used as selector for the `ServiceMonitor` object. |
---- | ||
|
||
.Verification | ||
. After the Alert is triggered, you can view it in the following ways: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. After the Alert is triggered, you can view it in the following ways: | |
* After the Alert is triggered, you can view it in the following ways: |
/cherrypick enterprise-4.14 |
/cherrypick enterprise-4.13 |
/cherrypick enterprise-4.12 |
/cherrypick enterprise-4.11 |
@adellape: new pull request created: #63444 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@adellape: new pull request created: #63445 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@adellape: new pull request created: #63446 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@adellape: new pull request created: #63447 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
OADP - 1.2.1
OCP - 4.11+
Resolves - https://issues.redhat.com/browse/OADP-2300
Preview - https://62710--docspreview.netlify.app/openshift-enterprise/latest/backup_and_restore/application_backup_and_restore/troubleshooting.html#oadp-monitoring_oadp-troubleshooting