Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OADP- 2300 OADP Metric Collection #62710

Merged
merged 1 commit into from
Aug 10, 2023
Merged

Conversation

@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 23, 2023
@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Jul 23, 2023

🤖 Updated build preview is available at:
https://62710--docspreview.netlify.app

Build log: https://circleci.com/gh/ocpdocs-previewbot/openshift-docs/21729

@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 24, 2023
@CarmiWisemon CarmiWisemon force-pushed the oadp2300metric branch 3 times, most recently from 33745d8 to 078924a Compare July 25, 2023 15:04
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
Copy link

@mpryc mpryc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarmiWisemon Thanks.

I have reviewed updated docs, looks fine to me.

One question which came to my mind when I was looking for them. They are under:

https://62710--docspreview.netlify.app/openshift-enterprise/latest/backup_and_restore/application_backup_and_restore/troubleshooting.html#oadp-monitoring_oadp-troubleshooting

Should the monitoring be outside of Troubleshooting ? (same level as Troubleshooting, so one level up) or inside "Advanced OADP features and functionalities"

Some monitoring parts are allowing to receive alerts when backup fails, but this isn't logically troubleshooting to me, which should be more related to OADP issues, misconfigurations or problems that affects the OADP itself.

Monitoring can alert when some problems may happen, but metrics also allows to monitor number of backups, it's duration, size, so not really under troubleshooting umbrella.

@CarmiWisemon
Copy link
Contributor Author

@mpryc
You are correct that it may not be in the ideal location at the moment. We are rearranging the OADP documentation and at present it's in a place holder. Once we rearrange the documentation, the Monitoring section will move somewhere else.

Thank you very much for all of your wonderful feedback!

Carmi

modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
@CarmiWisemon
Copy link
Contributor Author

/label OADP
/label peer-review-needed

@openshift-ci openshift-ci bot added OADP Label for all OADP PRs peer-review-needed Signifies that the peer review team needs to review this PR labels Aug 7, 2023
@adellape adellape added the peer-review-in-progress Signifies that the peer review team is reviewing this PR label Aug 7, 2023
@adellape adellape self-assigned this Aug 7, 2023
Copy link
Contributor

@adellape adellape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the number of comments; some of them are duplicates that I thought worth re-highlighting in-line versus just saying "do this everywhere". Please let me know if you have any questions!

images/oadp-backup-failing-alert.png Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
modules/oadp-monitoring.adoc Outdated Show resolved Hide resolved
@adellape adellape removed peer-review-in-progress Signifies that the peer review team is reviewing this PR peer-review-needed Signifies that the peer review team needs to review this PR labels Aug 8, 2023
@CarmiWisemon CarmiWisemon force-pushed the oadp2300metric branch 4 times, most recently from aac547b to 2e42e01 Compare August 9, 2023 14:50
@CarmiWisemon
Copy link
Contributor Author

@adellape
Thank you for your wonderful review, great suggestions and feedback. I made the changes you requested, except for the images that we can't do at this time.
Thanks for your help! Carmi

@CarmiWisemon
Copy link
Contributor Author

/label merge-review-needed

@openshift-ci openshift-ci bot added the merge-review-needed Signifies that the merge review team needs to review this PR label Aug 9, 2023
@adellape adellape added the merge-review-in-progress Signifies that the merge review team is reviewing this PR label Aug 9, 2023
Copy link
Contributor

@adellape adellape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to hold merge for some of the following remaining issues.

@CarmiWisemon Leaving the merge-review-in-progress label on for now so it stays in my view. Please ping me on Slack or here when it's ready for another look. Thanks!

$ oc edit configmap cluster-monitoring-config -n openshift-monitoring
----
+
.Example output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically it's not really "output" in this case, since we're showing what we want them to change. Maybe .Example changes just for this block?

Altho I think more guidelines-appropriate would be to break it into 2 actions (and then you don't need the .Example output block title at all):

. Edit the `cluster-monitoring-config` `ConfigMap` object in the `openshift-monitoring` namespace:
+
[source,terminal]
----
$ oc edit configmap cluster-monitoring-config -n openshift-monitoring
----

. Add or enable the `enableUserWorkload` option in the `data` section's `config.yaml` field:
+
[source,yaml]
----
apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true <1>
kind: ConfigMap
metadata:
# ...
----
<1> Add this option or set to `true`

enableUserWorkload: true <1>
kind: ConfigMap
metadata:
# [...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# [...]
# ...

Per repo guidelines for YAML.

Comment on lines 41 to 43
. Wait a short period of time to verify the User Workload Monitoring Setup by checking if the following components are up and running in the `openshift-user-workload-monitoring` namespace.
+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colon and blank line:

Suggested change
. Wait a short period of time to verify the User Workload Monitoring Setup by checking if the following components are up and running in the `openshift-user-workload-monitoring` namespace.
+
. Wait a short period of time to verify the User Workload Monitoring Setup by checking if the following components are up and running in the `openshift-user-workload-monitoring` namespace:
+

Comment on lines +49 to +58
----
NAME READY STATUS RESTARTS AGE
prometheus-operator-6844b4b99c-b57j9 2/2 Running 0 43s
prometheus-user-workload-0 5/5 Running 0 32s
prometheus-user-workload-1 5/5 Running 0 32s
thanos-ruler-user-workload-0 3/3 Running 0 32s
thanos-ruler-user-workload-1 3/3 Running 0 32s
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing block title and source:

Suggested change
----
NAME READY STATUS RESTARTS AGE
prometheus-operator-6844b4b99c-b57j9 2/2 Running 0 43s
prometheus-user-workload-0 5/5 Running 0 32s
prometheus-user-workload-1 5/5 Running 0 32s
thanos-ruler-user-workload-0 3/3 Running 0 32s
thanos-ruler-user-workload-1 3/3 Running 0 32s
----
.Example output
[source,terminal]
----
NAME READY STATUS RESTARTS AGE
prometheus-operator-6844b4b99c-b57j9 2/2 Running 0 43s
prometheus-user-workload-0 5/5 Running 0 32s
prometheus-user-workload-1 5/5 Running 0 32s
thanos-ruler-user-workload-0 3/3 Running 0 32s
thanos-ruler-user-workload-1 3/3 Running 0 32s
----

Comment on lines 60 to 71
[source,terminal]
----
$ oc get configmap user-workload-monitoring-config -n openshift-user-workload-monitoring
Error from server (NotFound): configmaps "user-workload-monitoring-config" not found

# We need to create: user-workload-monitoring-config ConfigMap
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs to be broken into 2 code blocks:

Suggested change
[source,terminal]
----
$ oc get configmap user-workload-monitoring-config -n openshift-user-workload-monitoring
Error from server (NotFound): configmaps "user-workload-monitoring-config" not found
# We need to create: user-workload-monitoring-config ConfigMap
----
[source,terminal]
----
$ oc get configmap user-workload-monitoring-config -n openshift-user-workload-monitoring
----
+
.Example output
[source,terminal]
----
Error from server (NotFound): configmaps "user-workload-monitoring-config" not found
----

+
. Create a `ServiceMonitor` YAML file that matches the existing SVC label, and save the file as `3_create_oadp_service_monitor.yaml`. The service monitor is created in the `openshift-adp` namespace where the `openshift-adp-velero-metrics-svc` service resides.
+
.Example output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.Example output
.Example `ServiceMonitor` object

Comment on lines 55 to 59
----
$ oc apply -f 3_create_oadp_service_monitor.yaml
servicemonitor.monitoring.coreos.com/oadp-service-monitor created
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break up into 2 blocks, or drop the output.

servicemonitor.monitoring.coreos.com/oadp-service-monitor created
----
+
. Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also turn this step into a .Verification section, to match the other module:

Suggested change
. Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console:
.Verification
* Confirm that the new service monitor is in an *Up* state by using the *Administrator* perspective of the {product-title} web console:

^ Either way, since this is also technically a single-step procedure w/ substeps, change . to *. The substeps are ordered here tho, so they can stay ..

Comment on lines 42 to 64
|`kopia_content_get_bytes`
|Number of bytes retrieved using GetContent
|Counter

|`kopia_content_get_count`
|Number of times GetContent() was called
|Counter

|`kopia_content_get_error_count`
|Number of times GetContent() was called and the result was an error
|Counter

|`kopia_content_get_not_found_count`
|Number of times GetContent() was called and the result was not found
|Counter

|`kopia_content_write_bytes`
|Number of bytes passed to WriteContent()
|Counter

|`kopia_content_write_count`
|Number of times WriteContent() was called
|Counter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetContent() and WriteContent() instances here still need to get marked-up with backticks.

Also note the first instance of GetContent is missing the ()

Suggested change
|`kopia_content_get_bytes`
|Number of bytes retrieved using GetContent
|Counter
|`kopia_content_get_count`
|Number of times GetContent() was called
|Counter
|`kopia_content_get_error_count`
|Number of times GetContent() was called and the result was an error
|Counter
|`kopia_content_get_not_found_count`
|Number of times GetContent() was called and the result was not found
|Counter
|`kopia_content_write_bytes`
|Number of bytes passed to WriteContent()
|Counter
|`kopia_content_write_count`
|Number of times WriteContent() was called
|Counter
|`kopia_content_get_bytes`
|Number of bytes retrieved using `GetContent()`
|Counter
|`kopia_content_get_count`
|Number of times `GetContent()` was called
|Counter
|`kopia_content_get_error_count`
|Number of times `GetContent()` was called and the result was an error
|Counter
|`kopia_content_get_not_found_count`
|Number of times `GetContent()` was called and the result was not found
|Counter
|`kopia_content_write_bytes`
|Number of bytes passed to `WriteContent()`
|Counter
|`kopia_content_write_count`
|Number of times `WriteContent()` was called
|Counter

Comment on lines 13 to 20
* Navigate to the *Observe* -> *Metrics* page:
** If you are using the *Developer* perspective, follow these steps:
.. Select *Custom query*, or click on the *Show PromQL* link.
.. Type the query and click *Enter*.
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*.
+
.OADP metrics query

image::oadp-metrics-query.png[OADP metrics query]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still suggest moving the screenshot (which shows Developer perspective) up to pair with the Developer step:

Suggested change
* Navigate to the *Observe* -> *Metrics* page:
** If you are using the *Developer* perspective, follow these steps:
.. Select *Custom query*, or click on the *Show PromQL* link.
.. Type the query and click *Enter*.
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*.
+
.OADP metrics query
image::oadp-metrics-query.png[OADP metrics query]
* Navigate to the *Observe* -> *Metrics* page:
** If you are using the *Developer* perspective, follow these steps:
.. Select *Custom query*, or click on the *Show PromQL* link.
.. Type the query and click *Enter*.
+
.OADP metrics query
image::oadp-metrics-query.png[OADP metrics query]
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*.

or just remove the + so that it floats after the step is done (and doesn't look attached to the Administrator step anymore):

Suggested change
* Navigate to the *Observe* -> *Metrics* page:
** If you are using the *Developer* perspective, follow these steps:
.. Select *Custom query*, or click on the *Show PromQL* link.
.. Type the query and click *Enter*.
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*.
+
.OADP metrics query
image::oadp-metrics-query.png[OADP metrics query]
* Navigate to the *Observe* -> *Metrics* page:
** If you are using the *Developer* perspective, follow these steps:
.. Select *Custom query*, or click on the *Show PromQL* link.
.. Type the query and click *Enter*.
** If you are using the *Administrator* perspective, type the expression in the text field and select *Run Queries*.
.OADP metrics query
image::oadp-metrics-query.png[OADP metrics query]

@CarmiWisemon CarmiWisemon force-pushed the oadp2300metric branch 2 times, most recently from c75b687 to bf88086 Compare August 10, 2023 09:35
Copy link
Contributor

@adellape adellape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the applied changes. I am going to merge per Slack discussion, however please consider the few remaining items below that could be considered for a quick follow-up PR.

Comment on lines +68 to +87
. Create `user-workload-monitoring-config` ConfigMap for the User Workload Monitoring and save it under `2_configure_user_workload_monitoring.yaml` filename.
+
.Example output
[source,yaml]
+
----
apiVersion: v1
kind: ConfigMap
metadata:
name: user-workload-monitoring-config
namespace: openshift-user-workload-monitoring
data:
config.yaml: |
----
+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one wasn't addressed.


.Procedure

. Ensure the `openshift-adp-velero-metrics-svc` exists. It should contain `app.kubernetes.io/name=velero` label which will be used as selector for the `ServiceMonitor` object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
. Ensure the `openshift-adp-velero-metrics-svc` exists. It should contain `app.kubernetes.io/name=velero` label which will be used as selector for the `ServiceMonitor` object.
. Ensure the `openshift-adp-velero-metrics-svc` exists. It should contain `app.kubernetes.io/name=velero` label, which will be used as selector for the `ServiceMonitor` object.

----

.Verification
. After the Alert is triggered, you can view it in the following ways:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
. After the Alert is triggered, you can view it in the following ways:
* After the Alert is triggered, you can view it in the following ways:

@adellape adellape merged commit ab797d3 into openshift:main Aug 10, 2023
1 check passed
@adellape
Copy link
Contributor

/cherrypick enterprise-4.14

@adellape
Copy link
Contributor

/cherrypick enterprise-4.13

@adellape
Copy link
Contributor

/cherrypick enterprise-4.12

@adellape
Copy link
Contributor

/cherrypick enterprise-4.11

@openshift-cherrypick-robot

@adellape: new pull request created: #63444

In response to this:

/cherrypick enterprise-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@adellape: new pull request created: #63445

In response to this:

/cherrypick enterprise-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@adellape: new pull request created: #63446

In response to this:

/cherrypick enterprise-4.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@adellape: new pull request created: #63447

In response to this:

/cherrypick enterprise-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@adellape adellape removed merge-review-in-progress Signifies that the merge review team is reviewing this PR merge-review-needed Signifies that the merge review team needs to review this PR labels Aug 10, 2023
@adellape adellape mentioned this pull request Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.11 branch/enterprise-4.12 branch/enterprise-4.13 branch/enterprise-4.14 OADP Label for all OADP PRs peer-review-done Signifies that the peer review team has reviewed this PR size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants