docs: add documentation to setup ceph dashboard monitoring stack #8891

jmolmo · 2021-09-30T14:58:57Z

By default Rook operator does not include any monitoring capability.
The information and resources provided together with the already created
documentation to setup Prometheus allows to use the full monitoring features
present in the Ceph dashboard

Resolves: #6519

Checklist:

Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com

subhamkrai

nits:

Documentation/ceph-dashboard-monitoring.md

jmolmo · 2021-10-01T11:28:57Z

@subhamkrai : Thank you very much for your help!. I have followed your suggestions and improve internal links and also added a kind of table of contents at the beginning of the document.
Really the final look is better and it is the same than in other documents!! :-)

subhamkrai · 2021-10-04T06:39:57Z

Documentation/ceph-dashboard-monitoring.md

+You should be able now to connect with the alert manager using the following URL:
+
+```console
+echo "http://$(kubectl get pod -l app=alertmanager -n rook-ceph -o jsonpath={.items[0].status.hostIP}):$(kubectl get service alertmanager -n rook-ceph  -o jsonpath='{.spec.ports[0].nodePort}')"


when I opened the file two places are not rendering correctly with quotes. I have attached the screenshots.

FYI, in VS code there are extensions to review markdown files

Documentation/ceph-dashboard-monitoring.md

subhamkrai · 2021-10-04T06:42:10Z

@subhamkrai : Thank you very much for your help!. I have followed your suggestions and improve internal links and also added a kind of table of contents at the beginning of the document. Really the final look is better and it is the same than in other documents!! :-)

Thanks for doing the changes 😀

Documentation/ceph-dashboard-monitoring.md

travisn · 2021-10-05T04:11:47Z

Documentation/ceph-dashboard-monitoring.md

+```console
+cd rook/cluster/examples/kubernetes/ceph/monitoring
+```
+Create and configure the Prometheus instance 


These steps are duplicated from ceph-monitoring.md? How can we consolidate?

I wanted to provide a brief summary of the minimum things to do .. and let the user to get all the stuff from the same document without "jumping". I will clarify that with a previous comment.

What if we refer them to the other document as a prerequisite? This still seems like extra duplication.

Documentation/ceph-dashboard-monitoring.md

travisn · 2021-10-05T04:15:08Z

Documentation/ceph-dashboard-monitoring.md

+
+## Grafana
+
+The Grafana image used is a customized image for Ceph. It is Grafana V.6.7.4, and it includes the set of Grafana dashboards used in the Ceph dashboard, and a couple of visualization plugins.


Only this exact version? Or we need at least v6.7.4?

Yes, only the V6.7.4. Even it is more restrictive, because you cannot use a V6.7.4 standard version of the Grafana image, you need to use the Grafana Ceph customized image.
Is not clear in the text provided?

travisn · 2021-10-05T04:18:10Z

cluster/examples/kubernetes/ceph/monitoring/grafana.yaml

@@ -0,0 +1,266 @@
+---


How about creating an integration test that will install all these monitoring yamls? Even if we don't test the full end to end gathering of ceph metrics, at least we can verify basic install?

Documentation/ceph-dashboard-monitoring.md

subhamkrai

one more rendering wrong

subhamkrai

nit

Documentation/ceph-dashboard-monitoring.md

cluster/examples/kubernetes/ceph/monitoring/alert-manager-patch.yaml

cluster/examples/kubernetes/ceph/monitoring/alert-manager.yaml

cluster/examples/kubernetes/ceph/monitoring/grafana.yaml

cluster/examples/kubernetes/ceph/monitoring/prometheus-additional.yaml

cluster/examples/kubernetes/ceph/monitoring/scrape-config-patch.yaml

github-actions · 2022-01-02T20:02:00Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

By default Rook operator does not include any monitoring capability. The information and resources provided together with the already created documentation to setup Prometheus allows to use the full monitoring features present in the Ceph dashboard fixes: rook#6519 Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>

travisn

@jmolmo (edited by @BlaineEXE : was user jmo and not jmolmo)

travisn · 2022-01-05T22:32:41Z

.github/workflows/integration-test-monstack-suite.yaml

@@ -0,0 +1,48 @@
+name: Integration test CephMonStackSuite


There are several CI failures in this test such as:

2022-01-04 12:57:16.223841 E | integrationTest: Get "https://10.1.0.23:31991/api/search?query=%": dial tcp 10.1.0.23:31991: connect: connection refused ceph_monstack_test.go:295: Error Trace: ceph_monstack_test.go:295 ceph_monstack_test.go:345 Error: Should be true Test: TestCephMonStackSuite/TestMonitoringStack ceph_monstack_test.go:299: Error Trace: ceph_monstack_test.go:299 ceph_monstack_test.go:345 Error: Should be true Test: TestCephMonStackSuite/TestMonitoringStack

travisn · 2022-01-05T22:34:49Z

Documentation/ceph-dashboard-monitoring.md

@@ -0,0 +1,352 @@
+---
+title: Ceph Dashboard Monitoring
+weight: 2500


This value will put it after the monitoring topic in the TOC.

Suggested change

weight: 2500

weight: 2510

travisn · 2022-01-05T22:35:55Z

Documentation/ceph-dashboard-monitoring.md

+Clone the Rook repository to have access to the different resource files used in the next steps:
+```console
+git clone --single-branch --branch {{ branchName }} https://github.com/rook/rook.git
+cd rook/cluster/examples/kubernetes/ceph/monitoring


This folder changed in v1.8

Suggested change

cd rook/cluster/examples/kubernetes/ceph/monitoring

cd rook/deploy/examples/monitoring

travisn · 2022-01-05T22:37:27Z

Documentation/ceph-dashboard-monitoring.md

+
+## Prometheus Operator
+
+The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md).


Links to other rook docs don't need a full url or path

Suggested change

The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md).

The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](ceph-monitoring.md).

travisn · 2022-01-05T22:39:21Z

Documentation/ceph-dashboard-monitoring.md

+1. Enable dashboard, prometheus manager module and monitoring in the [Ceph cluster CR](#ceph-cluster).
+2. Use the [Rook toolbox](ceph-toolbox.md) to set the urls used by the different monitoring stack components, and to disable (_set-*-api-ssl-verify settings_) the ssl certificates verification if self signed certificates are used. 
+```console
+ceph dashboard set-grafana-api-url <grafana_url>


Where can the three urls be found to set in these commands? In the later sections of this doc? Seems like the specific commands should not be in this section since they need to read the later section to get the urls.

travisn · 2022-01-05T22:44:48Z

Documentation/ceph-dashboard-monitoring.md

+
+**2. Grafana storage (mandatory)**
+
+Grafana needs storage for its database and other stuff. The "grafana.yaml" file provided uses a PVC that get the storage from a PV based in a Storage Class with name "ceph-storage". _The user must be sure that a PV using this storage class is available for the grafana PVC_. Or change the storage class used in the grafana PVC to point the right one for your cluster. 


The PVs are expected to be dynamically provisioned with the csi driver. Why would there be a PV already available? The PVC will just request it to be created.

travisn · 2022-01-05T22:46:18Z

Documentation/ceph-dashboard-monitoring.md

+>```
+
+_**Note about High Availability:**
+If you use a storage class that provides "high availability storage" (for example using a [Ceph shared filesystem](ceph-filesystem-md)), the Grafana installation can be also high available. Just is needed label more than 1 node with the label "app=grafana" and set the number of replicas to the desired number in the grafana.yaml file._


Rook always recommends replica 3 for production pools. Seems like we don't need to mention this for high availability since Rook pools should already be configured as such.

travisn · 2022-01-05T22:51:15Z

deploy/examples/monitoring/grafana.yaml

+    app: grafana
+type: kubernetes.io/tls
+data:
+  tls.crt: |


Let's not ship a secret, even as an example. Can we tell them to generate the cert instead?

github-actions · 2022-01-26T20:02:05Z

This pull request has been automatically closed due to inactivity. Please re-open if these changes are still required.

jmolmo added bug docs ceph-dashboard labels Sep 30, 2021

jmolmo requested review from leseb and travisn September 30, 2021 14:58

subhamkrai requested changes Oct 1, 2021

View reviewed changes

jmolmo force-pushed the ceph-dashboard-monitoring branch from a7393a2 to 50ab107 Compare October 1, 2021 11:23

jmolmo requested a review from subhamkrai October 1, 2021 11:29

subhamkrai reviewed Oct 4, 2021

View reviewed changes

Documentation/ceph-dashboard-monitoring.md Show resolved Hide resolved

jmolmo mentioned this pull request Oct 4, 2021

Dashboard performance monitoring #6519

Closed

jmolmo force-pushed the ceph-dashboard-monitoring branch from 50ab107 to 457bc15 Compare October 4, 2021 16:21

jmolmo requested a review from subhamkrai October 4, 2021 16:22

travisn requested changes Oct 5, 2021

View reviewed changes

jmolmo force-pushed the ceph-dashboard-monitoring branch from 457bc15 to 99e538c Compare October 5, 2021 08:27

subhamkrai requested changes Oct 5, 2021

View reviewed changes

jmolmo force-pushed the ceph-dashboard-monitoring branch 2 times, most recently from a445061 to ba0c23b Compare October 6, 2021 11:37

jmolmo requested review from travisn and subhamkrai October 6, 2021 11:37

jmolmo force-pushed the ceph-dashboard-monitoring branch 2 times, most recently from 76c0bde to 0d1b13d Compare October 6, 2021 15:53

subhamkrai requested changes Oct 7, 2021

View reviewed changes

jmolmo force-pushed the ceph-dashboard-monitoring branch from 0d1b13d to a9aadb2 Compare November 3, 2021 13:08

jmolmo added run-mgr-mon-suite skip-ci and removed run-mgr-mon-suite labels Nov 3, 2021

jmolmo force-pushed the ceph-dashboard-monitoring branch from a9aadb2 to ad76cf8 Compare November 3, 2021 16:13

jmolmo force-pushed the ceph-dashboard-monitoring branch 14 times, most recently from f2fb615 to 95dc2fd Compare November 26, 2021 17:20

jmolmo force-pushed the ceph-dashboard-monitoring branch from 95dc2fd to 2083f16 Compare December 3, 2021 16:21

jmolmo mentioned this pull request Dec 10, 2021

Grafana Ceph dashboard not working #9381

Closed

github-actions bot added the stale Labeled by the stale bot label Jan 2, 2022

jmolmo force-pushed the ceph-dashboard-monitoring branch 4 times, most recently from 15a49c5 to f058dd0 Compare January 4, 2022 12:42

BlaineEXE mentioned this pull request Jan 4, 2022

Customize Ceph PrometheusRule CRD #9082

Closed

travisn requested changes Jan 5, 2022

View reviewed changes

test: testing suite

b242d6c

jmolmo force-pushed the ceph-dashboard-monitoring branch from f058dd0 to b242d6c Compare January 7, 2022 11:10

BlaineEXE mentioned this pull request Jan 12, 2022

monitoring: customize prometheus rule alerts #9503

Closed

10 tasks

github-actions bot closed this Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add documentation to setup ceph dashboard monitoring stack #8891

docs: add documentation to setup ceph dashboard monitoring stack #8891

jmolmo commented Sep 30, 2021

subhamkrai left a comment

jmolmo commented Oct 1, 2021 •

edited

subhamkrai Oct 4, 2021

subhamkrai Oct 4, 2021

subhamkrai commented Oct 4, 2021 •

edited

travisn Oct 5, 2021

jmolmo Oct 5, 2021

travisn Nov 4, 2021

travisn Oct 5, 2021

jmolmo Oct 5, 2021

travisn Oct 5, 2021

subhamkrai left a comment

subhamkrai left a comment

github-actions bot commented Jan 2, 2022

travisn left a comment •

edited by BlaineEXE

travisn Jan 5, 2022

travisn Jan 5, 2022

travisn Jan 5, 2022

travisn Jan 5, 2022

travisn Jan 5, 2022

travisn Jan 5, 2022

travisn Jan 5, 2022

travisn Jan 5, 2022

github-actions bot commented Jan 26, 2022


		## Grafana

		The Grafana image used is a customized image for Ceph. It is Grafana V.6.7.4, and it includes the set of Grafana dashboards used in the Ceph dashboard, and a couple of visualization plugins.

	cd rook/cluster/examples/kubernetes/ceph/monitoring
	cd rook/deploy/examples/monitoring


		## Prometheus Operator

		The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md).


		2. Grafana storage (mandatory)

		Grafana needs storage for its database and other stuff. The "grafana.yaml" file provided uses a PVC that get the storage from a PV based in a Storage Class with name "ceph-storage". _The user must be sure that a PV using this storage class is available for the grafana PVC_. Or change the storage class used in the grafana PVC to point the right one for your cluster.

docs: add documentation to setup ceph dashboard monitoring stack #8891

docs: add documentation to setup ceph dashboard monitoring stack #8891

Conversation

jmolmo commented Sep 30, 2021

subhamkrai left a comment

Choose a reason for hiding this comment

jmolmo commented Oct 1, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

subhamkrai commented Oct 4, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

subhamkrai left a comment

Choose a reason for hiding this comment

subhamkrai left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 2, 2022

travisn left a comment • edited by BlaineEXE

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 26, 2022

jmolmo commented Oct 1, 2021 •

edited

subhamkrai commented Oct 4, 2021 •

edited

travisn left a comment •

edited by BlaineEXE