Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add documentation to setup ceph dashboard monitoring stack #8891

Closed
wants to merge 2 commits into from

Conversation

jmolmo
Copy link
Contributor

@jmolmo jmolmo commented Sep 30, 2021

By default Rook operator does not include any monitoring capability.
The information and resources provided together with the already created
documentation to setup Prometheus allows to use the full monitoring features
present in the Ceph dashboard

Resolves: #6519

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com

Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits:

Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
@jmolmo
Copy link
Contributor Author

jmolmo commented Oct 1, 2021

@subhamkrai : Thank you very much for your help!. I have followed your suggestions and improve internal links and also added a kind of table of contents at the beginning of the document.
Really the final look is better and it is the same than in other documents!! :-)

@jmolmo jmolmo requested a review from subhamkrai October 1, 2021 11:29
You should be able now to connect with the alert manager using the following URL:

```console
echo "http://$(kubectl get pod -l app=alertmanager -n rook-ceph -o jsonpath={.items[0].status.hostIP}):$(kubectl get service alertmanager -n rook-ceph -o jsonpath='{.spec.ports[0].nodePort}')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2021-10-04 12-06-52

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I opened the file two places are not rendering correctly with quotes. I have attached the screenshots.

FYI, in VS code there are extensions to review markdown files

@subhamkrai
Copy link
Contributor

subhamkrai commented Oct 4, 2021

@subhamkrai : Thank you very much for your help!. I have followed your suggestions and improve internal links and also added a kind of table of contents at the beginning of the document. Really the final look is better and it is the same than in other documents!! :-)

Thanks for doing the changes 😀

Documentation/ceph-dashboard-monitoring.md Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
Documentation/ceph-dashboard-monitoring.md Outdated Show resolved Hide resolved
```console
cd rook/cluster/examples/kubernetes/ceph/monitoring
```
Create and configure the Prometheus instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These steps are duplicated from ceph-monitoring.md? How can we consolidate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to provide a brief summary of the minimum things to do .. and let the user to get all the stuff from the same document without "jumping". I will clarify that with a previous comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we refer them to the other document as a prerequisite? This still seems like extra duplication.

Documentation/ceph-dashboard-monitoring.md Show resolved Hide resolved

## Grafana

The Grafana image used is a customized image for Ceph. It is Grafana V.6.7.4, and it includes the set of Grafana dashboards used in the Ceph dashboard, and a couple of visualization plugins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only this exact version? Or we need at least v6.7.4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only the V6.7.4. Even it is more restrictive, because you cannot use a V6.7.4 standard version of the Grafana image, you need to use the Grafana Ceph customized image.
Is not clear in the text provided?

@@ -0,0 +1,266 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about creating an integration test that will install all these monitoring yamls? Even if we don't test the full end to end gathering of ceph metrics, at least we can verify basic install?

Documentation/ceph-dashboard-monitoring.md Show resolved Hide resolved
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2021-10-05 14-02-33
one more rendering wrong

@jmolmo jmolmo force-pushed the ceph-dashboard-monitoring branch 2 times, most recently from a445061 to ba0c23b Compare October 6, 2021 11:37
@jmolmo jmolmo force-pushed the ceph-dashboard-monitoring branch 2 times, most recently from 76c0bde to 0d1b13d Compare October 6, 2021 15:53
Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

@jmolmo jmolmo force-pushed the ceph-dashboard-monitoring branch 14 times, most recently from f2fb615 to 95dc2fd Compare November 26, 2021 17:20
@github-actions
Copy link

github-actions bot commented Jan 2, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Labeled by the stale bot label Jan 2, 2022
By default Rook operator does not include any monitoring capability.
The information and resources provided together with the already created
documentation to setup Prometheus allows to use the full monitoring features
present in the Ceph dashboard

fixes: rook#6519

Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
@jmolmo jmolmo force-pushed the ceph-dashboard-monitoring branch 4 times, most recently from 15a49c5 to f058dd0 Compare January 4, 2022 12:42
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmolmo (edited by @BlaineEXE : was user jmo and not jmolmo)

@@ -0,0 +1,48 @@
name: Integration test CephMonStackSuite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several CI failures in this test such as:

2022-01-04 12:57:16.223841 E | integrationTest: Get "https://10.1.0.23:31991/api/search?query=%": dial tcp 10.1.0.23:31991: connect: connection refused
    ceph_monstack_test.go:295: 
        	Error Trace:	ceph_monstack_test.go:295
        	            				ceph_monstack_test.go:345
        	Error:      	Should be true
        	Test:       	TestCephMonStackSuite/TestMonitoringStack
    ceph_monstack_test.go:299: 
        	Error Trace:	ceph_monstack_test.go:299
        	            				ceph_monstack_test.go:345
        	Error:      	Should be true
        	Test:       	TestCephMonStackSuite/TestMonitoringStack

@@ -0,0 +1,352 @@
---
title: Ceph Dashboard Monitoring
weight: 2500
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value will put it after the monitoring topic in the TOC.

Suggested change
weight: 2500
weight: 2510

Clone the Rook repository to have access to the different resource files used in the next steps:
```console
git clone --single-branch --branch {{ branchName }} https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph/monitoring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This folder changed in v1.8

Suggested change
cd rook/cluster/examples/kubernetes/ceph/monitoring
cd rook/deploy/examples/monitoring


## Prometheus Operator

The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links to other rook docs don't need a full url or path

Suggested change
The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md).
The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](ceph-monitoring.md).

1. Enable dashboard, prometheus manager module and monitoring in the [Ceph cluster CR](#ceph-cluster).
2. Use the [Rook toolbox](ceph-toolbox.md) to set the urls used by the different monitoring stack components, and to disable (_set-*-api-ssl-verify settings_) the ssl certificates verification if self signed certificates are used.
```console
ceph dashboard set-grafana-api-url <grafana_url>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can the three urls be found to set in these commands? In the later sections of this doc? Seems like the specific commands should not be in this section since they need to read the later section to get the urls.


**2. Grafana storage (mandatory)**

Grafana needs storage for its database and other stuff. The "grafana.yaml" file provided uses a PVC that get the storage from a PV based in a Storage Class with name "ceph-storage". _The user must be sure that a PV using this storage class is available for the grafana PVC_. Or change the storage class used in the grafana PVC to point the right one for your cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PVs are expected to be dynamically provisioned with the csi driver. Why would there be a PV already available? The PVC will just request it to be created.

>```

_**Note about High Availability:**
If you use a storage class that provides "high availability storage" (for example using a [Ceph shared filesystem](ceph-filesystem-md)), the Grafana installation can be also high available. Just is needed label more than 1 node with the label "app=grafana" and set the number of replicas to the desired number in the grafana.yaml file._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rook always recommends replica 3 for production pools. Seems like we don't need to mention this for high availability since Rook pools should already be configured as such.

app: grafana
type: kubernetes.io/tls
data:
tls.crt: |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not ship a secret, even as an example. Can we tell them to generate the cert instead?

@github-actions
Copy link

This pull request has been automatically closed due to inactivity. Please re-open if these changes are still required.

@github-actions github-actions bot closed this Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dashboard performance monitoring
3 participants