-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add documentation to setup ceph dashboard monitoring stack #8891
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits:
a7393a2
to
50ab107
Compare
@subhamkrai : Thank you very much for your help!. I have followed your suggestions and improve internal links and also added a kind of table of contents at the beginning of the document. |
You should be able now to connect with the alert manager using the following URL: | ||
|
||
```console | ||
echo "http://$(kubectl get pod -l app=alertmanager -n rook-ceph -o jsonpath={.items[0].status.hostIP}):$(kubectl get service alertmanager -n rook-ceph -o jsonpath='{.spec.ports[0].nodePort}')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I opened the file two places are not rendering correctly with quotes. I have attached the screenshots.
FYI, in VS code there are extensions to review markdown files
Thanks for doing the changes 😀 |
50ab107
to
457bc15
Compare
```console | ||
cd rook/cluster/examples/kubernetes/ceph/monitoring | ||
``` | ||
Create and configure the Prometheus instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These steps are duplicated from ceph-monitoring.md? How can we consolidate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to provide a brief summary of the minimum things to do .. and let the user to get all the stuff from the same document without "jumping". I will clarify that with a previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we refer them to the other document as a prerequisite? This still seems like extra duplication.
|
||
## Grafana | ||
|
||
The Grafana image used is a customized image for Ceph. It is Grafana V.6.7.4, and it includes the set of Grafana dashboards used in the Ceph dashboard, and a couple of visualization plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only this exact version? Or we need at least v6.7.4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only the V6.7.4. Even it is more restrictive, because you cannot use a V6.7.4 standard version of the Grafana image, you need to use the Grafana Ceph customized image.
Is not clear in the text provided?
@@ -0,0 +1,266 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about creating an integration test that will install all these monitoring yamls? Even if we don't test the full end to end gathering of ceph metrics, at least we can verify basic install?
457bc15
to
99e538c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a445061
to
ba0c23b
Compare
76c0bde
to
0d1b13d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
cluster/examples/kubernetes/ceph/monitoring/alert-manager-patch.yaml
Outdated
Show resolved
Hide resolved
cluster/examples/kubernetes/ceph/monitoring/prometheus-additional.yaml
Outdated
Show resolved
Hide resolved
cluster/examples/kubernetes/ceph/monitoring/scrape-config-patch.yaml
Outdated
Show resolved
Hide resolved
0d1b13d
to
a9aadb2
Compare
a9aadb2
to
ad76cf8
Compare
f2fb615
to
95dc2fd
Compare
95dc2fd
to
2083f16
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
By default Rook operator does not include any monitoring capability. The information and resources provided together with the already created documentation to setup Prometheus allows to use the full monitoring features present in the Ceph dashboard fixes: rook#6519 Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
15a49c5
to
f058dd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmolmo (edited by @BlaineEXE : was user jmo
and not jmolmo
)
@@ -0,0 +1,48 @@ | |||
name: Integration test CephMonStackSuite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several CI failures in this test such as:
2022-01-04 12:57:16.223841 E | integrationTest: Get "https://10.1.0.23:31991/api/search?query=%": dial tcp 10.1.0.23:31991: connect: connection refused
ceph_monstack_test.go:295:
Error Trace: ceph_monstack_test.go:295
ceph_monstack_test.go:345
Error: Should be true
Test: TestCephMonStackSuite/TestMonitoringStack
ceph_monstack_test.go:299:
Error Trace: ceph_monstack_test.go:299
ceph_monstack_test.go:345
Error: Should be true
Test: TestCephMonStackSuite/TestMonitoringStack
@@ -0,0 +1,352 @@ | |||
--- | |||
title: Ceph Dashboard Monitoring | |||
weight: 2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value will put it after the monitoring topic in the TOC.
weight: 2500 | |
weight: 2510 |
Clone the Rook repository to have access to the different resource files used in the next steps: | ||
```console | ||
git clone --single-branch --branch {{ branchName }} https://github.com/rook/rook.git | ||
cd rook/cluster/examples/kubernetes/ceph/monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This folder changed in v1.8
cd rook/cluster/examples/kubernetes/ceph/monitoring | |
cd rook/deploy/examples/monitoring |
|
||
## Prometheus Operator | ||
|
||
The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Links to other rook docs don't need a full url or path
The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md). | |
The Prometheus operator allows you to set up a Prometheus instance in you kubernetes cluster. Follow the instructions provided in [Prometheus Monitoring](ceph-monitoring.md). |
1. Enable dashboard, prometheus manager module and monitoring in the [Ceph cluster CR](#ceph-cluster). | ||
2. Use the [Rook toolbox](ceph-toolbox.md) to set the urls used by the different monitoring stack components, and to disable (_set-*-api-ssl-verify settings_) the ssl certificates verification if self signed certificates are used. | ||
```console | ||
ceph dashboard set-grafana-api-url <grafana_url> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where can the three urls be found to set in these commands? In the later sections of this doc? Seems like the specific commands should not be in this section since they need to read the later section to get the urls.
|
||
**2. Grafana storage (mandatory)** | ||
|
||
Grafana needs storage for its database and other stuff. The "grafana.yaml" file provided uses a PVC that get the storage from a PV based in a Storage Class with name "ceph-storage". _The user must be sure that a PV using this storage class is available for the grafana PVC_. Or change the storage class used in the grafana PVC to point the right one for your cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PVs are expected to be dynamically provisioned with the csi driver. Why would there be a PV already available? The PVC will just request it to be created.
>``` | ||
|
||
_**Note about High Availability:** | ||
If you use a storage class that provides "high availability storage" (for example using a [Ceph shared filesystem](ceph-filesystem-md)), the Grafana installation can be also high available. Just is needed label more than 1 node with the label "app=grafana" and set the number of replicas to the desired number in the grafana.yaml file._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rook always recommends replica 3 for production pools. Seems like we don't need to mention this for high availability since Rook pools should already be configured as such.
app: grafana | ||
type: kubernetes.io/tls | ||
data: | ||
tls.crt: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not ship a secret, even as an example. Can we tell them to generate the cert instead?
f058dd0
to
b242d6c
Compare
This pull request has been automatically closed due to inactivity. Please re-open if these changes are still required. |
By default Rook operator does not include any monitoring capability.
The information and resources provided together with the already created
documentation to setup Prometheus allows to use the full monitoring features
present in the Ceph dashboard
Resolves: #6519
Checklist:
make codegen
) has been run to update object specifications, if necessary.Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com