Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strimzi Monitoring Refresh #877

Merged
merged 5 commits into from
Sep 21, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 87 additions & 45 deletions documentation/book/appendix_metrics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,80 +2,105 @@
[id='metrics-{context}']
== Metrics

This section describes how to deploy a Prometheus server for scraping metrics from the Kafka cluster and showing them using a Grafana dashboard. The resources provided are examples to show how Kafka metrics can be stored in Prometheus: They are not a recommended configuration, and further support should be available from the Prometheus and Grafana communities.
This section describes how to monitor Strimzi Kafka and ZooKeeper clusters using Grafana dashboards.
In order to run the example dashboards you must configure Prometheus server and add the appropriate https://github.com/prometheus/jmx_exporter[Prometheus JMX Exporter] rules to your Kafka cluster resource.

WARNING: The resources referenced in this section serve as a good starting point for setting up monitoring, but they are provided as an example only.
If you require further support on configuration and running Prometheus or Grafana in production then please reach out to their respective communities.

ifdef::InstallationAppendix[]
When adding Prometheus and Grafana servers to an Apache Kafka deployment using `minikube` or `minishift`, the memory available to the virtual machine should be increased (to 4 GB of RAM, for example, instead of the default 2 GB). Information on how to increase the default amount of memory can be found in the following section <<installing_kubernetes_and_openshift_cluster>>.
endif::InstallationAppendix[]

=== Deploying on {OpenShiftName}
=== Kafka Metrics Configuration

==== Prometheus
Strimzi uses the https://github.com/prometheus/jmx_exporter[Prometheus JMX Exporter] to export JMX metrics from Kafka and ZooKeeper to a Prometheus HTTP metrics endpoint that is scraped by Prometheus server.
The Grafana dashboard relies on the Kafka and ZooKeeper Prometheus JMX Exporter relabeling rules defined in the example `Kafka` resource configuration in https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/kafka/kafka-metrics.yaml[`kafka-metrics.yaml`].
Copy this configuration to your own `Kafka` resource definition, or run this example, in order to use the provided Grafana dashboards.

The Prometheus server configuration uses a service discovery feature in order to discover the pods in the cluster from which it gets metrics.
In order to have this feature working, it is necessary for the service account used for running the Prometheus service pod to have access to the API server to get the pod list. By default the service account `prometheus-server` is used.
==== Deploying on {OpenShiftName}

[source,shell]
export NAMESPACE=[namespace]
oc login -u system:admin
oc create sa prometheus-server
oc adm policy add-cluster-role-to-user cluster-reader system:serviceaccount:${NAMESPACE}:prometheus-server
oc login -u developer
To deploy the example Kafka cluster the following command should be executed:

[source,shell,subs=attributes+]
oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/kafka/kafka-metrics.yaml

ifdef::Kubernetes[]
==== Deploying on {KubernetesName}

To deploy the example Kafka cluster the following command should be executed:

where `[namespace]` is the namespace/project where the Apache Kafka cluster was deployed.
[source,shell,subs=attributes+]
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/kafka/kafka-metrics.yaml

Finally, create the Prometheus service by running
endif::Kubernetes[]

[source,shell]
oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/prometheus/kubernetes.yaml
=== Prometheus

==== Grafana
The provided Prometheus https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/prometheus/kubernetes.yaml[`kubernetes.yaml`] YAML file describes all the resources required by Prometheus in order to effectively monitor a Strimzi Kafka & ZooKeeper cluster.
These resources lack important production configuration to run a healthy and highly available Prometheus server.
They should only be used to demonstrate this Grafana dashboard example.

A Grafana server is necessary only to get a visualisation of the Prometheus metrics.
The following resources are defined:

To deploy Grafana on {OpenShiftName}, the following commands should be executed:
* A `ClusterRole` that grants permissions to read Prometheus health endpoints of the Kubernetes system, including cAdvisor and kubelet for container metrics. The Prometheus server configuration uses the Kubernetes service discovery feature in order to discover the pods in the cluster from which it gets metrics. In order to have this feature working, it is necessary for the service account used for running the Prometheus service pod to have access to the API server to get the pod list.
* A `ServiceAccount` for the Prometheus pods to run under.
* A `ClusterRoleBinding` which binds the aforementioned `ClusterRole` to the `ServiceAccount`.
* A `Deployment` to manage the actual Prometheus server pod.
* A `ConfigMap` to manage the configuration of Prometheus Server.
* A `Service` to provide an easy to reference hostname for other services to connect to Prometheus server (such as Grafana).

[source,shell]
oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/grafana/kubernetes.yaml
==== Deploying on {OpenShiftName}

To deploy all these resources you can run the following. Note that this file creates a `ClusterRoleBinding` in the `myproject` namespace. If you're not using this namespace then download the resource file locally and update it.

[source,shell,subs=attributes+]
oc login -u system:admin
oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/prometheus/kubernetes.yaml

ifdef::Kubernetes[]
=== Deploying on {KubernetesName}
==== Deploying on {KubernetesName}

==== Prometheus
To deploy all these resources you can run the following. Note that this file creates a `ClusterRoleBinding` in the `myproject` namespace. If you're not using this namespace then download the resource file locally and update it.

The Prometheus server configuration uses a service discovery feature in order to discover the pods in the cluster from which it gets metrics.
If the RBAC is enabled in your {KubernetesName} deployment then in order to have this feature working, it is necessary for the service account used for running the Prometheus service pod to have access to the API server to get the pod list. By default the service account `prometheus-server` is used.
[source,shell,subs=attributes+]
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/prometheus/kubernetes.yaml

[source,shell]
export NAMESPACE=[namespace]
kubectl create sa prometheus-server
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/prometheus/cluster-reader.yaml
kubectl create clusterrolebinding read-pods-binding --clusterrole=cluster-reader --serviceaccount=${NAMESPACE}:prometheus-server
endif::Kubernetes[]

where `[namespace]` is the namespace/project where the Apache Kafka cluster was deployed.
=== Grafana

Finally, create the Prometheus service by running
A Grafana server is necessary to get a visualisation of the Prometheus metrics. The source for the Grafana docker image used can be found in the `./metrics/examples/grafana/grafana-openshift` directory.

[source,shell]
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/prometheus/kubernetes.yaml
==== Deploying on {OpenShiftName}

==== Grafana
To deploy Grafana the following commands should be executed:

A Grafana server is necessary only to get a visualisation of Prometheus metrics.
[source,shell,subs=attributes+]
oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/grafana/kubernetes.yaml

ifdef::Kubernetes[]
==== Deploying on {KubernetesName}

To deploy Grafana on {KubernetesName}, the following commands should be executed:
To deploy Grafana the following commands should be executed:

[source,shell]
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/grafana/kubernetes.yaml
[source,shell,subs=attributes+]
kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/grafana/kubernetes.yaml

endif::Kubernetes[]

=== Grafana dashboard

As an example, and in order to visualize the exported metrics in Grafana, the simple dashboard https://github.com/strimzi/strimzi-kafka-operator/blob/master/metrics/examples/grafana/kafka-dashboard.json[`kafka-dashboard.json`] file is provided.
The Prometheus data source, and the above dashboard, can be set up in Grafana by following these steps.
As an example, and in order to visualize the exported metrics in Grafana, two sample dashboards are provided https://github.com/strimzi/strimzi-kafka-operator/blob/{ProductVersion}/metrics/examples/grafana/strimzi-kafka.json[`strimzi-kafka.json`] and https://github.com/strimzi/strimzi-kafka-operator/blob/{ProductVersion}/metrics/examples/grafana/strimzi-zookeeper.json[`strimzi-zookeeper.json`].
These dashboards represent a good starting point for key metrics to monitor Kafka and ZooKeeper clusters, but depending on your infrastructure you may need to update or add to them.
Please note that they are not representative of all the metrics available.
No alerting rules are defined.

The Grafana Prometheus data source, and the above dashboards, can be set up in Grafana by following these steps.

NOTE: For accessing the dashboard, you can use the `port-forward` command for forwarding traffic from the Grafana pod to the host. For example, you can access the Grafana UI by running `oc port-forward grafana-1-fbl7s 3000:3000` (or using `kubectl` instead of `oc`) and then pointing a browser to `http://localhost:3000`.

. Access to the Grafana UI using `admin/admin` credentials.
. Access to the Grafana UI using `admin/admin` credentials. On the following view you can choose to skip resetting the admin password, or set it to a password you desire.
+
image::grafana_login.png[Grafana login]

Expand All @@ -87,11 +112,28 @@ image::grafana_home.png[Grafana home]
+
image::grafana_prometheus_data_source.png[Add Prometheus data source]

. From the top left menu, click on "Dashboards" and then "Import" to open the "Import Dashboard" window where the provided https://github.com/strimzi/strimzi-kafka-operator/blob/master/metrics/examples/grafana/kafka-dashboard.json[`kafka-dashboard.json`] file can be imported or its content pasted.
. From the top left menu, click on "Dashboards" and then "Import" to open the "Import Dashboard" window where the provided https://github.com/strimzi/strimzi-kafka-operator/blob/{ProductVersion}/metrics/examples/grafana/strimzi-kafka.json[`strimzi-kafka.json`] and https://github.com/strimzi/strimzi-kafka-operator/blob/{ProductVersion}/metrics/examples/grafana/strimzi-zookeeper.json[`strimzi-zookeeper.json`] files can be imported or their content pasted.
+
image::grafana_import_dashboard.png[Add Grafana dashboard]

. After importing the dashboard, the Grafana home should show with some initial metrics about CPU and JVM memory usage. When the Kafka cluster is used (creating topics and exchanging messages) the other metrics, like messages in and bytes in/out per topic, will be shown.
+
. After importing the dashboards, the Grafana dashboard homepage will now list two dashboards for you to choose from. After your Prometheus server has been collecting metrics for a Strimzi cluster for some time you should see a populated dashboard such as the examples list below.

==== Kafka Dashboard

image::grafana_kafka_dashboard.png[Kafka dashboard]
endif::Kubernetes[]

==== ZooKeeper Dashboard

image::grafana_zookeeper_dashboard.png[ZooKeeper dashboard]

==== Metrics References

To learn more about what metrics are available to monitor for Kafka, ZooKeeper, and Kubernetes in general, please review the following resources.

* http://kafka.apache.org/documentation/#monitoring[Apache Kafka Monitoring] - A list of JMX metrics exposed by Apache Kafka.
It includes a description, JMX mbean name, and in some cases a suggestion on what is a normal value returned.
* https://zookeeper.apache.org/doc/current/zookeeperJMX.html[ZooKeeper JMX] - A list of JMX metrics exposed by Apache ZooKeeper.
* https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/[Prometheus - Monitoring Docker Container Metrics using cAdvisor] - cAdvisor (short for container Advisor) analyzes and exposes resource usage (such as CPU, Memory, and Disk) and performance data from running containers within pods on Kubernetes.
cAdvisor is bundled along with the kubelet binary so that it is automatically available within Kubernetes clusters.
This reference describes how to monitor cAdvisor metrics in various ways using Prometheus.
** https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md[cAdvisor Metrics] - A full list of cAdvisor metrics as exposed through Prometheus.
Binary file modified documentation/images/grafana_import_dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified documentation/images/grafana_kafka_dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified documentation/images/grafana_login.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified documentation/images/grafana_prometheus_data_source.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions metrics/examples/grafana/grafana-openshift/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM centos:7
MAINTAINER Erik Jacobs <erikmjacobs@gmail.com>

USER root
EXPOSE 3000

ENV GRAFANA_VERSION="5.2.4"

ADD root /
RUN yum -y install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-"$GRAFANA_VERSION"-1.x86_64.rpm \
&& yum clean all
COPY run.sh /usr/share/grafana/
RUN /usr/bin/fix-permissions /usr/share/grafana \
&& /usr/bin/fix-permissions /etc/grafana \
&& /usr/bin/fix-permissions /var/lib/grafana \
&& /usr/bin/fix-permissions /var/log/grafana

WORKDIR /usr/share/grafana
ENTRYPOINT ["./run.sh"]
1 change: 1 addition & 0 deletions metrics/examples/grafana/grafana-openshift/README
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Sourced from https://github.com/OpenShiftDemos/grafana-openshift
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Basic Kubernetes service & deployment
apiVersion: v1
kind: Service
metadata:
name: grafana-openshift-server
spec:
ports:
- port: 3000
protocol: TCP
targetPort: 3000
nodePort: 30130
selector:
app: grafana-openshift
component: server
type: NodePort
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: grafana-openshift-server
labels:
app: grafana-openshift
component: server
spec:
replicas: 1
template:
metadata:
labels:
app: grafana-openshift
component: server
spec:
containers:
- image: strimzilabs/grafana-openshift:latest
imagePullPolicy: IfNotPresent
name: grafana-openshift-server
resources:
requests:
cpu: 100m
memory: 250Mi
env:
- name: GF_INSTALL_PLUGINS
value: hawkular-datasource
- name: DATAD
value: /usr/share/grafana/data
- name: PLGND
value: /usr/share/grafana/data/plugins
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: /login
port: 3000
volumeMounts:
- name: grafana-data
mountPath: /usr/share/grafana/data
volumes:
- name: grafana-data
emptyDir: {}
Loading