strimzi · ppatierno · Sep 21, 2018 · Sep 14, 2018 · Sep 19, 2018 · Sep 20, 2018
diff --git a/documentation/book/appendix_metrics.adoc b/documentation/book/appendix_metrics.adoc
@@ -2,80 +2,105 @@
 [id='metrics-{context}']
 == Metrics
 
-This section describes how to deploy a Prometheus server for scraping metrics from the Kafka cluster and showing them using a Grafana dashboard. The resources provided are examples to show how Kafka metrics can be stored in Prometheus: They are not a recommended configuration, and further support should be available from the Prometheus and Grafana communities.
+This section describes how to monitor Strimzi Kafka and ZooKeeper clusters using Grafana dashboards.
+In order to run the example dashboards you must configure Prometheus server and add the appropriate https://github.com/prometheus/jmx_exporter[Prometheus JMX Exporter] rules to your Kafka cluster resource.
+
+WARNING: The resources referenced in this section serve as a good starting point for setting up monitoring, but they are provided as an example only.
+If you require further support on configuration and running Prometheus or Grafana in production then please reach out to their respective communities.
 
 ifdef::InstallationAppendix[]
 When adding Prometheus and Grafana servers to an Apache Kafka deployment using `minikube` or `minishift`, the memory available to the virtual machine should be increased (to 4 GB of RAM, for example, instead of the default 2 GB). Information on how to increase the default amount of memory can be found in the following section <<installing_kubernetes_and_openshift_cluster>>.
 endif::InstallationAppendix[]
 
-=== Deploying on {OpenShiftName}
+=== Kafka Metrics Configuration
 
-==== Prometheus
+Strimzi uses the https://github.com/prometheus/jmx_exporter[Prometheus JMX Exporter] to export JMX metrics from Kafka and ZooKeeper to a Prometheus HTTP metrics endpoint that is scraped by Prometheus server.
+The Grafana dashboard relies on the Kafka and ZooKeeper Prometheus JMX Exporter relabeling rules defined in the example `Kafka` resource configuration in https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/kafka/kafka-metrics.yaml[`kafka-metrics.yaml`].
+Copy this configuration to your own `Kafka` resource definition, or run this example, in order to use the provided Grafana dashboards.
 
-The Prometheus server configuration uses a service discovery feature in order to discover the pods in the cluster from which it gets metrics.
-In order to have this feature working, it is necessary for the service account used for running the Prometheus service pod to have access to the API server to get the pod list. By default the service account `prometheus-server` is used.
+==== Deploying on {OpenShiftName}
 
-[source,shell]
-export NAMESPACE=[namespace]
-oc login -u system:admin
-oc create sa prometheus-server
-oc adm policy add-cluster-role-to-user cluster-reader system:serviceaccount:${NAMESPACE}:prometheus-server
-oc login -u developer
+To deploy the example Kafka cluster the following command should be executed:
+
+[source,shell,subs=attributes+]
+oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/kafka/kafka-metrics.yaml
+
+ifdef::Kubernetes[]
+==== Deploying on {KubernetesName}
+
+To deploy the example Kafka cluster the following command should be executed:
 
-where `[namespace]` is the namespace/project where the Apache Kafka cluster was deployed.
+[source,shell,subs=attributes+]
+kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/kafka/kafka-metrics.yaml
 
-Finally, create the Prometheus service by running
+endif::Kubernetes[]
 
-[source,shell]
-oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/prometheus/kubernetes.yaml
+=== Prometheus
 
-==== Grafana
+The provided Prometheus https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/prometheus/kubernetes.yaml[`kubernetes.yaml`] YAML file describes all the resources required by Prometheus in order to effectively monitor a Strimzi Kafka & ZooKeeper cluster.
+These resources lack important production configuration to run a healthy and highly available Prometheus server.
+They should only be used to demonstrate this Grafana dashboard example.
 
-A Grafana server is necessary only to get a visualisation of the Prometheus metrics.
+The following resources are defined:
 
-To deploy Grafana on {OpenShiftName}, the following commands should be executed:
+* A `ClusterRole` that grants permissions to read Prometheus health endpoints of the Kubernetes system, including cAdvisor and kubelet for container metrics.  The Prometheus server configuration uses the Kubernetes service discovery feature in order to discover the pods in the cluster from which it gets metrics.  In order to have this feature working, it is necessary for the service account used for running the Prometheus service pod to have access to the API server to get the pod list.
+* A `ServiceAccount` for the Prometheus pods to run under.
+* A `ClusterRoleBinding` which binds the aforementioned `ClusterRole` to the `ServiceAccount`.
+* A `Deployment` to manage the actual Prometheus server pod.
+* A `ConfigMap` to manage the configuration of Prometheus Server.
+* A `Service` to provide an easy to reference hostname for other services to connect to Prometheus server (such as Grafana).
 
-[source,shell]
-oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/grafana/kubernetes.yaml
+==== Deploying on {OpenShiftName}
+
+To deploy all these resources you can run the following.  Note that this file creates a `ClusterRoleBinding` in the `myproject` namespace.  If you're not using this namespace then download the resource file locally and update it.
+
+[source,shell,subs=attributes+]
+oc login -u system:admin
+oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/prometheus/kubernetes.yaml
 
 ifdef::Kubernetes[]
-=== Deploying on {KubernetesName}
+==== Deploying on {KubernetesName}
 
-==== Prometheus
+To deploy all these resources you can run the following.  Note that this file creates a `ClusterRoleBinding` in the `myproject` namespace.  If you're not using this namespace then download the resource file locally and update it.
 
-The Prometheus server configuration uses a service discovery feature in order to discover the pods in the cluster from which it gets metrics.
-If the RBAC is enabled in your {KubernetesName} deployment then in order to have this feature working, it is necessary for the service account used for running the Prometheus service pod to have access to the API server to get the pod list. By default the service account `prometheus-server` is used.
+[source,shell,subs=attributes+]
+kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/prometheus/kubernetes.yaml
 
-[source,shell]
-export NAMESPACE=[namespace]
-kubectl create sa prometheus-server
-kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/prometheus/cluster-reader.yaml
-kubectl create clusterrolebinding read-pods-binding --clusterrole=cluster-reader --serviceaccount=${NAMESPACE}:prometheus-server
+endif::Kubernetes[]
 
-where `[namespace]` is the namespace/project where the Apache Kafka cluster was deployed.
+=== Grafana
 
-Finally, create the Prometheus service by running
+A Grafana server is necessary to get a visualisation of the Prometheus metrics.  The source for the Grafana docker image used can be found in the `./metrics/examples/grafana/grafana-openshift` directory.
 
-[source,shell]
-kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/prometheus/kubernetes.yaml
+==== Deploying on {OpenShiftName}
 
-==== Grafana
+To deploy Grafana the following commands should be executed:
 
-A Grafana server is necessary only to get a visualisation of Prometheus metrics.
+[source,shell,subs=attributes+]
+oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/grafana/kubernetes.yaml
+
+ifdef::Kubernetes[]
+==== Deploying on {KubernetesName}
 
-To deploy Grafana on {KubernetesName}, the following commands should be executed:
+To deploy Grafana the following commands should be executed:
 
-[source,shell]
-kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/metrics/examples/grafana/kubernetes.yaml
+[source,shell,subs=attributes+]
+kubectl apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/{ProductVersion}/metrics/examples/grafana/kubernetes.yaml
+
+endif::Kubernetes[]
 
 === Grafana dashboard
 
-As an example, and in order to visualize the exported metrics in Grafana, the simple dashboard https://github.com/strimzi/strimzi-kafka-operator/blob/master/metrics/examples/grafana/kafka-dashboard.json[`kafka-dashboard.json`] file is provided.
-The Prometheus data source, and the above dashboard, can be set up in Grafana by following these steps.
+As an example, and in order to visualize the exported metrics in Grafana, two sample dashboards are provided https://github.com/strimzi/strimzi-kafka-operator/blob/{ProductVersion}/metrics/examples/grafana/strimzi-kafka.json[`strimzi-kafka.json`] and https://github.com/strimzi/strimzi-kafka-operator/blob/{ProductVersion}/metrics/examples/grafana/strimzi-zookeeper.json[`strimzi-zookeeper.json`].
+These dashboards represent a good starting point for key metrics to monitor Kafka and ZooKeeper clusters, but depending on your infrastructure you may need to update or add to them.
+Please note that they are not representative of all the metrics available.
+No alerting rules are defined.
+
+The Grafana Prometheus data source, and the above dashboards, can be set up in Grafana by following these steps.
 
 NOTE: For accessing the dashboard, you can use the `port-forward` command for forwarding traffic from the Grafana pod to the host. For example, you can access the Grafana UI by running `oc port-forward grafana-1-fbl7s 3000:3000` (or using `kubectl` instead of `oc`) and then pointing a browser to `http://localhost:3000`.
 
-. Access to the Grafana UI using `admin/admin` credentials.
+. Access to the Grafana UI using `admin/admin` credentials.  On the following view you can choose to skip resetting the admin password, or set it to a password you desire.
 +
 image::grafana_login.png[Grafana login]
 
@@ -87,11 +112,28 @@ image::grafana_home.png[Grafana home]
 +
 image::grafana_prometheus_data_source.png[Add Prometheus data source]
 
-. From the top left menu, click on "Dashboards" and then "Import" to open the "Import Dashboard" window where the provided https://github.com/strimzi/strimzi-kafka-operator/blob/master/metrics/examples/grafana/kafka-dashboard.json[`kafka-dashboard.json`] file can be imported or its content pasted.
+. From the top left menu, click on "Dashboards" and then "Import" to open the "Import Dashboard" window where the provided https://github.com/strimzi/strimzi-kafka-operator/blob/{ProductVersion}/metrics/examples/grafana/strimzi-kafka.json[`strimzi-kafka.json`] and https://github.com/strimzi/strimzi-kafka-operator/blob/{ProductVersion}/metrics/examples/grafana/strimzi-zookeeper.json[`strimzi-zookeeper.json`] files can be imported or their content pasted.
 +
 image::grafana_import_dashboard.png[Add Grafana dashboard]
 
-. After importing the dashboard, the Grafana home should show with some initial metrics about CPU and JVM memory usage. When the Kafka cluster is used (creating topics and exchanging messages) the other metrics, like messages in and bytes in/out per topic, will be shown.
-+
+. After importing the dashboards, the Grafana dashboard homepage will now list two dashboards for you to choose from.  After your Prometheus server has been collecting metrics for a Strimzi cluster for some time you should see a populated dashboard such as the examples list below.
+
+==== Kafka Dashboard
+
 image::grafana_kafka_dashboard.png[Kafka dashboard]
-endif::Kubernetes[]
+
+==== ZooKeeper Dashboard
+
+image::grafana_zookeeper_dashboard.png[ZooKeeper dashboard]
+
+==== Metrics References
+
+To learn more about what metrics are available to monitor for Kafka, ZooKeeper, and Kubernetes in general, please review the following resources.
+
+* http://kafka.apache.org/documentation/#monitoring[Apache Kafka Monitoring] - A list of JMX metrics exposed by Apache Kafka.
+It includes a description, JMX mbean name, and in some cases a suggestion on what is a normal value returned.
+* https://zookeeper.apache.org/doc/current/zookeeperJMX.html[ZooKeeper JMX] - A list of JMX metrics exposed by Apache ZooKeeper.
+* https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/[Prometheus - Monitoring Docker Container Metrics using cAdvisor] - cAdvisor (short for container Advisor) analyzes and exposes resource usage (such as CPU, Memory, and Disk) and performance data from running containers within pods on Kubernetes.
+cAdvisor is bundled along with the kubelet binary so that it is automatically available within Kubernetes clusters.
+This reference describes how to monitor cAdvisor metrics in various ways using Prometheus.
+** https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md[cAdvisor Metrics] - A full list of cAdvisor metrics as exposed through Prometheus.
diff --git a/documentation/images/grafana_import_dashboard.png b/documentation/images/grafana_import_dashboard.png
diff --git a/documentation/images/grafana_kafka_dashboard.png b/documentation/images/grafana_kafka_dashboard.png
diff --git a/documentation/images/grafana_login.png b/documentation/images/grafana_login.png
diff --git a/documentation/images/grafana_prometheus_data_source.png b/documentation/images/grafana_prometheus_data_source.png
diff --git a/documentation/images/grafana_zookeeper_dashboard.png b/documentation/images/grafana_zookeeper_dashboard.png
diff --git a/metrics/examples/grafana/grafana-openshift/Dockerfile b/metrics/examples/grafana/grafana-openshift/Dockerfile
@@ -0,0 +1,19 @@
+FROM centos:7
+MAINTAINER Erik Jacobs <erikmjacobs@gmail.com>
+
+USER root
+EXPOSE 3000
+
+ENV GRAFANA_VERSION="5.2.4"
+
+ADD root /
+RUN yum -y install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-"$GRAFANA_VERSION"-1.x86_64.rpm \
+    && yum clean all
+COPY run.sh /usr/share/grafana/
+RUN /usr/bin/fix-permissions /usr/share/grafana \
+    && /usr/bin/fix-permissions /etc/grafana \
+    && /usr/bin/fix-permissions /var/lib/grafana \
+    && /usr/bin/fix-permissions /var/log/grafana
+
+WORKDIR /usr/share/grafana
+ENTRYPOINT ["./run.sh"]
diff --git a/metrics/examples/grafana/grafana-openshift/README b/metrics/examples/grafana/grafana-openshift/README
@@ -0,0 +1 @@
+Sourced from https://github.com/OpenShiftDemos/grafana-openshift
diff --git a/metrics/examples/grafana/grafana-openshift/openshift-grafana-deployment.yaml b/metrics/examples/grafana/grafana-openshift/openshift-grafana-deployment.yaml
@@ -0,0 +1,58 @@
+# Basic Kubernetes service & deployment
+apiVersion: v1
+kind: Service
+metadata:
+  name: grafana-openshift-server
+spec:
+  ports:
+  - port: 3000
+    protocol: TCP
+    targetPort: 3000
+    nodePort: 30130
+  selector:
+    app: grafana-openshift
+    component: server
+  type: NodePort
+---
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+  name: grafana-openshift-server
+  labels:
+    app: grafana-openshift
+    component: server
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: grafana-openshift
+        component: server
+    spec:
+      containers:
+      - image: strimzilabs/grafana-openshift:latest
+        imagePullPolicy: IfNotPresent
+        name: grafana-openshift-server
+        resources:
+          requests:
+            cpu: 100m
+            memory: 250Mi
+        env:
+          - name: GF_INSTALL_PLUGINS
+            value: hawkular-datasource
+          - name: DATAD
+            value: /usr/share/grafana/data
+          - name: PLGND
+            value: /usr/share/grafana/data/plugins
+        ports:
+          - containerPort: 3000
+        readinessProbe:
+          httpGet:
+            path: /login
+            port: 3000
+        volumeMounts:
+        - name: grafana-data
+          mountPath: /usr/share/grafana/data
+      volumes:
+        - name: grafana-data
+          emptyDir: {}