# Prometheus Operator

[Prometheus Operator github repo](https://github.com/prometheus-operator/prometheus-operator)

The Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes. It serves to make running Prometheus on top of Kubernetes as easy as possible, while preserving Kubernetes-native configuration options.

### Objective of this notebook

Installing the Prometheus operator in a k8s cluster and deploying some pods as targets according to the [getting started user guid](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md_)

### Links

* [The actual operator source code in golang](https://github.com/prometheus-operator/prometheus-operator/blob/main/cmd/operator/main.go)


### Observations

* The [example](https://github.com/prometheus-operator/prometheus-operator/blob/main/bundle.yaml) with all manifests combined has more than 27,000 lines

### Installing the prometheus operator in an example configuration

In [3]:
! kubectl get ns

NAME                 STATUS   AGE
default              Active   4h21m
kube-node-lease      Active   4h21m
kube-public          Active   4h21m
kube-system          Active   4h21m
local-path-storage   Active   4h18m


In [8]:
! mkdir /tmp/po && cd /tmp/po && wget https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/f6d0b666955a233374667f402e9c8ff09aa4d5ec/bundle.yaml -O bundle.yaml
! kubectl apply --server-side=true -f /tmp/po/bundle.yaml

--2022-06-21 12:36:57--  https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/f6d0b666955a233374667f402e9c8ff09aa4d5ec/bundle.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... xxx.xxx.109.133, xxx.xxx.108.133, xxx.xxx.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|xxx.xxx.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1652484 (1.6M) [text/plain]
Saving to: ‘bundle.yaml’


2022-06-21 12:36:58 (9.35 MB/s) - ‘bundle.yaml’ saved [1652484/1652484]

customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.

In [9]:
! kubectl get ns

NAME                 STATUS   AGE
default              Active   4h26m
kube-node-lease      Active   4h26m
kube-public          Active   4h26m
kube-system          Active   4h26m
local-path-storage   Active   4h24m


In [29]:
! kubectl get all -o wide

NAME                                       READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
pod/prometheus-operator-567cd8b6f6-t45d4   1/1     Running   0          11m   10.233.96.2   node2   <none>           <none>

NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE     SELECTOR
service/kubernetes            ClusterIP   10.233.0.1   <none>        443/TCP    4h38m   <none>
service/prometheus-operator   ClusterIP   None         <none>        8080/TCP   11m     app.kubernetes.io/component=controller,app.kubernetes.io/name=prometheus-operator

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS            IMAGES                                                    SELECTOR
deployment.apps/prometheus-operator   1/1     1            1           11m   prometheus-operator   quay.io/prometheus-operator/prometheus-operator:v0.57.0   app.kubernetes.io/component=controller,app.kubernetes.io

In [28]:
# What's running in the prometheus operator container ?
! kubectl exec pod/prometheus-operator-567cd8b6f6-t45d4 -- ps aux

PID   USER     TIME  COMMAND
    1 nobody    0:00 /bin/operator --kubelet-service=kube-system/kubelet --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.57.0
   64 nobody    0:00 ps aux


### Using the operator

The Prometheus Operator introduces additional resources in Kubernetes to declare the desired state of a Prometheus and Alertmanager cluster as well as the Prometheus configuration. The resources it introduces are:

* Prometheus
* Alertmanager
* ServiceMonitor

The Prometheus resource declaratively describes the desired state of a Prometheus deployment, while a ServiceMonitor describes the set of targets to be monitored by Prometheus.

...

The Prometheus resource includes a field called serviceMonitorSelector, which defines a selection of ServiceMonitors to be used.

In [58]:
"""
If RBAC authorization is activated, you must create RBAC rules for both Prometheus and Prometheus Operator. 
A ClusterRole and a ClusterRoleBinding for the Prometheus Operator were created in the example 
Prometheus Operator manifest above. The same must be done for the Prometheus Pods.
"""

content="""
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

""".strip("\n")

with open('/tmp/po_auth.yaml', 'w') as f:
    f.write(content)

In [33]:
! kubectl apply --server-side=true -f /tmp/po_auth.yaml

serviceaccount/prometheus serverside-applied
clusterrole.rbac.authorization.k8s.io/prometheus serverside-applied
clusterrolebinding.rbac.authorization.k8s.io/prometheus serverside-applied


In [34]:
content="""
# First, deploy three instances of a simple example application, which listens and exposes metrics on port 8080

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      containers:
      - name: example-app
        image: fabxc/instrumented_app
        ports:
        - name: web
          containerPort: 8080

---
# The ServiceMonitor has a label selector to select Services and their underlying Endpoint objects.

kind: Service
apiVersion: v1
metadata:
  name: example-app
  labels:
    app: example-app
spec:
  selector:
    app: example-app
  ports:
  - name: web
    port: 8080

---
# This Service object is discovered by a ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

""".strip("\n")

with open('/tmp/po_app.yaml', 'w') as f:
    f.write(content)

In [35]:
! kubectl apply --server-side=true -f /tmp/po_app.yaml

deployment.apps/example-app serverside-applied
service/example-app serverside-applied
servicemonitor.monitoring.coreos.com/example-app serverside-applied


In [42]:
! kubectl get po -o wide

NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
example-app-56cc7f77dd-4d2qm           1/1     Running   0          98s   10.233.92.3   node3   <none>           <none>
example-app-56cc7f77dd-svsxv           1/1     Running   0          98s   10.233.96.3   node2   <none>           <none>
example-app-56cc7f77dd-wkkhv           1/1     Running   0          98s   10.233.92.2   node3   <none>           <none>
prometheus-operator-567cd8b6f6-t45d4   1/1     Running   0          62m   10.233.96.2   node2   <none>           <none>


In [41]:
! kubectl get svc

NAME                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
example-app           ClusterIP   10.233.0.216   <none>        8080/TCP   78s
kubernetes            ClusterIP   10.233.0.1     <none>        443/TCP    5h28m
prometheus-operator   ClusterIP   None           <none>        8080/TCP   62m


In [40]:
! kubectl get servicemonitors

NAME          AGE
example-app   57s


In [56]:
# checking if service exposes metrics
! ssh -q node1 'curl -s 10.233.0.216:8080/metrics | head'

# HELP codelab_api_http_requests_in_progress The current number of API HTTP requests in progress.
# TYPE codelab_api_http_requests_in_progress gauge
codelab_api_http_requests_in_progress 1
# HELP codelab_api_request_duration_seconds A histogram of the API HTTP request durations in seconds.
# TYPE codelab_api_request_duration_seconds histogram
codelab_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0001"} 0
codelab_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.00015000000000000001"} 0
codelab_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.00022500000000000002"} 0
codelab_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0003375"} 0
codelab_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.00050625"} 0


In [71]:
# Please note that at this point there is no prometheus server instance running
! kubectl get po

NAME                                   READY   STATUS    RESTARTS   AGE
example-app-56cc7f77dd-4d2qm           1/1     Running   0          33m
example-app-56cc7f77dd-svsxv           1/1     Running   0          33m
example-app-56cc7f77dd-wkkhv           1/1     Running   0          33m
prometheus-operator-567cd8b6f6-t45d4   1/1     Running   0          93m


In [73]:
content="""
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  labels:
    prometheus: prometheus
spec:
  replicas: 1
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  alerting:
    alertmanagers:
    - namespace: default
      name: alertmanager
      port: web
""".strip("\n")

with open('/tmp/po_pm.yaml', 'w') as f:
    f.write(content)

In [74]:
! kubectl apply --server-side=true -f /tmp/po_pm.yaml

prometheus.monitoring.coreos.com/prometheus serverside-applied


In [90]:
# Now we have a prometheus pod
! kubectl get po -o wide

NAME                                   READY   STATUS    RESTARTS   AGE     IP            NODE    NOMINATED NODE   READINESS GATES
example-app-56cc7f77dd-4d2qm           1/1     Running   0          42m     10.233.92.3   node3   <none>           <none>
example-app-56cc7f77dd-svsxv           1/1     Running   0          42m     10.233.96.3   node2   <none>           <none>
example-app-56cc7f77dd-wkkhv           1/1     Running   0          42m     10.233.92.2   node3   <none>           <none>
prometheus-operator-567cd8b6f6-t45d4   1/1     Running   0          103m    10.233.96.2   node2   <none>           <none>
prometheus-prometheus-0                2/2     Running   0          6m50s   10.233.96.4   node2   <none>           <none>


In [94]:
# Get targets discovered by prometheus server
! ssh -q node1 "apt install jq -y"
! ssh -q node1 "curl -s  http://10.233.96.4:9090/api/v1/targets | jq"



Reading package lists...
Building dependency tree...
Reading state information...
jq is already the newest version (1.6-1ubuntu0.20.04.1).
0 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.
{
  "status": "success",
  "data": {
    "activeTargets": [
      {
        "discoveredLabels": {
          "__address__": "10.233.92.2:8080",
          "__meta_kubernetes_endpoint_address_target_kind": "Pod",
          "__meta_kubernetes_endpoint_address_target_name": "example-app-56cc7f77dd-wkkhv",
          "__meta_kubernetes_endpoint_node_name": "node3",
          "__meta_kubernetes_endpoint_port_name": "web",
          "__meta_kubernetes_endpoint_port_protocol": "TCP",
          "__meta_kubernetes_endpoint_ready": "true",
          "__meta_kubernetes_endpoints_label_app": "example-app",
          "__meta_kubernetes_endpoints_labelpresent_app": "true",
          "__meta_kubernetes_endpoints_name": "example-app",
          "__meta_kubernetes_namespace": "default",
          "__meta

In [97]:
# Get those metrics the prometheus server expose about itself
# Note that the output was filtered by the metrics orginating from the serviceMonitor
! ssh -q node1 "curl -s  http://10.233.96.4:9090/metrics | grep serviceMonitor/default"

net_conntrack_dialer_conn_attempted_total{dialer_name="serviceMonitor/default/example-app/0"} 3
net_conntrack_dialer_conn_closed_total{dialer_name="serviceMonitor/default/example-app/0"} 0
net_conntrack_dialer_conn_established_total{dialer_name="serviceMonitor/default/example-app/0"} 3
net_conntrack_dialer_conn_failed_total{dialer_name="serviceMonitor/default/example-app/0",reason="refused"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="serviceMonitor/default/example-app/0",reason="resolution"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="serviceMonitor/default/example-app/0",reason="timeout"} 0
net_conntrack_dialer_conn_failed_total{dialer_name="serviceMonitor/default/example-app/0",reason="unknown"} 0
prometheus_sd_discovered_targets{config="serviceMonitor/default/example-app/0",name="scrape"} 7
prometheus_target_metadata_cache_bytes{scrape_job="serviceMonitor/default/example-app/0"} 5565
prometheus_target_metadata_cache_entries{scrape_job="serviceMonitor/default/exa