Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 44 additions & 23 deletions docs/draft/howto/consuming-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The following procedure is provided as an example for testing purposes. Do not d

In OLM v1, you can use the provided metrics with tools such as the [Prometheus Operator][prometheus-operator]. By default, Operator Controller and catalogd export metrics to the `/metrics` endpoint of each service.

You must grant the necessary permissions to access the metrics by using [role-based access control (RBAC) polices][rbac-k8s-docs].
You must grant the necessary permissions to access the metrics by using [role-based access control (RBAC) polices][rbac-k8s-docs]. You will also need to create a `NetworkPolicy` to allow egress traffic from your scraper pod, as the OLM namespace by default allows only `catalogd` and `operator-controller` to send and receive traffic.
Copy link
Contributor

@camilamacedo86 camilamacedo86 Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that is required.
See that the NPs that we have should allow already we scrap the metrics
Also, note that we are calling the metrics endpoint at: https://github.com/operator-framework/operator-controller/blob/main/test/e2e/metrics_test.go and we do not create any new NP

If we break it, then in the downstream we would no longer be able to get the metrics, and that is why we have a test to ensure it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried the guide again without the NetworkPolicy and it does not work. After I apply the NetworkPolicy, it works again.

The reason that the e2e test works is that it puts the curl pod into a random namespace, outside of olmv1-system. If you were to create the pod inside olmv1-system, the tests would fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and to your point on downstream metrics, the reason that also works fine is because the metrics scraper pod does not live in the same namespace as catalogd or operator-controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. 👍
Thank you for the clarification

Because the metrics are exposed over HTTPS by default, you need valid certificates to use the metrics with services such as Prometheus.
The following sections cover enabling metrics, validating access, and provide a reference of a `ServiceMonitor`
to illustrate how you might integrate the metrics with the [Prometheus Operator][prometheus-operator] or other third-part solutions.
Expand All @@ -23,6 +23,25 @@ kubectl create clusterrolebinding operator-controller-metrics-binding \
--serviceaccount=olmv1-system:operator-controller-controller-manager
```

2. Next, create a `NetworkPolicy` to allow the scraper pods to send their scrape requests:

```shell
kubectl apply -f - << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: scraper-policy
namespace: olmv1-system
spec:
podSelector:
matchLabels:
metrics: scraper
policyTypes:
- Egress
egress:
- {} # Allows all egress traffic for metrics requests
EOF
```
### Validating Access Manually

1. Generate a token for the service account and extract the required certificates:
Expand All @@ -41,6 +60,8 @@ kind: Pod
metadata:
name: curl-metrics
namespace: olmv1-system
labels:
metrics: scraper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec:
serviceAccountName: operator-controller-controller-manager
containers:
Expand Down Expand Up @@ -69,28 +90,27 @@ spec:
secretName: olmv1-cert
securityContext:
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
restartPolicy: Never
EOF
```

3. Access the pod:
3. Run the following command using the `TOKEN` value obtained above to check the metrics:

```shell
kubectl exec -it curl-metrics -n olmv1-system -- sh
```

4. Run the following command using the `TOKEN` value obtained above to check the metrics:

```shell
curl -v -k -H "Authorization: Bearer <TOKEN>" \
kubectl exec -it curl-metrics -n olmv1-system -- \
curl -v -k -H "Authorization: Bearer ${TOKEN}" \
https://operator-controller-service.olmv1-system.svc.cluster.local:8443/metrics
```

5. Run the following command to validate the certificates and token:
4. Run the following command to validate the certificates and token:

```shell
kubectl exec -it curl-metrics -n olmv1-system -- \
curl -v --cacert /tmp/cert/ca.crt --cert /tmp/cert/tls.crt --key /tmp/cert/tls.key \
-H "Authorization: Bearer <TOKEN>" \
-H "Authorization: Bearer ${TOKEN}" \
https://operator-controller-service.olmv1-system.svc.cluster.local:8443/metrics
```

Expand Down Expand Up @@ -131,6 +151,8 @@ kind: Pod
metadata:
name: curl-metrics-catalogd
namespace: olmv1-system
labels:
metrics: scraper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not need either

spec:
serviceAccountName: catalogd-controller-manager
containers:
Expand Down Expand Up @@ -159,27 +181,26 @@ spec:
secretName: $OLM_SECRET
securityContext:
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
restartPolicy: Never
EOF
```

4. Access the pod:

```shell
kubectl exec -it curl-metrics-catalogd -n olmv1-system -- sh
```

5. Run the following command using the `TOKEN` value obtained above to check the metrics:
4. Run the following command using the `TOKEN` value obtained above to check the metrics:

```shell
curl -v -k -H "Authorization: Bearer <TOKEN>" \
kubectl exec -it curl-metrics -n olmv1-system -- \
curl -v -k -H "Authorization: Bearer ${TOKEN}" \
https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics
```

6. Run the following command to validate the certificates and token:
5. Run the following command to validate the certificates and token:
```shell
kubectl exec -it curl-metrics -n olmv1-system -- \
curl -v --cacert /tmp/cert/ca.crt --cert /tmp/cert/tls.crt --key /tmp/cert/tls.key \
-H "Authorization: Bearer <TOKEN>" \
-H "Authorization: Bearer ${TOKEN}" \
https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics
```

Expand Down Expand Up @@ -253,7 +274,7 @@ metadata:
spec:
endpoints:
- path: /metrics
port: https
port: metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
Expand All @@ -272,7 +293,7 @@ spec:
key: tls.key
selector:
matchLabels:
control-plane: catalogd-controller-manager
app.kubernetes.io/name: catalogd
EOF
```

Expand Down