-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR required for monitoring setup #8
Comments
@hemajv see if the adr doc for github alerts can be incorporated here |
Since we have decided (relevant issue) to keep a single Prometheus instance that monitors all the other component namespaces. Now we need to decide where to keep our service-monitors/pod-monitors resources. For the ODH components these resources will probably go in the upstream odh-manifests but we will still need to add the following overlay to the upstream monitors: spec:
namespaceSelector:
matchNames:
- opf-observatorium # The namespace where the component is deployed in |
If the service monitors and pod monitors are to go in the same namespace as prometheus, then I think the monitoring folder makes sense. I think we can put them in the base since they aren't environment specific, then inherit them in all the overlays -- wdyt? |
@HumairAK @anishasthana I was thinking of keeping just one servicemonitor resource which looks into multiple namespaces. Maybe we can make this change upstream and only overlay the list of namespaces in operatefirst overlays? |
I like the idea of having one servicemonitor for odh, It would be nice if we could specify a list of namespaces as a parameter in the kfdef that would be populated in the servicemonitor so we don't need to do an override. |
I am not opposed to it, but we should ask the monitoring team what they think. I wonder if it's abusing the idea of service monitors. |
I was able to test a service monitor that was able to monitor different services in multiple namespaces. ---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: multiservice-monitor
labels:
k8s-app: prometheus
namespace: opf-stage
spec:
endpoints:
- port: metrics
- port: 8080-tcp
namespaceSelector:
matchNames:
- opf-observatorium
- opf-jupyterhub
selector: {} This works well. I think we should try to get something like this pushed upstream and only set the |
We have made some progress with setting up monitoring, and have some issues already created in the appropriate repos for creating service monitors, prometheus / grafana deployments etc. What's lacking is a document that adds context to our setup and future plans. For this we should prepare an ADR that outlines our monitoring architecture.
The text was updated successfully, but these errors were encountered: