Skip to content

Latest commit

 

History

History
73 lines (57 loc) · 3.37 KB

0006-monitoring-structure.md

File metadata and controls

73 lines (57 loc) · 3.37 KB

Application Monitoring in Operate First using Prometheus

  • Status: accepted
  • Deciders: 4n4nd, anishasthana, hemajv, HumairAK, tumido
  • Date: 2021-01-14

Technical Story: issue-1 issue-2

Context and Problem Statement

Since there will be multiple services deployed in the Operate First environment (ex. Jupyterhub, Argo, Superset, Observatorium, etc) distributed over various namespaces, we need to be able to monitor them.

To do so we will use Prometheus. This will be deployed using the ODH Operator (ODH operator includes the Prometheus Operator to do this).

Normally, Prometheus Operator would have been able to monitor services across namespaces using cluster-wide access, but the recent versions of Openshift have an instance of Prometheus Operator pre-installed for cluster monitoring. As such we cannot do a cluster-wide Prometheus Operator setup since this would conflict with the cluster monitoring instance.

We also cannot use Operator Groups to do a multiple namespace installation, since selection of multiple target namespaces using operator groups might be deprecated in a future release.

One possible solution is User Workload Monitoring, but it is still a feature preview in Openshift v4.5 and no documentation is available for UWM in Openshift v4.6 yet.

So the main question here is how do we structure the Prometheus-Operator/Prometheus Deployments in Operate First?

Decision Drivers

  • Access to cluster-wide permissions
    • The Prometheus instance might not have cluster wide resource access
  • Prometheus resource locations
    • Where do we keep the monitoring resources like servicemonitors/podmonitors?
    • Whose responsibility is it to create these monitoring resources?
  • Complexity of Prometheus deployments
    • How many instances of Prometheus do we deploy?
    • Which namespaces should these Prometheus instances be deployed in?

Considered Options

  • Option 1:
    • Single Prometheus instance in a dedicated monitoring namespace
    • Servicemonitors/podmonitors can be kept in the monitoring namespace
    • Give Prometheus serviceaccount the roles to access services in other namespaces
  • Option 2:
    • Each namespace with a service has an instance of Prometheus
    • All servicemonitors/podmonitors will need to be kept in their respective service namespaces
    • One main Prometheus will federate metrics from all the instances of Prometheus
  • Option 3:
    • Single Prometheus instance in a dedicated monitoring namespace
    • Give Prometheus serviceaccount the clusterrole to access services and other monitoring resources cluster wide
    • Servicemonitors/podmonitors can be kept in any namespace

Decision Outcome

Chosen option: Option 1, because:

  • Set up is less complex than Option 2, only a single Prometheus instance is needed
  • No clusterroles are required like Option 3, only roles to access specific namespaces are required