Prometheus Cluster Monitoring - custom configuration description improvements #12500

domino7 · 2018-10-16T12:30:45Z

Which section(s) is the issue in?

https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html
https://github.com/openshift/openshift-docs/blob/master/install_config/monitoring/configuring-openshift-cluster-monitoring.adoc
https://github.com/openshift/openshift-docs/blob/master/install_config/monitoring/update-and-compatibility-guarantees.adoc

What needs fixing?

Description how to pause resetting Monitoring stack to default state is misleading.

(...) it is possible to pause this behavior, by setting the paused field in the AppVersion called openshift-monitoring.

I've installed Cluster Monitoring Operator from openshift-ansible 3.10, 3.11 branches.
In Openshift, AppVersion object is not visible. Operators control state of objects using ControllerRevision objects, see below:

$ oc get AppVersion
the server doesn't have a resource type "AppVersion"
$ oc get ControllerRevision
NAME                          KIND
alertmanager-main-b99b6c78c   ControllerRevision.v1.apps
node-exporter-5fc4d87798      ControllerRevision.v1.apps
prometheus-k8s-6d4cd9b8fb     ControllerRevision.v1.apps

Add example (or some links) how to inject custom configuration

Beyond those explicit configuration options, it is possible to inject additional configuration into the stack. However this is unsupported (...)

As for now I was able to stop resetting Prometheus to default state by manually deleting ClusterMonitoringOperator and PrometheusOperator instances (and then reconfiguring CRDs with my configuration). This is probably not the most elegant way to provide custom configuration. I understand that per se customized setup is not supported, however some example how to inject it without Operators deletion will be greatly appreciated here.

The text was updated successfully, but these errors were encountered:

shah-zobair · 2018-10-22T20:39:26Z

I was able to change following this approach:

Export the CRD object:

oc project openshift-monitoring
oc get prometheusrules prometheus-k8s-rules --export -o yaml > prometheus-k8s-rules.yaml

Change configuration in prometheus-k8s-rules.yaml file.

Replace the CRD object:

oc replace -f prometheus-k8s-rules.yaml

Check the ConfigMap for auto change by operator:

oc get cm

shah-zobair · 2018-10-23T19:44:10Z

nvm, the above method didn't work. operator reconciled that object to the previous state.

shah-zobair · 2018-10-24T13:50:31Z

I was able to disable prometheusOperator and then it worked.

oc edit cm cluster-monitoring-config

Comment out this section:

    #prometheusOperator:
      #baseImage: registry.redhat.io/openshift3/ose-prometheus-operator
      #prometheusConfigReloaderBaseImage: registry.redhat.io/openshift3/ose-prometheus-config-reloader
      #configReloaderBaseImage: registry.redhat.io/openshift3/ose-configmap-reloader
      #nodeSelector:
        #node-role.kubernetes.io/infra: "true"

jkupferer · 2018-12-06T20:58:43Z

I have been exploring this issue and have had no luck getting the approach @shah-zobair mentions to work. It seems that disabling the prometheusOperator section of this configmap only results in some default settings being used by the operator.

So far the only workable solution I have found is to disable the operator entirely with oc scale deployment cluster-monitoring-operator --replicas=0 -n openshift-monitoring.

We really need a mechanism to disable parts of this operator without disabling the whole thing.

domino7 · 2018-12-10T12:32:10Z

Results of research from my side:
Default installation of CMO is supported as a read-only setup with configuration hard-coded in CMO image. Operators are responsible for resetting configuration to default state, so modifications are not possible (except configuration options, which are explicitly externalized to CMO configmap).
All configuration options that can be overridden are described here:
https://github.com/openshift/cluster-monitoring-operator/blob/master/Documentation/user-guides/configuring-cluster-monitoring.md
Prometheus Operator in openshift-monitoring namespace is responsible only for managing this single Prometheus instance.

I've tested CMO as a way to monitor entire cluster, including applications monitoring, self-service capabilities, high availability. As a result, there are two ways of achieving it:

to provide custom image of Cluster Monitoring Operator. You can find more details how to override default setup here: https://github.com/domino7/cluster-monitoring-operator/commits/ngp-release-3.11
to install default setup as it is and provide custom installation of Prometheus and Prometheus Operator on your own. There's Operator Framework currently in Tech Preview (Openshift 3.11) described as a cluster operators manager - see: https://docs.openshift.com/container-platform/3.11/install_config/installing-operator-framework.html. Direct Prometheus Operator usage is a better option today, e.g. Operator's Framework images are Alpine not RHEL based.

zhangchl007 · 2019-01-03T09:54:45Z

How to customize the config file /etc/prometheus.yaml, It look like in this image registry.redhat.io/openshift3/ose-prometheus-config-reloader , How can I customized this image?

cbrdy · 2019-01-17T22:02:36Z

I'm also looking to customize prometheus.yaml like @zhangchl007 is requesting. We need a way to use remote_write in prometheus.

openshift-bot · 2020-09-01T09:55:45Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-10-01T11:45:10Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-10-31T13:32:57Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-10-31T13:33:13Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

domino7 mentioned this issue Oct 17, 2018

Reconfigure default setup of Cluster Monitoring Operator openshift/cluster-monitoring-operator#128

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 1, 2020

openshift-ci-robot closed this as completed Oct 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Cluster Monitoring - custom configuration description improvements #12500

Prometheus Cluster Monitoring - custom configuration description improvements #12500

domino7 commented Oct 16, 2018

shah-zobair commented Oct 22, 2018 •

edited

Loading

shah-zobair commented Oct 23, 2018

shah-zobair commented Oct 24, 2018

jkupferer commented Dec 6, 2018

domino7 commented Dec 10, 2018

zhangchl007 commented Jan 3, 2019

cbrdy commented Jan 17, 2019

openshift-bot commented Sep 1, 2020

openshift-bot commented Oct 1, 2020

openshift-bot commented Oct 31, 2020

openshift-ci-robot commented Oct 31, 2020

Prometheus Cluster Monitoring - custom configuration description improvements #12500

Prometheus Cluster Monitoring - custom configuration description improvements #12500

Comments

domino7 commented Oct 16, 2018

Which section(s) is the issue in?

What needs fixing?

shah-zobair commented Oct 22, 2018 • edited Loading

shah-zobair commented Oct 23, 2018

shah-zobair commented Oct 24, 2018

jkupferer commented Dec 6, 2018

domino7 commented Dec 10, 2018

zhangchl007 commented Jan 3, 2019

cbrdy commented Jan 17, 2019

openshift-bot commented Sep 1, 2020

openshift-bot commented Oct 1, 2020

openshift-bot commented Oct 31, 2020

openshift-ci-robot commented Oct 31, 2020

shah-zobair commented Oct 22, 2018 •

edited

Loading