Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Cluster Monitoring - custom configuration description improvements #12500

Closed
domino7 opened this issue Oct 16, 2018 · 11 comments
Closed
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@domino7
Copy link

domino7 commented Oct 16, 2018

Which section(s) is the issue in?

https://docs.openshift.com/container-platform/3.11/install_config/prometheus_cluster_monitoring.html
https://github.com/openshift/openshift-docs/blob/master/install_config/monitoring/configuring-openshift-cluster-monitoring.adoc
https://github.com/openshift/openshift-docs/blob/master/install_config/monitoring/update-and-compatibility-guarantees.adoc

What needs fixing?

Description how to pause resetting Monitoring stack to default state is misleading.

(...) it is possible to pause this behavior, by setting the paused field in the AppVersion called openshift-monitoring.

I've installed Cluster Monitoring Operator from openshift-ansible 3.10, 3.11 branches.
In Openshift, AppVersion object is not visible. Operators control state of objects using ControllerRevision objects, see below:

$ oc get AppVersion
the server doesn't have a resource type "AppVersion"
$ oc get ControllerRevision
NAME                          KIND
alertmanager-main-b99b6c78c   ControllerRevision.v1.apps
node-exporter-5fc4d87798      ControllerRevision.v1.apps
prometheus-k8s-6d4cd9b8fb     ControllerRevision.v1.apps

Add example (or some links) how to inject custom configuration

Beyond those explicit configuration options, it is possible to inject additional configuration into the stack. However this is unsupported (...)

As for now I was able to stop resetting Prometheus to default state by manually deleting ClusterMonitoringOperator and PrometheusOperator instances (and then reconfiguring CRDs with my configuration). This is probably not the most elegant way to provide custom configuration. I understand that per se customized setup is not supported, however some example how to inject it without Operators deletion will be greatly appreciated here.

@shah-zobair
Copy link

shah-zobair commented Oct 22, 2018

I was able to change following this approach:

Export the CRD object:

oc project openshift-monitoring
oc get prometheusrules prometheus-k8s-rules --export -o yaml > prometheus-k8s-rules.yaml

Change configuration in prometheus-k8s-rules.yaml file.

Replace the CRD object:

oc replace -f prometheus-k8s-rules.yaml

Check the ConfigMap for auto change by operator:

oc get cm

@shah-zobair
Copy link

nvm, the above method didn't work. operator reconciled that object to the previous state.

@shah-zobair
Copy link

I was able to disable prometheusOperator and then it worked.

oc edit cm cluster-monitoring-config

Comment out this section:

    #prometheusOperator:
      #baseImage: registry.redhat.io/openshift3/ose-prometheus-operator
      #prometheusConfigReloaderBaseImage: registry.redhat.io/openshift3/ose-prometheus-config-reloader
      #configReloaderBaseImage: registry.redhat.io/openshift3/ose-configmap-reloader
      #nodeSelector:
        #node-role.kubernetes.io/infra: "true"

@jkupferer
Copy link

I have been exploring this issue and have had no luck getting the approach @shah-zobair mentions to work. It seems that disabling the prometheusOperator section of this configmap only results in some default settings being used by the operator.

So far the only workable solution I have found is to disable the operator entirely with oc scale deployment cluster-monitoring-operator --replicas=0 -n openshift-monitoring.

We really need a mechanism to disable parts of this operator without disabling the whole thing.

@domino7
Copy link
Author

domino7 commented Dec 10, 2018

Results of research from my side:
Default installation of CMO is supported as a read-only setup with configuration hard-coded in CMO image. Operators are responsible for resetting configuration to default state, so modifications are not possible (except configuration options, which are explicitly externalized to CMO configmap).
All configuration options that can be overridden are described here:
https://github.com/openshift/cluster-monitoring-operator/blob/master/Documentation/user-guides/configuring-cluster-monitoring.md
Prometheus Operator in openshift-monitoring namespace is responsible only for managing this single Prometheus instance.

I've tested CMO as a way to monitor entire cluster, including applications monitoring, self-service capabilities, high availability. As a result, there are two ways of achieving it:

@zhangchl007
Copy link

How to customize the config file /etc/prometheus.yaml, It look like in this image registry.redhat.io/openshift3/ose-prometheus-config-reloader , How can I customized this image?

@cbrdy
Copy link

cbrdy commented Jan 17, 2019

I'm also looking to customize prometheus.yaml like @zhangchl007 is requesting. We need a way to use remote_write in prometheus.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2020
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 1, 2020
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants