New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOG-2732: Fix ES servicemonitor for user-workload-monitoring #903
LOG-2732: Fix ES servicemonitor for user-workload-monitoring #903
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: periklis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fa21b26
to
695cc7b
Compare
/retest-required |
/test e2e-upgrade |
1 similar comment
/test e2e-upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works nicely on a 4.10 cluster, but I think the existing code will not work on 4.11 anymore. Not putting an lgtm on it yet to clear up this question first.
} | ||
|
||
var tokenSecret string | ||
for _, oref := range sa.Secrets { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code will probably not work correctly on OCP 4.11 as ServiceAccounts do not get a Secret containing the token by default anymore on Kubernetes 1.24.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I have update the PR to create a ServiceAccountToken Secret manually. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for me on 4.11.
695cc7b
to
d31ca01
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works fine on 4.10, 4.11 cluster is still booting...
d31ca01
to
19f7482
Compare
/lgtm |
/retest |
/hold Investigating e2e failures |
/hold cancel |
/retest-required |
@periklis: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/cherry-pick release-5.4 |
@periklis: #903 failed to apply on top of branch "release-5.4":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Description
For legacy reasons the elasticsearch-operator assumes that
Elasticsearch
and ownedServiceMonitor
resources installed only onopenshift-
namespaces or those annotated withopenshift.io/cluster-monitoring: true
. In both case the cluster-monitoring stack takes responsibility to reconcile theServiceMonitor
resources for the cluster-monitoring Prometheus. In detail ServiceMonitor endpoints used theprometheus-k8s
's serviceaccount token to scrape metrics from elasticsearch and elasticsearch-proxy. This is the legacy and nowadays not-recommended practice that is still sustained in OCP's cluster-monitoring for compatibility reason (i.e. prometheus CRArbitraryFSAccessThroughSMsConfig.Deny: false
)Moving forward in time with the addition of User Workload Monitoring in OCP (since 4.8) the monitoring stack is amended by a second instance of prometheus-operator where
ArbitraryFSAccessThroughSMsConfig.Deny: true
is applied by default. In turn this means that the following fields in ServiceMonitor's are not allowed for use any more:Spec.Endpoints[].TLSConfig.CAFile
: Certificate Authority file for verifying server-side certificates when scraping metrics.Spec.Endpoints[].BearerTokenFile
: Bearer Token file for authorizing against server-side when scraping metrics.In summary this PR makes
ServiceMonitor
resources compliant withArbitraryFSAccessThroughSMsConfig.Deny: true
and in turn extends the support of monitoring Elasticsearch from cluster-monitoring only to cluster-monitoring and user-workload-monitoring. In detail the denied fields forCAFile
andBearerTokenFile
are replaced by:CAFile
the endpoints use a local object reference to a configmap annotated withservice.beta.openshift.io/inject-cabundle: true
BearerTokenFile
the endpoints use a local object reference to the serviceaccount token secret of theelasticsearch-metrics
serviceaccount.Notes for reviewer
To make the above settings work in parallel with cluster-monitoring (i.e. openshift-logging) and user-workload-monitoring (i.e. openshift distributed tracing platform), the PR changes the elasticsearch-proxy backend role mapping from using a cluster-scoped non-resource-url (i.e.
/metrics
) to a custom virtual namespace scoped resource (i.e.elasticsearch.openshift.io/metrics
). This simplifies RBAC by providing for each stack a serviceaccount (elasticsearch-metrics
) and a pair of Role/Rolebinding./cc @xperimental
/cherry-pick release-5.4
Links