Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MON-3229: Remove the dependency on the apiserver auth #1904

Conversation

marioferh
Copy link
Contributor

@marioferh marioferh commented Feb 27, 2023

Description

When the control plane nodes are under pressure or the apiserver is just not available, no telemetry data is emitted by the monitoring stack

Solution

Remove the dependency on the apiserver would be to use mTLS communication between telemeter-client and the Prometheus pods.

Add /federate endpoint to rbac proxy and allow telemeter-client to authenticate via mTLS to reach Prometheus metrics.

Add mTLS auth to telemeter-client openshift/telemeter#455

Type of change

  • Remove the dependency on the apiserver auth
  • kube-rbac-proxy exposing the /federate endpoint
  • Add tests
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2023
@simonpasquier
Copy link
Contributor

I think that we also need to update the telemeter client so it can authenticate against the Prometheus server using a client TLS certificate. Right now, the forwarder package (which reads the metrics from the /federate endpoint) only supports bearer token authentication.

https://github.com/openshift/telemeter/blob/ee1ba4699b82ecb2033f886af12b9e2e451d2296/pkg/forwarder/forwarder.go#L149-L182

@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch from 14135da to 933033b Compare March 15, 2023 08:58
@marioferh
Copy link
Contributor Author

Testing TLS changes
openshift/telemeter#455

@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch 2 times, most recently from 191cf78 to 96085b5 Compare April 11, 2023 12:57
@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch from 96085b5 to 8f96539 Compare April 11, 2023 13:18
@marioferh
Copy link
Contributor Author

depends on openshift/telemeter#457

@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch 2 times, most recently from 50722bb to 5ecd763 Compare April 17, 2023 14:39
@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch 2 times, most recently from c2041ff to 440d643 Compare April 20, 2023 16:50
@marioferh marioferh changed the title WIP Remove the dependency on the apiserver auth Remove the dependency on the apiserver auth Apr 21, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2023
@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch 2 times, most recently from 76964fd to c0ac7c5 Compare May 9, 2023 14:25
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2023
@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch from c0ac7c5 to 5baef0d Compare May 9, 2023 15:36
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2023
@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch 2 times, most recently from a49143f to 419d5ce Compare May 9, 2023 19:34
@marioferh
Copy link
Contributor Author

/retest-required

1 similar comment
@marioferh
Copy link
Contributor Author

/retest-required

Signed-off-by: Mario Fernandez <mariofer@redhat.com>
@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch from ba85332 to 5c08866 Compare June 12, 2023 10:53
@marioferh
Copy link
Contributor Author

it seems as everything is working with last patch of openshift/telemeter#455
As example:
with cmd: curl --connect-timeout 5 -v -s --fail --cert /etc/tls/private/tls.crt --key /etc/tls/private/tls.key --cacert ./etc/serving-certs-ca-bundle/service-ca.crt "https://prometheus-k8s.openshift-monitoring.svc:9092/federate?" -G --data-urlencode 'match[]={job="telemeter-client"}'

# TYPE metricsclient_request_send untyped                                              
metricsclient_request_send{client="federate_to",container="kube-rbac-proxy",endpoint="https",instance="10.131.0.29:8443",job="telemeter-client",namespace="openshift-monitoring",pod="telemeter-client-858d8d9c5b-hwxq2",service="telemeter-client",status_code="200",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-1"} 2 1686586
998902          

After this PR is merged I can add some test in telemeter side. Also this tests in tests origin is checking always that cmo and telemeter federate scraping is working right: https://github.com/openshift/origin/blob/master/test/extended/prometheus/prometheus.go#L276-L305

Signed-off-by: Mario Fernandez <mariofer@redhat.com>
@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch 3 times, most recently from cf6388a to 1ec291a Compare June 14, 2023 12:50
@marioferh
Copy link
Contributor Author

/retest

@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch from 1ec291a to b38647f Compare June 15, 2023 08:18
@marioferh
Copy link
Contributor Author

/test e2e-aws-ovn-single-node

@raptorsun
Copy link
Contributor

raptorsun commented Jun 19, 2023

/retitle MON-3229: Remove the dependency on the apiserver auth

@openshift-ci openshift-ci bot changed the title Remove the dependency on the apiserver auth MON-3229: Remove the dependency on the apiserver auth Jun 19, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 19, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 19, 2023

@marioferh: This pull request references MON-3229 which is a valid jira issue.

In response to this:

Description

When the control plane nodes are under pressure or the apiserver is just not available, no telemetry data is emitted by the monitoring stack

Solution

Remove the dependency on the apiserver would be to use mTLS communication between telemeter-client and the Prometheus pods.

Add /federate endpoint to rbac proxy and allow telemeter-client to authenticate via mTLS to reach Prometheus metrics.

Add mTLS auth to telemeter-client openshift/telemeter#455

Type of change

  • Remove the dependency on the apiserver auth
  • kube-rbac-proxy exposing the /federate endpoint
  • Add tests
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

static+: [
{
user: {
name: 'system:serviceaccount:openshift-monitoring:prometheus-k8s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This seems to repeat at various place. Perhaps we should store this in a variable / config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other reference to this user is here https://github.com/openshift/cluster-monitoring-operator/blob/master/jsonnet/utils/generate-secret.libsonnet#L18

I think it not so repetitive as all the *rbac-secret.yaml files are generated from that function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its ok for @sthaha we can moving forward with this PR and maybe do a refactor in the future with other variables configs or SA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marioferh , nit can always be ignored 🤗

Signed-off-by: Mario Fernandez <mariofer@redhat.com>
@marioferh marioferh force-pushed the remove_dependency_apiserver_auth branch from b38647f to a280a8a Compare June 20, 2023 09:14
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 20, 2023
@marioferh
Copy link
Contributor Author

/hold
wait for more reviews

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 20, 2023
@raptorsun
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marioferh, raptorsun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [marioferh,raptorsun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 20, 2023

@marioferh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/versions a280a8a link false /test versions

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@marioferh
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 20, 2023
@openshift-merge-robot openshift-merge-robot merged commit 97f0462 into openshift:master Jun 20, 2023
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants