Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use kube-rbac-proxy for standalone kube-proxy metrics #839

Merged

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Oct 16, 2020

As with openshift-sdn and ovn-kubernetes, we should run the standalone kube-proxy metrics behind an authenticated proxy.

To avoid possible problems with upgrading old clusters, I made it conditional on using the new metrics port.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 16, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2020
@danwinship danwinship changed the title WIP Use kube-rbac-proxy for standalone kube-proxy metrics Use kube-rbac-proxy for standalone kube-proxy metrics Oct 16, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 16, 2020
@danwinship
Copy link
Contributor Author

/assign @juanluisvaladas

@squeed
Copy link
Contributor

squeed commented Oct 19, 2020

Can you think of a cleverer way to roll this out such that all clusters get this? There would be a bit of scraping failures as the ServiceMonitor and Daemonset race, but my instinct is that's not much of a problem. (kube-proxy monitoring is nice but not stupendously critical).

@danwinship
Copy link
Contributor Author

I don't really know a lot about monitoring, and I don't have much sense of what is and is not breaking.

@squeed
Copy link
Contributor

squeed commented Oct 21, 2020

Given that we don't have any alerts defined for kube-proxy (oops, we should, I'll file a ticket) then I think it's safe to just roll this out and accept a bit of disruption. I've asked the monitoring folks either way.

@squeed
Copy link
Contributor

squeed commented Oct 21, 2020

Yeah, TargetDown fires if scrape fails for more than 10 minutes; I think we don't need the affordance for old clusters.

@danwinship
Copy link
Contributor Author

Given that we don't have any alerts defined for kube-proxy (oops, we should, I'll file a ticket)

#819 adds them but I didn't want to merge that until we had CI for standalone kube-proxy (openshift/release#12502) which is blocked by this because we fail "Prometheus when installed on the cluster should start and expose a secured proxy and unsecured metrics"

We don't allow the user to change the SDN metrics port, but we
historically allowed them to manually specify the correct port (9101)
explicitly in the kube-proxy config.

However, this got broken when we switched to using the https proxy for
metrics; we need to be telling kube-proxy to use port 29101 now, but
if the user had specified 9101 explicitly in the config, we would have
told it to use that instead and broken everything.
Don't allow the user to explicitly specify the health/metrics port
values. While we have to continue to allow them to specify the old
ones if they were already doing it, for backward compatibility, don't
encourage it.

Also, fix the bug that we were now allowing openshift-sdn users to
choose the standalone-kube-proxy ports, though it wouldn't actually
work right.

Also, revert a change to kubeProxyConfiguration() that just made things
messy and doesn't work well with the next commit.
@danwinship
Copy link
Contributor Author

@squeed fixed to unconditionally deploy kube-rbac-proxy. I had to keep the possibility of continuing to use the old port though, since we allow the user to explicitly request that port in the operator config and it seems wrong to not use that port in that case? (But cleaned up some other stuff around specifying explicit ports in the kube-proxy config...)

@squeed
Copy link
Contributor

squeed commented Oct 21, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 21, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@danwinship
Copy link
Contributor Author

/hold
/refresh

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 21, 2020
@openshift-merge-robot
Copy link
Contributor

@danwinship: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovn-step-registry 9c683fb link /test e2e-ovn-step-registry
ci/prow/e2e-vsphere-ovn 9c683fb link /test e2e-vsphere-ovn
ci/prow/e2e-ovn-hybrid-step-registry 9c683fb link /test e2e-ovn-hybrid-step-registry

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@danwinship
Copy link
Contributor Author

/override ci/prow/e2e-aws-sdn-multi
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 21, 2020
@openshift-ci-robot
Copy link
Contributor

@danwinship: Overrode contexts on behalf of danwinship: ci/prow/e2e-aws-sdn-multi

In response to this:

/override ci/prow/e2e-aws-sdn-multi
/hold cancel

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit f5bb473 into openshift:master Oct 21, 2020
@danwinship danwinship deleted the kube-proxy-rbac-proxy branch February 3, 2022 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants