Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Dynamically reload kube-aggregator certificates #88120

Open
wants to merge 1 commit into
base: master
from

Conversation

@alenkacz
Copy link
Contributor

alenkacz commented Feb 13, 2020

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Dynamically load certificates for kube-aggregator

Which issue(s) this PR fixes:

Fixes #87766

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Kube-aggregator certificates are dynamically loaded on change from disk

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


}

// DynamicRestConfigProvider provides a restconfig and transport backed by dynamically loaded cert/key pair
type DynamicRestConfigProvider struct {

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 13, 2020

Author Contributor

this probably deserves some tests :) I'll add those after initial review validating the direction of this PR

@alenkacz alenkacz changed the title WIP: Dynamically reload proxy certs WIP: Dynamically reload kube-aggregator certificates Feb 13, 2020
@deads2k

This comment has been minimized.

Copy link
Contributor

deads2k commented Feb 13, 2020

/assign @p0lyn0mial

@alenkacz

This comment has been minimized.

Copy link
Contributor Author

alenkacz commented Feb 13, 2020

😂 I'll check out the failures, I have run only unit tests locally :)

@p0lyn0mial

This comment has been minimized.

Copy link
Contributor

p0lyn0mial commented Feb 14, 2020

What happens to the old cert? It is still valid, right?


var _ dynamiccertificates.Listener = &DynamicRestConfigProvider{}

func NewDynamicRestConfigProvider(servingContent dynamiccertificates.CertKeyContentProvider, egressSelector *egressselector.EgressSelector, proxyTransport *http.Transport, insecure bool, serverName string, caBundle []byte) (*DynamicRestConfigProvider, error) {

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 14, 2020

Contributor

it has to be dynamic, serverName, caBundle are not constant, what if they have changed?

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 14, 2020

Author Contributor

based on my understanding, these are provided when kube-aggregator starts so they cannot change without the process being restarted. But I might have missed something...

This comment has been minimized.

Copy link
@p0lyn0mial
}

func (c *DynamicRestConfigProvider) syncRestConfig() error {
cert, key := c.certKeyPair.CurrentCertKeyContent()

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 14, 2020

Contributor

it will create a new config every second even though nothing has changed, are they any drawback?

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 14, 2020

Author Contributor

yeah I should probably add checking whether the value changed, good point. I'll change that

@@ -154,13 +138,26 @@ func (r *proxyHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
newReq, cancelFn := newRequestForProxy(location, req)
defer cancelFn()

if handlingInfo.proxyRoundTripper == nil {
roundTripper, err := handlingInfo.dynamicRestConfig.GetTransport()

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 14, 2020

Contributor

this is a hot path, now it depends on an external component, for example, it can return an error.

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 14, 2020

Author Contributor

any suggestions how to resolve this?

}

// DynamicRestConfigProvider provides a restconfig and transport backed by dynamically loaded cert/key pair
type DynamicRestConfigProvider struct {

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 14, 2020

Contributor

Thinking if we really need a controller, what would be an alternative design?

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 14, 2020

Author Contributor

yeah I am open to other ideas, I pretty much kept this in line with how are similar problems solved in apiserver (so the dynamiccertificates package)

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 14, 2020

Contributor

How about:

  1. Adding a function to APIAggregator that would return a cert and a key
  2. Adding a new method to APIAggregator that would call updateAPIService for all proxyHandlers and would use a func from 1.
  3. Wire Notify method to call the new function from 2

Alternatively:

  1. Adding a new method to APIAggregator that would call updateAPIService for all proxyHandlers and would accept a cert and a key
  2. Wire Notify method to call the new function from 1 and providing a cert and a key

Doing this that way we don't modify the hot path and we take potential changes in services into account. It should work if the old cert is still valid. What do you think?

@sttts FYI

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 17, 2020

Author Contributor

@p0lyn0mial that solves it for proxy_handlers, and then available_controller will have to implement the exact same thing (that's why I went the way of extracting that logic outside, so both places can reuse it).

If we go and extract it then the only difference between your proposal and current implementation is whether it uses the workqueue or not. I like reusing workqueue since it's already proven piece of code that solves the async update for me.

No hard opinion here though, just putting out here why I went the way I went. I am totally fine shifting the direction :-)

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 17, 2020

Contributor

It seems that available_controller only needs a restConfig - slightly different one than proxy_handlers. I think that before it didn't use MTLS, now it does. I don't know why it was like that before but it should work with MTLS as well, right?

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 17, 2020

Contributor

alright, let's try with the controller, left some comments.

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 17, 2020

Author Contributor

it does not use mtls as I pass insecure=true here https://github.com/kubernetes/kubernetes/pull/88120/files#diff-d92af716cdd9489e3f57d61b53adad72R216 ... should I make that more obvious somehow?

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 17, 2020

Author Contributor

I think I got the confusion around "controller" now, the Run method I think is definitely something we don't need, it was even unused. I removed it, so now it really just asynchronously processes notifications.

if err != nil {
return err
}
discoveryClient := &http.Client{

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 17, 2020

Contributor

creating an http client on every sync doesn't seem right, we should reuse it if restConfig hasn't changed.

}

// DynamicRestConfigProvider provides a restconfig and transport backed by dynamically loaded cert/key pair
type DynamicRestConfigProvider struct {

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 17, 2020

Contributor

It seems that available_controller only needs a restConfig - slightly different one than proxy_handlers. I think that before it didn't use MTLS, now it does. I don't know why it was like that before but it should work with MTLS as well, right?

// start timer that rechecks every minute, just in case. this also serves to prime the controller quickly.
go wait.Until(func() {
c.Enqueue()
}, 1*time.Minute, stopCh)

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 17, 2020

Contributor

do we have to call it every minute if it already supports dynamiccertificates.Notifier ?

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 17, 2020

Author Contributor

no :) I actually think we might not need the whole Run method 🤔 removing it...

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 17, 2020

Contributor

what if the underlying dynamic file read doesn't support "notification" ?:) I think we need to cover both paths.

@@ -301,14 +304,18 @@ func (s *APIAggregator) AddAPIService(apiService *v1.APIService) error {
proxyPath = "/api"
}

restConfigProvider, err := util.NewDynamicRestConfigProvider(s.certKeyContentProvider, s.egressSelector, s.proxyTransport, apiService.Spec.InsecureSkipTLSVerify, apiService.Spec.Service.Name + "." + apiService.Spec.Service.Namespace + ".svc", apiService.Spec.CABundle)

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 17, 2020

Contributor

we need to stop it when RemoveAPIService was called.

This comment has been minimized.

Copy link
@alenkacz

alenkacz Feb 17, 2020

Author Contributor

oh good point... but right now when I removed the Run method it's actually just implementing Notifier, so there's nothing to stop anymore, correct?

The only think that's "running" right now is the dynamic file reader and that is reused across all restconfig providers so that should stop with the whole process going away

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 17, 2020

Contributor

hmm okay, but we still use up the memory we should clean up, wdyt?

@alenkacz alenkacz force-pushed the alenkacz:av/aggregator-cert-reload branch 3 times, most recently from 7ee8963 to 7e57a28 Feb 17, 2020
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 18, 2020

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alenkacz
To complete the pull request process, please assign deads2k
You can assign the PR to them by writing /assign @deads2k in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fedebongio

This comment has been minimized.

Copy link
Contributor

fedebongio commented Feb 18, 2020

/assign @deads2k

@alenkacz alenkacz force-pushed the alenkacz:av/aggregator-cert-reload branch 2 times, most recently from cf9c206 to 4f4c506 Feb 19, 2020
@alenkacz alenkacz force-pushed the alenkacz:av/aggregator-cert-reload branch 3 times, most recently from 8cd6074 to 968f9b5 Feb 19, 2020
} else if c.proxyTransport != nil && c.proxyTransport.DialContext != nil {
restConfig.Dial = c.proxyTransport.DialContext
}
transport, err := restclient.TransportFor(restConfig)

This comment has been minimized.

Copy link
@p0lyn0mial

p0lyn0mial Feb 19, 2020

Contributor

I think that it will eventually create 'transport.TLSConfig' struct. Could you check if we could use GetCert function to dynamically inject the client certificate? Then DynamicRestConfigProvider could provide that function for 'available_controller' and 'available_controller'. WDYT?

@alenkacz alenkacz force-pushed the alenkacz:av/aggregator-cert-reload branch 2 times, most recently from e410524 to da3d42a Feb 20, 2020
@alenkacz alenkacz force-pushed the alenkacz:av/aggregator-cert-reload branch from da3d42a to c565ce0 Feb 20, 2020
@alenkacz alenkacz force-pushed the alenkacz:av/aggregator-cert-reload branch from c565ce0 to 8a29120 Feb 20, 2020
@alenkacz alenkacz force-pushed the alenkacz:av/aggregator-cert-reload branch from 8a29120 to de12785 Feb 20, 2020
@k8s-ci-robot k8s-ci-robot added size/XL and removed size/L labels Feb 20, 2020
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 20, 2020

@alenkacz: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-dependencies de12785 link /test pull-kubernetes-dependencies
pull-kubernetes-e2e-kind de12785 link /test pull-kubernetes-e2e-kind
pull-kubernetes-integration de12785 link /test pull-kubernetes-integration
pull-kubernetes-e2e-kind-ipv6 de12785 link /test pull-kubernetes-e2e-kind-ipv6
pull-kubernetes-e2e-gce de12785 link /test pull-kubernetes-e2e-gce
pull-kubernetes-verify de12785 link /test pull-kubernetes-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.