Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update http.Transport if it already exists in ExecProvider #66395

Merged
merged 1 commit into from Jul 26, 2018

Conversation

@awly
Copy link
Contributor

awly commented Jul 19, 2018

What this PR does / why we need it:
This unbreaks ExecPlugin. Without the change, we hit this error
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/transport/transport.go#L32

Release note:

Fix kubelet startup failure when using ExecPlugin in kubeconfig
@awly

This comment has been minimized.

Copy link
Contributor Author

awly commented Jul 19, 2018

/assign @mikedanese
/cc @liggitt

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Jul 19, 2018

/hold
this reverts the fix in #63492 and removes the ability to tear down kubelet connections on heartbeat failure. it's fine if we want to compose these two things differently, but we cannot reintroduce the kubelet TCP heartbeat hang issue.

@awly awly force-pushed the awly:fix-kubelet-exec-plugin-startup branch from 5262e8e to 4471282 Jul 19, 2018

@k8s-ci-robot k8s-ci-robot added size/M and removed size/L labels Jul 19, 2018

@awly

This comment has been minimized.

Copy link
Contributor Author

awly commented Jul 19, 2018

@liggitt good point, not sure how i forgot about that :|

Changed to handle existing Transport in ExecProvider and use stdlib GetCertificate version.
WDYT?

@awly awly changed the title Only call kubeletcertificate.UpdateConfig if cert rotation is enabled Update http.Transport if it already exists in ExecProvider Jul 19, 2018

cert, err := getCert()
if err != nil {
return nil, err
if t, ok := c.Transport.(*http.Transport); ok && t != nil {

This comment has been minimized.

@liggitt

liggitt Jul 20, 2018

Member

what does this do when run with --v=6? might need a func TransportFor(transport http.RoundTripper) (*http.Transport, error) helper in pkg/util/net to unwrap nested debugging transports (see DialerFor as an example)

This comment has been minimized.

@awly

awly Jul 20, 2018

Author Contributor

Just tested, --v=6 works fine.
But I agree with your point, someone can introduce a wrapper eventually, breaking this.
Added a TransportFor helper.

}
getCert := t.TLSClientConfig.GetClientCertificate
t.TLSClientConfig.GetClientCertificate = func(hi *tls.CertificateRequestInfo) (*tls.Certificate, error) {
// If previous GetCert is present and returns a valid non-nil

This comment has been minimized.

@liggitt

liggitt Jul 20, 2018

Member

trying to think through the implications if you use client cert rotation and an exec plugin... the client cert rotation code would always win. does that make sense?

This comment has been minimized.

@liggitt

liggitt Jul 20, 2018

Member

where are we checking if the returned cert is valid?

This comment has been minimized.

@awly

awly Jul 20, 2018

Author Contributor

where are we checking if the returned cert is valid?

refreshCredsLocked does validation. It fetches the actual certificate, validates and caches it. GetClientCertificate returns the cached copy.
I added cert expiry validation too in this PR, just in case.

trying to think through the implications if you use client cert rotation and an exec plugin... the client cert rotation code would always win. does that make sense?

Is cert rotation and exec plugin a valid configuration? Does it mean "fetch initial cert via plugin but then rotate manually"?
Our use case relies on exec plugin to bootstrap a fresh cert whenever needed and cert rotation in kubelet is disabled.

This comment has been minimized.

@liggitt

liggitt Jul 20, 2018

Member

refreshCredsLocked does validation.

I was referring to what the previous GetCert was returning ("If previous GetCert is present and returns a valid non-nil..."), not what the exec plugin returned

Is cert rotation and exec plugin a valid configuration?

I don't know, but this is allowing it :)

This comment has been minimized.

@awly

awly Jul 20, 2018

Author Contributor

I was referring to what the previous GetCert was returning ("If previous GetCert is present and returns a valid non-nil..."), not what the exec plugin returned

Ah, gotcha. I don't think it's this plugin's job to check that. If something returns a certificate here, it better be valid. Otherwise it's a bug.

I'd rather leave it as is, WDYT?

This comment has been minimized.

@mikedanese

mikedanese Jul 24, 2018

Member

For the sanity of the next person who tries to debug this, I think it would be better to return an error if GetClientCertificate is not nil. That would effectively disallow client cert rotation to be used with the exec plugin, right? I'd be happy to avoid this complexity.

This comment has been minimized.

@liggitt

liggitt Jul 24, 2018

Member

I think it would be better to return an error if GetClientCertificate is not nil. That would effectively disallow client cert rotation to be used with the exec plugin, right? I'd be happy to avoid this complexity.

that seems like a reasonable starting point. mixing the two gets weird.

This comment has been minimized.

@awly

awly Jul 24, 2018

Author Contributor

SGTM, made it an error in both cases

if getCert != nil {
cert, err := getCert(hi)
if err != nil {
return nil, err

This comment has been minimized.

@liggitt

liggitt Jul 20, 2018

Member

would we want to check the exec plugin in an error case? (I know this is pre-existing, just trying to understand the implications and whether they are different if we're dealing with an existing transport vs a GetCert func)

This comment has been minimized.

@awly

awly Jul 20, 2018

Author Contributor

Makes sense to me, added.
It's an unlikely configuration, but doesn't hurt.

@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Jul 20, 2018

@awly

This comment has been minimized.

Copy link
Contributor Author

awly commented Jul 20, 2018

@liggitt thanks for the comments, PTAL

@awly

This comment has been minimized.

Copy link
Contributor Author

awly commented Jul 20, 2018

FYI, I'd like to cherry-pick this into 1.11.2.

@mikedanese
Copy link
Member

mikedanese left a comment

The rest client keeps on giving. What if we added a WrapDialer field to the restclient.Config and used that to add the connection tracker? Would that be simpler?


// TransportFor returns the root *http.Transport behind all wrapper
// RoundTrippers. Wrappers must implement RoundTripperWrapper.
func TransportFor(transport http.RoundTripper) (*http.Transport, error) {
if transport == nil {

This comment has been minimized.

@mikedanese

mikedanese Jul 24, 2018

Member

when does this happen?

This comment has been minimized.

@awly

awly Jul 24, 2018

Author Contributor

This shifts the nil check from every caller to here. transport.Config.Transport can be nil with valid config, afaik (just not in kubelet)

}
getCert := t.TLSClientConfig.GetClientCertificate
t.TLSClientConfig.GetClientCertificate = func(hi *tls.CertificateRequestInfo) (*tls.Certificate, error) {
// If previous GetCert is present and returns a valid non-nil

This comment has been minimized.

@mikedanese

mikedanese Jul 24, 2018

Member

For the sanity of the next person who tries to debug this, I think it would be better to return an error if GetClientCertificate is not nil. That would effectively disallow client cert rotation to be used with the exec plugin, right? I'd be happy to avoid this complexity.

if certBlock == nil {
return errors.New("can't parse client certificate returned by exec plugin as PEM block")
}
x509Cert, err := x509.ParseCertificate(certBlock.Bytes)

This comment has been minimized.

@mikedanese

mikedanese Jul 24, 2018

Member

Use the cert on line 402 so you don't have to parse the pem twice.

This comment has been minimized.

@awly

awly Jul 24, 2018

Author Contributor

You mean manually parse ClientKeyData and build tls.Certificate via a literal?
It feels like more work and easy to miss some validations that tls.X509KeyPair does

if err != nil {
return fmt.Errorf("failed parsing x509 client certificate returned by exec plugin: %v", err)
}
if time.Now().After(x509Cert.NotAfter) {

This comment has been minimized.

@liggitt

liggitt Jul 24, 2018

Member

are we already checking time client-side elsewhere? this makes us vulnerable to client clock skew

This comment has been minimized.

@awly

awly Jul 24, 2018

Author Contributor

I don't know if we're checking it elsewhere. It should be kube-apiserver's responsibility to verify cert validity on incoming requests. Could you clarify how this makes us vulnerable?

@awly awly force-pushed the awly:fix-kubelet-exec-plugin-startup branch from 97ab95f to 527b23b Jul 25, 2018

@awly

This comment has been minimized.

Copy link
Contributor Author

awly commented Jul 25, 2018

@liggitt @mikedanese PTAL. Turns out we can just set restclient.Config.Dialer instead of Transport. This fixes the issue in my test cluster.

}
return a.cert()
if c.TLS.GetCert != nil {
return errors.New("can't add TLC certificate callback: transport.Config.TLS.GetCert already set")

This comment has been minimized.

@mikedanese

mikedanese Jul 25, 2018

Member

s/TLC/TLS/?

This comment has been minimized.

@awly

awly Jul 25, 2018

Author Contributor

done

@awly awly force-pushed the awly:fix-kubelet-exec-plugin-startup branch 2 times, most recently from e619672 to 7293fab Jul 25, 2018

Set connrotation dialer via restclient.Config.Dialer
Instead of Transport. This fixes ExecPlugin, which fails if
restclient.Config.Transport is set.

@awly awly force-pushed the awly:fix-kubelet-exec-plugin-startup branch from 7293fab to 3357b5e Jul 25, 2018

@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Jul 25, 2018

I'm reasonably happy with this.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 25, 2018

@awly

This comment has been minimized.

Copy link
Contributor Author

awly commented Jul 25, 2018

Moved cert expiry check out to #66632

@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Jul 26, 2018

/retest

1 similar comment
@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Jul 26, 2018

/retest

@liggitt

This comment has been minimized.

Copy link
Member

liggitt commented Jul 26, 2018

/lgtm
/hold cancel

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 26, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awly, liggitt, mikedanese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 26, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 26, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit cef2d32 into kubernetes:master Jul 26, 2018

17 of 18 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation awly authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

@awly awly deleted the awly:fix-kubelet-exec-plugin-startup branch Jul 26, 2018

k8s-github-robot pushed a commit that referenced this pull request Jul 31, 2018

Kubernetes Submit Queue
Merge pull request #66683 from awly/automated-cherry-pick-of-#66395-u…
…pstream-release-1.11

Automatic merge from submit-queue.

Automated cherry pick of #66395: Set connrotation dialer via restclient.Config.Dialer

Cherry pick of #66395 on release-1.11.

#66395: Set connrotation dialer via restclient.Config.Dialer
@marpaia

This comment has been minimized.

Copy link
Member

marpaia commented Jul 31, 2018

/sig cli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.