Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-apiserver: healthcheck via sidecar container #9069

Merged
merged 1 commit into from
May 7, 2020

Conversation

justinsb
Copy link
Member

@justinsb justinsb commented May 5, 2020

kube-apiserver doesn't expose the healthcheck via a dedicated
endpoint, instead relying on anonyomous-access being enabled. That
has previously forced us to enable the unauthenticated endpoint on
127.0.0.1:8080.

Instead we now run a small sidecar container, which
proxies /healthz and /readyz requests (only) adding appropriate
authentication using a client certificate.

This will also enable better load balancer checks in future, as these
have previously been hampered by the custom CA certificate.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 5, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/api area/nodeup approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 5, 2020
@justinsb justinsb force-pushed the healthcheck branch 2 times, most recently from 2ebe0c7 to e755771 Compare May 5, 2020 17:55
Makefile Outdated
mkdir -p ${BAZELIMAGES}
DOCKER_REGISTRY="" DOCKER_IMAGE_PREFIX="kope/" KUBE_APISERVER_HEALTHCHECK_TAG=${KUBE_APISERVER_HEALTHCHECK_TAG} bazel build ${BAZEL_CONFIG} --platforms=@io_bazel_rules_go//go/toolchain:linux_amd64 //cmd/kube-apiserver-healthcheck:image-bundle.tar.gz //cmd/kube-apiserver-healthcheck:image-bundle.tar.gz.sha1 //cmd/kube-apiserver-healthcheck:image-bundle.tar.gz.sha256
cp -fp bazel-bin/cmd/kube-apiserver-healthcheck/image-bundle.tar.gz ${BAZELIMAGES}/kube-apiserver-healthcheck.tar.gz
cp -fp bazel-bin/cmd/kube-apiserver-healthcheck/image-bundle.tar.gz.sha1 ${BAZELIMAGES}/kube-apiserver-healthcheck.tar.gz.sha1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to create and ship SHA-1 hashes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's historical and we began migrating to sha256 about a year ago. I'm not sure on the specifics, we can probably talk about a deprecation at some point, however I don't think this PR is the right time to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - it's a new file, let's try removing the sha1, then we can find out if anyone is still using it (they shouldn't be!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - our responses crossed. The sha256 support (eca2ac6) should be in kops 1.15, so I am going ahead and removing the sha1. But if it gives us trouble, let's add it back ...

resp, err := httpClient.Do(req)
if err != nil {
klog.Infof("error from %s: %v", target, err)
http.Error(w, "internal error", http.StatusInternalServerError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http.StatusBadGateway would be more appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

func (s *healthCheckServer) handler(w http.ResponseWriter, r *http.Request) {
path := r.URL.Path
path = strings.TrimPrefix(path, "/")
tokens := strings.Split(path, "/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My druthers would be to just select(r.URL.Path) and leave out any parsing of the path. There's a risk of a discrepancy between the parsing here and the parsing in proxyRequest, leading to request injection vulnerabilities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - generally reworked to avoid all parsing!

func (s *healthCheckServer) httpClient() *http.Client {
transport := &http.Transport{
TLSClientConfig: s.tlsConfig,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems expensive to create a new Transport for each incoming query. Shouldn't this be one per healthCheckServer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fear is that they contaminate each other (e.g. cookies), but given our use case, you're right - simplifying.

signerCertificate = cert
}

privateKey, err := NewPrivateKey(2048)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use pki.GeneratePrivateKey() and don't hardcode the key size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about creating simpler methods, but looking at pki, that's exactly what you've already done!

}

klog.Infof("signing certificate for %q", id)
cert, err := NewSignedClientCert(id, privateKey, signerCertificate.Certificate, signer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pki.SignNewCertificate()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

@@ -164,6 +178,13 @@ func (a *AssetBuilder) RemapImage(image string) (string, error) {
}
}

if strings.HasPrefix(image, "kope/kube-apiserver-healthcheck:") {
override := os.Getenv("KUBE_APISERVER_HEALTHCHECK_IMAGE")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add export KUBE_APISERVER_HEALTHCHECK_IMAGE= to hack/update-expected.sh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

})
}

clientKey, clientCert, err := b.buildClientKeypair(id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have buildClientKeypair() return a pki.Certificate and then use its AsBytes(). Use the pattern in KubeletBuilder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - done!

Copy link
Contributor

@mikesplain mikesplain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this looks good to me. Holding on the lgtm since John had some comments, however I think some of those can be cleaned up over time as needed.

Makefile Outdated
mkdir -p ${BAZELIMAGES}
DOCKER_REGISTRY="" DOCKER_IMAGE_PREFIX="kope/" KUBE_APISERVER_HEALTHCHECK_TAG=${KUBE_APISERVER_HEALTHCHECK_TAG} bazel build ${BAZEL_CONFIG} --platforms=@io_bazel_rules_go//go/toolchain:linux_amd64 //cmd/kube-apiserver-healthcheck:image-bundle.tar.gz //cmd/kube-apiserver-healthcheck:image-bundle.tar.gz.sha1 //cmd/kube-apiserver-healthcheck:image-bundle.tar.gz.sha256
cp -fp bazel-bin/cmd/kube-apiserver-healthcheck/image-bundle.tar.gz ${BAZELIMAGES}/kube-apiserver-healthcheck.tar.gz
cp -fp bazel-bin/cmd/kube-apiserver-healthcheck/image-bundle.tar.gz.sha1 ${BAZELIMAGES}/kube-apiserver-healthcheck.tar.gz.sha1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's historical and we began migrating to sha256 about a year ago. I'm not sure on the specifics, we can probably talk about a deprecation at some point, however I don't think this PR is the right time to do it.

@justinsb justinsb force-pushed the healthcheck branch 2 times, most recently from 97b3f82 to 8a99603 Compare May 6, 2020 14:32
@k8s-ci-robot k8s-ci-robot added area/documentation area/provider/aws Issues or PRs related to aws provider labels May 6, 2020
Makefile Outdated Show resolved Hide resolved
Scheme: "https",
Host: "127.0.0.1",
Path: r.URL.Path,
RawQuery: r.URL.RawQuery,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to pass through an unvetted RawQuery? It seems risky to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's not trivial.. https://github.com/kubernetes/kubernetes/blob/08e1fd3bb947faf465e8a67d5c7106dbd10840c0/cluster/gce/manifests/kube-apiserver.manifest#L39

I don't know what validation we could / should do here. We could whitelist some queries I guess but I think that'll confuse people more. I imagine two things are coming:

  1. exposing this for use as a load balancer health-check (i.e. publicly)
  2. using the richer readyz / livez queries as in the upstream manifest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to parse the query and only pass through parameters we recognize as safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at apiserver code, I see at least exclude and verbose, but that http.Request gets passed around all over the place.

So it's a risk assessment as to how likely it is that apiserver would have or later add an important authorization check on a parameter to those paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we parse and pass through exclude? I worry about information leakage from verbose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea - I implemented that. I also made it more testable and added a test!

// KeyEncipherment allows the cert/key pair to be used to encrypt
// keys, including the symmetric keys negotiated during TLS setup
// and used for data transfer.
template.KeyUsage = template.KeyUsage | x509.KeyUsageKeyEncipherment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, x509.KeyUsageDigitalSignature | x509.KeyUsageKeyEncipherment is the default for pki.SignNewCertificate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - cool. I suggest leaving it for now? I think this function and the similar one to generate a kubeconfig end up being utils.

You can also see the beginnings of a thought I'm exploring here, which is putting less logic into nodeup. The only parameter here is really the path and the CN, we could imagine this just being in the NodeUpConfig (or similar), meaning we would have more insight into exactly what nodeup does (because it would do less)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can leave for now. I'll look into extracting a function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I don't think it's urgent, I just figured that's one way we could go, so it might not matter too much if we only have two of these (kubeconfig & "raw")

@justinsb justinsb force-pushed the healthcheck branch 2 times, most recently from 8a34253 to 6b11d8b Compare May 7, 2020 03:33
@justinsb
Copy link
Member Author

justinsb commented May 7, 2020

Cleaned up the request processing in the proxy & added some tests

t.Fatalf("unexpected method %q", out.Method)
}
actual = out.URL.String()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test wouldn't catch a bug where mapToProxyRequest returns an Request containing a URL of the empty string when it should be returning nil.

kube-apiserver doesn't expose the healthcheck via a dedicated
endpoint, instead relying on anonyomous-access being enabled.  That
has previously forced us to enable the unauthenticated endpoint on
127.0.0.1:8080.

Instead we now run a small sidecar container, which
proxies /healthz and /readyz requests (only) adding appropriate
authentication using a client certificate.

This will also enable better load balancer checks in future, as these
have previously been hampered by the custom CA certificate.

Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
@rifelpet
Copy link
Member

rifelpet commented May 7, 2020

/test pull-kops-e2e-kubernetes-aws

@johngmyers
Copy link
Member

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api area/documentation area/nodeup area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants