Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e regression tests for the kubelet being secure #64140

Merged
merged 2 commits into from
Jun 21, 2018

Conversation

dixudx
Copy link
Member

@dixudx dixudx commented May 22, 2018

What this PR does / why we need it:
This PR does,

  1. The kubelet cAdvisor port (4194) can't be reached, neither via the API server proxy nor directly on the public IP address
  2. The kubelet read-only port (10255) can't be reached, neither via the API server proxy nor directly on the public IP address
  3. The kubelet can delegate ServiceAccount tokens to the API server
  4. The kubelet's main port (10250) has both authentication (should fail with no credentials) and authorization (should fail with insufficient permissions) set-up

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes kubernetes/kubeadm#838

Special notes for your reviewer:
/cc luxas tallclair
Release note:

Add e2e regression tests for the kubelet being secure

@k8s-ci-robot k8s-ci-robot requested a review from luxas May 22, 2018 09:05
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 22, 2018
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2018
@dixudx
Copy link
Member Author

dixudx commented May 22, 2018

/cc @kubernetes/sig-node-proposals @kubernetes/sig-auth-proposals

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/design Categorizes issue or PR as related to design. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels May 22, 2018
Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great overall.
Thanks for this valuable contribution @dixudx 👍!
I left a couple of initial comments.

I'd love a review from @kubernetes/sig-auth-pr-reviews, and a comment from @kubernetes/sig-architecture-pr-reviews regarding eventually graduating It("Should make the kubelet's main port 10250 enforce authentication for client requests") a Conformance test as @tallclair commented in kubernetes/kubeadm#838 (comment)

test/e2e/auth/node_authn.go Outdated Show resolved Hide resolved
test/e2e/auth/node_authn.go Outdated Show resolved Hide resolved
Expect(err).NotTo(HaveOccurred())
Expect(len(nodeList.Items)).NotTo(Equal(0))
nodeName = nodeList.Items[0].Name
sa, err := f.ClientSet.CoreV1().ServiceAccounts(ns).Get("default", metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment why you do this. (I think it's to make sure the ServiceAccount admission controller is enabled etc., so secret generation on SA creation works, right?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Add a comment is better.

test/e2e/auth/node_authn.go Outdated Show resolved Hide resolved
@@ -179,4 +181,18 @@ var _ = SIGDescribe("[Feature:NodeAuthorizer]", func() {
err := c.CoreV1().Nodes().Delete("foo", &metav1.DeleteOptions{})
Expect(apierrors.IsForbidden(err)).Should(Equal(true))
})

framework.ConformanceIt("The kubelet's main port 10250 should fail with the Forbidden error", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add conformance here please. You can add a comment though to maybe do it in the future though.
cc @tallclair

test/e2e/lifecycle/kubelet_security.go Outdated Show resolved Hide resolved

prefix := "/api/v1"
// make sure kubelet readonly (10255) and cadvisor (4194) ports are disabled via API server proxy
It("should not be able to contact the readonly port 10255", func() { portProxyDisabledTest(f, prefix+"/nodes/", ":10255/pods/") })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: should not be able to proxy to the readonly kubelet port 10255 using proxy subresource to be consistent with what you have below

test/e2e/lifecycle/kubelet_security.go Outdated Show resolved Hide resolved
test/e2e/lifecycle/kubelet_security.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label May 22, 2018
Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @dixudx!!
I'm fine for the moment, the high level functionality is there, deferring the rest of the review to SIG Auth reviewers.
I'm gonna try running these locally in a minute to make sure everything actually works 😄

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 23, 2018
@liggitt
Copy link
Member

liggitt commented May 23, 2018

while I like the idea of exercising the described tests, these e2e tests don't actually do that. the only reason CI is green with them added is because they're behind feature tags that don't actually run them (or the API server happens to return the same response we were hoping to see from the kubelet)

@tallclair
Copy link
Member

I recommend proxying the tests through shell commands run in a container and pointed at the kubelets cluster-internal IP address. See the apparmor test for an example of how you might do this:
https://github.com/kubernetes/kubernetes/blob/master/test/e2e/common/apparmor.go#L60

k8s-github-robot pushed a commit that referenced this pull request May 24, 2018
Automatic merge from submit-queue (batch tested with PRs 64174, 64187, 64216, 63265, 64223). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm: Improve the kubelet default configuration security-wise

**What this PR does / why we need it**:
 - Disables the readonly port for the kubelets in the cluster
 - Enables delegated SA token authentication for the secure kubelet port (GCE also did this ref: #58178)
 - Follows up #63912 to move the last flag from the system dropin to the ComponentConfig

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/kubeadm#732
Fixes kubernetes/kubeadm#650
Replaces #57997

**Special notes for your reviewer**:
In order to make sure this actually works, or that clusters actually are secure, we're adding e2e tests for this: kubernetes/kubeadm#838 & #64140
Depends on #63912

**Release note**:

```release-note
[action required] kubeadm: kubelets in kubeadm clusters now disable the readonly port (10255). If you're relying on unauthenticated access to the readonly port, please switch to using the secure port (10250). Instead, you can now use ServiceAccount tokens when talking to the secure port, which will make it easier to get access to e.g. the `/metrics` endpoint of the kubelet securely.
```
@kubernetes/sig-cluster-lifecycle-pr-reviews 
@kubernetes/sig-auth-pr-reviews FYI
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2018
@tallclair tallclair removed this from the v1.11 milestone Jun 1, 2018
@tallclair
Copy link
Member

Yeah, this isn't a conformance test, so I don't think it needs to be tied to a version. We can merge it when it's ready.

@dixudx
Copy link
Member Author

dixudx commented Jun 2, 2018

@jberkus @tallclair Addressed the comments. PTAL. Thanks.

Expect(len(nodeList.Items)).NotTo(BeZero())

pickedNode := nodeList.Items[0]
// Internal IPs are meaningless from the e2e test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment only applies to the test runner. Since you're execing both the tests through a pod, you can test the internal IPs here.

Expect(len(sa.Secrets)).NotTo(BeZero())
})

It("The kubelet's main port 10250 should fail with no credentials", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/fail/reject requests/

"--insecure",
fmt.Sprintf("https://%s:%v/metrics", nodeIP, ports.KubeletPort),
},
"401",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to check for 401 OR 403, that way you're not depending on a specific implementation as much.

"-s",
"-I",
"--insecure",
fmt.Sprintf(`--header "Authorization: Bearer %s"`, "`cat /var/run/secrets/kubernetes.io/serviceaccount/token`"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl doesn't know how to interpret backticks - you need to run this command in a shell. I.e.:

[]string{"sh", "-c", "curl -sI --insecure --header "Authorization: Bearer `cat /var/run/secrets/kubernetes.io/serviceaccount/token`" + url}

fmt.Sprintf(`--header "Authorization: Bearer %s"`, "`cat /var/run/secrets/kubernetes.io/serviceaccount/token`"),
fmt.Sprintf("https://%s:%v/metrics", nodeIP, ports.KubeletPort),
},
"403",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or 401

Expect(statusCode).NotTo(Equal(http.StatusOK))
})
It("should not be able to proxy to cadvisor port 4194 using proxy subresource", func() {
result, err := framework.NodeProxyRequest(f.ClientSet, nodeName, "proxy/containers/", 4194)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the path should include proxy here?

@tallclair
Copy link
Member

There seems to be some confusion about what internal IPs and external IPs are.

External IPs are available to the entire internet.
Internal IPs are from a private address space, meaning they are only meaningful within the private network. In our case, that means running in the cluster.

The E2E test runner runs outside the cluster being tested, which means that it can only see the external IP addresses (actually, it can see the internal IP addresses, but they are resolved within the test runner's private network, so they probably resolve to different devices, if at all). When you run a pod, it runs in the cluster, meaning the pod can see the internal addresses, this is why an tests that look at the internal IPs should be run through a pod.

Hopefully that clears it up?

pod := createNodeAuthTestPod(f)
for _, nodeIP := range nodeIPs {
// Anonymous authentication is disabled by default
result, err := framework.LookForStringInPodExec(pod.Namespace,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to use this method if you're not looking for the string. How about just:

result := framework.RunHostCmdOrDie(ns, pod.Name, fmt.Sprintf("curl -sIk -o /dev/null -w '%{http_code}' https://%s:%v/metrics", nodeIP, ports.KubeletPort))
Expect(result).To(Or(Equal("401"), Equal("403")), "the kubelet's main port 10250 should reject requests with no credentials")

pod := createNodeAuthTestPod(f)

for _, nodeIP := range nodeIPs {
result, err := framework.LookForStringInPodExec(pod.Namespace,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment.

time.Minute)
Expect(err).NotTo(HaveOccurred())

if !strings.Contains(result, "403") && !strings.Contains(result, "401") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expect(result).To(And(BeNumerically(">=", 200), BeNumerically("<", 300)), ...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly care for the gomega DSL, so feel free to do these checks explicitly, but semantically this is the condition you should check for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to simplify this to just: Expect(result).To(Equal("200"), ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're using a new SA token when sending requests to the kubelet. The expected result should be 403, and authentication is okay.

Expect(result).To(Equal("200"), ...)

@tallclair I don't quite understand where 200 comes.


pickedNode := nodeList.Items[0]
// Here we only need to care about Internal IP, since the pods running in the cluster
// can see the internal addresses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still doesn't make sense to me. Why did you drop the external IPs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we add external IPs back again? I was testing both internal IPs and external IPs before.

BeforeEach(func() {
nodes := framework.GetReadySchedulableNodesOrDie(f.ClientSet)
Expect(len(nodes.Items)).NotTo(BeZero())
node = &nodes.Items[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0th node is usually the master, which might be configured differently. Still a good idea to check it though, but maybe pick a few (or all of them)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually all the nodes here are schedulable and ready. Usually the master node is tainted, which will be filtered due to unschedulable.

So the 0th node is usually not the master. And 0th node is used widely in our e2e test scenarios.

// Pick a node where all pods will run.
nodes := framework.GetReadySchedulableNodesOrDie(f.ClientSet)
Expect(len(nodes.Items)).NotTo(BeZero(), "No available nodes for scheduling")
node := &nodes.Items[0]

@dixudx dixudx force-pushed the add_e2e_kubelet_port branch 2 times, most recently from 27c98bf to f71ad8c Compare June 6, 2018 03:02
@luxas luxas removed the kind/design Categorizes issue or PR as related to design. label Jun 6, 2018
@luxas luxas added this to the v1.11 milestone Jun 6, 2018
@liggitt
Copy link
Member

liggitt commented Jun 6, 2018

per #64140 (comment) this isn't tied to v1.11 or blocking the release

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.11 milestone Jun 6, 2018
@dixudx
Copy link
Member Author

dixudx commented Jun 20, 2018

@luxas @tallclair Needs your lgtm. Thanks.

@tallclair
Copy link
Member

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dixudx, luxas, tallclair, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dixudx
Copy link
Member Author

dixudx commented Jun 21, 2018

@tallclair @luxas Seems the bot does not want to lgtm.

@luxas luxas added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 21, 2018
@k8s-github-robot
Copy link

/test all

Tests are more than 96 hours old. Re-running tests.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 21, 2018

@dixudx: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-bazel-test 924df8a link /test pull-kubernetes-bazel-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 64140, 64898, 65022, 65037, 65027). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 0b3af19 into kubernetes:master Jun 21, 2018
@dixudx dixudx deleted the add_e2e_kubelet_port branch June 21, 2018 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add e2e regression tests for the kubelet being secure
8 participants