Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restricted Pod E2E tests #109946

Merged
merged 2 commits into from May 25, 2022
Merged

Restricted Pod E2E tests #109946

merged 2 commits into from May 25, 2022

Conversation

tallclair
Copy link
Member

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

  1. Add a new MixinRestrictedPodSecurity and corresponding MustMixinRestrictedPodSecurity utilities which set the required fields to pass the restricted pod security checks, and verify that the given pod does in fact meet the requirements. These are complementary to the existing GetRestricted{Pod,Container}SecurityContext functions, but in most cases require fewer changes.
  2. Update the test/e2e/common/node/pods.go tests to set the restricted namespace label and use the new utilities.

Does this PR introduce a user-facing change?

NONE

/area test
/assign @liggitt @s-urbaniak

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label May 10, 2022
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 10, 2022
@tallclair tallclair changed the title Ps e2e Restricted Pod E2E tests May 10, 2022
@tallclair tallclair marked this pull request as draft May 10, 2022 18:18
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 10, 2022
@tallclair tallclair changed the title Restricted Pod E2E tests WIP: Restricted Pod E2E tests May 10, 2022
@tallclair
Copy link
Member Author

Sorry, looks like I have a little more work to do on this.

@tallclair tallclair marked this pull request as ready for review May 10, 2022 21:45
@tallclair tallclair force-pushed the ps-e2e branch 3 times, most recently from 2b0b97f to 02c1c0d Compare May 12, 2022 17:10
@tallclair tallclair changed the title WIP: Restricted Pod E2E tests Restricted Pod E2E tests May 12, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 12, 2022
@@ -235,12 +235,12 @@ var _ = SIGDescribe("Pods", func() {
Spec: v1.PodSpec{
Containers: []v1.Container{
{
Name: "nginx",
Image: imageutils.GetE2EImage(imageutils.Nginx),
Name: "pause",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does changing from nginx to pause change the lifecycle characteristics of the pod (w.r.t. handling shutdown signals)? same question applies in multiple places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Both the pause container and nginx handle SIGINT and SIGTERM as a fast shutdown, which I think is what containerd uses. nginx additionally handles SIGQUIT, but I don't see any references to SIGQUIT in k/k, containerd, or cri-o.

Maybe @SergeyKanzhelev or @endocrimes have additional context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't cover any corner cases of shutdown so this seems ok to me 👍

// GetRestrictedPodSecurityContext returns a minimal restricted pod security context.
func GetRestrictedPodSecurityContext() *v1.PodSecurityContext {
return &v1.PodSecurityContext{
RunAsNonRoot: pointer.BoolPtr(true),
RunAsUser: pointer.Int64(nonRootUser),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure about this... this isn't really a minimal restricted pod security context, right? shouldn't runAsNonRoot: true be sufficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of my goal with these methods was that most e2e test authors can just apply MixinRestrictedPodSecurity and not think about the restricted profile at all. If we don't set RunAsUser here (and in the mixin) then for all cases other than the pause pod the test author will need to manually set a RunAsUser. They'll probably pick something random that gets cargo-culted around.

What's your argument against this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't set RunAsUser here (and in the mixin) then for all cases other than the pause pod the test author will need to manually set a RunAsUser

Would they have to set it, or do you mean because the container image would have to be built to run with a non-root uid, which is uncommon?

What's your argument against this?

It would break images that expect to run with a particular non-zero uid at runtime, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched through our test image manifests, and the only ones that set a user are:

  • pause: user doesn't matter
  • nonroot: all the tests using this override the user anyway
  • windows containers: don't support Restricted for now (added TODO)

I'd prefer that the tests which require a specific user are explicit about it, rather than having all the tests that don't have an opinion set something arbitrary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than having all the tests that don't have an opinion set something arbitrary

but… isn't this setting something arbitrary (1000) on all tests that don't have an opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is runAsUser generally needed as boilerplate, though? I thought runAsNonRoot was supposed to be sufficient.

If an e2e wanted to exercise runAsNonRoot functionality with an image that did not run as root by default and didn't want to set a runAsUser, that would mean they couldn't use GetRestrictedPodSecurityContext or MustMixinRestrictedPodSecurity to do so, because those wouldn't actually set the minimal securityContext that is compatible with the restricted PSS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. IMO, if a test is explicitly testing the runAsNonRoot functionality (with or without a runAsUser set), then the test should be explicitly writing & managing the SecurityContext, and not deferring that to a helper.
  2. There should probably be 1-2 E2E tests explicitly checking runAsNonRoot behavior. I'd prefer to optimize these convenience methods for the common path, and force those couple of tests to be write the pod manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough. Make sure the docs don't claim to produce a minimal valid restricted securityContext and doc the uid chosen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning makes sense to me 👍 - A comment summarizing this would go a long way for future readers of the framework code here tho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comments.

test/e2e/framework/pod/utils.go Outdated Show resolved Hide resolved
test/e2e/framework/pod/utils.go Show resolved Hide resolved
@endocrimes
Copy link
Member

/assign

Copy link
Member

@endocrimes endocrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

looks good from a node pov, just needs the comment/doc updates

@tallclair
Copy link
Member Author

Added comments about the default non-root user. Also squashed commits & rebased.

@tallclair
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2022
@liggitt
Copy link
Member

liggitt commented May 25, 2022

/test pull-kubernetes-e2e-capz-conformance

func GetRestrictedPodSecurityContext() *v1.PodSecurityContext {
return &v1.PodSecurityContext{
RunAsNonRoot: pointer.BoolPtr(true),
RunAsUser: pointer.Int64(DefaultNonRootUser),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do windows nodes do if runAsUser is non-nil? do they ignore it or fail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ignored by kubelet:

if effectiveSc.RunAsUser != nil {
klog.InfoS("Windows container does not support SecurityContext.RunAsUser, please use SecurityContext.WindowsOptions",
"pod", klog.KObj(pod), "containerName", container.Name)
}

With PodOS feature and OS field is set (in beta in 1.24) the adminsion controller would reject it:

// If the OS field is set to windows, following fields must be unset:
// - spec.hostPID
// - spec.hostIPC
// - spec.securityContext.seLinuxOptions
// - spec.securityContext.seccompProfile
// - spec.securityContext.fsGroup
// - spec.securityContext.fsGroupChangePolicy
// - spec.securityContext.sysctls
// - spec.shareProcessNamespace
// - spec.securityContext.runAsUser
// - spec.securityContext.runAsGroup
// - spec.securityContext.supplementalGroups
// - spec.containers[*].securityContext.seLinuxOptions
// - spec.containers[*].securityContext.seccompProfile
// - spec.containers[*].securityContext.capabilities
// - spec.containers[*].securityContext.readOnlyRootFilesystem
// - spec.containers[*].securityContext.privileged
// - spec.containers[*].securityContext.allowPrivilegeEscalation
// - spec.containers[*].securityContext.procMount
// - spec.containers[*].securityContext.runAsUser
// - spec.containers[*].securityContext.runAsGroup
// +optional

Copy link
Member

@liggitt liggitt May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, there's a TODO to avoid setting RunAsUser in MustMixinRestrictedPodSecurity for explicitly windows pods as part of the PodOS work, I just wanted to make sure this wasn't going to fail in a windows kubelet for pods that didn't explicitly indicate their OS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this did break the tests because of #102849

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-containerd-windows/1529769915945324544

May 26 10:47:49.396: INFO: At 2022-05-26 10:44:57 +0000 UTC - event for pod-update-d046cc17-42c7-493c-8c0c-fea0e9ac3164: {kubelet capz-conf-qwm24} FailedMount: (combined from similar events): MountVolume.SetUp failed for volume "kube-api-access-9l7hz" : chown c:\var\lib\kubelet\pods\0a6b5b55-fc8f-42b4-b285-19aedc311882\volumes\kubernetes.io~projected\kube-api-access-9l7hz\..2022_05_26_10_46_59.624505313\token: not supported by windows

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't setting RunAsUserName, but does the windows kubelet have similar problems with RunAsUser?

if so, that seems like a good reason to drop RunAsUser here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the title is slightly misleading but it states in the issue: In addition if a Windows Pod is created with a Projected Volume and RunAsUser set, the Pod will be stuck at ContainerCreating:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. @tallclair, that's a good reason not to set RunAsUser here for pods that could run on Windows. @jsturtevant, do you mind opening a PR to drop the RunAsUser field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linking here for anyone following the thread #110235

@endocrimes
Copy link
Member

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 25, 2022
@endocrimes endocrimes moved this from Triage to PRs Waiting on Author in SIG Node CI/Test Board May 25, 2022
@liggitt
Copy link
Member

liggitt commented May 25, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 25, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: endocrimes, liggitt, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 25, 2022
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 60481c9 into kubernetes:master May 25, 2022
SIG Node CI/Test Board automation moved this from PRs Waiting on Author to Done May 25, 2022
SIG-Auth: PodSecurity automation moved this from In Review to Done (1.25) May 25, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

None yet

8 participants