Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRT-540: Add privileged label to infra namespaces #3328

Merged
merged 1 commit into from
Sep 14, 2022

Conversation

xueqzhan
Copy link
Contributor

@xueqzhan xueqzhan commented Sep 8, 2022

This has been affecting kubelet creating mirror pods for keepalived, haproxy etc.

- What I did
Add privileged labels to infra namespaces.

- How to verify it
Kubelet logs should not have errors about "Failed creating a mirror pod for .... is forbidden: violates PodSecurity" error.

- Description for the changelog
Since this PR: openshift/cluster-kube-apiserver-operator#1369, many pods are affected and fail to create. The infra namespaces are affecting kubelets. The following are some sample errors.

Sep 02 14:02:06.914534 hckrs6pg-c805c-7czxl-master-0.novalocal kubenswrapper[1782]: E0902 14:02:06.914490    1782 kubelet.go:1713] "Failed creating a mirror pod for" err="pods \"keepalived-hckrs6pg-c805c-7czxl-master-0{color}" is forbidden: violates PodSecurity \"restricted:latest\": host namespaces (hostNetwork=true), privileged (containers \"keepalived\", \"keepalived-monitor\" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers \"render-config-keepalived\", \"keepalived\", \"keepalived-monitor\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \"render-config-keepalived\", \"keepalived\", \"keepalived-monitor\" must set securityContext.capabilities.drop=[\"ALL\"]), restricted volume types (volumes \"resource-dir\", \"script-dir\", \"kubeconfig\", \"kubeconfigvarlib\", \"conf-dir\", \"chroot-host\" use restricted volume type \"hostPath\"), runAsNonRoot != true (pod or containers \"render-config-keepalived\", \"keepalived\", \"keepalived-monitor\" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers \"render-config-keepalived\", \"keepalived\", \"keepalived-monitor\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")" pod="openshift-openstack-infra/keepalived-hckrs6pg-c805c-7czxl-master-0"

Sep 02 14:02:37.926925 hckrs6pg-c805c-7czxl-master-0.novalocal kubenswrapper[1782]: E0902 14:02:37.925692    1782 kubelet.go:1713] "Failed creating a mirror pod for" err="pods \"coredns-hckrs6pg-c805c-7czxl-master-0{color}" is forbidden: violates PodSecurity \"restricted:latest\": host namespaces (hostNetwork=true), privileged (containers \"coredns\", \"coredns-monitor\" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers \"render-config-coredns\", \"coredns\", \"coredns-monitor\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \"render-config-coredns\", \"coredns\", \"coredns-monitor\" must set securityContext.capabilities.drop=[\"ALL\"]), restricted volume types (volumes \"resource-dir\", \"kubeconfig\", \"conf-dir\", \"nm-resolv\" use restricted volume type \"hostPath\"), runAsNonRoot != true (pod or containers \"render-config-coredns\", \"coredns\", \"coredns-monitor\" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers \"render-config-coredns\", \"coredns\", \"coredns-monitor\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")" pod="openshift-openstack-infra/coredns-hckrs6pg-c805c-7czxl-master-0"

Sep 02 14:03:04.918144 hckrs6pg-c805c-7czxl-master-0.novalocal kubenswrapper[1782]: E0902 14:03:04.918067    1782 kubelet.go:1713] "Failed creating a mirror pod for" err="pods \"haproxy-hckrs6pg-c805c-7czxl-master-0{color}" is forbidden: violates PodSecurity \"restricted:latest\": host namespaces (hostNetwork=true), privileged (container \"haproxy-monitor\" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers \"verify-api-int-resolvable\", \"haproxy\", \"haproxy-monitor\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \"verify-api-int-resolvable\", \"haproxy\", \"haproxy-monitor\" must set securityContext.capabilities.drop=[\"ALL\"]), restricted volume types (volumes \"resource-dir\", \"kubeconfigvarlib\", \"conf-dir\", \"chroot-host\" use restricted volume type \"hostPath\"), runAsNonRoot != true (pod or containers \"verify-api-int-resolvable\", \"haproxy\", \"haproxy-monitor\" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers \"verify-api-int-resolvable\", \"haproxy\", \"haproxy-monitor\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")" pod="openshift-openstack-infra/haproxy-hckrs6pg-c805c-7czxl-master-0"

You can see the error in this job run: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.12-e2e-openstack-serial/1567141456282390528/artifacts/e2e-openstack-serial/gather-extra/artifacts/nodes/c1fh9587-c805c-lqm8z-master-0/journal

@sinnykumari
Copy link
Contributor

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere
/test e2e-vsphere-upgrade

@sinnykumari
Copy link
Contributor

/cc @jcpowermac @mandre

@mandre
Copy link
Member

mandre commented Sep 9, 2022

Hmm, the installation failed at bootstrap for all platforms, including aws and gcp that shouldn't be affected by this patch.
/retest

@jcpowermac
Copy link
Contributor

Awesome! Thanks for this...been seeing these errors in the kubelet while debugging another issue.

This has been affecting kubelet creating mirror pods for keepalived, haproxy etc.
@xueqzhan
Copy link
Contributor Author

xueqzhan commented Sep 9, 2022

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere
/test e2e-vsphere-upgrade

@xueqzhan
Copy link
Contributor Author

xueqzhan commented Sep 9, 2022

It turned out the aws and gcp failures were related to the change. I think yaml was interpreting false without quotes as boolean type. Therefore unmarshalling got a panic trying to convert boolean to string. The panic can be observed from cluster-version-operator log on bootstrap node. Here is an example:

I0908 20:12:08.029692 1 sync_worker.go:982] Running sync for namespace "openshift-openstack-infra" (687 of 830)
E0908 20:12:08.029922 1 runtime.go:79] Observed a panic: &json.UnmarshalTypeError{Value:"bool", Type:(*reflect.rtype)(0x1866440), Offset:544, Struct:"ObjectMeta", Field:"metadata.labels"} (json: cannot unmarshal bool into Go struct field ObjectMeta.metadata.labels of type string)
goroutine 212 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x18f8080?, 0xc003280d20})
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x280?})
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x18f8080, 0xc003280d20})
/usr/lib/golang/src/runtime/panic.go:838 +0x207
github.com/openshift/cluster-version-operator/lib/resourceread.ReadOrDie({0xc001f20500?, 0x246?, 0x280?})
/go/src/github.com/openshift/cluster-version-operator/lib/resourceread/resourceread.go:66 +0x65
github.com/openshift/cluster-version-operator/lib/resourcebuilder.(*builder).Do(0xc002919030, {0x1e5eb30, 0xc000ac9c40})
/go/src/github.com/openshift/cluster-version-operator/lib/resourcebuilder/resourcebuilder.go:77 +0x45
github.com/openshift/cluster-version-operator/pkg/cvo.(*resourceBuilder).Apply(0xc001d17d70, {0x1e5eb30, 0xc000ac9c40}, 0x110?, 0x7f2e99cf5f18?)
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:849 +0x93
github.com/openshift/cluster-version-operator/pkg/payload.(*Task).Run.func1()
/go/src/github.com/openshift/cluster-version-operator/pkg/payload/task.go:112 +0x6b
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x19b6340, 0xc003280c01})
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:220 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1e5eb68?, 0xc000056098?}, 0xc001839a40?)
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:233 +0x57
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0xc001d05a80?)
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:226 +0x39
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoffWithContext({0x1e5eb30, 0xc000ac9c40}, {0x3b9aca00, 0x4000000000000000, 0x0, 0x4, 0x37e11d600}, 0x4?)
/go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:738 +0xa5
github.com/openshift/cluster-version-operator/pkg/payload.(*Task).Run(0xc000027a40, {0x1e5eb30?, 0xc000ac9c40}, {0xc00120e0c0, 0x38}, {0x1e44120?, 0xc001d17d70}, 0x2)
/go/src/github.com/openshift/cluster-version-operator/pkg/payload/task.go:111 +0x1f2
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).apply.func1({0x1e5eb30, 0xc000ac9c40}, {0xc00047f560, 0x14, 0x20200a226c617564?})
/go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:993 +0x50e
github.com/openshift/cluster-version-operator/pkg/payload.RunGraph.func3({0x1e5eb30, 0xc000ac9c40}, 0x20202020200a0a5b?)
/go/src/github.com/openshift/cluster-version-operator/pkg/payload/task_graph.go:478 +0xf5
created by github.com/openshift/cluster-version-operator/pkg/payload.RunGraph
/go/src/github.com/openshift/cluster-version-operator/pkg/payload/task_graph.go:468 +0x1f3

That said, I am not sure why cluster-version-operator for aws and gcp are Running sync for namespace "openshift-openstack-infra".

@jcpowermac
Copy link
Contributor

vSphere failures are unrelated to this PR. Probably disk or cpu performance issue in CI vsphere environment.
Cluster installed successfully.

@jcpowermac
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2022
@mandre
Copy link
Member

mandre commented Sep 9, 2022

OpenStack failure also probably unrelated.
/test e2e-openstack

@mandre mandre mentioned this pull request Sep 9, 2022
@cybertron
Copy link
Member

/test e2e-metal-ipi-ovn-ipv6

Copy link
Member

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that the openstack failure is unrelated.
/lgtm

@sinnykumari
Copy link
Contributor

/test e2e-metal-ipi-ovn-ipv6

@yuqi-zhang
Copy link
Contributor

Are we expecting any of the other tests to pass?

@xueqzhan
Copy link
Contributor Author

Are we expecting any of the other tests to pass?

Honestly I am not sure. The metal-ipi-ovn-ipv6 jobs ran at least a couple of times. First time, it failed with installation. The job ended getting three master nodes with no worker nodes. The second run passed installation. But getting lease was taking close to 2 hours and therefore the job is terminated after timeout. I am going to try rerun those after hours here to see what I will get tomorrow.

@xueqzhan
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere
/test e2e-vsphere-upgrade
/test e2e-openstack

@sinnykumari
Copy link
Contributor

e2e-metal-ipi-ovn-ipv6 test is passing now. Already noted that vsphere and openstack failures are unrelated.

Thanks Ken for the PR!
/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 13, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcpowermac, mandre, sinnykumari, xueqzhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2022
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD bb08790 and 2 for PR HEAD 08f620a in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 8276d9c and 1 for PR HEAD 08f620a in total

@xueqzhan
Copy link
Contributor Author

/retest-required

1 similar comment
@xueqzhan
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 14, 2022

@xueqzhan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack 08f620a link false /test e2e-openstack
ci/prow/e2e-vsphere-upgrade 08f620a link false /test e2e-vsphere-upgrade
ci/prow/e2e-vsphere 08f620a link false /test e2e-vsphere

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@xueqzhan
Copy link
Contributor Author

/retest-required

@openshift-merge-robot openshift-merge-robot merged commit a985910 into openshift:master Sep 14, 2022
@stlaz
Copy link
Contributor

stlaz commented Sep 22, 2022

/cherry-pick release-4.11

@openshift-cherrypick-robot

@stlaz: new pull request created: #3346

In response to this:

/cherry-pick release-4.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants