Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatekeeper pods failing to come up on openshift cluster #790

Closed
goyalv opened this issue Aug 18, 2020 · 14 comments · Fixed by #842
Closed

Gatekeeper pods failing to come up on openshift cluster #790

goyalv opened this issue Aug 18, 2020 · 14 comments · Fixed by #842
Labels
bug Something isn't working production issue

Comments

@goyalv
Copy link
Contributor

goyalv commented Aug 18, 2020

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

Deployed gatekeeper release using prebuilt image with following command kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml

Under gatekeeper-system namespace , gatekeeper pods did not get deployed, when described the deployment of gatekeeper-controller-manger saw the following

kubectl get deployments -n gatekeeper-system gatekeeper-controller-manager -o yaml

status:
conditions:

  • lastTransitionTime: "2020-08-17T22:52:08Z"
    lastUpdateTime: "2020-08-17T22:52:08Z"
    message: Created new replica set "gatekeeper-controller-manager-549cc48799"
    reason: NewReplicaSetCreated
    status: "True"
    type: Progressing
  • lastTransitionTime: "2020-08-17T22:52:08Z"
    lastUpdateTime: "2020-08-17T22:52:08Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  • lastTransitionTime: "2020-08-17T22:52:08Z"
    lastUpdateTime: "2020-08-17T22:52:08Z"
    message: 'pods "gatekeeper-controller-manager-549cc48799-" is forbidden: unable
    to validate against any security context constraint: [spec.containers[0].securityContext.securityContext.runAsUser:
    Invalid value: 1000: must be in the ranges: [1000220000, 1000229999] pod.metadata.annotations.container.seccomp.security.alpha.kubernetes.io/manager:
    Forbidden: seccomp may not be set]'
    reason: FailedCreate
    status: "True"
    type: ReplicaFailure
    observedGeneration: 1
    unavailableReplicas: 3

What did you expect to happen:

Expected the pods to come up and start running

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

used workaround:
Edit the clusterrole "gatekeeper-manager-role" to add a rule

apiGroups:
security.openshift.io
resourceNames:
nonroot
resources:
securitycontextconstraints
verbs:
use

Edited the scc "nonroot" to add
seccompProfiles:
-runtime/default
Using above workaround gatekeeper pods came up and started running

Environment:
Openshift cluster

  • Gatekeeper version: openpolicyagent/gatekeeper:v3.1.0-rc.1
  • Kubernetes version: (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T18:55:03Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2020-07-21T11:45:09Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
@maxsmythe
Copy link
Contributor

maxsmythe commented Aug 18, 2020

This looks like the user needs to apply a security context policy (OpenShift's PSP-like system) or change the default port in the pod.

It looks like OpenShift allows ports in range [1000220000, 1000229999] by default, with other ports requiring specific security context policies.

@maxsmythe
Copy link
Contributor

You can get the Gatekeeper version by looking for the image tag used in the deployment.

@goyalv
Copy link
Contributor Author

goyalv commented Aug 18, 2020

This looks like the user needs to apply a seccomp policy (OpenShift's PSP-like system) or change the default port in the pod.

It looks like OpenShift allows ports in range [1000220000, 1000229999] by default, with other ports requiring specific seccomp policies.

This range is for user ID in RunAsUser

@maxsmythe
Copy link
Contributor

Oops, thanks! The solutions should be the same?

@goyalv
Copy link
Contributor Author

goyalv commented Aug 18, 2020

Oops, thanks! The solutions should be the same?

openshift comes with default scc's but none of them have seccomp set, only cluster-admin is allowed to make changes to these default scc,
as a solution may be in gatekeeper we can create a new custom scc that has seccomp set, and runasuser either nonroot or define a range, and then change gatekeeper cluster role to add a new rule to use this custom scc

I was playing with this and since I am cluster-admin , I modified nonroot scc to set seccomp and changed gatekeeper clusterrole to use this scc, this workaround worked and gatekeeper pods came up, however after that I encounter #791

@maxsmythe
Copy link
Contributor

My only issue with including a new cloud-specific resource is how it may affect users who don't have that resource? The impact for the YAML manifest would be an apply error that should be benign but may confuse users. I'm unsure what Helm would do, though there it could be gated by a flag.

What do other projects do? Is it standard practice to supply seccomp policies, or is it up to the users to be compliant with the idiosyncrasies of the cloud they are running on?

@goyalv
Copy link
Contributor Author

goyalv commented Aug 18, 2020

My only issue with including a new cloud-specific resource is how it may affect users who don't have that resource? The impact for the YAML manifest would be an apply error that should be benign but may confuse users. I'm unsure what Helm would do, though there it could be gated by a flag.

What do other projects do? Is it standard practice to supply seccomp policies, or is it up to the users to be compliant with the idiosyncrasies of the cloud they are running on?

this seccomnp issue is happening because gatekeeper has annotation added to the deployment under spec.template
annotations:
container.seccomp.security.alpha.kubernetes.io/manager: runtime/default

If I remove this annotation then do not fall into this issue

@maxsmythe
Copy link
Contributor

Interesting. It looks like the seccomp annotation was added as a result of the security audit: #518

Removing the secccomp annotation removes the RunAsUser issue?

@goyalv
Copy link
Contributor Author

goyalv commented Aug 18, 2020

no RunAsUser issue still exists, but as a user if I am not cluster-admin, I change gatekeeper cluster role to use nonroot scc,
and same edited clusterrole works on all types of clusters like azure, gcp, aws and kind even though they do not have concept of scc.

@maxsmythe
Copy link
Contributor

Gah, I was getting confused between seccomp and scc.

To use the default nonroot scc, you need to drop the seccomp annotation?

@goyalv
Copy link
Contributor Author

goyalv commented Aug 18, 2020

so to be more clear there are 2 issues currently

  1. gatekeeper pod runs as 1000 user id, this can be resolved by modifying gatekeepere clusterrole to use nonroot scc
  2. gatekeeper deployment has annotation seccomp that is forbidden, this can be resolved by dropping the annotation or modifying scc that gatekeeper is using to have seccomp profile set.

@ritazh
Copy link
Member

ritazh commented Aug 19, 2020

Given that this is very specific to OpenShift's environment and cluster configuration, I think it would be hard for us to test this as part of our e2e test infra. Perhaps this can be provided as a guidance in the readme? WDYT? We already have a precedence for it here.

@fedepaol
Copy link
Contributor

I am not an OpenShift security expert, but:

  • I would use the anyuid scc, which per description anyuid provides all features of the restricted SCC but allows users to run with any UID and any GID. Restricted is the (obviously) most restrictive scc.

More references about OpenShift sccs can be found here

Per the seccomp annotation, that would not be allowed by the restricted scc. I'd avoid modifying the current sccs (and creating a new one seems like an overkill) and just suggest to remove the annotation, given the low priv scc the deployments would run with.

@maxsmythe
Copy link
Contributor

+1, and making the changes to the README is reasonable, as it avoids impacting our response to the audit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working production issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants