Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes needed in the gloo-ee / gloo helm charts for 1.25 compatibility with a namespace using restricted Pod Security Standards (PSS) #8864

Closed
1 task done
ably77 opened this issue Nov 6, 2023 · 11 comments
Assignees
Labels
Area: Helm Committed: 1.18 Prioritized Indicating issue prioritized to be worked on in RFE stream release/1.16 Size: S 1 - 3 days Type: Bug Something isn't working

Comments

@ably77
Copy link

ably77 commented Nov 6, 2023

Gloo Edge Product

Open Source

Gloo Edge Version

latest

Kubernetes Version

1.25

Describe the bug

Summary:
Issues when deploying Gloo Edge on 1.25 with a restricted Pod Security Standard (PSS) profile

  1. gloo-ee/charts/gloo/templates/19-gloo-mtls-certgen-job.yaml container: certgen does now allow setting a complete podSecurityContext ( PSC ).
  2. gloo-ee/charts/gloo/templates/3-discovery-deployment.yaml containers have a hardcoded PSC, missing seccompProfile / ability to override it.
  3. gloo-ee/charts/gloo/templates/5-resource-cleanup-job.yaml container kubectl has a hardcoded PSC without SeccompProfile / Drop of capabilities.
  4. gloo-ee/charts/gloo/templates/5-resource-migration-job.yaml same as number 3.
  5. gloo-ee/charts/gloo/templates/5-resource-rollout-job.yaml same as 3 and 4.
  6. gloo-ee/charts/gloo/templates/6.5-gateway-certgen-job.yaml same as 3 and 4.
  7. gloo-ee/templates/70-resource-rollout-job.yaml same as 3 / 4.
  8. gloo-ee/templates/_helpers.tpl gloo.extauthinitcontainers template does not allow setting a PSC.
  9. Several helm-hooks do not set resource request/limits.

IMHO, a lot of this changes are for "single-shot" pods, adding a default PSC that matches a restricted namespace, the only exception is the _template helper.

Expected Behavior

Gloo Edge OSS and Gloo Edge Enterprise should be able to be deployed in Kubernetes 1.25 with the standards set forth by the restricted PSS profile

Steps to reproduce the bug

deploy latest gloo edge on 1.25 in a cluster set up with restricted PSS profile

Additional Environment Detail

No response

Additional Context

Additional Context:
link to PSS doc

Related Issues

┆Issue is synchronized with this Asana task by Unito

@ably77 ably77 added the Type: Bug Something isn't working label Nov 6, 2023
@htpvu htpvu added the Prioritized Indicating issue prioritized to be worked on in RFE stream label Mar 6, 2024
@nfuden nfuden added the Size: S 1 - 3 days label Apr 29, 2024
@sheidkamp
Copy link
Contributor

Note: gloo-ee/templates/70-resource-rollout-job.yaml1 was removed in https://github.com/solo-io/solo-projects/pull/5491/files

@sheidkamp
Copy link
Contributor

@ably77 - question on "9 - Several helm-hooks do not set resource request/limits":

I don't see anything about resource/request limits in the Pod Security Standards. Is this specifically needed for meeting PSS/deploying with a restricted profile, or is this more generally part of requested helm updates?

@sheidkamp
Copy link
Contributor

OSS changes have entered PR.

In addition to adding support for configuring the individual container securityContexts, I have added a flag global.podSecurityStandards.container.enableRestrictedContainerDefaults that will default all container securityContexts to the following securityContext which applies the minimal changes needed to meet the Restricted Pod Security Standards:

securityContext:
  allowPrivilegeEscalation: false
  runAsNonRoot: true
  seccompProfile:
    type: RuntimeDefault
  capabilities:
    drop:
    - ALL

Template specific defaults will be applied to this context.

@ably77
Copy link
Author

ably77 commented May 16, 2024

@ably77 - question on "9 - Several helm-hooks do not set resource request/limits":

I don't see anything about resource/request limits in the Pod Security Standards. Is this specifically needed for meeting PSS/deploying with a restricted profile, or is this more generally part of requested helm updates?

Hey @sheidkamp sorry I missed this. I dont think its a hard requirement that is strictly enforced but is generally a recommended best practice for most organizations to be configurable so more of the "generally part of requested helm updates"

Generally I think we'll see a tool like OPA, Kyverno, or an admission controller that will block a Pod without defined resources from being deployed

@anessi
Copy link

anessi commented May 17, 2024

@sheidkamp : great that this got fixed! Is this also covering extauth (this is not visible in the PR)? See #8455 (comment)

@sheidkamp
Copy link
Contributor

@ably77 - extauth will be covered in the EE PR that relies on the OSS PR.

For resources limits, that's needed at the container level, basically the same scope as the security contexts?

@nfuden
Copy link
Contributor

nfuden commented May 17, 2024

Resource limits also seem dangerous to enforce given that most of these commands are highly dependant on a customers environment. @ably77 can you move that part to a separate RFE as its not cut and dry as well as potentially being a dangerous update

@ably77
Copy link
Author

ably77 commented May 17, 2024

I dont think we need to strictly set a request limit by default, but allow it to be configurable for a user that wants to

@nfuden
Copy link
Contributor

nfuden commented May 17, 2024

We will consider this. Although everything can already technically be overidden by kustomize we can check in to see if there is a cleaner update

@sheidkamp
Copy link
Contributor

sheidkamp commented May 17, 2024

@ably77 - looking for some additional clarifications, I see we set the resources in the 5-/6.5-/19- jobs (for example with gateway.cleanupJob.resources).

Can you give examples (or a full list) of the hooks that need this configuration?

@sheidkamp
Copy link
Contributor

The container security changes have been merged into EE/solo-projects main (will be part of the 1.17.0-beta3 release) and the 1.16.x branch (will be part of the 1.16.10 release)

As requested in #8864 (comment), please open another RFE for the resource limits, ideally with clarifications requested in #8864 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Helm Committed: 1.18 Prioritized Indicating issue prioritized to be worked on in RFE stream release/1.16 Size: S 1 - 3 days Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants