Skip to content

fix: PSS restricted compliance, missing IAM permissions, Gateway API support#4

Merged
levkk merged 3 commits into
pgdogdev:mainfrom
reductoai:fix/pss-permissions-gateway-api
May 25, 2026
Merged

fix: PSS restricted compliance, missing IAM permissions, Gateway API support#4
levkk merged 3 commits into
pgdogdev:mainfrom
reductoai:fix/pss-permissions-gateway-api

Conversation

@piotr-reducto
Copy link
Copy Markdown
Contributor

Summary

Three fixes discovered while deploying pgdog-control on EKS (K8s 1.35) with Pod Security Standards enforcement:

1. PSS Restricted compliance

The control deployment's pod securityContext only set seccompProfile: RuntimeDefault. Namespaces with pod-security.kubernetes.io/enforce=restricted reject pods that don't set runAsNonRoot: true. The container-level context already drops ALL capabilities and disables privilege escalation, but the missing pod-level field blocks admission.

Fix: Add runAsNonRoot: true, runAsUser: 65534, runAsGroup: 65534 (nobody) to the pod securityContext. The control binary binds port 8080 (unprivileged), reads config from a ConfigMap mount (world-readable), and has no filesystem write requirements — verified working as non-root on EKS.

2. Missing IAM permissions in README

The documented IRSA permissions policy was missing two actions the control plane actually calls:

  • rds:DescribeDBParameters — called after DescribeDBInstances to display parameter group settings. Without it, the RDS refresh loop fails with AccessDenied and the database panel stays empty.
  • ec2:DescribeInstanceTypes — called to resolve instance class specs (vCPU, memory) for each RDS instance. Without it, the refresh fails with UnauthorizedOperation.

Both are read-only. The DescribeDBInstances and DescribeDBClusters calls succeed, but the refresh is all-or-nothing — one denied action causes the entire refresh to report failure.

Found via CloudTrail after the dashboard showed no RDS instances despite successful STS token exchange and correct IRSA wiring.

3. Gateway API (HTTPRoute) support

Added ingress.mode: gateway which renders an HTTPRoute instead of an Ingress. On clusters that use Gateway API (Traefik, Envoy Gateway, AWS ALB controller via gateway.k8s.aws), there's no IngressClass to target. The existing modes (nginx, aws, default) all render Ingress resources.

The HTTPRoute attaches to an existing Gateway via three new values:

  • ingress.gateway.name — Gateway resource name
  • ingress.gateway.namespace — Gateway namespace
  • ingress.gateway.sectionName — optional listener selector

Example:

ingress:
  enabled: true
  mode: gateway
  host: pgdog-dash.staging.example.com
  gateway:
    name: traefik-gw
    namespace: traefik
    sectionName: web

Test plan

  • Deployed on EKS staging cluster with enforce=restricted namespace — pods admitted without warnings
  • Verified id inside container returns uid=65534(nobody)
  • IAM policy with all 8 actions: RDS instances discovered (instances=1 in logs)
  • HTTPRoute attached to Traefik Gateway, dashboard accessible at https://pgdog-dash.staging.platform.reducto.ai/
  • All 4 pgdog data-plane pods (blue + green) posting metrics to dashboard via in-cluster control endpoint

🤖 Generated with Claude Code

…support

Three fixes discovered while deploying pgdog-control on EKS with
Pod Security Standards enforcement:

1. **PSS Restricted compliance**: Add `runAsNonRoot: true`,
   `runAsUser: 65534`, `runAsGroup: 65534` (nobody) to the control
   deployment's pod securityContext. The container already sets
   `allowPrivilegeEscalation: false`, drops ALL capabilities, and
   uses RuntimeDefault seccomp — but omitting `runAsNonRoot` causes
   PSS Restricted admission to reject the pod. The control binary
   has no dependency on running as root.

2. **Missing IAM permissions in docs**: The IRSA permissions policy
   in the README was missing `rds:DescribeDBParameters` and
   `ec2:DescribeInstanceTypes`. Without these, the RDS refresh loop
   fails with AccessDenied/UnauthorizedOperation and the database
   panel stays empty, even though DescribeDBInstances and
   DescribeDBClusters succeed.

3. **Gateway API (HTTPRoute) support**: Add `ingress.mode: gateway`
   which renders an HTTPRoute instead of an Ingress. This is needed
   on clusters that route traffic through a Gateway controller
   (Traefik, Envoy Gateway, AWS ALB via gateway.k8s.aws) rather
   than through an IngressClass. The HTTPRoute attaches to an
   existing Gateway via `ingress.gateway.{name,namespace,sectionName}`.
Running as uid 65534 (nobody) sets HOME=/nonexistent, which doesn't
exist and isn't writable. Helm's `repo add` writes config to $HOME
and fails with `mkdir /nonexistent: permission denied`. Setting
HOME=/tmp gives helm a writable directory.
The image ships with ubuntu:1000 and its home dir /home/ubuntu exists.
Using this instead of nobody (65534) avoids the HOME=/tmp workaround
needed for helm repo config writes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants