Skip to content

Security audit: network policy & infrastructure#215

Merged
RafaelPo merged 5 commits intomainfrom
audit-network-policy
Feb 25, 2026
Merged

Security audit: network policy & infrastructure#215
RafaelPo merged 5 commits intomainfrom
audit-network-policy

Conversation

@RafaelPo
Copy link
Contributor

Summary

  • Comprehensive defensive security audit of the MCP server deployment infrastructure
  • Covers Kubernetes network policies, container security, Redis TLS, CORS, rate limiting, security headers, and CI/CD pipeline
  • Identifies 2 critical, 4 high, 5 medium, and 4 low severity findings with concrete remediation code

Key Findings

  • C1 — No NetworkPolicy resources: unrestricted pod-to-pod communication in the cluster
  • C2 — No securityContext in K8s deployment template (docker-compose has it, K8s doesn't)
  • H1–H4 — Missing Pod Security Standards, no RBAC/ServiceAccount, no image scanning, static GCP credentials
  • Full report with PoC exploits, file:line references, and copy-paste remediation snippets in NETWORK_POLICY_AUDIT.md

Test plan

  • Review all findings against current deployment
  • Validate NetworkPolicy YAML against cluster CNI
  • Apply securityContext fix to deployment.yaml and verify pod starts
  • Triage and schedule remediation items

🤖 Generated with Claude Code

Comprehensive defensive audit of K8s deployment, container security,
Redis TLS, CORS, rate limiting, security headers, and CI/CD pipeline.
Identifies 2 critical (missing NetworkPolicy, missing securityContext),
4 high, 5 medium, and 4 low findings with remediation code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RafaelPo
Copy link
Contributor Author

@claude code review

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Claude finished @RafaelPo's task in 2m 52s —— View job


Code Review: Network Policy & Infrastructure Audit

  • Read repo guidelines
  • Read the audit document
  • Cross-reference findings against actual codebase files
  • Provide review feedback

Overall: This is a thorough and well-structured security audit. I cross-referenced all 15 findings against the actual codebase and every file path and line number reference is accurate. The severity ratings are reasonable, the remediation priority matrix is practical, and the "Security Strengths" section provides valuable balance. A few issues with the remediation snippets are worth fixing before merge.


Issues in Remediation Code

1. C1 — NetworkPolicy ingress allows from wrong namespace (NETWORK_POLICY_AUDIT.md:76)

The allow-gateway-ingress policy allows traffic from cert-manager namespace, but looking at httproute.yaml, shared-tls-gateway is the Gateway resource in cert-manager — the actual data-plane proxy pods that forward traffic may run in a different namespace (e.g., gke-managed-system, envoy-gateway-system, or similar depending on the GKE Gateway controller). Allowing from cert-manager may not match the real proxy pods.

# Current (may not work):
- namespaceSelector:
    matchLabels:
      kubernetes.io/metadata.name: cert-manager

# Should be verified against actual proxy namespace:
- namespaceSelector:
    matchLabels:
      kubernetes.io/metadata.name: <actual-gateway-proxy-namespace>

Suggest adding a note that the namespace selector needs to be validated against the cluster's gateway implementation, e.g.: "Verify which namespace runs the gateway proxy pods: kubectl get pods -A | grep gateway"

Fix this →

2. C2 — runAsUser: 65534 conflicts with Dockerfile user (NETWORK_POLICY_AUDIT.md:170)

The Dockerfile creates a dedicated mcp user (line 19: useradd -r -g mcp ...), and the app directory /app is owned by that user. The remediation suggests runAsUser: 65534 (the nobody user), which would likely cause permission errors when the process tries to read files in /app.

# Current recommendation (will break):
runAsUser: 65534

# Should match the Dockerfile's mcp user UID:
runAsUser: 999  # or whatever UID was assigned to 'mcp' - check with: docker run <image> id mcp

Similarly fsGroup: 65534 should match. Suggest updating the snippet to reference the mcp user's actual UID, or add a note to determine it first.

Fix this →

3. H3 — Trivy action version not pinned to SHA (NETWORK_POLICY_AUDIT.md:257)

The audit recommends aquasecurity/trivy-action@0.28.0 but finding L2 in the same document recommends pinning actions to commit SHAs. The remediation code should follow its own advice.

# Current:
uses: aquasecurity/trivy-action@0.28.0

# Should be SHA-pinned per L2:
uses: aquasecurity/trivy-action@<commit-sha>  # v0.28.0

Minor Suggestions

4. C1 — DNS egress could be tighter (NETWORK_POLICY_AUDIT.md:94-100)

The DNS egress rule uses namespaceSelector: {} (all namespaces). This works but is broader than needed. Consider restricting to kube-system where CoreDNS runs:

- to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns

5. L4 — PDB with replicaCount: 1 blocks node drains (NETWORK_POLICY_AUDIT.md:513-518)

A minAvailable: 1 PDB with only 1 replica will prevent voluntary disruptions entirely (node drains, cluster upgrades). The audit does mention increasing replicaCount to 2, but this nuance should be more prominent — applying the PDB without also increasing replicas will block cluster maintenance.

6. M3 — CORS wildcard is intentionally justified (NETWORK_POLICY_AUDIT.md:376-379)

The audit recommends restricting CORS to settings.mcp_server_url, but the codebase comment at routes.py:22-33 explicitly justifies the wildcard: auth uses Bearer tokens (not cookies), so CORS restrictions don't add security value. The recommendation to restrict isn't wrong, but the mitigating factors already noted in the audit make this closer to L severity than M.


What's Done Well

  • Every file:line reference verified accurate against the codebase
  • PoC exploits are realistic and demonstrate real risk
  • The strengths table (lines 526-551) is excellent — acknowledges the strong application-layer security already in place
  • Remediation priority matrix with effort estimates is immediately actionable
  • Correct identification that docker-compose hardening wasn't ported to K8s (C2) — this is the most impactful finding

Verdict: The audit is high quality and ready to merge after fixing issues #1 and #2 above (incorrect gateway namespace and UID mismatch), as these could lead to broken deployments if the remediation is copy-pasted as-is.


|

RafaelPo and others added 2 commits February 25, 2026 10:18
C1: Add networkpolicy.yaml with default-deny-all, gateway ingress
allowlist, and egress restricted to kube-dns, Redis, and external HTTPS.
Gateway namespace and Redis selector are configurable via values.yaml.

C2: Add pod and container securityContext to deployment.yaml:
runAsNonRoot, runAsUser/Group 10000, readOnlyRootFilesystem,
allowPrivilegeEscalation: false, drop ALL capabilities, Seccomp
RuntimeDefault. Dockerfile updated to pin mcp user to UID/GID 10000.

Also: dedicated ServiceAccount with automountServiceAccountToken: false
(H2), and audit doc updated per code review feedback (gateway namespace
note, UID fix, DNS egress tightened, PDB caveat, CORS severity
downgrade, Trivy SHA-pin note).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The infra fixes (NetworkPolicy, securityContext, ServiceAccount) stand
on their own — the audit document is internal-only and does not belong
in the repository.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RafaelPo
Copy link
Contributor Author

@claude code review

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Claude finished @RafaelPo's task in 1m 50s —— View job


Code Review: Network Policy & Infrastructure Security

  • Read repo guidelines
  • Read all changed files in the PR
  • Cross-reference against existing codebase
  • Provide review feedback

Overall: Solid hardening PR. The previous review's critical issues (UID mismatch, gateway namespace, DNS egress scope) have all been properly addressed. The UID is now pinned to 10000 in both Dockerfile and deployment spec, the gateway namespace is configurable via values, and DNS egress is scoped to kube-system. I found a few remaining issues worth addressing.


Issues

1. readOnlyRootFilesystem will break without PYTHONDONTWRITEBYTECODE in runtime stageDockerfile:17-21, deployment.yaml:37

PYTHONDONTWRITEBYTECODE=1 is set in stage 1 (build) but not in stage 2 (runtime). Since multi-stage builds don't inherit ENVs, Python will try to write .pyc files at runtime into the read-only /app/.venv/ directory. This will cause PermissionError or noisy warnings on every import.

# Stage 2: Slim runtime
FROM python:3.13-slim

+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+
RUN groupadd -r -g 10000 mcp && useradd -r -u 10000 -g mcp -d /app -s /sbin/nologin mcp

Fix this →

2. NetworkPolicy names will collide if multiple releases share a namespacenetworkpolicy.yaml:6,25,46

The policies use static names (default-deny-all, allow-gateway-ingress, allow-mcp-egress) while all other templates use {{ .Release.Name }}. If a second Helm release is deployed into the same namespace, these will conflict.

# Current:
  name: default-deny-all

# Suggested:
  name: {{ .Release.Name }}-default-deny-all

Same for allow-gateway-ingress and allow-mcp-egress. default-deny-all is arguably fine as a shared namespace-wide policy, but the other two are release-specific and should be prefixed.

Fix this →

3. default-deny-all affects all pods in the namespacenetworkpolicy.yaml:7-11

podSelector: {} matches every pod in the namespace, not just the MCP pods. This is correct for defense-in-depth, but if other workloads share this namespace and don't have their own allow-policies, they'll lose all connectivity. Worth a comment in values.yaml noting this is namespace-wide.

4. Redis egress may need a namespaceSelectornetworkpolicy.yaml:69-71, values.yaml:48-51

The Redis selector uses only podSelector with no namespaceSelector, meaning it only matches pods in the same namespace. If Redis is deployed to a different namespace (common with managed Redis or shared infra), this rule won't work. Consider making the namespace configurable:

# values.yaml
redisSelector:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: redis  # or same-namespace if co-located
    podSelector:
      matchLabels:
        app: redis

This may not be an issue if Redis is indeed co-located, but worth verifying.


Minor

5. Redundant automountServiceAccountToken: falseserviceaccount.yaml:5, deployment.yaml:30

Set at both the ServiceAccount level and the pod spec level. Not harmful (belt-and-suspenders), but one location is sufficient. The pod spec takes precedence.

6. ServiceAccount missing standard Helm labelsserviceaccount.yaml

The ServiceAccount doesn't include namespace, labels, or annotations that other templates in this chart might benefit from for identification and lifecycle management. This is a minor consistency issue.


What's Done Well

  • UID 10000 consistently pinned across Dockerfile and K8s spec — the previous review's critical issue is properly resolved
  • Gateway namespace is now configurable via values.yaml with a sensible gke-managed-system default and clear documentation comment
  • DNS egress properly scoped to kube-system/kube-dns (addresses previous review's suggestion Optional response model for Rank #4)
  • Container security is comprehensive: drop ALL caps, no privilege escalation, seccomp RuntimeDefault, read-only root filesystem
  • /tmp emptyDir with 100Mi sizeLimit is a nice detail for the read-only rootfs
  • Egress HTTPS rule correctly excludes RFC1918 + link-local ranges

Verdict: Address issue #1 (PYTHONDONTWRITEBYTECODE in runtime stage) before merge — the read-only filesystem will cause Python import failures without it. Issues #2-4 are lower priority but worth considering.


|

RafaelPo and others added 2 commits February 25, 2026 10:37
Multi-stage builds don't inherit ENVs. Without this, Python tries to
write .pyc files into the read-only /app/.venv/ directory, causing
PermissionError with readOnlyRootFilesystem: true.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Production and staging use Redis Sentinel (REDIS_SENTINEL_ENDPOINTS),
which means Redis is cross-namespace or external — a podSelector in
the same namespace would silently block all Redis traffic.

Switch to ipBlock CIDRs (redisCIDRs in values.yaml). Defaults to
10.0.0.0/8 with a TODO to narrow once Sentinel IPs are confirmed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RafaelPo RafaelPo merged commit 038ff76 into main Feb 25, 2026
5 checks passed
@RafaelPo RafaelPo deleted the audit-network-policy branch February 25, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant