Security audit: network policy & infrastructure by RafaelPo · Pull Request #215 · futuresearch/futuresearch-python

RafaelPo · 2026-02-25T10:00:12Z

Summary

Comprehensive defensive security audit of the MCP server deployment infrastructure
Covers Kubernetes network policies, container security, Redis TLS, CORS, rate limiting, security headers, and CI/CD pipeline
Identifies 2 critical, 4 high, 5 medium, and 4 low severity findings with concrete remediation code

Key Findings

C1 — No NetworkPolicy resources: unrestricted pod-to-pod communication in the cluster
C2 — No securityContext in K8s deployment template (docker-compose has it, K8s doesn't)
H1–H4 — Missing Pod Security Standards, no RBAC/ServiceAccount, no image scanning, static GCP credentials
Full report with PoC exploits, file:line references, and copy-paste remediation snippets in NETWORK_POLICY_AUDIT.md

Test plan

Review all findings against current deployment
Validate NetworkPolicy YAML against cluster CNI
Apply securityContext fix to deployment.yaml and verify pod starts
Triage and schedule remediation items

🤖 Generated with Claude Code

Comprehensive defensive audit of K8s deployment, container security, Redis TLS, CORS, rate limiting, security headers, and CI/CD pipeline. Identifies 2 critical (missing NetworkPolicy, missing securityContext), 4 high, 5 medium, and 4 low findings with remediation code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RafaelPo · 2026-02-25T10:00:18Z

@claude code review

github-actions · 2026-02-25T10:00:36Z

Claude finished @RafaelPo's task in 2m 52s —— View job

Code Review: Network Policy & Infrastructure Audit

Read repo guidelines
Read the audit document
Cross-reference findings against actual codebase files
Provide review feedback

Overall: This is a thorough and well-structured security audit. I cross-referenced all 15 findings against the actual codebase and every file path and line number reference is accurate. The severity ratings are reasonable, the remediation priority matrix is practical, and the "Security Strengths" section provides valuable balance. A few issues with the remediation snippets are worth fixing before merge.

Issues in Remediation Code

1. C1 — NetworkPolicy ingress allows from wrong namespace (NETWORK_POLICY_AUDIT.md:76)

The allow-gateway-ingress policy allows traffic from cert-manager namespace, but looking at httproute.yaml, shared-tls-gateway is the Gateway resource in cert-manager — the actual data-plane proxy pods that forward traffic may run in a different namespace (e.g., gke-managed-system, envoy-gateway-system, or similar depending on the GKE Gateway controller). Allowing from cert-manager may not match the real proxy pods.

# Current (may not work):
- namespaceSelector:
    matchLabels:
      kubernetes.io/metadata.name: cert-manager

# Should be verified against actual proxy namespace:
- namespaceSelector:
    matchLabels:
      kubernetes.io/metadata.name: <actual-gateway-proxy-namespace>

Suggest adding a note that the namespace selector needs to be validated against the cluster's gateway implementation, e.g.: "Verify which namespace runs the gateway proxy pods: kubectl get pods -A | grep gateway"

Fix this →

2. C2 — runAsUser: 65534 conflicts with Dockerfile user (NETWORK_POLICY_AUDIT.md:170)

The Dockerfile creates a dedicated mcp user (line 19: useradd -r -g mcp ...), and the app directory /app is owned by that user. The remediation suggests runAsUser: 65534 (the nobody user), which would likely cause permission errors when the process tries to read files in /app.

# Current recommendation (will break):
runAsUser: 65534

# Should match the Dockerfile's mcp user UID:
runAsUser: 999  # or whatever UID was assigned to 'mcp' - check with: docker run <image> id mcp

Similarly fsGroup: 65534 should match. Suggest updating the snippet to reference the mcp user's actual UID, or add a note to determine it first.

Fix this →

3. H3 — Trivy action version not pinned to SHA (NETWORK_POLICY_AUDIT.md:257)

The audit recommends aquasecurity/trivy-action@0.28.0 but finding L2 in the same document recommends pinning actions to commit SHAs. The remediation code should follow its own advice.

# Current:
uses: aquasecurity/trivy-action@0.28.0

# Should be SHA-pinned per L2:
uses: aquasecurity/trivy-action@<commit-sha>  # v0.28.0

Minor Suggestions

4. C1 — DNS egress could be tighter (NETWORK_POLICY_AUDIT.md:94-100)

The DNS egress rule uses namespaceSelector: {} (all namespaces). This works but is broader than needed. Consider restricting to kube-system where CoreDNS runs:

- to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns

5. L4 — PDB with replicaCount: 1 blocks node drains (NETWORK_POLICY_AUDIT.md:513-518)

A minAvailable: 1 PDB with only 1 replica will prevent voluntary disruptions entirely (node drains, cluster upgrades). The audit does mention increasing replicaCount to 2, but this nuance should be more prominent — applying the PDB without also increasing replicas will block cluster maintenance.

6. M3 — CORS wildcard is intentionally justified (NETWORK_POLICY_AUDIT.md:376-379)

The audit recommends restricting CORS to settings.mcp_server_url, but the codebase comment at routes.py:22-33 explicitly justifies the wildcard: auth uses Bearer tokens (not cookies), so CORS restrictions don't add security value. The recommendation to restrict isn't wrong, but the mitigating factors already noted in the audit make this closer to L severity than M.

What's Done Well

Every file:line reference verified accurate against the codebase
PoC exploits are realistic and demonstrate real risk
The strengths table (lines 526-551) is excellent — acknowledges the strong application-layer security already in place
Remediation priority matrix with effort estimates is immediately actionable
Correct identification that docker-compose hardening wasn't ported to K8s (C2) — this is the most impactful finding

Verdict: The audit is high quality and ready to merge after fixing issues #1 and #2 above (incorrect gateway namespace and UID mismatch), as these could lead to broken deployments if the remediation is copy-pasted as-is.

|

C1: Add networkpolicy.yaml with default-deny-all, gateway ingress allowlist, and egress restricted to kube-dns, Redis, and external HTTPS. Gateway namespace and Redis selector are configurable via values.yaml. C2: Add pod and container securityContext to deployment.yaml: runAsNonRoot, runAsUser/Group 10000, readOnlyRootFilesystem, allowPrivilegeEscalation: false, drop ALL capabilities, Seccomp RuntimeDefault. Dockerfile updated to pin mcp user to UID/GID 10000. Also: dedicated ServiceAccount with automountServiceAccountToken: false (H2), and audit doc updated per code review feedback (gateway namespace note, UID fix, DNS egress tightened, PDB caveat, CORS severity downgrade, Trivy SHA-pin note). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The infra fixes (NetworkPolicy, securityContext, ServiceAccount) stand on their own — the audit document is internal-only and does not belong in the repository. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RafaelPo · 2026-02-25T10:33:23Z

@claude code review

github-actions · 2026-02-25T10:33:37Z

Claude finished @RafaelPo's task in 1m 50s —— View job

Code Review: Network Policy & Infrastructure Security

Read repo guidelines
Read all changed files in the PR
Cross-reference against existing codebase
Provide review feedback

Overall: Solid hardening PR. The previous review's critical issues (UID mismatch, gateway namespace, DNS egress scope) have all been properly addressed. The UID is now pinned to 10000 in both Dockerfile and deployment spec, the gateway namespace is configurable via values, and DNS egress is scoped to kube-system. I found a few remaining issues worth addressing.

Issues

1. readOnlyRootFilesystem will break without PYTHONDONTWRITEBYTECODE in runtime stage — Dockerfile:17-21, deployment.yaml:37

PYTHONDONTWRITEBYTECODE=1 is set in stage 1 (build) but not in stage 2 (runtime). Since multi-stage builds don't inherit ENVs, Python will try to write .pyc files at runtime into the read-only /app/.venv/ directory. This will cause PermissionError or noisy warnings on every import.

# Stage 2: Slim runtime
FROM python:3.13-slim

+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+
RUN groupadd -r -g 10000 mcp && useradd -r -u 10000 -g mcp -d /app -s /sbin/nologin mcp

Fix this →

2. NetworkPolicy names will collide if multiple releases share a namespace — networkpolicy.yaml:6,25,46

The policies use static names (default-deny-all, allow-gateway-ingress, allow-mcp-egress) while all other templates use {{ .Release.Name }}. If a second Helm release is deployed into the same namespace, these will conflict.

# Current:
  name: default-deny-all

# Suggested:
  name: {{ .Release.Name }}-default-deny-all

Same for allow-gateway-ingress and allow-mcp-egress. default-deny-all is arguably fine as a shared namespace-wide policy, but the other two are release-specific and should be prefixed.

Fix this →

3. default-deny-all affects all pods in the namespace — networkpolicy.yaml:7-11

podSelector: {} matches every pod in the namespace, not just the MCP pods. This is correct for defense-in-depth, but if other workloads share this namespace and don't have their own allow-policies, they'll lose all connectivity. Worth a comment in values.yaml noting this is namespace-wide.

4. Redis egress may need a namespaceSelector — networkpolicy.yaml:69-71, values.yaml:48-51

The Redis selector uses only podSelector with no namespaceSelector, meaning it only matches pods in the same namespace. If Redis is deployed to a different namespace (common with managed Redis or shared infra), this rule won't work. Consider making the namespace configurable:

# values.yaml
redisSelector:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: redis  # or same-namespace if co-located
    podSelector:
      matchLabels:
        app: redis

This may not be an issue if Redis is indeed co-located, but worth verifying.

Minor

5. Redundant automountServiceAccountToken: false — serviceaccount.yaml:5, deployment.yaml:30

Set at both the ServiceAccount level and the pod spec level. Not harmful (belt-and-suspenders), but one location is sufficient. The pod spec takes precedence.

6. ServiceAccount missing standard Helm labels — serviceaccount.yaml

The ServiceAccount doesn't include namespace, labels, or annotations that other templates in this chart might benefit from for identification and lifecycle management. This is a minor consistency issue.

What's Done Well

UID 10000 consistently pinned across Dockerfile and K8s spec — the previous review's critical issue is properly resolved
Gateway namespace is now configurable via values.yaml with a sensible gke-managed-system default and clear documentation comment
DNS egress properly scoped to kube-system/kube-dns (addresses previous review's suggestion Optional response model for Rank #4)
Container security is comprehensive: drop ALL caps, no privilege escalation, seccomp RuntimeDefault, read-only root filesystem
/tmp emptyDir with 100Mi sizeLimit is a nice detail for the read-only rootfs
Egress HTTPS rule correctly excludes RFC1918 + link-local ranges

Verdict: Address issue #1 (PYTHONDONTWRITEBYTECODE in runtime stage) before merge — the read-only filesystem will cause Python import failures without it. Issues #2-4 are lower priority but worth considering.

|

everyrow-mcp/deploy/chart/templates/networkpolicy.yaml

Multi-stage builds don't inherit ENVs. Without this, Python tries to write .pyc files into the read-only /app/.venv/ directory, causing PermissionError with readOnlyRootFilesystem: true. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Production and staging use Redis Sentinel (REDIS_SENTINEL_ENDPOINTS), which means Redis is cross-namespace or external — a podSelector in the same namespace would silently block all Redis traffic. Switch to ipBlock CIDRs (redisCIDRs in values.yaml). Defaults to 10.0.0.0/8 with a TODO to narrow once Sentinel IPs are confirmed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RafaelPo and others added 2 commits February 25, 2026 10:18

Remove audit report from PR

a478a1f

The infra fixes (NetworkPolicy, securityContext, ServiceAccount) stand on their own — the audit document is internal-only and does not belong in the repository. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry bot reviewed Feb 25, 2026

View reviewed changes

everyrow-mcp/deploy/chart/templates/networkpolicy.yaml Outdated Show resolved Hide resolved

RafaelPo and others added 2 commits February 25, 2026 10:37

RafaelPo merged commit 038ff76 into main Feb 25, 2026
5 checks passed

RafaelPo deleted the audit-network-policy branch February 25, 2026 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security audit: network policy & infrastructure#215

Security audit: network policy & infrastructure#215
RafaelPo merged 5 commits intomainfrom
audit-network-policy

RafaelPo commented Feb 25, 2026

Uh oh!

RafaelPo commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

RafaelPo commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RafaelPo commented Feb 25, 2026

Summary

Key Findings

Test plan

Uh oh!

RafaelPo commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Network Policy & Infrastructure Audit

Issues in Remediation Code

Minor Suggestions

What's Done Well

Uh oh!

RafaelPo commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Network Policy & Infrastructure Security

Issues

Minor

What's Done Well

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 25, 2026 •

edited

Loading

github-actions bot commented Feb 25, 2026 •

edited

Loading