Security hardening: SSRF, headers, Redis TLS, container lockdown by RafaelPo · Pull Request #208 · futuresearch/futuresearch-python

RafaelPo · 2026-02-24T19:49:19Z

Summary

Add transport-aware server instructions and SSRF protection (redirect validation, hostname re-check transport)
Security headers middleware, rate limiter fail-open with in-memory fallback, body size limits
Shell injection prevention via shlex.quote, Redis TLS support, container lockdown (non-root, read-only FS)
User isolation: per-user Redis key namespacing, token encryption at rest
Fix Docker build (--frozen removal), Redis healthcheck, and .dockerignore

Depends on PR #207 (unified input API).

Test plan

All 283 unit tests pass (26 new tests for security features)
SSRF protection blocks private IPs, metadata endpoints, unresolvable hosts
Rate limiter falls back to in-memory when Redis unavailable
Body size middleware rejects oversized uploads
Upload handler returns generic error messages (no info leaks)
Manual: Docker Compose build and healthcheck verification

🤖 Generated with Claude Code

everyrow-mcp/src/everyrow_mcp/utils.py

everyrow-mcp/src/everyrow_mcp/uploads.py

Instructions: - Add _INSTRUCTIONS_STDIO and _INSTRUCTIONS_HTTP to app.py - HTTP instructions guide agent to use request_upload_url for local files - server.py sets instructions based on transport mode Security & correctness (from parallel review): - SSRF protection: block internal IPs in URL fetching - __Host- cookie prefix for auth state cookie - Rate limiter: in-memory fallback when Redis unavailable - Upload endpoint: use caller's API token, limit CSV rows - Poll token via Authorization header (not just query param) - Progress URL no longer leaks poll token in URL Also: - Update upload_data docstring and error message for HTTP mode - Sync manifest.json description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ckdown - Fix shell injection in upload curl command via shlex.quote() - Add SecurityHeadersMiddleware (HSTS, X-Content-Type-Options, X-Frame-Options, Cache-Control, Referrer-Policy) on all HTTP responses - Add Redis TLS support (REDIS_SSL setting) - Stream URL fetch with size limit (max_fetch_size_bytes) to prevent OOM - Validate UPLOAD_SECRET at startup instead of first request - Warn on missing REDIS_PASSWORD in HTTP mode at startup - Enable rate limiting in --no-auth mode; cap in-memory fallback at 50K entries - Container hardening: cap_drop ALL, no-new-privileges, read-only rootfs, CPU limits, REDISCLI_AUTH for healthcheck, pinned Redis image, network isolation - Add --frozen to Dockerfile uv sync to prevent lockfile drift - Sanitize SSRF error (no longer leaks resolved IPs) - Add repr=False on sensitive config fields to prevent accidental logging - Add Vary: Origin, Access-Control-Max-Age on CORS preflight responses - Default to 127.0.0.1 in --no-auth mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ockerignore - Fix SSRF DNS-rebinding TOCTOU: add _SSRFSafeTransport that re-validates hostnames at request time; block GKE metadata hostname; cap max_redirects=5 - Add user-scoped data isolation: record task owner (JWT sub) on submission, check ownership in everyrow_results_http to prevent cross-user access - Encrypt tokens at rest in Redis using Fernet (derived from UPLOAD_SECRET): task tokens, poll tokens, auth codes, refresh tokens, upload metadata - Add root .dockerignore with deny-all allowlist to prevent secrets leaking into Docker build context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove --frozen from Dockerfile (incompatible with --no-sources) - Fix Redis healthcheck: pass REDIS_PASSWORD as env var, use -a flag - Remove cap_drop ALL, read_only, tmpfs from redis service (prevents Redis user switching) - Fix test_auth cookie prefix assertion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

everyrow-mcp/src/everyrow_mcp/middleware.py

everyrow-mcp/src/everyrow_mcp/uploads.py

RafaelPo · 2026-02-24T20:06:42Z

@claude code review

If the api_token field in upload metadata is missing or corrupted, decrypt_value raises InvalidToken (from Fernet) which would cause an unhandled 500. Catch the exception and fall through to the existing empty-token → 403 response. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

everyrow-mcp/src/everyrow_mcp/uploads.py

…n, healthcheck - Fix __Host-mcp_auth_state cookie: handle_start now sets the correct prefixed name, matching handle_callback and delete_cookie - Block IPv4-mapped IPv6 SSRF bypass (::ffff:127.0.0.1): unwrap mapped addresses before checking against blocked networks - Use HKDF instead of raw SHA-256 for Fernet key derivation - Use REDISCLI_AUTH env var instead of -a flag in Redis healthcheck to avoid exposing password in docker inspect output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pd.read_csv(nrows=N) silently drops rows beyond N, returning a 201 success that misleads users into thinking the full file was ingested. Now: parse the full CSV, then return 413 if it exceeds max_upload_rows. Also lower the default from 100k to 50k rows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The cross-user ownership guard on everyrow_results_http silently continued when verification failed (fail-open). Additionally, everyrow_progress and everyrow_cancel had no ownership checks at all, allowing any authenticated user to poll or cancel another user's tasks. - Extract shared _check_task_ownership helper (fail-closed: denies access when owner is missing or cannot be verified) - Add ownership guard to everyrow_progress and everyrow_cancel - Make store_task_owner in tool_helpers raise on missing auth so tasks are never created without an owner record - Update tests to mock auth context in HTTP mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…iption - Replace input_csv/left_csv/right_csv with inline data/left_data/right_data across all 7 integration tests (screen, rank, dedupe, merge, agent, single_agent x2) - Revert manifest.json upload_data description back to "URL or local file" All 7 integration tests pass against the live API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests the stdio upload flow: write CSV to disk, upload via everyrow_upload_data to get an artifact_id, then pass it to everyrow_screen. Verifies the artifact path label and that results match the inline-data test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add trusted_ip_header setting (default X-Forwarded-For) instead of hardcoded Cloudflare + XFF header checks in middleware - Move get_access_token import to module level in tools.py and tool_helpers.py; update mock patch paths in tests - Fix upload_data docstring to match manifest ("URL or local file") Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…or paths - TestArtifactReuseIntegration: upload once, reuse artifact_id in screen + rank - TestUrlUploadIntegration: upload CSV from public GitHub URL - TestCancelIntegration: cancel running task + cancel nonexistent task - TestErrorPathsIntegration: bad UUID, empty data, both inputs, no input, merge missing side Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Explicit Cloudflare header config so GKE deployments can override with X-Forwarded-For without risk of header spoofing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RafaelPo · 2026-02-24T20:59:36Z

@claude code review

github-actions · 2026-02-24T20:59:52Z

Claude finished @RafaelPo's task in 2m 17s —— View job

Code Review: Security Hardening PR

This is a well-structured security hardening PR. The defense-in-depth approach is solid, with multiple layers of protection across SSRF, authentication, encryption, and container security. I reviewed all 26 changed files; here is my analysis.

What looks good

SSRF Protection (utils.py) — The multi-layer approach is thorough: pre-flight DNS validation, redirect validation via httpx event hooks, and a custom _SSRFSafeTransport that re-validates hostnames at request time to narrow the TOCTOU window. IPv4-mapped IPv6 unwrapping (::ffff:127.0.0.1 → 127.0.0.1) is correctly handled. Streaming response size enforcement is a nice touch.

Rate Limiter (middleware.py) — Redis-primary with in-memory fallback is the right pattern. Thread-safe in-memory counters with periodic cleanup and a 50k hard cap prevent unbounded memory growth. The BodySizeLimitMiddleware is a clean pure-ASGI implementation that handles chunked transfers correctly.

Token Encryption (redis_store.py) — HKDF → Fernet for at-rest encryption of tokens in Redis is solid. The build_key sanitizer prevents Redis key injection. Sentinel support and TLS options are well-configured.

Auth (auth.py) — Atomic consume via GETDEL, client-ID mismatch re-storage, input length caps (256 chars), __Host- cookie prefix, rate limiting on auth endpoints, and JWKS lock for concurrent access are all well-implemented.

Upload System (uploads.py) — shlex.quote for shell injection prevention, hmac.compare_digest for timing-safe comparison, consume-on-use metadata, and generic error messages (no info leaks) are all correct.

Container Lockdown (Dockerfile/compose) — Non-root user, read-only filesystem, CAP_DROP ALL, no-new-privileges, resource limits, private Redis network, and 127.0.0.1 port binding are all best practices.

Test Coverage — 26 new tests covering SSRF blocking, rate limiter fallback, body size enforcement, upload validation, and generic error messages. Coverage is comprehensive.

Issues found

1. Incomplete cloud metadata hostname blocklist

utils.py:33-38 — _BLOCKED_HOSTNAMES only covers Google Cloud. While _BLOCKED_NETWORKS blocks 169.254.0.0/16 (covering metadata IPs for all providers), adding AWS and Azure metadata hostnames provides defense-in-depth against DNS edge cases:

_BLOCKED_HOSTNAMES = frozenset(
    {
        "metadata.google.internal",
        "metadata.google.internal.",
+       "metadata.aws.internal",
+       "metadata.aws.internal.",
    }
)

Severity: Low (IPs are already blocked by _BLOCKED_NETWORKS; this is defense-in-depth only).

Fix this →

2. `_decode_trusted_server_jwt` should restrict algorithms

auth.py:159-161 — This function disables signature verification (acceptable for trusted server-to-server tokens from Supabase) but hardcodes algorithms=["RS256"]. Since verification is off, PyJWT ignores the algorithms param anyway. Consider adding an "alg" claim check or a comment clarifying that algorithm confusion doesn't apply here because verification is disabled.

def _decode_trusted_server_jwt(token: str) -> dict[str, Any]:
    return pyjwt.decode(
        token, options={"verify_signature": False}, algorithms=["RS256"]
    )

Severity: Low (signature verification is intentionally off; algorithm confusion isn't a risk when you don't verify).

3. Misleading exception type in redirect validation

utils.py:157-160 — _check_redirect raises httpx.TooManyRedirects for SSRF blocks. While this works to abort the request, it could confuse debugging. A comment explaining the choice (or a dedicated exception) would improve clarity:

async def _check_redirect(response: httpx.Response) -> None:
    if response.is_redirect:
        location = response.headers.get("location", "")
        if location:
            try:
                _validate_url_target(location)
            except ValueError:
+               # Use TooManyRedirects to abort the redirect chain — httpx
+               # does not expose a more specific "redirect rejected" error.
                raise httpx.TooManyRedirects(
                    f"Redirect to blocked address: {location}",
                    request=response.request,
                )

Severity: Nit.

4. HKDF derivation uses `salt=None`

redis_store.py:61-66 — The HKDF key derivation uses salt=None. While the info parameter provides domain separation, HKDF's security properties are stronger with a salt. Consider using a static salt:

key = HKDF(
    algorithm=hashes.SHA256(),
    length=32,
-   salt=None,
+   salt=b"everyrow-mcp-fernet-v1",
    info=b"everyrow-mcp-fernet",
).derive(settings.upload_secret.encode())

Severity: Low (current approach is cryptographically acceptable for single-key derivation from a high-entropy secret, but a salt is recommended by RFC 5869).

5. Redis warning should fire for auth-mode HTTP only

http_config.py:78-82 — The Redis password warning fires for all HTTP mode (settings.is_http), but it's arguably only critical for authenticated deployments. In --no-auth dev mode, Redis on localhost without a password may be fine.

-   if settings.is_http and not settings.redis_password:
+   if settings.is_http and not no_auth and not settings.redis_password:

Severity: Nit (cosmetic; a warning in dev mode is noisy but harmless).

Summary

This is a strong security PR with no high-severity issues. The SSRF protection, middleware stack, encryption-at-rest, and container hardening are all well-implemented with proper test coverage. The issues noted above are low-severity improvements that can be addressed in follow-up if desired.
|

- Add comment explaining why TooManyRedirects is used to abort SSRF redirects (httpx has no dedicated redirect-rejected error) - Only warn about missing REDIS_PASSWORD in authenticated HTTP mode, not in --no-auth dev mode where localhost Redis without a password is expected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Some backend rows can take >6 minutes under load; 30 polls (~6.5 min) was too tight. 60 polls (~13 min) gives enough headroom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Base compose file is now environment-neutral (proxy headers off by default, no Cloudflare-specific error messages). Local override exposes Redis on localhost and sets MCP_SERVER_URL to http://localhost:8000. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

everyrow-mcp/src/everyrow_mcp/utils.py


    Raises:
-        ValueError: If the response cannot be parsed as CSV.
+        ValueError: If the response cannot be parsed as CSV, or URL targets a blocked network.
        httpx.HTTPStatusError: On non-2xx responses.
    """


_validate_hostname checked direct IP literals against _BLOCKED_NETWORKS without unwrapping IPv4-mapped IPv6 addresses. An attacker could bypass SSRF protection by using ::ffff:127.0.0.1 instead of 127.0.0.1, since Python's ipaddress silently returns False for IPv6-in-IPv4-network checks. The DNS-resolution path (_is_blocked_ip) already had the unwrap; this aligns the direct IP literal path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RafaelPo mentioned this pull request Feb 24, 2026

Unified input API: artifact_id + data replaces input_csv #199

Closed

7 tasks