Skip to content

fix(gateway): security closeout — wildcard rejection, normalization, healthcheck#638

Merged
marcusrbrown merged 3 commits into
mainfrom
fix/gateway-security-closeout
May 17, 2026
Merged

fix(gateway): security closeout — wildcard rejection, normalization, healthcheck#638
marcusrbrown merged 3 commits into
mainfrom
fix/gateway-security-closeout

Conversation

@marcusrbrown
Copy link
Copy Markdown
Collaborator

Three security findings from the ce:review pass on PR #635 bundled into one focused follow-up. All deploy/-only, no TypeScript touched.

Changes

OBJECT_STORE_HOSTS validation (P1 + P2)

The env var that scopes mitmproxy's S3/R2 allowlist now refuses to start with bad config. Three classes of misconfiguration are blocked at module import time:

  • Wildcards (P1)OBJECT_STORE_HOSTS="*.s3.amazonaws.com" re-introduced the over-broad allowlist hole that PR feat(gateway): Unit 4 — gateway daemon skeleton + Docker Compose stack #635 fixed. Same blast radius, different surface. Now raises ValueError with an actionable message that names the entry and explains the security rationale.
  • PortsOBJECT_STORE_HOSTS="localhost:9000" never matched because mitmproxy's flow.request.host inclusion of port depends on version and proxy mode. Now raises with a clear bare-hostname-only message.
  • Invalid hostnames — non-RFC 1123 hostnames (leading/trailing dots, consecutive dots, invalid chars, oversized) raise with the entry name in the message.

Operator-friendly: surviving entries are lowercased so OBJECT_STORE_HOSTS="FOO.S3.AMAZONAWS.COM" works as expected.

Validation order: empty-skip → wildcard-reject → port-reject → hostname-validate → lowercase-normalize → append.

Gateway healthcheck (P2)

Replaced [CMD, node, -e, process.exit(0)] (unconditionally passing) with [CMD-SHELL, "test -s /etc/ssl/certs/mitmproxy-ca-cert.pem"]. If mitmproxy's cert volume becomes unreadable or zero-byte (corrupted write, mount issue), the gateway now transitions to unhealthy and Docker's restart policy can act. The deeper TCP probe option (nc against mitmproxy:8080) is deferred.

Tests

deploy/mitmproxy/test_allowlist.py: 20 tests → 29 tests. New coverage:

  • Wildcard apex + subdomain rejection (asserts 'Set exact bucket hostnames' guidance)
  • Invalid hostname rejection (asserts entry name in message)
  • Valid bucket host acceptance
  • Uppercase normalization
  • Port rejection
  • Hostname validator edge cases: leading dot, trailing dot, consecutive dots

Verification

  • Python: 29/29 pass (python3 deploy/mitmproxy/test_allowlist.py)
  • Gateway TS: 72/72 pass
  • Workspace: 1463+ pass
  • Lint, types, build: clean
  • docker compose -f deploy/compose.yaml config: validates

Residual

One P3 follow-up tracked locally: deciding whether to reject IP literals in OBJECT_STORE_HOSTS (currently accepted, which preserves the MinIO use case but lets an attacker with deploy-config access reach metadata-service IPs). Deferred to a future reliability batch — OBJECT_STORE_HOSTS is already inside the deploy-config trust boundary.

…healthcheck

3 ready todos from PR #635's ce:review bundled into one focused PR:
todo 016 (P1), todo 015 (P2), todo 010 (P2). All deploy/-only.

Todo 016 (P1) — Reject wildcards in OBJECT_STORE_HOSTS:
The env var that scopes mitmproxy's S3/R2 allowlist accepted wildcard
entries unchecked. An operator (or attacker with deploy-config access)
could set OBJECT_STORE_HOSTS="*.s3.amazonaws.com" and silently re-
introduce the same over-broad-allowlist hole that PR #635 fixed. Same
blast radius, different surface.

Added _is_valid_hostname() RFC 1123 helper. Validation now raises
ValueError at module import time (mitmproxy fails-fast on bad config)
when an entry:
- starts with "*." (wildcard) — message names the entry and explains
  the security rationale
- fails hostname validation (1-253 chars, allowed [a-z 0-9 . -],
  label rules)

Todo 015 (P2) — Normalize OBJECT_STORE_HOSTS entries:
Two real misconfigurations slipped through the previous parser:
1. Uppercase entries never matched because _is_allowed lowercases the
   request host but not the allowlist entries.
2. Entries with ports (e.g. "localhost:9000" for MinIO) never matched
   because flow.request.host inclusion of port depends on mitmproxy
   version and proxy mode — ambiguous.

Hybrid fix: normalize case (operator-friendly, just .lower() the
entry), but reject ports at startup (genuine ambiguity — operator
must know whether mitmproxy will see the port).

Validation order is now: empty-skip → wildcard-reject → port-reject
→ hostname-validate → lowercase-normalize → append.

Todo 010 (P2) — Gateway cert-readability healthcheck:
The Docker healthcheck was [CMD, node, -e, process.exit(0)] which
passes unconditionally. If mitmproxy's cert volume gets corrupted
or unmounted after gateway startup, the gateway has no signal — the
operator's only feedback is failed requests.

Replaced with [CMD-SHELL, "test -r /etc/ssl/certs/mitmproxy-ca-cert.pem"]
which verifies the CA cert is readable. If the cert disappears the
gateway transitions to unhealthy and Docker's restart policy can act.

The deeper TCP-probe option (nc against mitmproxy:8080) is deferred
to a future pass per the todo's scope_decision frontmatter.

Tests:
- deploy/mitmproxy/test_allowlist.py: +6 tests covering all
  validation paths (wildcard apex/subdomain, invalid hostname,
  valid bucket, uppercase normalization, port rejection)

Verification: 26/26 Python tests (was 20, +6), 72/72 gateway TS,
1463+ workspace, lint/types/build clean, compose config validates.
3 P2 findings from a 3-reviewer ce:review pass on the security closeout
(commit f61a463). All bounded, all in deploy/.

Changes:
- allowlist.py _is_valid_hostname: explicitly reject leading-dot,
  trailing-dot, and consecutive-dots BEFORE the per-label loop.
  Previously rstrip('.') was silently normalizing trailing dots and
  the empty-label check missed leading dots — `.example.com`,
  `example.com.`, and `example..com` all incorrectly returned True.
- compose.yaml gateway healthcheck: tightened test -r to test -s so
  a zero-byte CA cert file (mid-write corruption, mount issue) also
  triggers unhealthy state. Catches the case where the file exists
  but isn't a valid cert.
- test_allowlist.py: tightened error-message assertions on the
  wildcard and invalid-hostname rejection tests so they pin the
  operator-actionable guidance text ("Set exact bucket hostnames",
  the entry name), not just the type or a partial word. A future
  refactor that swallows the actionable guidance can't slip through.
- test_allowlist.py: +3 new tests for the hostname-validator edge
  cases (leading dot, trailing dot, consecutive dots).

Residual: todo 017 (P3, IP literal decision) captures the residual
risk that OBJECT_STORE_HOSTS accepts private/loopback/metadata-service
IPs. Deferred to PR B (reliability) since deploy-config is already a
trust boundary and the MinIO use case wants IP support.

Verification: 29/29 Python tests (was 26, +3), 72/72 gateway TS,
lint/types/build clean, compose config validates.
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: CONDITIONAL

All three security fixes are correctly implemented and the test suite passes cleanly (29/29). One blocking issue found.

Blocking issues

** accepts IP literals, and the port-reject error message misleads for IPv6**

is caught by the port-reject step (colons trigger it), so IPv6 literals are blocked — but the error message says "contains a port" when the real problem is that it's an IPv6 address. This is a cosmetic mislead for a security-relevant validator. More importantly, bare IPv4 literals (e.g. ) pass because digits and dots satisfy the RFC 1123 label check:

$ python3 -c "
import sys; from unittest.mock import MagicMock
sys.modules['mitmproxy']=MagicMock(); sys.modules['mitmproxy.http']=MagicMock()
import os, importlib.util
os.environ['OBJECT_STORE_HOSTS']='169.254.169.254'
spec=importlib.util.spec_from_file_location('al','deploy/mitmproxy/allowlist.py')
mod=importlib.util.module_from_spec(spec); spec.loader.exec_module(mod)
print('allowed:', mod._is_allowed('169.254.169.254'))
"

(AWS instance metadata) would load and be allowed. The PR description explicitly defers this as a P3 on the basis of deploy-config trust — that rationale is reasonable, but the deferred item should be tracked in code (a or issue reference) rather than just locally. Without it, the next reviewer won't know the decision was deliberate.

Required fix: Add an inline comment in at the point where is called (or in itself) that says IP literals are intentionally accepted — cite the MinIO use case and P3 tracking reference. This is a one-liner comment, not a code change.


Non-blocking concerns

  1. Healthcheck probe gap — checks that the cert file is non-empty, which is a solid improvement over the unconditional . However, it does not verify that the cert is actually parseable or that mitmproxy is listening on 8080. A zero-byte partial write followed by a complete write between health intervals would pass on the second check. Accepted risk at this scope, but worth a comment noting the deeper probe is deferred (the PR description mentions it — just not in the compose.yaml).

  2. Static ALLOWLIST wildcard entries remain — and in the static list use wildcard semantics. This is intentional and was in-scope before this PR, but the wildcard-reject guard only applies to . A future reader might wonder why wildcards are fine in the static list but rejected in the env var — a brief comment near the definition would pre-empt the question.

  3. Port-reject error message for IPv6 — is technically wrong for . Low impact since IPv6 literals are blocked (desired), but the message is confusing. Non-blocking because IPv4 literals are the real P3 concern.

Missing tests

  • No test covering a bare IPv4 literal in (e.g., or ). Given the explicit P3 deferral, adding a test that documents current behavior ("IPv4 literals are accepted") would make the intentionality explicit and catch an accidental future change.

Risk assessment: LOW

The code changes are deploy/-scoped, additive, and fail-closed. The wildcard-reject and port-reject paths are covered by tests. The healthcheck improvement is strictly better than the previous unconditional pass. The only untracked risk is the IP-literal acceptance, which the author has already reasoned about — it just needs to be visible in the code.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 25998851366
Cache hit
Session ses_1c8d69d7effebdWSGKQpMtcm0O

Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: CONDITIONAL

All three security fixes are correctly implemented and the test suite passes cleanly (29/29). One blocking issue found.

Blocking issues

_is_valid_hostname accepts IPv4 literals — intentional but undocumented in code

Bare IPv4 literals (e.g. 169.254.169.254) pass _is_valid_hostname because digits and dots satisfy the RFC 1123 label check. The AWS instance metadata IP would be accepted as a valid OBJECT_STORE_HOSTS entry. The PR description explicitly defers this as a P3 (MinIO use case, deploy-config trust boundary) — that rationale is sound — but the decision is tracked only locally, not in the code.

IPv6 literals (2001:db8::1) are incidentally blocked by the port-reject step (colons trigger it), but with a misleading error message ("contains a port").

Required fix: Add a one-line comment in allowlist.py near the _is_valid_hostname call in the OBJECT_STORE_HOSTS loop documenting the intentional acceptance of IPv4 literals, citing the MinIO use case and the P3 tracking decision. No code change needed — just a comment so the next reviewer knows it was deliberate.


Non-blocking concerns

  1. Healthcheck probe gaptest -s /etc/ssl/certs/mitmproxy-ca-cert.pem checks that the cert file is non-empty, which is a solid improvement over the unconditional node -e process.exit(0). It does not verify the cert is parseable or that mitmproxy is listening on 8080. Accepted risk; the PR description already notes the deeper nc probe is deferred. A comment in compose.yaml noting the deferral would prevent the next contributor from re-litigating it.

  2. Static ALLOWLIST wildcards vs. env-var wildcard rejection*.discord.com and *.discord.gg exist in the static list, but wildcard-reject only applies to OBJECT_STORE_HOSTS. This is intentional by design, but a brief comment near the ALLOWLIST definition explaining the asymmetry would pre-empt confusion for future contributors.

  3. Port-reject error message for IPv6 — The message "contains a port" is technically wrong for 2001:db8::1. Low impact since IPv6 literals being blocked is the desired outcome, but the message could confuse an operator who genuinely intended to enter an IPv6 address.

Missing tests

No test covering a bare IPv4 literal in OBJECT_STORE_HOSTS (e.g. 192.168.1.1 or 169.254.169.254). Given the explicit P3 deferral, a test that documents current behavior ("IPv4 literals are currently accepted") would make the intentionality explicit and catch an accidental future change to reject them.

Risk assessment (LOW)

The code changes are deploy/-scoped, additive, and fail-closed. The wildcard-reject and port-reject paths are covered by tests. The healthcheck improvement is strictly better than the previous unconditional pass. The only untracked risk is the IPv4-literal acceptance, which the author has already reasoned about — it just needs one comment to be visible in the code.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 25998851366
Cache hit
Session ses_1c8d69d7effebdWSGKQpMtcm0O

… clarity

1 blocking + 3 non-blocking items from Fro Bot's CHANGES_REQUESTED.
All documentation/clarity fixes; no behavior changes to validation logic
beyond a sharper IPv6 error message.

Blocking — IP literal acceptance is now visible in code:
The OBJECT_STORE_HOSTS validation accepts IPv4 literals because RFC 1123
labels permit pure digits. That's a deliberate choice (MinIO/self-hosted
object-store use case), and the residual risk (metadata-service IPs
169.254.169.254 reachable if deploy-config is compromised) is accepted
on deploy-config-trust grounds. Previously this rationale lived only in
the local todo 017 — invisible to anyone reading the code. Added an
inline comment at the validator call site that cites the MinIO case,
names the accepted risk, and points to the todo for the deferred
tightening decision.

Non-blocking 1 — Wildcard asymmetry comment:
The static ALLOWLIST permits *.discord.com / *.discord.gg / etc.
The OBJECT_STORE_HOSTS env-var path rejects ANY wildcard entry.
Future readers wondering about the inconsistency now see the explanation
inline: wildcards in code go through review; wildcards via env var would
let operators (or attackers with deploy-config access) silently
re-introduce over-broad allowlist holes.

Non-blocking 2 — IPv6 port-reject message clarity:
OBJECT_STORE_HOSTS=::1 was rejected by the port-reject branch because it
contains a colon, but the error message said "contains a port" — wrong
for IPv6. Split the port-reject step into two arms: if the entry looks
like IPv6 (multiple colons OR leading [), raise an IPv6-specific
message; otherwise the original port message. IPv6 is still rejected
(not yet supported) but the diagnostic is now accurate.

Non-blocking 3 — Test documenting current IPv4 acceptance:
Added test_object_store_hosts_accepts_ipv4_literal that captures the
deliberate choice to accept 10.0.0.5 etc. and references todo 017. An
accidental future refactor that rejects IPv4 will now break a test with
a clear docstring explaining why the behavior matters.

Also added test_object_store_hosts_rejects_ipv6_literal to confirm the
new branch fires and emits an IPv6-specific error message.

Verification: 31/31 Python tests (was 29, +2), 72/72 gateway TS,
lint/types/build clean, compose config validates.
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: PASS

All blocking issues from my previous review are fully resolved in commit daf9409. 31/31 tests pass.

What changed in the follow-up commit

Blocking issue — IPv4 literal acceptance now documented in code (allowlist.py:133-138): The 7-line comment names the MinIO use case, states the accepted risk (metadata-service IPs), and references the todo tracking the future decision. Exactly what was asked.

Non-blocking — IPv6 error message corrected (allowlist.py:120-125): The port-reject step now distinguishes between host:port (single colon) and IPv6 literals (multiple colons or [ prefix), with a dedicated error message. fe80::1, [::1]:443, and ::1 all raise with "IPv6" in the message; host:80 still says "contains a port".

Non-blocking — static ALLOWLIST wildcard asymmetry documented (allowlist.py:33-38): The comment at the top of ALLOWLIST now explains why wildcards are permitted in the static list (code review gate) but rejected in OBJECT_STORE_HOSTS (operator/attacker surface).

Missing test — IPv4 literal behavior documented (test_allowlist.py:368-377): test_object_store_hosts_accepts_ipv4_literal locks in current behavior with a docstring that cites todo 017.

Missing test — IPv6 literal rejection (test_allowlist.py:357-365): New test asserts "IPv6" appears in the message and "contains a port" does not.

Blocking issues

None.

Non-blocking concerns

None.

Missing tests

None.

Risk assessment (LOW)

No behavior changes since the previous review — only comments, a sharper error message for an already-rejected input class, and two new tests. Blast radius is unchanged.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 25999003235
Cache hit
Session ses_1c8d69d7effebdWSGKQpMtcm0O

@marcusrbrown marcusrbrown merged commit 7234d25 into main May 17, 2026
10 checks passed
@marcusrbrown marcusrbrown deleted the fix/gateway-security-closeout branch May 17, 2026 18:27
@fro-bot fro-bot mentioned this pull request May 18, 2026
46 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants