Skip to content

fix(aws-cf-reverse-proxy): make the duplicate-content UA marker optional#69

Merged
sam-at-luther merged 1 commit into
mainfrom
fix/cf-reverse-proxy-make-ua-marker-optional
May 27, 2026
Merged

fix(aws-cf-reverse-proxy): make the duplicate-content UA marker optional#69
sam-at-luther merged 1 commit into
mainfrom
fix/cf-reverse-proxy-make-ua-marker-optional

Conversation

@sam-at-luther
Copy link
Copy Markdown
Member

Summary

aws-cf-reverse-proxy unconditionally injects User-Agent: <duplicate_content_penalty_secret> (default "luthersystems") on every CloudFront → origin request. This is a static-site-era SEO trick: when the same content was reachable through both the CF domain and the origin domain, origin servers detected the marker UA and responded with X-Robots-Tag: noindex so Google wouldn't penalise the duplicate.

For API / JSON origins (our A2A, MCP, gRPC paths) the trick is irrelevant and actively harmful — every inbound peer's real UA is destroyed at CloudFront before reaching origin, breaking observability and spam attribution.

Discovered while investigating GCP IPs hitting /insideout-a2a/* on prod. Every A2A session in the admin UI showed User-Agent: luthersystems because that's literally what CF was forwarding, regardless of the actual peer's UA. Pairs with luthersystems/reliable#1805 which adds the deeper probe attribution we needed once we realised UA was a dead end.

What changes

Wrap the custom_header in a dynamic block keyed on var.duplicate_content_penalty_secret != "":

  • Default (preserved): still injects User-Agent: luthersystems — every existing caller of this module gets identical behaviour.
  • Opt-out: callers pass duplicate_content_penalty_secret = "" to disable the injection for API origins (or anywhere the trick is unwanted).

Docstring on the variable now explains both the SEO origin and when to disable.

Test plan

  • terraform fmt -check -recursive aws-cf-reverse-proxy/ clean
  • Reviewer: spot-check that the default-on path produces an identical TF plan against an existing prod / test app domain (no diff)
  • After merge + version tag: bump aws-cf-reverse-proxy?ref= in ui-infrastructure/platform-{prod,test}/app_domain.tf to the new tag AND set duplicate_content_penalty_secret = "" on the app domain module — that's the operational fix that restores real UA on app.luthersystems.com-fronted A2A/MCP/gRPC traffic.

Out of scope

  • ui-infrastructure bump (separate PR after this lands and we cut a tag)
  • The other ~10 callers of this module in our repo (metrics-lambda etc.) keep the default and aren't affected

The module unconditionally injects `User-Agent: <duplicate_content_penalty_secret>`
on every CloudFront → origin request. The default value is "luthersystems",
left over from a static-site SEO trick: when the same content was reachable
through both the CF domain and the origin domain, origin servers detected
the marker UA and responded with `X-Robots-Tag: noindex` so Google wouldn't
penalise the duplicate.

For API/JSON origins (A2A, MCP, gRPC) the trick is irrelevant and the
override actively harms observability — origin server logs and downstream
session metadata see "luthersystems" for every request regardless of who
called. We discovered this while investigating an A2A endpoint where
every prober session showed an identical UA; the bytes the prober sent
were silently replaced at CF.

Wrap the `custom_header` in a `dynamic` block so callers can opt out by
passing `duplicate_content_penalty_secret = ""`. The default is preserved,
so existing callers (static-site reverse proxies, metrics lambdas) get
the same behaviour as before.
@sam-at-luther sam-at-luther merged commit 65fbaaf into main May 27, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant