fix(aws-cf-reverse-proxy): make the duplicate-content UA marker optional#69
Merged
Merged
Conversation
The module unconditionally injects `User-Agent: <duplicate_content_penalty_secret>` on every CloudFront → origin request. The default value is "luthersystems", left over from a static-site SEO trick: when the same content was reachable through both the CF domain and the origin domain, origin servers detected the marker UA and responded with `X-Robots-Tag: noindex` so Google wouldn't penalise the duplicate. For API/JSON origins (A2A, MCP, gRPC) the trick is irrelevant and the override actively harms observability — origin server logs and downstream session metadata see "luthersystems" for every request regardless of who called. We discovered this while investigating an A2A endpoint where every prober session showed an identical UA; the bytes the prober sent were silently replaced at CF. Wrap the `custom_header` in a `dynamic` block so callers can opt out by passing `duplicate_content_penalty_secret = ""`. The default is preserved, so existing callers (static-site reverse proxies, metrics lambdas) get the same behaviour as before.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
aws-cf-reverse-proxyunconditionally injectsUser-Agent: <duplicate_content_penalty_secret>(default"luthersystems") on every CloudFront → origin request. This is a static-site-era SEO trick: when the same content was reachable through both the CF domain and the origin domain, origin servers detected the marker UA and responded withX-Robots-Tag: noindexso Google wouldn't penalise the duplicate.For API / JSON origins (our A2A, MCP, gRPC paths) the trick is irrelevant and actively harmful — every inbound peer's real UA is destroyed at CloudFront before reaching origin, breaking observability and spam attribution.
Discovered while investigating GCP IPs hitting
/insideout-a2a/*on prod. Every A2A session in the admin UI showedUser-Agent: luthersystemsbecause that's literally what CF was forwarding, regardless of the actual peer's UA. Pairs with luthersystems/reliable#1805 which adds the deeper probe attribution we needed once we realised UA was a dead end.What changes
Wrap the
custom_headerin adynamicblock keyed onvar.duplicate_content_penalty_secret != "":User-Agent: luthersystems— every existing caller of this module gets identical behaviour.duplicate_content_penalty_secret = ""to disable the injection for API origins (or anywhere the trick is unwanted).Docstring on the variable now explains both the SEO origin and when to disable.
Test plan
terraform fmt -check -recursive aws-cf-reverse-proxy/cleanaws-cf-reverse-proxy?ref=inui-infrastructure/platform-{prod,test}/app_domain.tfto the new tag AND setduplicate_content_penalty_secret = ""on the app domain module — that's the operational fix that restores real UA onapp.luthersystems.com-fronted A2A/MCP/gRPC traffic.Out of scope
ui-infrastructurebump (separate PR after this lands and we cut a tag)metrics-lambdaetc.) keep the default and aren't affected