Skip to content

fix: block LLM/search crawlers on branch deploy subdomains#2122

Open
desimone wants to merge 2 commits intomainfrom
bdd/eng-3768-block-llm-crawlers-branch-deploys
Open

fix: block LLM/search crawlers on branch deploy subdomains#2122
desimone wants to merge 2 commits intomainfrom
bdd/eng-3768-block-llm-crawlers-branch-deploys

Conversation

@desimone
Copy link
Copy Markdown
Contributor

@desimone desimone commented Mar 19, 2026

Summary

  • make docs robots.txt branch-aware instead of relying on a manual revert when the next release branch is cut
  • keep non-release branch deploys and previews blocked by default
  • automatically write the permissive robots.txt for numbered release branches like 0-33-0

Problem

*.docs.pomerium.com wildcard DNS routes branch deploys through Netlify, so branch builds are publicly fetchable unless we actively block crawlers.

The first draft of this change hard-coded static/robots.txt to Disallow: / on main, which would have required a manual undo when the next stable release branch is cut. That is exactly the kind of thing we forget.

Approach

This PR keeps the checked-in fallback safe (Disallow: /) and adds a small Docusaurus post-build plugin that writes the allow template only when the build is for a numbered release branch.

Behavior:

  • deploy-preview / branch-deploy on main or any other non-release branch: Disallow: /
  • numbered release branches like 0-33-0: write the normal permissive robots.txt
  • explicit escape hatch: POMERIUM_DOCS_ROBOTS_MODE=allow|disallow

This means we do not need to remember a revert when we cut 0-33-0; the release branch will emit the right file automatically.

Verification

Verified locally with full builds:

  • HEAD=main BRANCH=main CONTEXT=branch-deploy yarn build -> build/robots.txt is Disallow: /
  • HEAD=0-33-0 BRANCH=0-33-0 CONTEXT=branch-deploy yarn build -> build/robots.txt is the normal allow template

Fixes ENG-3768

@desimone desimone requested a review from a team as a code owner March 19, 2026 04:44
@desimone desimone requested review from nickytonline and removed request for a team March 19, 2026 04:44
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 19, 2026

Deploy Preview for pomerium-docs ready!

Name Link
🔨 Latest commit 5e69bc9
🔍 Latest deploy log https://app.netlify.com/projects/pomerium-docs/deploys/69bc417621893600074038cd
😎 Deploy Preview https://deploy-preview-2122--pomerium-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@desimone
Copy link
Copy Markdown
Contributor Author

Updated this to be branch-aware instead of a manual-revert hack.

  • non-release branch deploys / previews stay Disallow: /
  • numbered release branches like 0-33-0 automatically write the normal allow template
  • verified locally with both build modes

So when we cut 0-33-0, the release branch emits the right robots.txt without a follow-up cleanup commit.

@desimone desimone marked this pull request as draft March 19, 2026 16:52
Add robots.txt to main branch that disallows all crawlers. Branch
deploys (e.g., main.docs.pomerium.com) were returning 404 for
robots.txt, allowing LLM scrapers to freely index stale content.
Production (0-32-0 branch) is unaffected — it has its own permissive
robots.txt.

Fixes ENG-3768
@desimone desimone force-pushed the bdd/eng-3768-block-llm-crawlers-branch-deploys branch from 62879b0 to 5e69bc9 Compare March 19, 2026 18:33
@desimone desimone marked this pull request as ready for review March 19, 2026 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants