Skip to content

feat(enrichment): harden Open Graph browser mode against anti-bot pages#2724

Merged
Innei merged 1 commit into
masterfrom
feat/og-browser-hardening
May 15, 2026
Merged

feat(enrichment): harden Open Graph browser mode against anti-bot pages#2724
Innei merged 1 commit into
masterfrom
feat/og-browser-hardening

Conversation

@Innei
Copy link
Copy Markdown
Member

@Innei Innei commented May 14, 2026

Summary

  • Stop the screenshot pipeline from capturing Cloudflare/Akamai challenge pages and from caching HTTP 4xx/5xx bodies as if they were normal HTML.
  • Inject realistic UA, Accept-Language, and a stealth chromium arg on every agent-browser invocation.
  • Replace the blind wait 1500 with wait --load networkidle (10s cap), then parse the main document request via network requests --status 400-599 so failures throw before the row enters the cache.
  • Detect challenge pages by signature (title + first 8KB of html), retry once via reload, then throw a typed ChallengeBlockedError. Log challenge-error at info; reserve warn for unexpected fetch faults.

Spec: docs/superpowers/specs/2026-05-15-og-browser-hardening-design.md

Test plan

  • `pnpm -C apps/core run test test/src/modules/enrichment/` — 237 / 237 pass (40 directly exercise the rewrite)
  • `pnpm exec eslint` on changed files — clean
  • `pnpm exec prettier --check` on changed files — clean
  • Staging smoke against a known CF-protected URL (manual; not gated)

Stops the screenshot pipeline from capturing Cloudflare/Akamai challenge
pages and from caching HTTP 4xx/5xx bodies as if they were normal HTML.

Changes:
- Inject realistic UA, Accept-Language, and a single chromium stealth arg
  (--disable-blink-features=AutomationControlled) on every agent-browser
  invocation.
- Replace the blind 1.5s wait with `wait --load networkidle` (10s cap).
- Parse the main document request via `network requests --status 400-599`
  and throw before the row enters the cache.
- Detect challenge pages by signature (title + first 8KB of html); retry
  once via reload + networkidle, then throw a typed ChallengeBlockedError
  so the failure surfaces distinctly from generic fetch errors.
- Drop the log level for ChallengeBlockedError from warn to info; expected
  anti-bot signals should not flood on-call dashboards.

See docs/superpowers/specs/2026-05-15-og-browser-hardening-design.md for
the full design rationale and rollout notes.
@safedep
Copy link
Copy Markdown

safedep Bot commented May 14, 2026

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

View complete scan results →

This report is generated by SafeDep Github App

@Innei Innei merged commit 383e0e4 into master May 15, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant