Context
After #176, the fetch provider chain has four tiers:
| tier |
cost |
ships in local? |
doi |
free, pure Python (Crossref/Unpaywall) |
always on |
httpx |
free |
always on |
curl_cffi |
hard pip dep, libcurl-impersonate native binaries (~5-10MB), no container |
always on after uv sync |
flaresolverr (Byparr) |
headless Chromium container, ~1GB image, ~500MB-1GB RAM idle |
not yet wired |
The first three tiers handle the cell.com / Cloudflare case from the original bug report and the large majority of real-world fetches with zero user action. The chain self-degrades cleanly when Byparr is absent.
PR #176 stubs the flaresolverr provider but defers the actual Byparr container to a follow-up. This issue tracks how that follow-up should land so it doesn't make local execution painful.
Problem
The local stack already runs a lot:
- graph-db, write-db, redis, hatchet (+ its own postgres) = 5 infra containers
- api + 6 workers = 7 service processes
- frontend dev server
Auto-starting a headless Chromium container on top makes the "clone and just setup" path materially worse on a laptop, in exchange for unblocking the long tail of WAFs that even curl_cffi can't beat. That trade-off is right for the hosted/SaaS deployment but wrong for local-only users.
Proposal
Ship Byparr in docker-compose.yml under a Docker Compose profile, off by default.
-
Compose profile:
services:
byparr:
image: ghcr.io/thephaseless/byparr:latest
profiles: [\"heavy-fetch\"]
ports: [\"8191:8191\"]
Opt in via docker compose --profile heavy-fetch up -d.
-
just recipe next to just up / just up-all:
just up-heavy-fetch — start infra + Byparr
- Make it discoverable in
just --list
-
Docs section in README / CLAUDE.md under "Optional services":
- When to enable it (hitting non-academic WAFs that defeat
curl_cffi)
- Resource cost (~1GB image, ~500MB-1GB RAM idle)
- The single env var to flip:
FETCH_FLARESOLVERR_URL=http://byparr:8191/v1
-
Hosted deployment flips the profile on in its own compose override / k8s manifest, so prod gets the full chain without forcing it on local dev. Same shape as any production-only sidecar.
-
Audit-trail UX hint (optional): when every tier fails and flaresolverr is unavailable, the source detail page could link to the docs section explaining how to enable it. Cheap, high signal.
Non-goals
Acceptance criteria
Context
After #176, the fetch provider chain has four tiers:
doihttpxcurl_cffiuv syncflaresolverr(Byparr)The first three tiers handle the cell.com / Cloudflare case from the original bug report and the large majority of real-world fetches with zero user action. The chain self-degrades cleanly when Byparr is absent.
PR #176 stubs the
flaresolverrprovider but defers the actual Byparr container to a follow-up. This issue tracks how that follow-up should land so it doesn't make local execution painful.Problem
The local stack already runs a lot:
Auto-starting a headless Chromium container on top makes the "clone and
just setup" path materially worse on a laptop, in exchange for unblocking the long tail of WAFs that evencurl_cffican't beat. That trade-off is right for the hosted/SaaS deployment but wrong for local-only users.Proposal
Ship Byparr in
docker-compose.ymlunder a Docker Compose profile, off by default.Compose profile:
Opt in via
docker compose --profile heavy-fetch up -d.justrecipe next tojust up/just up-all:just up-heavy-fetch— start infra + Byparrjust --listDocs section in README / CLAUDE.md under "Optional services":
curl_cffi)FETCH_FLARESOLVERR_URL=http://byparr:8191/v1Hosted deployment flips the profile on in its own compose override / k8s manifest, so prod gets the full chain without forcing it on local dev. Same shape as any production-only sidecar.
Audit-trail UX hint (optional): when every tier fails and
flaresolverris unavailable, the source detail page could link to the docs section explaining how to enable it. Cheap, high signal.Non-goals
flaresolverrcleanly whenFETCH_FLARESOLVERR_URLis unset.Acceptance criteria
byparrservice added todocker-compose.ymlunderprofiles: [\"heavy-fetch\"]just up-heavy-fetchrecipe addedflaresolverrshows as unavailable in the audit UI; with the profile on, a known WAF-protected URL succeeds viaflaresolverr