Skip to content

Local execution: opt-in heavy fetch tier (Byparr/FlareSolverr) via Docker Compose profile #178

@charlie83Gs

Description

@charlie83Gs

Context

After #176, the fetch provider chain has four tiers:

tier cost ships in local?
doi free, pure Python (Crossref/Unpaywall) always on
httpx free always on
curl_cffi hard pip dep, libcurl-impersonate native binaries (~5-10MB), no container always on after uv sync
flaresolverr (Byparr) headless Chromium container, ~1GB image, ~500MB-1GB RAM idle not yet wired

The first three tiers handle the cell.com / Cloudflare case from the original bug report and the large majority of real-world fetches with zero user action. The chain self-degrades cleanly when Byparr is absent.

PR #176 stubs the flaresolverr provider but defers the actual Byparr container to a follow-up. This issue tracks how that follow-up should land so it doesn't make local execution painful.

Problem

The local stack already runs a lot:

  • graph-db, write-db, redis, hatchet (+ its own postgres) = 5 infra containers
  • api + 6 workers = 7 service processes
  • frontend dev server

Auto-starting a headless Chromium container on top makes the "clone and just setup" path materially worse on a laptop, in exchange for unblocking the long tail of WAFs that even curl_cffi can't beat. That trade-off is right for the hosted/SaaS deployment but wrong for local-only users.

Proposal

Ship Byparr in docker-compose.yml under a Docker Compose profile, off by default.

  1. Compose profile:

    services:
      byparr:
        image: ghcr.io/thephaseless/byparr:latest
        profiles: [\"heavy-fetch\"]
        ports: [\"8191:8191\"]

    Opt in via docker compose --profile heavy-fetch up -d.

  2. just recipe next to just up / just up-all:

    • just up-heavy-fetch — start infra + Byparr
    • Make it discoverable in just --list
  3. Docs section in README / CLAUDE.md under "Optional services":

    • When to enable it (hitting non-academic WAFs that defeat curl_cffi)
    • Resource cost (~1GB image, ~500MB-1GB RAM idle)
    • The single env var to flip: FETCH_FLARESOLVERR_URL=http://byparr:8191/v1
  4. Hosted deployment flips the profile on in its own compose override / k8s manifest, so prod gets the full chain without forcing it on local dev. Same shape as any production-only sidecar.

  5. Audit-trail UX hint (optional): when every tier fails and flaresolverr is unavailable, the source detail page could link to the docs section explaining how to enable it. Cheap, high signal.

Non-goals

Acceptance criteria

  • byparr service added to docker-compose.yml under profiles: [\"heavy-fetch\"]
  • just up-heavy-fetch recipe added
  • README / CLAUDE.md "Optional services" section documents the trade-off and the env var
  • k8s manifest (or hosted compose override) enables Byparr by default for the SaaS deployment
  • Smoke test: with the profile off, the chain works end-to-end and flaresolverr shows as unavailable in the audit UI; with the profile on, a known WAF-protected URL succeeds via flaresolverr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions