Skip to content

Fetch Details agent (was: Scrape step) #106

@rafacm

Description

@rafacm

What to build

Migrate the Scrape pipeline step to a Pydantic AI agent as an SDK swap (single LLM call, no tools), establishing the agent shape so future PRs can grow it. Full rename ScrapeFetch Details (breaking change) across status enum, file names, env vars, configure wizard, telemetry, docs, and diagrams.

Detailed plan: doc/plans/2026-04-28-fetch-details-agent.md.

Delivered as one PR on feature/fetch-details-agent with two commits:

  1. Rename (status enum, files, env vars, configure wizard, recovery can_handle literal, docs, diagrams) + status data migration.
  2. Agent migration (introduce episodes/agents/_model.py and episodes/agents/fetch_details.py, refactor fetch_details_step.py to call the agent, delete get_scraping_provider).

Acceptance criteria

  • episodes/agents/fetch_details.py exposes EpisodeDetails, get_agent(), run(html) and imports only Pydantic AI / Pydantic / stdlib / agents/_model.py (no Django, no DBOS, no episodes.models).
  • episodes/agents/_model.py provides a pure build_model(model_string, api_key) helper; recovery agent refactored to use it.
  • scraper.pyfetch_details_step.py; agent.py/browser.py/deps.py/tools.py/resume.pyrecovery_*.py.
  • Status enum value scrapingfetching_details; data migration updates Episode.status and ProcessingStep.step_name rows.
  • Env vars: RAGTIME_SCRAPING_PROVIDER removed; RAGTIME_SCRAPING_MODELRAGTIME_FETCH_DETAILS_MODEL (Convention B, e.g. openai:gpt-4o-mini); RAGTIME_SCRAPING_API_KEYRAGTIME_FETCH_DETAILS_API_KEY. .env.sample and core/management/commands/configure.py updated.
  • STEP_FUNCTIONS[Episode.Status.FETCHING_DETAILS] points at episodes.fetch_details_step.fetch_episode_details; PIPELINE_STEPS updated; get_scraping_provider deleted from episodes/providers/factory.py.
  • Fast-path skip and empty-field-only merge behavior preserved in the orchestrator.
  • Recovery AgentStrategy.can_handle() literal flipped "scraping""fetching_details".
  • Excalidraw diagrams updated: ragtime-processing-pipeline, ragtime-processing-pipeline-with-recovery, ragtime-recovery (both .excalidraw and .svg).
  • Docs: README.md pipeline table, doc/README.md step descriptions, doc/features/2026-04-28-fetch-details-agent.md, planning + implementation session transcripts in doc/sessions/, CHANGELOG.md ## 2026-04-28 entry with **BREAKING** marker.

Out of scope (future work)

  • Giving the agent tools (browser use, etc.)
  • Migrating the Download step to its own agent
  • Deleting the recovery agent
  • Unit/eval tests for the agent
  • Renaming the Episode.scraped_html model field
  • Migrating summarize/extract/resolve/translate/embed steps

Blocked by

None - can start immediately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions