Skip to content

refactor: drop legacy single-file shape + provider-agnostic / fireflyframework-* docs sweep#12

Merged
ancongui merged 1 commit into
mainfrom
refactor/drop-legacy-single-file
May 15, 2026
Merged

refactor: drop legacy single-file shape + provider-agnostic / fireflyframework-* docs sweep#12
ancongui merged 1 commit into
mainfrom
refactor/drop-legacy-single-file

Conversation

@ancongui
Copy link
Copy Markdown
Contributor

Summary

  • Drop the legacy document single-file branch. The HTTP surface is now one shape only: documents is a non-empty list (single file = one-element list). Cleaned up across DTOs (ExtractionRequest, SubmitJobRequest, ExtractionResult, DocumentInfo, ExtractedDocument), the orchestrator's is_multi_file heuristic, the submit-job handler's schema_json shape (no more document_content_base64 keys), the async job worker, the bbox-refine worker, and the controller's size-limit check. Every unit + LLM test was updated to construct requests with documents=[DocumentInput(...)].
  • Provider-agnostic + framework-naming docs sweep. Bare pyflyfireflyframework-pyfly and bare agenticfireflyframework-agentic in prose across the docs (Python module paths like pyfly.eda.* stay as-is — that is the import name). genai-prices is now described as the multi-provider pricing source (Anthropic / OpenAI / Google / Mistral) and not "Anthropic tariffs". Anthropic prompt caching is explicitly marked Anthropic-specific (no-op on other providers).
  • Full step-by-step quickstart in the README: prerequisites → install → provider-agnostic env setup → db + migrations → API + worker boot → first sync extraction → first async multi-file job with a transformation + webhook → Docker stack alternative → test surface. The "what you get back" table also drops the stale "async is single-file only" caveat.

Test plan

  • task lint:check — ruff + ruff-format + pyright all clean (0 errors).
  • task test — 237 passed, 1 skipped, < 6 s. Every previously-touched suite (request validator, submit-job handler, bbox-refine worker, judge escalator, real-LLM extraction) was rewritten for the new shape.
  • CI: lint, typecheck, unit, docker-build all green on the PR.
  • Manual: render the new quickstart in GitHub's markdown preview and confirm anchors + code blocks display correctly.

…ramework-* docs sweep

The HTTP surface now exposes a single shape -- `documents` is always a
non-empty list (one entry = single file), removing the `document`
single-file branch from DTOs, the orchestrator, the submit/job worker,
the bbox-refine worker, and the controller's size-limit check. The
`schema_json` written by `SubmitJobHandler` is now a single shape too:
always `{ documents: [...] }`, no more `document_content_base64`
fallback. Every test was updated to construct requests with
`documents=[DocumentInput(...)]`.

Docs sweep alongside the refactor:

- Rename bare `pyfly` -> `fireflyframework-pyfly` and bare `agentic`
  -> `fireflyframework-agentic` in prose across README, overview,
  architecture, pipeline, deployment, cicd, prompts, troubleshooting,
  and api-reference. Python module paths (`pyfly.eda.*`) are kept
  since they are the actual import names.
- Make provider mentions provider-agnostic. `genai-prices` is now
  described as the multi-provider pricing source (Anthropic / OpenAI
  / Google / Mistral), not "Anthropic tariffs". Prompt caching is
  marked explicitly Anthropic-specific (no-op on other providers).
  `outbound_call` log-line target enumeration now lists all
  providers `fireflyframework-genai` can resolve.
- Rewrite the README quickstart end-to-end: prerequisites, install,
  provider-agnostic env setup, db + migrations, API + worker boot,
  first sync extraction, first async multi-file job with a
  transformation + webhook, docker stack alternative, and the test
  surface. Updates the "what you get back" section to drop the
  "single-file only on async" caveat.
@ancongui ancongui merged commit 6782cc2 into main May 15, 2026
4 checks passed
@ancongui ancongui deleted the refactor/drop-legacy-single-file branch May 15, 2026 23:09
ancongui added a commit that referenced this pull request May 31, 2026
…ramework-* docs sweep (#12)

The HTTP surface now exposes a single shape -- `documents` is always a
non-empty list (one entry = single file), removing the `document`
single-file branch from DTOs, the orchestrator, the submit/job worker,
the bbox-refine worker, and the controller's size-limit check. The
`schema_json` written by `SubmitJobHandler` is now a single shape too:
always `{ documents: [...] }`, no more `document_content_base64`
fallback. Every test was updated to construct requests with
`documents=[DocumentInput(...)]`.

Docs sweep alongside the refactor:

- Rename bare `pyfly` -> `fireflyframework-pyfly` and bare `agentic`
  -> `fireflyframework-agentic` in prose across README, overview,
  architecture, pipeline, deployment, cicd, prompts, troubleshooting,
  and api-reference. Python module paths (`pyfly.eda.*`) are kept
  since they are the actual import names.
- Make provider mentions provider-agnostic. `genai-prices` is now
  described as the multi-provider pricing source (Anthropic / OpenAI
  / Google / Mistral), not "Anthropic tariffs". Prompt caching is
  marked explicitly Anthropic-specific (no-op on other providers).
  `outbound_call` log-line target enumeration now lists all
  providers `fireflyframework-genai` can resolve.
- Rewrite the README quickstart end-to-end: prerequisites, install,
  provider-agnostic env setup, db + migrations, API + worker boot,
  first sync extraction, first async multi-file job with a
  transformation + webhook, docker stack alternative, and the test
  surface. Updates the "what you get back" section to drop the
  "single-file only on async" caveat.

Co-authored-by: ancongui <andres.contreras@soon.es>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant