-
Notifications
You must be signed in to change notification settings - Fork 114
Description
name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration"
labels: enhancement
assignees: ''
📝 Description
The current implementation of POST /forms/fill performs LLM extraction synchronously using requests.post() within the FastAPI request lifecycle.
Execution chain:
forms.py → controller.fill_form() → llm.get_data() → requests.post()
Because FastAPI runs on an asyncio event loop (via Uvicorn), this synchronous HTTP call blocks the event loop thread for the entire duration of Ollama inference.
Operational impact:
- Each field extraction may take 2–5 seconds on CPU inference.
- With sequential
main_loop()(N fields), total blocking time scales linearly. - A 10-field form can hold the event loop for 20–50 seconds.
- Even
main_loop_batch()still blocks for the full duration of a single inference call.
This results in:
- Event loop starvation
- Inability to serve concurrent requests
- 30+ second client response latency
- No retry mechanism for missed fields
- No per-field confidence visibility
- No observable progress during extraction
- No fault tolerance for partial success scenarios
💡 Rationale
PR #151 (currently open) proposes schema enforcement using Ollama’s format parameter and dynamically generated Pydantic models.
While that meaningfully improves output structural reliability, it does not address:
- Transport-layer blocking
- Concurrency limitations
- Retry logic for null/missed fields
- Event loop starvation
- Client-side observability
- Job orchestration
In high-impact public-sector deployments, a system must not only return structured output but must also:
- Remain responsive under load
- Provide progressive feedback
- Handle partial failures gracefully
- Support operational monitoring
This issue proposes a transport and orchestration layer redesign to meet those requirements.
🛠️ Proposed Solution
1️⃣ Asynchronous Concurrent Extraction
- Replace
requestswithhttpx.AsyncClient - Introduce
async_extract_all_streaming()insrc/llm.py - Launch per-field extraction tasks via
asyncio.create_task() - Collect results using
asyncio.as_completed()
This ensures:
- Wall-clock time is bounded by the slowest field
- Partial results become available immediately
- Event loop remains free to serve other requests
If 9 fields resolve in 3 seconds and 1 resolves in 8 seconds:
- Client receives 9 results at second 3
- Final result at second 8
Prior implementations return nothing until second 8.
2️⃣ Two-Pass Auto-Retry Mechanism
After Pass 1 completes:
- Any field returning
Noneenters Pass 2. _build_targeted_prompt()constructs a focused single-field prompt.- Explicit instruction: return
-1if not found. - Retry tasks launched concurrently.
Confidence scoring:
| Confidence | Meaning |
|---|---|
| high | Extracted in Pass 1 |
| medium | Recovered in Pass 2 |
| low | Missing after both passes |
This provides deterministic field-level reliability reporting.
3️⃣ Non-Blocking PDF Generation
- Introduce
fill_form_with_data()infiller.py - Offload pdfrw operations to a
ThreadPoolExecutorvialoop.run_in_executor() - Prevent CPU-bound PDF writes from blocking event loop
Correctness improvement:
Nonevalues written as empty strings instead of literal"None"
4️⃣ New Client-Observable API Surfaces
POST /forms/fill/stream
- Returns
text/event-stream - Emits one Server-Sent Event per field as soon as it resolves
- Event payload includes:
- field
- value
- confidence
- phase
- Final
completeevent includes:- submission_id
- output_pdf_path
POST /forms/fill/async
- Returns 202 with
job_id - Full extraction pipeline runs as FastAPI
BackgroundTask
GET /forms/jobs/{job_id}
Returns:
- status (pending, running, complete, failed)
- partial_results
- field_confidence
- output_pdf_path
- error_message
5️⃣ Database Orchestration Layer
Introduce FillJob SQLModel table:
- UUID primary key
- template_id
- input_text
- status
- output_pdf_path
- partial_results (JSON)
- field_confidence (JSON)
- error_message
- created_at
Repository functions:
- create_job
- get_job
- update_job (**kwargs for partial updates)
This enables incremental persistence of extraction progress.
✅ Acceptance Criteria
- Event loop no longer blocked by extraction
- Concurrent per-field extraction implemented
- Two-pass retry operational
- Field-level confidence scoring implemented
- SSE streaming endpoint functional
- Async job queue endpoint functional
- FillJob model implemented with incremental updates
- Original
/forms/fillendpoint preserved - Comprehensive test coverage
📌 Additional Context
This redesign is transport- and orchestration-focused.
It is fully compatible with schema enforcement approaches such as PR #151 and does not conflict with model-level validation strategies.