Skip to content

Add LLM request timeout and retry with exponential backoff#441

Open
Arijit429 wants to merge 5 commits intofireform-core:mainfrom
Arijit429:feat/llm-timeout-retry-with-backoff
Open

Add LLM request timeout and retry with exponential backoff#441
Arijit429 wants to merge 5 commits intofireform-core:mainfrom
Arijit429:feat/llm-timeout-retry-with-backoff

Conversation

@Arijit429
Copy link
Copy Markdown

@Arijit429 Arijit429 commented Apr 16, 2026

Closes #228
Closes #277
Closes #13

Summary

Adds timeout, retry with exponential backoff, and structured logging to
the LLM extraction pipeline in src/llm.py.

Problem

The current requests.post() call to Ollama has:

  1. No timeout — hangs forever if Ollama is slow or unresponsive,
    blocking the server thread indefinitely
  2. No retry — a single transient failure (network hiccup, Ollama
    momentarily overloaded) kills the entire form fill permanently
  3. No structured logging — uses print() with no log levels or
    field context, making debugging impossible in production

Changes

src/llm.py

  • Added 120-second timeout to all Ollama requests
  • Added retry logic (3 attempts) with exponential backoff (2s → 4s → 8s)
  • Smart retry: retries timeouts/connection errors/5xx, does NOT retry 4xx
  • Extracted _call_ollama() method for clean separation and testability
  • Replaced all print() with logging.getLogger("fireform.llm")
  • Added per-field logging so extraction failures are traceable

Testing

  • Existing test suite passes without modification
  • Verified health check and error handling endpoints still work
  • Timeout triggers cleanly when Ollama is unreachable

Changes Summary

Change Why
120s request timeout Prevents infinite hangs when Ollama is unresponsive
3x retry with exponential backoff Handles transient LLM failures gracefully
Smart retry (skip 4xx) Client errors are permanent, retrying wastes time
_call_ollama() method Clean, testable, reusable LLM call logic
Structured logging Replaces print(), adds field names and log levels

Real-world impact

Ensures firefighter extraction requests never hang indefinitely when Ollama is
slow on the local machine — critical for reliable field use.

- Add HTTPException handler for consistent error shape across all routes
- Add RequestValidationError handler with human-readable error messages
- Add catch-all Exception handler to prevent stack trace leakage
- Fix duplicate get_template() call in forms.py (was querying DB twice)
- Wrap Controller errors in AppError for safe client-facing messages
- All errors now return uniform {success, error: {code, message}} envelope
…file

- Add GET /health liveness probe for Docker and container orchestration
- Migrate database init from module-level to FastAPI lifespan context manager
- Fix Dockerfile: start uvicorn server instead of tail -f /dev/null
- Fix Dockerfile: correct PYTHONPATH from /app/src to /app
- Add Docker HEALTHCHECK directive using /health endpoint
- Add EXPOSE 8000 for container port documentation
- Add FastAPI metadata (title, description, version) for API docs
- Enforce 20 MB max upload size (returns 413 if exceeded)
- Validate PDF magic bytes to reject non-PDF files renamed to .pdf
- Reject empty file uploads with clear 400 error
- Add matching client-side size and empty file checks for instant UX feedback
- Server-side validation is the security authority, client checks are UX only
- Add 120s timeout to prevent indefinite request hangs
- Add retry logic (3 attempts) with exponential backoff (2s, 4s, 8s)
- Retry on timeouts, connection errors, and 5xx server errors
- Do not retry on 4xx client errors (permanent failures)
- Extract _call_ollama() method for testability
- Replace print() statements with structured logging
- Add per-field logging for extraction debugging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant