Skip to content

Add input validation with length limits, whitespace sanitization, and Pydantic V2 migration#447

Open
Arijit429 wants to merge 6 commits intofireform-core:mainfrom
Arijit429:feat/input-text-validation-and-sanitization
Open

Add input validation with length limits, whitespace sanitization, and Pydantic V2 migration#447
Arijit429 wants to merge 6 commits intofireform-core:mainfrom
Arijit429:feat/input-text-validation-and-sanitization

Conversation

@Arijit429
Copy link
Copy Markdown

@Arijit429 Arijit429 commented Apr 16, 2026

Closes #376
Closes #137

Summary

Adds input validation and sanitization to prevent empty, whitespace-only,
or oversized text from reaching the LLM pipeline, and migrates schemas
to Pydantic V2 model_config.

Problem

  1. No input length limitinput_text accepts strings of any size.
    A 10MB string goes directly into the Ollama prompt payload, potentially
    crashing the LLM or causing extreme timeouts.

  2. No whitespace handling" " (all spaces) passes validation
    but sends an effectively empty prompt to the LLM, returning garbage.

  3. No template name validation — empty or whitespace-only template
    names get saved to the database.

  4. Deprecated Pydantic configclass Config generates deprecation
    warnings on every pytest run (visible in test output).

Changes

api/schemas/forms.py

  • Added min_length=1, max_length=50000 to input_text
  • Added field_validator to reject whitespace-only input and auto-strip
  • Migrated class Configmodel_config

api/schemas/templates.py

  • Added min_length=1, max_length=200 to template name
  • Added field_validator to reject whitespace-only names and auto-strip
  • Added min_length=1 to pdf_path
  • Migrated class Configmodel_config

Testing

All validation paths and existing endpoints verified:

image
  • Empty string → rejected with min_length validation
  • Whitespace-only → rejected with field_validator
  • Health check → still returns healthy
  • Existing tests → all pass

Changes Summary

File Change Why
api/schemas/forms.py input_text length limits Prevents oversized LLM payloads
api/schemas/forms.py Whitespace validator Prevents empty prompts to LLM
api/schemas/forms.py model_config migration Fixes Pydantic V2 deprecation
api/schemas/templates.py Template name validation Prevents empty names in DB
api/schemas/templates.py pdf_path min_length Prevents empty paths to Controller
api/schemas/templates.py model_config migration Fixes Pydantic V2 deprecation

- Add HTTPException handler for consistent error shape across all routes
- Add RequestValidationError handler with human-readable error messages
- Add catch-all Exception handler to prevent stack trace leakage
- Fix duplicate get_template() call in forms.py (was querying DB twice)
- Wrap Controller errors in AppError for safe client-facing messages
- All errors now return uniform {success, error: {code, message}} envelope
…file

- Add GET /health liveness probe for Docker and container orchestration
- Migrate database init from module-level to FastAPI lifespan context manager
- Fix Dockerfile: start uvicorn server instead of tail -f /dev/null
- Fix Dockerfile: correct PYTHONPATH from /app/src to /app
- Add Docker HEALTHCHECK directive using /health endpoint
- Add EXPOSE 8000 for container port documentation
- Add FastAPI metadata (title, description, version) for API docs
- Enforce 20 MB max upload size (returns 413 if exceeded)
- Validate PDF magic bytes to reject non-PDF files renamed to .pdf
- Reject empty file uploads with clear 400 error
- Add matching client-side size and empty file checks for instant UX feedback
- Server-side validation is the security authority, client checks are UX only
- Add 120s timeout to prevent indefinite request hangs
- Add retry logic (3 attempts) with exponential backoff (2s, 4s, 8s)
- Retry on timeouts, connection errors, and 5xx server errors
- Do not retry on 4xx client errors (permanent failures)
- Extract _call_ollama() method for testability
- Replace print() statements with structured logging
- Add per-field logging for extraction debugging
…itization

- Add min_length=1 and max_length=50000 to input_text field
- Add whitespace-only rejection via field_validator
- Auto-strip leading/trailing whitespace from input before LLM
- Add template name validation (min 1, max 200 chars)
- Add pdf_path minimum length validation
- Fix deprecated class Config to model_config in both schema files
- Prevents empty prompts and oversized payloads reaching LLM pipeline
@Arijit429
Copy link
Copy Markdown
Author

Context

The input_text field had no validation — any string (empty, whitespace-only, or megabytes long) went directly into the LLM prompt. Added length limits, whitespace rejection, and auto-stripping. Also migrated both schema files from deprecated class Config to Pydantic V2 model_config, eliminating the deprecation warnings visible in pytest output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate input_text in fill_form endpoint [BUG]: Pydantic V2 Deprecation Warning

1 participant