Add input validation with length limits, whitespace sanitization, and Pydantic V2 migration#447
Open
Arijit429 wants to merge 6 commits intofireform-core:mainfrom
Open
Conversation
- Add HTTPException handler for consistent error shape across all routes
- Add RequestValidationError handler with human-readable error messages
- Add catch-all Exception handler to prevent stack trace leakage
- Fix duplicate get_template() call in forms.py (was querying DB twice)
- Wrap Controller errors in AppError for safe client-facing messages
- All errors now return uniform {success, error: {code, message}} envelope
…file - Add GET /health liveness probe for Docker and container orchestration - Migrate database init from module-level to FastAPI lifespan context manager - Fix Dockerfile: start uvicorn server instead of tail -f /dev/null - Fix Dockerfile: correct PYTHONPATH from /app/src to /app - Add Docker HEALTHCHECK directive using /health endpoint - Add EXPOSE 8000 for container port documentation - Add FastAPI metadata (title, description, version) for API docs
- Enforce 20 MB max upload size (returns 413 if exceeded) - Validate PDF magic bytes to reject non-PDF files renamed to .pdf - Reject empty file uploads with clear 400 error - Add matching client-side size and empty file checks for instant UX feedback - Server-side validation is the security authority, client checks are UX only
- Add 120s timeout to prevent indefinite request hangs - Add retry logic (3 attempts) with exponential backoff (2s, 4s, 8s) - Retry on timeouts, connection errors, and 5xx server errors - Do not retry on 4xx client errors (permanent failures) - Extract _call_ollama() method for testability - Replace print() statements with structured logging - Add per-field logging for extraction debugging
…itization - Add min_length=1 and max_length=50000 to input_text field - Add whitespace-only rejection via field_validator - Auto-strip leading/trailing whitespace from input before LLM - Add template name validation (min 1, max 200 chars) - Add pdf_path minimum length validation - Fix deprecated class Config to model_config in both schema files - Prevents empty prompts and oversized payloads reaching LLM pipeline
Author
ContextThe |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #376
Closes #137
Summary
Adds input validation and sanitization to prevent empty, whitespace-only,
or oversized text from reaching the LLM pipeline, and migrates schemas
to Pydantic V2 model_config.
Problem
No input length limit —
input_textaccepts strings of any size.A 10MB string goes directly into the Ollama prompt payload, potentially
crashing the LLM or causing extreme timeouts.
No whitespace handling —
" "(all spaces) passes validationbut sends an effectively empty prompt to the LLM, returning garbage.
No template name validation — empty or whitespace-only template
names get saved to the database.
Deprecated Pydantic config —
class Configgenerates deprecationwarnings on every pytest run (visible in test output).
Changes
api/schemas/forms.pymin_length=1, max_length=50000toinput_textfield_validatorto reject whitespace-only input and auto-stripclass Config→model_configapi/schemas/templates.pymin_length=1, max_length=200to templatenamefield_validatorto reject whitespace-only names and auto-stripmin_length=1topdf_pathclass Config→model_configTesting
All validation paths and existing endpoints verified:
Changes Summary
api/schemas/forms.pyapi/schemas/forms.pyapi/schemas/forms.pyapi/schemas/templates.pyapi/schemas/templates.pyapi/schemas/templates.py