Compliance Analysis Platform - Automated legal document and fire safety compliance analysis. Improved version of hackathon-ai-juridique-atos, originally developed during the Hackathon AI & GenAI - Legal & Compliance @Atos.
- Document Processing: OCR extraction from PDF, Word, and PowerPoint via Mistral API
- Dual Domain Analysis:
- Contract Risk: Identification and categorization of contractual risks (Financial, Liability, Regulatory, etc.)
- Fire Safety: Inspection findings and regulatory compliance (ERP/IGH)
- AI Summaries: Generated risk/finding summaries and compliance analysis
- Export: Download results in JSON and Markdown formats
- Web Interface: Streamlit UI with pipeline stepper and domain switching
conformia/
├── main.py # Entry point (launches Streamlit)
├── pyproject.toml # Project config & dependencies
├── env.example # Template for API keys
├── backend/
│ ├── ocr.py # OCR via Mistral API
│ ├── baml_access.py # Accessors for BAML-generated client
│ └── baml/
│ ├── baml_src/ # BAML schemas (.baml files)
│ └── baml_client/ # Generated BAML Python client
├── config/
│ ├── settings.py # Env loading, constants
│ └── domains.py # Domain configs (Contract / Fire Safety)
├── data/
│ ├── ocr/ # OCR outputs (markdown)
│ ├── kcp/ # KCP checklist (contract compliance)
│ └── regulatory/ # ERP/IGH checklist (fire safety)
└── frontend/
├── app.py # Main Streamlit application
├── styles.py # Custom CSS theme
├── components/renders/
│ ├── document_renderer.py # Upload + preview + OCR trigger
│ ├── extraction_renderer.py # Structured extraction display
│ └── analysis_renderer.py # Summary + compliance analysis
└── utils/
├── display.py # UI helpers (PDF/markdown/JSON display)
└── utils.py # Session state, pipeline, save/load- Python 3.13
- Streamlit - Web UI
- BAML - Structured LLM extraction with generated client
- Mistral API - OCR processing
- OpenRouter - LLM models (DeepSeek Chat, GPT-4.1-mini, GPT-OSS-120b)
- UV - Dependency management
- Python 3.13
- UV (recommended)
- API keys: Mistral (OCR) + OpenRouter (LLM)
# Install dependencies
uv sync
# Configure environment
cp env.example .env
# Then edit .env with your API keys# Recommended
uv run python main.py
# With hot reload (development)
uv run python main.py -- --server.runOnSave=true
# Direct Streamlit
uv run streamlit run frontend/app.pyAfter modifying schemas in backend/baml/baml_src/:
cd backend/baml && baml generate- Open the Streamlit URL printed in the terminal
- Select a domain: Contract Risk or Fire Safety
- Upload a document (PDF, DOCX, PPTX)
- Follow the pipeline:
- Document: Preview and OCR result
- Extraction: Structured risks or findings
- Analysis: Summary and compliance checklist
- Download results (JSON/Markdown)
Environment variables (copy env.example to .env):
MISTRAL_API_KEY=... # Required for OCR
OPENROUTER_API_KEY=... # Required for LLM extraction/analysisLLM clients are configured in backend/baml/baml_src/clients.baml.
- 401 Unauthorized: Check that your API keys in
.envare valid and active - OCR errors: Verify file type and Mistral quota/connectivity
- BAML client mismatch: Regenerate after changing
baml_srcschemas (cd backend/baml && baml generate) - Large PDFs: Very large files may be slow; try smaller samples
This project is an improved and refactored version of hackathon-ai-juridique-atos, originally developed during the Hackathon AI & GenAI - Legal & Compliance @Atos.
Key improvements over the original:
- Added Fire Safety domain alongside Contract Risk analysis
- Restructured codebase with clean separation (backend/frontend/config)
- Migrated to BAML for structured LLM extraction
- Modern Streamlit UI with pipeline stepper and domain switching
- UV for fast, reproducible dependency management
By Joseph ASSOUMA