Skip to content

jass228/conformia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConformIA

Compliance Analysis Platform - Automated legal document and fire safety compliance analysis. Improved version of hackathon-ai-juridique-atos, originally developed during the Hackathon AI & GenAI - Legal & Compliance @Atos.

Features

  • Document Processing: OCR extraction from PDF, Word, and PowerPoint via Mistral API
  • Dual Domain Analysis:
    • Contract Risk: Identification and categorization of contractual risks (Financial, Liability, Regulatory, etc.)
    • Fire Safety: Inspection findings and regulatory compliance (ERP/IGH)
  • AI Summaries: Generated risk/finding summaries and compliance analysis
  • Export: Download results in JSON and Markdown formats
  • Web Interface: Streamlit UI with pipeline stepper and domain switching

Architecture

conformia/
├── main.py                        # Entry point (launches Streamlit)
├── pyproject.toml                 # Project config & dependencies
├── env.example                   # Template for API keys
├── backend/
│   ├── ocr.py                     # OCR via Mistral API
│   ├── baml_access.py             # Accessors for BAML-generated client
│   └── baml/
│       ├── baml_src/              # BAML schemas (.baml files)
│       └── baml_client/           # Generated BAML Python client
├── config/
│   ├── settings.py                # Env loading, constants
│   └── domains.py                 # Domain configs (Contract / Fire Safety)
├── data/
│   ├── ocr/                       # OCR outputs (markdown)
│   ├── kcp/                       # KCP checklist (contract compliance)
│   └── regulatory/                # ERP/IGH checklist (fire safety)
└── frontend/
    ├── app.py                     # Main Streamlit application
    ├── styles.py                  # Custom CSS theme
    ├── components/renders/
    │   ├── document_renderer.py   # Upload + preview + OCR trigger
    │   ├── extraction_renderer.py # Structured extraction display
    │   └── analysis_renderer.py   # Summary + compliance analysis
    └── utils/
        ├── display.py             # UI helpers (PDF/markdown/JSON display)
        └── utils.py               # Session state, pipeline, save/load

Technologies

  • Python 3.13
  • Streamlit - Web UI
  • BAML - Structured LLM extraction with generated client
  • Mistral API - OCR processing
  • OpenRouter - LLM models (DeepSeek Chat, GPT-4.1-mini, GPT-OSS-120b)
  • UV - Dependency management

Quick Start

Prerequisites

  • Python 3.13
  • UV (recommended)
  • API keys: Mistral (OCR) + OpenRouter (LLM)

Installation

# Install dependencies
uv sync

# Configure environment
cp env.example .env
# Then edit .env with your API keys

Run

# Recommended
uv run python main.py

# With hot reload (development)
uv run python main.py -- --server.runOnSave=true

# Direct Streamlit
uv run streamlit run frontend/app.py

Regenerate BAML client

After modifying schemas in backend/baml/baml_src/:

cd backend/baml && baml generate

Usage

  1. Open the Streamlit URL printed in the terminal
  2. Select a domain: Contract Risk or Fire Safety
  3. Upload a document (PDF, DOCX, PPTX)
  4. Follow the pipeline:
    • Document: Preview and OCR result
    • Extraction: Structured risks or findings
    • Analysis: Summary and compliance checklist
  5. Download results (JSON/Markdown)

Configuration

Environment variables (copy env.example to .env):

MISTRAL_API_KEY=...       # Required for OCR
OPENROUTER_API_KEY=...    # Required for LLM extraction/analysis

LLM clients are configured in backend/baml/baml_src/clients.baml.

Troubleshooting

  • 401 Unauthorized: Check that your API keys in .env are valid and active
  • OCR errors: Verify file type and Mistral quota/connectivity
  • BAML client mismatch: Regenerate after changing baml_src schemas (cd backend/baml && baml generate)
  • Large PDFs: Very large files may be slow; try smaller samples

Origin

This project is an improved and refactored version of hackathon-ai-juridique-atos, originally developed during the Hackathon AI & GenAI - Legal & Compliance @Atos.

Key improvements over the original:

  • Added Fire Safety domain alongside Contract Risk analysis
  • Restructured codebase with clean separation (backend/frontend/config)
  • Migrated to BAML for structured LLM extraction
  • Modern Streamlit UI with pipeline stepper and domain switching
  • UV for fast, reproducible dependency management

By Joseph ASSOUMA

About

Automated legal document and fire safety compliance analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages