DataPilot is a modern Business Intelligence platform designed for small and medium-sized businesses (50-2000 employees). It combines the power of AI-driven natural language queries with traditional SQL exploration and visual dashboard building.
Built as a French sovereign alternative to tools like Looker, Metabase, and Omni Analytics, DataPilot prioritizes data sovereignty with OVH cloud hosting, multi-tenant isolation, and a CSV-first approach that eliminates the need for complex database connectors during early adoption.
The platform leverages Claude AI for text-to-SQL conversion, DuckDB for fast CSV/Parquet query execution, and a modern React frontend with drag-and-drop dashboard building capabilities.
Implemented:
- ✅ JWT Authentication with multi-tenant isolation (30min access + 7-day refresh)
- ✅ CSV Upload with automatic schema inference
- ✅ French CSV support (; delimiter + , decimal separator with auto-detection)
- ✅ SQL Explorer with CodeMirror 6 editor and TanStack Table results
- ✅ DuckDB-powered query execution on CSV/Parquet files
- ✅ Semantic model editor (visual ERD with ReactFlow for table joins)
- ✅ Dashboard builder with react-grid-layout (KPI, Bar, Line, Pie, Table widgets)
- ✅ SavedQuery CRUD (reusable queries)
Roadmap:
- 🔲 AI text-to-SQL with Claude API
- 🔲 PostgreSQL/MySQL database connectors
- 🔲 Spreadsheet view with visual aggregations
- 🔲 Global dashboard filters
- 🔲 Theme customization
Prerequisites:
- Docker & Docker Compose installed
Steps:
# 1. Clone the repository
git clone https://github.com/lucas-zubiarrain/DataPilot.git
cd DataPilot
# 2. Configure environment
cp .env.example .env
# Edit .env with your settings (JWT secret, database credentials, etc.)
# 3. Start all services
docker compose up -d
# 4. Run database migrations
docker compose exec backend alembic upgrade head
# 5. Access the platform
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000/docsDefault Admin Credentials (Development):
- Email:
admin@datapilot.fr - Password:
admin
cd backend
# Create virtual environment
python3.12 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run database migrations
alembic upgrade head
# Start development server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Run tests
pytest tests/ -vcd frontend
# Install dependencies
npm install
# Start development server
npm run dev
# Run tests
npm testgraph TB
Browser[Browser] --> Nginx[Nginx Reverse Proxy :80]
Nginx --> Frontend[Next.js Frontend :3000]
Nginx --> Backend[FastAPI Backend :8000]
Backend --> PostgreSQL[(PostgreSQL 16)]
Backend --> DuckDB[(DuckDB In-Process)]
DuckDB --> Storage[CSV/Parquet Files]
PostgreSQL -.stores.- Metadata[Metadata, Users, Dashboards]
DuckDB -.queries.- Storage
Flow:
- User uploads CSV files via frontend
- Backend converts CSV to Parquet (with French format detection)
- User builds semantic model (ERD) to define table relationships
- User explores data with SQL editor or builds dashboards
- (Future) User asks questions in natural language → Claude API → SQL → Results
DataPilot/
├── backend/
│ ├── app/
│ │ ├── core/ # Security, dependencies, settings
│ │ ├── models/ # SQLAlchemy ORM (User, DataSource, SemanticLayer, Dashboard, SavedQuery)
│ │ ├── schemas/ # Pydantic request/response validation
│ │ ├── routers/ # FastAPI endpoints (7 routers)
│ │ ├── services/ # Business logic (6 services)
│ │ └── migrations/ # Alembic database migrations
│ ├── tests/ # Pytest test suites (1469 lines)
│ ├── Dockerfile
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── app/ # Next.js App Router (10 pages)
│ │ ├── components/
│ │ │ ├── ui/ # shadcn/ui components (16 components)
│ │ │ └── features/ # Business components (Explorer, Dashboard, ERD)
│ │ ├── contexts/ # React contexts (AuthContext)
│ │ └── lib/ # API client, utilities
│ ├── Dockerfile
│ └── package.json
├── nginx/ # Reverse proxy configuration
├── docs/ # API documentation + technical specs
├── docker-compose.yml # Development environment
├── docker-compose.prod.yml # Production environment
├── CLAUDE.md # AI context and project documentation
└── .env.example # Environment variables template
| Component | Library | Why |
|---|---|---|
| Backend Framework | FastAPI | High performance, auto-generated OpenAPI docs, async support |
| ORM | SQLAlchemy 2.0 | Industry standard, excellent async support, type-safe |
| Migrations | Alembic | Standard for SQLAlchemy, version control for database schema |
| Database | PostgreSQL 16 | Robust, multi-tenant ready, JSON support |
| Query Engine | DuckDB | In-process analytical database, native CSV/Parquet support, cross-file JOINs |
| Frontend Framework | Next.js 15 | React-based, App Router, server components, excellent DX |
| UI Components | shadcn/ui | Accessible, customizable, Tailwind-based components |
| SQL Editor | CodeMirror 6 | Modern, lightweight (~400KB vs Monaco's 5MB), SQL highlighting |
| Data Tables | TanStack Table | Headless, MIT license, ~50KB, perfect for SQL results |
| ERD Editor | ReactFlow | React-native nodes/edges, ~180KB, visual relationship mapping |
| Dashboard Grid | react-grid-layout | Industry standard (used by Grafana), drag+resize |
| Charts | Recharts | 5 chart types (KPI, Bar, Line, Pie, Table), React-native |
| File Upload | react-dropzone | Multi-file support, lightweight, great UX |
| Authentication | python-jose + passlib | JWT tokens, bcrypt hashing, secure by default |
| AI Provider | Claude API | Best-in-class text-to-SQL capabilities |
| Infrastructure | Docker Compose | Easy local development, production-ready |
Base URL: /api/v1/
| Endpoint | Description | Documentation |
|---|---|---|
/auth |
Authentication (register, login, refresh, me) | docs/api/auth.md |
/data-sources |
CSV upload, list, preview, delete | docs/api/data-sources.md |
/workspaces |
Workspace CRUD | docs/api/workspaces.md |
/semantic-layers |
Semantic model (ERD) CRUD | docs/api/semantic-layers.md |
/queries |
Execute SQL, manage saved queries | docs/api/queries.md |
/dashboards |
Dashboard and widget CRUD | docs/api/dashboards.md |
/ai |
Text-to-SQL conversion (planned) | Coming soon |
Full API documentation available at http://localhost:8000/docs (Swagger UI) when running locally.
Backend:
- Framework: pytest
- Coverage: 1469 lines across 5 test suites
- Suites: auth, csv_upload, queries, semantic_layers, tenant_isolation
Frontend:
- Framework: Jest + React Testing Library
- Coverage: 15 tests (login, register, middleware)
Run tests:
# Backend
docker compose exec backend pytest tests/ -v
# Frontend
cd frontend && npm testWe welcome contributions! Please see CONTRIBUTING.md for guidelines on:
- Code of conduct
- Development workflow
- Pull request process
- Commit message conventions (Conventional Commits)
Quick tips:
- Always work on feature branches (never on
main) - Run linters before committing
- Write tests for new features
- Keep PRs focused and scoped
Please review our SECURITY.md for:
- Reporting security vulnerabilities
- Security best practices
- Multi-tenant isolation guarantees
Critical rule: All database queries MUST filter by tenant_id. No exceptions.
DataPilot is licensed under the Apache License 2.0. See LICENSE for details.
Copyright 2026 Lucas Zubiarrain.
DataPilot is built on the shoulders of giants:
- FastAPI - Modern Python web framework
- Next.js - React framework for production
- DuckDB - In-process analytical database
- shadcn/ui - Beautiful UI components
- ReactFlow - Visual node editor
- TanStack Table - Headless table library
- CodeMirror - Code editor component
- Recharts - Composable charting library