Skip to content

saist1993/data-analyst-pr

Repository files navigation

AI Data Analyst with Multi-Agent System

A system of specialized AI agents that work together to analyze databases and create visualizations. Ask questions in natural language, get SQL-powered insights with charts.

Key Features

  • CLI & Web UI - Typer CLI + Streamlit interface
  • User Permissions - YAML-based access control to specific databases
  • Smart SQL - Automatic error correction with LLM (up to 3 retries)
  • Company Branding - Configurable chart styles, colors, and metadata injection
  • Automated Tests and simple eval - Evaluation framework with infra, qualitative and quantitative analysis.
  • Type-Safe Agents - Structured JSON outputs with Pydantic validation

Documentation

  • Extended README - Detailed CLI reference, configuration, troubleshooting
  • Architecture & Design - Implementation details, guardrails, design decisions. Alongside adding new database and its configuration
  • Deliverables - How assignment requirements are met
  • Limitations - Current constraints and future improvements

Quick Setup

# 1. Install dependencies
uv sync

# 2. Set API key
export OPENROUTER_API_KEY="your-key-here"
# Can also put it as .env. See .env.example for the structure.

# 3. Run tests
uv run pytest tests/ -v

# 4. Try a query
uv run python -m src.cli analyze "What are the top 5 products by revenue?" --database sales_demo

Basic Usage

# List available databases
uv run python -m src.cli list-databases

# Ask a question (auto-generates SQL, runs query, creates viz if appropriate)
uv run python -m src.cli analyze "Show me sales by category" --user alice --database sales_demo

# View database schema
uv run python -m src.cli schema sales_demo

# Create standalone visualization
uv run python -m src.cli visualize examples/sample_analysis_result.json

# Launch web interface
uv run streamlit run app.py

Agent Flow

Architecture Diagram

The system follows a deterministic 6-step pipeline:

1. Permission Check → Validate user can access database
2. Schema Retrieval → Get table/column information + company metadata
3. Clarity Check → Structured JSON output (ClarityResult: can we answer?)
4. Complexity Classification → Structured JSON output (ComplexityResult: simple vs complex)
5. Analysis → ReACT style CodeAgent which can generate and executes SQL (with auto-correction) + generates summary. Followed by guardrails
6. Visualization → CodeAgent creates charts if appropriate (with company style)

Evaluation Framework

The project includes automated testing:

Infrastructure Tests:

  • Permission system validation
  • Database operations
  • Visualization generation

Agent Evaluation:

  • Clarity Agent: 10 test cases → 100% accuracy
  • Complexity Agent: 14 test cases → 100% accuracy
  • SQL Generation: 20 test cases → Flexible/Strict matching

Run all tests:

# Quick tests (infrastructure only)
uv run pytest tests/ -v

# Full evaluation (includes LLM-based agent tests, ~30-60 min)
uv run python scripts/run_tests.py

# Specific agent evaluation
uv run python -m tests.evaluation.evaluate_clarity
uv run python -m tests.evaluation.evaluate_complexity

Future Direction

- Example-based prompting (few-shot). Current prompt has no examples, but in the futute one could fetch similar queries (or more invovled hydra like mechansim)
- Longer-term memory / personalization. Could be useful to build metadata around database

Credits

Built with smolagents, LiteLLM, SQLAlchemy, Matplotlib, Typer, Streamlit

About

AI Data Analyst with Multi-Agent System

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages