Autonomous BI Analyst is a full-stack prototype that turns natural-language questions into SQL, executes them safely against a PostgreSQL analytics database, and returns a human-readable answer. It combines:
- a LangGraph orchestration workflow (the agent),
- a Model Context Protocol (MCP) database tool server,
- PostgreSQL with seeded analytics data,
- and a Streamlit chatbot frontend.
bi-agent/
├── docker-compose.yml
├── backend/
│ ├── agent/ # LangGraph agent (FastAPI + graph)
│ │ ├── api/
│ │ │ ├── routes.py # POST /chat endpoint
│ │ │ └── schemas.py # Request / response Pydantic models
│ │ ├── config/
│ │ │ ├── prompts.py # Centralised prompt strings
│ │ │ └── state.py # AgentState TypedDict
│ │ ├── nodes/ # One function per graph node
│ │ │ ├── guardrail.py
│ │ │ ├── handle_tool_result.py
│ │ │ ├── query_database.py
│ │ │ ├── retry.py
│ │ │ ├── security_warning.py
│ │ │ └── summarize.py
│ │ ├── routers/ # LangGraph conditional edge functions
│ │ │ ├── guardrail.py
│ │ │ ├── query_database.py
│ │ │ └── tool_execution.py
│ │ ├── graph.py # Graph compilation
│ │ ├── app.py # FastAPI application factory
│ │ ├── Dockerfile
│ │ └── requirements.txt
│ ├── core/
│ │ ├── llm/ # Provider-agnostic LLM abstraction
│ │ │ ├── base.py # LLMEngine abstract base class
│ │ │ ├── factory.py # EngineFactory registry
│ │ │ └── engines/
│ │ │ ├── langchain_anthropic.py # Claude via LangChain
│ │ │ ├── langchain_azure_openai.py # Azure OpenAI via LangChain
│ │ │ └── langchain_openai.py # GPT via LangChain
│ │ ├── logger.py
│ │ └── schemas.py # Shared ToolResponse schema
│ ├── mcp/ # MCP tool server (streamable HTTP)
│ │ ├── tools/
│ │ │ ├── describe_table.py
│ │ │ ├── execute_readonly_query.py
│ │ │ └── list_tables.py
│ │ ├── client.py # Tool loader + cache
│ │ ├── server.py # FastMCP server entrypoint
│ │ ├── utils.py
│ │ ├── Dockerfile
│ │ └── requirements.txt
│ ├── postgres/
│ │ └── init/
│ │ ├── 001_schema.sql
│ │ ├── 002_seed_data.sql
│ │ └── 003_roles.sql
│ └── utils/
│ └── parsing.py
├── frontend/
│ └── chatbot/
│ ├── app.py
│ ├── Dockerfile
│ └── requirements.txt
└── tests/
├── integration/
│ └── llm/
│ ├── test_anthropic_engine.py # Live API tests for Anthropic engine
│ ├── test_azure_openai_engine.py # Live API tests for Azure OpenAI engine
│ └── test_openai_engine.py # Live API tests for OpenAI engine
└── unit/
├── agent/
│ └── test_graph_retry.py
└── mcp/
└── test_mcp_server.py
docker-compose.yml defines four services:
| Service | Port | Description |
|---|---|---|
postgres |
5432 | Analytics database |
mcp |
8001 | MCP tool server (internal network) |
agent |
8000 | LangGraph agent + FastAPI API |
frontend |
3000 | Streamlit chatbot |
# Start all services
docker compose up -d --build
# Stop
docker compose downThe LangGraph flow in backend/agent/graph.py:
guardrail— classifies whether the request is safe to process- route →
query_databaseorsecurity_warning query_database— generates SQL and calls the MCP tool- route →
execute_toolsor finish handle_tool_result— inspects tool output for Postgres errors- route →
retry(up to 3×) orsummarize summarize— turns query results into a natural-language answer
backend/core/llm/ provides a provider-agnostic abstraction over LLM calls used throughout the agent.
LLMEngine (in base.py) uses the Template Method pattern. Each engine subclass implements _build_llm() to construct its provider-specific BaseChatModel; the base class stores it and exposes two concrete async methods covering the two usage patterns in the agent:
# Structured classification (e.g. guardrail node)
result = await engine.generate_structured(MySchema, messages) # returns MySchema instance
# Tool-augmented generation (e.g. query_database node)
response = await engine.generate_with_tools(tools, messages) # returns BaseMessageThe active engine is selected at runtime via the LLM_ENGINE environment variable — no code changes required when switching providers.
EngineFactory (in factory.py) maps string keys to engine classes:
# Explicit engine
engine = EngineFactory.create("langchain-anthropic", temperature=0)
# Engine from LLM_ENGINE env var (used by all agent nodes)
engine = EngineFactory.create_default()| Key | Class | Default model / deployment |
|---|---|---|
langchain-openai |
LangChainOpenAIEngine |
OPENAI_MODEL env var / gpt-4o |
langchain-anthropic |
LangChainAnthropicEngine |
ANTHROPIC_MODEL env var / claude-sonnet-4-6 |
langchain-azure-openai |
LangChainAzureOpenAIEngine |
AZURE_OPENAI_DEPLOYMENT env var |
# Select the active engine (default: langchain-openai)
LLM_ENGINE=langchain-openai
# OpenAI
OPENAI_API_KEY=...
OPENAI_MODEL=gpt-4o # optional
# Anthropic
ANTHROPIC_API_KEY=...
ANTHROPIC_MODEL=claude-sonnet-4-6 # optional
# Azure OpenAI
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
AZURE_OPENAI_API_VERSION=2024-08-01-preview # optionalpython -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"This installs the full runtime and all dev dependencies (pytest, pytest-asyncio, python-dotenv) in one step.
Copy .env.example to .env and fill in your API keys (the file is gitignored):
cp .env.example .env# All tests
pytest
# Unit tests only
pytest tests/unit/
# Integration tests only (make real API calls — requires valid keys in .env)
pytest -m integration
# Skip integration tests
pytest -m "not integration"tests/
├── unit/
│ ├── agent/
│ │ ├── graph/ # Full graph wiring and routing
│ │ │ ├── test_happy_path.py # Tool succeeds on first attempt
│ │ │ ├── test_retry.py # Tool fails then recovers
│ │ │ └── test_security.py # Guardrail blocks unsafe request
│ │ └── nodes/ # Individual node input/output
│ │ ├── test_guardrail.py
│ │ ├── test_handle_tool_result.py
│ │ ├── test_query_database.py
│ │ ├── test_retry.py
│ │ ├── test_security_warning.py
│ │ └── test_summarize.py
│ └── mcp/
│ └── test_mcp_server.py
└── integration/
└── llm/ # Live API calls — requires valid keys
├── test_openai_engine.py
├── test_anthropic_engine.py
└── test_azure_openai_engine.py
Unit tests run fully offline with mocked LLM calls and mocked MCP tools. They are split into two levels:
- Node tests (
tests/unit/agent/nodes/) — call each node function directly and assert on the partial state dict it returns. Fast and focused on a single node's logic. - Graph tests (
tests/unit/agent/graph/) — compile and run the full LangGraph workflow. Verify that nodes are wired correctly, state is merged by the reducers, and conditional edges route as expected.
Integration tests (tests/integration/llm/) make real API calls to verify that each engine (langchain-openai, langchain-anthropic, langchain-azure-openai) can produce structured output. Each engine has an easy test (maths question with a definitive answer) and a difficult test (open-ended question that validates structure only).
This codebase is designed to remain understandable and extensible as it grows. The decisions below are deliberate applications of standard software design principles.
Every module has one reason to change. Graph nodes (nodes/) only transform agent state. Routers (routers/) only decide which edge to take. Prompt text lives exclusively in config/prompts.py. If the prompt for a node changes, only that file changes — nothing else.
EngineFactory uses a registry dict to map string keys to engine classes. Adding a new LLM provider (e.g. Google Gemini) means:
- Create
engines/langchain_google.pyimplementingLLMEngine. - Add one line to the registry in
factory.py.
No existing code is modified. Callers that already use EngineFactory.create(...) are unaffected.
Nodes and services depend on the LLMEngine abstraction, never on ChatOpenAI or ChatAnthropic directly. The concrete engine is injected at the call site via EngineFactory. This means the guardrail node, the summariser, and any future node can be tested with a mock engine that implements the same interface — no real API calls required.
LLMEngine exposes exactly two methods — generate_structured and generate_with_tools — covering the two distinct patterns the agent needs. Engines are not forced to implement streaming, fine-tuning configuration, or token counting — concerns that belong to other abstractions. Each layer only knows what it needs.
| Layer | Responsibility |
|---|---|
frontend |
User input / output rendering |
agent/api |
HTTP boundary — deserialise request, serialise response |
agent/nodes |
Business logic — state transitions |
core/llm |
LLM I/O — prompt rendering, structured output |
mcp |
Database I/O — tool definitions and execution |
postgres |
Persistence — schema and seed data |
Each layer depends only on the layer below it, never sideways or upward.
The MCP server (backend/mcp/) is deployed as a separate Docker service and communicates over streamable HTTP. The agent loads tools at startup via client.py and treats them as opaque LangChain tools. This means the set of database tools can evolve — or be replaced entirely with a different data source — without touching the agent graph.