FastAPI-based LLM operations platform with:
- Chat and evaluation APIs
- RAG + dataset management
- Batch evaluation orchestration
- Authz (role + scope) with capability-aware UI
- Admin operations (runtime config, maintenance, breaker, SLO, runbooks, audit/state)
- Optional vector backend swap (
in_memory/faiss) - Optional tracing backend integration (OTLP)
- Two web surfaces:
- Main console:
http://localhost:8000 - Targeted React surface:
http://localhost:8000/react
- Main console:
This project gives you one local service/UI to:
- Test prompts and RAG behavior quickly
- Upload datasets and run repeatable evaluations
- Track quality and operational health in one place
- Operate the service safely via admin controls
- Verify auth scopes/capabilities per API key
- Inference:
chat,evaluate,embeddings,rag - Data plane: dataset upload/list/get/delete/restore/purge
- Eval plane: batch queue, status, retries, events stream, artifact export
- Ops plane: metrics, readiness/health, maintenance mode, circuit breaker, runtime tuning profiles, SLO incidents, runbooks/templates
- Security: API key auth + role/scope enforcement + capability discovery endpoints
- Stretch features: vector backend status, tracing status/probe, agent tools endpoints, targeted React migration surface
- Python 3.11 (for local run)
- Docker Desktop (for container run)
- From repo root, create
.env:
Copy-Item .env.example .env- Build and run:
docker compose up -d --build- Open:
- Main UI:
http://localhost:8000 - React UI:
http://localhost:8000/react
- In UI auth box:
- API key:
dev-local-key - Click
Use Key
- Stop:
docker compose downpython -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000API_KEY=dev-local-key
API_KEY_ROLE=admin
API_KEYS_JSON={"dev-local-key":{"role":"admin","scopes":["*"]}}
MODEL_NAME=distilgpt2
STATE_PERSISTENCE_ENABLED=1
STATE_BACKEND=jsonCommon optional config:
STATE_BACKEND=sqliteVECTOR_INDEX_BACKEND=in_memory|faissTRACING_ENABLED=1TRACING_OTLP_ENDPOINT=http://<collector>:4318/v1/traces
Playground: run chat inference and inspect latency/tokens.RAG Explorer: query a chosendataset_idwith retrieval context.Dataset Manager: upload.jsonl/.csv/.txt, list datasets.Batch Evaluation: run async dataset evals and inspect result.Metrics Dashboard: summary + Prometheus text.Agent Tools: discover allowed tools and run tool-based plans.System Status: health/readiness/model/auth/access matrix/rag backend/tracing.Admin Ops(admin role): runtime config/profiles, maintenance, breaker, SLO incidents, runbooks, audit/state operations.
Set variables:
export BASE_URL="http://localhost:8000"
export API_KEY="dev-local-key"Chat:
curl -s -X POST "$BASE_URL/v1/chat" \
-H "x-api-key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"Explain transformers in 3 sentences.","max_new_tokens":120}'Upload dataset:
curl -s -X POST "$BASE_URL/v1/datasets/upload" \
-H "x-api-key: $API_KEY" \
-F "name=quick-rag" \
-F "type=rag_corpus" \
-F "file=@./quick_rag.jsonl"RAG query:
curl -s -X POST "$BASE_URL/v1/rag/query" \
-H "x-api-key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"dataset_id":"ds_xxxxxxxx","query":"How should API keys be rotated?","top_k":5}'Agent tools:
curl -s "$BASE_URL/v1/agent/tools" -H "x-api-key: $API_KEY"
curl -s -X POST "$BASE_URL/v1/agent/run" \
-H "x-api-key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"goal":"Investigate datasets and metrics","requested_tools":["datasets.list","metrics.dashboard"]}'Vector/tracing status:
curl -s "$BASE_URL/v1/rag/vector-backend" -H "x-api-key: $API_KEY"
curl -s "$BASE_URL/v1/tracing/status" -H "x-api-key: $API_KEY"-
Tabs/buttons disabled except Status:
- Check
GET /v1/auth/contextwith your key. - Ensure
.envhasAPI_KEYS_JSONor role/scopes that allow required capabilities. - Rebuild container after env changes:
docker compose up -d --build. - Hard refresh browser (
Ctrl+F5) after frontend changes.
- Check
-
docker compose downsays no configuration file:- Run command from repo root where
docker-compose.ymlexists.
- Run command from repo root where
-
Startup takes time:
- First run downloads model weights; wait for
Application startup complete.
- First run downloads model weights; wait for
Run tests:
$env:PYTHONPATH='.'
pytest -qExport OpenAPI:
python scripts/export_openapi.py --output docs/openapi.v1.json