100% Local · Zero-Cost · Enterprise Multi-Agent RAG System Powered by Ollama (llama3) · ChromaDB · LangGraph · FastAPI · React + Tailwind
- Overview
- System Architecture
- Project Structure
- Prerequisites & Installation
- Database Schema
- Running the Application
- Full API Reference
- LangGraph Multi-Agent Pipeline
- Frontend Features
- User Roles & Permissions
- Configuration & Environment
- Utility Scripts
- Troubleshooting
SDF AI Copilot is a fully self-hosted, enterprise-grade AI assistant that lets employees query internal company policy documents through a conversational interface. It runs entirely on your local machine — no cloud, no paid API keys, no data leaving your network.
| Feature | Detail |
|---|---|
| LLM | Ollama running llama3 locally |
| Embeddings | HuggingFace all-MiniLM-L6-v2 (CPU, downloaded once) |
| Vector Store | ChromaDB (persistent local directory) |
| PDF Parsing | PyMuPDF (fast, no dependencies on cloud OCR) |
| Orchestration | LangGraph StateGraph — 5-node multi-agent pipeline |
| Audit Trail | MSSQL (SQL Server Express) |
| API | FastAPI with SSE streaming responses |
| Frontend | React 19 + Vite + TailwindCSS |
| Cost | $0.00 — 100% local |
┌─────────────────────────────────────────────────────────┐
│ React Frontend │
│ (Vite + TailwindCSS · Port 5173) │
│ │
│ ┌─────────────┐ ┌──────────────────────────────┐ │
│ │ Login View │ │ Chat View (SSE Streaming) │ │
│ └─────────────┘ │ • History Sidebar │ │
│ ┌──────────────┐ │ • Library (Document Links) │ │
│ │Admin Dashboard│ │ • Agent Pipeline Progress │ │
│ │• Upload PDF │ └──────────────────────────────┘ │
│ │• Manage Docs │ │
│ │• Manage Users │ │
│ │• Action Logs │ │
│ └──────────────┘ │
└────────────────────────┬────────────────────────────────┘
│ HTTP / SSE (axios + fetch)
▼
┌─────────────────────────────────────────────────────────┐
│ FastAPI Backend (Port 8000) │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ LangGraph Pipeline │ │
│ │ │ │
│ │ [Researcher] ──► [Compliance] ──► [Communicator] │ │
│ │ │ │ │
│ │ [Reviewer] │ │
│ │ / \ │ │
│ │ (PASS) (FAIL) │ │
│ │ │ [Rewrite]│ │
│ │ │ │ │ │
│ │ [Audit Node] ◄───┘ │ │
│ └───────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ ChromaDB │ │ Ollama LLM │ │ MSSQL │ │
│ │ (chroma_db/) │ │ (llama3) │ │ Audit DB │ │
│ └───────────────┘ └──────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────┘
User sends query
│
▼
FastAPI /chat/stream
│
├─ Checks QueryCache (MSSQL) for exact match → returns instantly if found
│
└─ Runs LangGraph pipeline:
1. RESEARCHER → embeds query, retrieves top-3 chunks from ChromaDB
2. COMPLIANCE → Ollama analyzes if context is sufficient
3. COMMUNICATOR → Ollama drafts professional response with citations
4. REVIEWER → Ollama verifies no hallucinations (JSON verdict)
├─ PASS → AUDIT node (saves to MSSQL, caches result)
└─ FAIL → rewrite (up to 2 retries) → AUDIT node
5. AUDIT → Saves EmployeeID, Query, Response to AuditTrail
│
└─ Streams SSE events to frontend (agent progress + final response)
Local_Agent/
│
├── backend/ # Python FastAPI backend
│ ├── main.py # All routes, LangGraph pipeline, agents
│ ├── requirements.txt # Python dependencies
│ ├── setup_db.sql # MSSQL schema creation script
│ ├── reset_kb.py # Utility: wipe ChromaDB + SQL data
│ ├── .env.example # Environment variable template
│ ├── chroma_db/ # ChromaDB persistent vector store (auto-created)
│ └── uploads/ # Uploaded PDF files (auto-created)
│
├── frontend/ # React + Vite frontend
│ ├── index.html # HTML entry point
│ ├── package.json # Node dependencies
│ ├── vite.config.js # Vite configuration
│ ├── tailwind.config.js # Tailwind custom theme (brand + dark colors)
│ ├── postcss.config.js # PostCSS setup
│ └── src/
│ ├── main.jsx # React entry point
│ ├── App.jsx # Root component (Login, routing, theme)
│ ├── App.css # Minimal app-level styles
│ ├── index.css # Global styles, Tailwind directives, animations
│ ├── api.js # Axios API client (all endpoints)
│ └── components/
│ ├── Sidebar.jsx # Left navigation sidebar
│ ├── ChatView.jsx # Chat interface with SSE streaming
│ ├── AdminDashboard.jsx # Document management & user admin
│ └── MessageBubble.jsx # Individual message component
│
└── README.md # This file
| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Windows 10/11 | Windows 11 |
| RAM | 8 GB | 16 GB |
| Disk | 10 GB free | 20 GB free |
| Python | 3.11 | 3.11 or 3.12 |
| Node.js | 18 LTS | 20 LTS |
| SQL Server | Express 2019 | Express 2022 |
| ODBC Driver | 17 | 17 or 18 |
Ollama runs the LLM locally. You only need to do this once.
Step 1 – Download and install Ollama:
https://ollama.com/download/windows
Step 2 – Pull the llama3 model (run in any terminal):
ollama pull llama3This downloads ~4.7 GB. Wait for it to finish.
Step 3 – Verify Ollama is running:
ollama list
# Should show: llama3 latest ...Ollama starts automatically as a background service on Windows after installation. The API runs on http://localhost:11434.
Step 1 – Install SQL Server Express (if not already installed):
https://www.microsoft.com/en-us/sql-server/sql-server-downloads
Choose Express edition. During install, note your instance name (default: SQLEXPRESS).
Step 2 – Install SQL Server Management Studio (SSMS) (optional but recommended):
https://aka.ms/ssmsfullsetup
Step 3 – Create the database and all tables by running the setup script.
Option A — using SSMS:
- Open SSMS, connect to
localhost\SQLEXPRESS - File → Open →
backend\setup_db.sql - Click Execute (F5)
Option B — using sqlcmd in PowerShell:
sqlcmd -S localhost\SQLEXPRESS -i backend\setup_db.sqlWhat the script creates:
| Table | Purpose |
|---|---|
Accounts |
User/admin accounts with roles |
AuditTrail |
All employee queries + AI responses |
DocumentLogs |
Upload/delete/rename actions by admins |
KnowledgeDocuments |
Tracked uploaded documents with date ranges |
QueryCache |
Caches exact-match responses to speed repeat queries |
A default master admin is created automatically:
- Username:
master_admin- Password:
admin123⚠️ Change this password immediately after first login.
Step 4 – Install ODBC Driver 17 for SQL Server (if not installed):
https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server
Step 5 – Enable Named Pipes and TCP/IP (if connection fails):
- Open SQL Server Configuration Manager
- SQL Server Network Configuration → Protocols for SQLEXPRESS
- Enable Named Pipes and TCP/IP
- Restart the SQL Server service
Step 1 – Create and activate a virtual environment:
cd Local_Agent\backend
python -m venv venv
.\venv\Scripts\activateYour prompt should show
(venv)at the start.
Step 2 – Install all dependencies:
pip install -r requirements.txtThe first run downloads ~1.5 GB (HuggingFace model + all packages). Subsequent starts are fast.
Step 3 – (Optional) Create a .env file:
copy .env.example .envEdit .env if you need to change the Ollama model, database name, or paths.
Step 4 – Verify the backend starts:
python main.pyYou should see:
INFO | SDF AI Copilot backend starting up…
INFO | Loading HuggingFace embeddings model (all-MiniLM-L6-v2)…
INFO | Initialising ChromaDB at ...\chroma_db…
INFO | Uvicorn running on http://127.0.0.1:8000
Step 1 – Install Node dependencies:
cd Local_Agent\frontend
npm installStep 2 – Start the development server:
npm run devThe app opens at http://localhost:5173.
Stores all user credentials and role assignments.
CREATE TABLE Accounts (
ID INT IDENTITY(1,1) PRIMARY KEY,
Username NVARCHAR(100) NOT NULL UNIQUE,
Password NVARCHAR(200) NOT NULL,
Role NVARCHAR(50) NOT NULL, -- 'master' | 'admin' | 'subadmin' | 'user'
Name NVARCHAR(150) NULL,
EmpNum NVARCHAR(100) NULL,
Designation NVARCHAR(150) NULL,
Department NVARCHAR(150) NULL,
CreatedAt DATETIME DEFAULT GETDATE()
);Every query and AI response is logged here for compliance.
CREATE TABLE AuditTrail (
ID INT IDENTITY(1,1) PRIMARY KEY,
EmployeeID NVARCHAR(100) NOT NULL,
SessionID NVARCHAR(100) NULL, -- Groups messages into sessions
QueryText NVARCHAR(MAX) NOT NULL,
AIResponse NVARCHAR(MAX) NOT NULL,
IsSaved BIT DEFAULT 0, -- User opted to save session
IsPinned BIT DEFAULT 0,
SessionTitle NVARCHAR(255) NULL, -- Custom session name
CreatedAt DATETIME DEFAULT GETDATE()
);Tracks every admin action on documents.
CREATE TABLE DocumentLogs (
ID INT IDENTITY(1,1) PRIMARY KEY,
AdminID NVARCHAR(100) NULL,
Action NVARCHAR(50) NOT NULL, -- 'UPLOAD' | 'DELETE' | 'RENAME' | 'MODIFY'
Filename NVARCHAR(255) NOT NULL,
ChunksCount INT NOT NULL,
CreatedAt DATETIME DEFAULT GETDATE()
);Tracks documents with optional validity date ranges.
CREATE TABLE KnowledgeDocuments (
ID INT IDENTITY(1,1) PRIMARY KEY,
Filename NVARCHAR(255) NOT NULL,
StartDate NVARCHAR(100) NULL, -- Policy effective from
ExpireDate NVARCHAR(100) NULL, -- Policy expires on
AdminID NVARCHAR(100) NULL,
CreatedAt DATETIME DEFAULT GETDATE()
);Exact-match cache to avoid re-running the expensive LLM pipeline.
CREATE TABLE QueryCache (
ID INT IDENTITY(1,1) PRIMARY KEY,
QueryText NVARCHAR(MAX) NOT NULL, -- Stored lowercase
AIResponse NVARCHAR(MAX) NOT NULL,
Accuracy NVARCHAR(50) NOT NULL,
CreatedAt DATETIME DEFAULT GETDATE()
);Note: The cache is automatically cleared (
TRUNCATE TABLE QueryCache) whenever a document is uploaded, deleted, or renamed.
Always start both servers at the same time in two separate terminals.
cd Local_Agent\backend
.\venv\Scripts\activate
python main.pycd Local_Agent\frontend
npm run devOpen your browser and navigate to:
http://localhost:5173
Log in with the default master admin credentials:
- Username:
master_admin - Password:
admin123
Base URL: http://127.0.0.1:8000
Interactive API docs (Swagger UI): http://127.0.0.1:8000/docs
Returns backend health status.
Response:
{ "status": "ok" }Authenticates a user and returns their profile.
Request Body:
{
"username": "master_admin",
"password": "admin123"
}Response (200 OK):
{
"username": "master_admin",
"role": "master",
"name": "Master Admin",
"emp_num": "SYS-0000"
}Response (401 Unauthorized):
{ "detail": "Invalid username or password." }Runs the full LangGraph multi-agent pipeline and streams the response as Server-Sent Events (SSE).
Request Body:
{
"query": "What is the leave encashment policy?",
"employee_id": "EMP-001",
"session_id": "1712345678901",
"save_chat": false
}| Field | Type | Required | Description |
|---|---|---|---|
query |
string | ✅ | The employee's question |
employee_id |
string | ✅ | Used for audit logging |
session_id |
string | ❌ | Groups messages into sessions |
save_chat |
boolean | ❌ | If true, session is persisted in history |
SSE Stream Events:
Each event is a data: line with a JSON payload:
data: {"agent": "Researcher", "status": "processing", "response": "", "accuracy_score": "", "hallucination_check": ""}
data: {"agent": "Compliance", "status": "processing", "response": "[COMPLIANCE NOTE]\nContext is sufficient...", "accuracy_score": "", "hallucination_check": ""}
data: {"agent": "Communicator", "status": "processing", "response": "Based on company policy...", "accuracy_score": "", "hallucination_check": ""}
data: {"agent": "Reviewer", "status": "processing", "response": "Based on company policy...", "accuracy_score": "92%", "hallucination_check": "pass"}
data: {"agent": "Done", "status": "processing", "response": "Based on company policy...", "accuracy_score": "92%", "hallucination_check": "pass"}
data: [DONE]
Cache Hit Response (when an identical query was previously answered):
data: {"agent": "CacheHit", "status": "done", "response": "...", "accuracy_score": "92%", "hallucination_check": "pass"}
data: [DONE]
Returns the saved chat history for a given employee.
Example: GET /history/EMP-001
Response:
[
{
"id": 1,
"session_id": "1712345678901",
"query": "What is the leave policy?",
"response": "Based on the HR Policy document...",
"created_at": "2026-04-09T09:00:00",
"is_saved": true,
"is_pinned": false,
"session_title": null
}
]Retroactively marks all messages in a session as saved (persisted to history).
Example: PUT /history/save/1712345678901
Response:
{ "message": "Session saved successfully." }Renames a saved session with a custom title.
Request Body:
{ "title": "My Leave Policy Questions" }Response:
{ "message": "Session renamed successfully." }Uploads a PDF, chunks it, embeds it into ChromaDB, and records it in the database.
Request: multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
file |
file | ✅ | PDF file to upload |
admin_id |
string | ❌ | Who performed the upload (default: "System") |
start_date |
string | ❌ | Policy effective date (YYYY-MM-DD) |
expire_date |
string | ❌ | Policy expiry date (YYYY-MM-DD) |
Response (200 OK):
{
"message": "PDF ingested successfully.",
"filename": "HR_Policy_2026.pdf",
"chunks_added": 47
}Chunking Details:
- Chunk size: 800 characters
- Chunk overlap: 100 characters
- Each chunk is prefixed with:
[Source: filename.pdf, Page: X, Paragraph/Chunk: N] - Chunks are stored in ChromaDB with IDs like
HR_Policy_2026.pdf_0,HR_Policy_2026.pdf_1, etc. - QueryCache is cleared after every upload
Lists all tracked knowledge documents.
Response:
[
{
"id": 1,
"filename": "HR_Policy_2026.pdf",
"start_date": "2026-01-01",
"expire_date": "2026-12-31",
"admin_id": "master_admin",
"created_at": "2026-04-09T09:00:00"
}
]Deletes all ChromaDB chunks for the given filename, removes it from the database, and deletes the local file copy.
Example: DELETE /admin/document/HR_Policy_2026.pdf
Response:
{
"message": "Deleted 47 chunks and removed document.",
"filename": "HR_Policy_2026.pdf"
}Renames a document in the database, updates ChromaDB metadata, and renames the local file.
Request Body:
{
"old_filename": "OLD_HR_Policy.pdf",
"new_filename": "HR_Policy_2026.pdf",
"admin_id": "master_admin"
}Response:
{ "message": "Document renamed successfully." }Updates the start/expiry dates for a tracked document.
Request Body:
{
"filename": "HR_Policy_2026.pdf",
"start_date": "2026-01-01",
"expire_date": "2026-12-31",
"admin_id": "master_admin"
}Response:
{ "message": "Document dates updated." }Returns all embedded chunks with preview text and metadata.
Response:
{
"total": 47,
"chunks": [
{
"id": "HR_Policy_2026.pdf_0",
"text": "[Source: HR_Policy_2026.pdf, Page: 1, Paragraph/Chunk: 1]\nThis HR policy covers...",
"metadata": {
"source_filename": "HR_Policy_2026.pdf",
"page": 0
}
}
]
}Returns the document action history (upload, delete, rename, modify).
Response:
[
{
"id": 1,
"action": "UPLOAD",
"filename": "HR_Policy_2026.pdf",
"chunks_count": 47,
"created_at": "2026-04-09T09:00:00",
"admin_id": "master_admin"
}
]Returns the total chunk count in ChromaDB.
Response:
{ "total_chunks": 47 }Returns all non-master accounts.
Response:
[
{
"id": 2,
"username": "john_doe",
"role": "user",
"name": "John Doe",
"emp_num": "EMP-001",
"designation": "Software Engineer",
"department": "IT",
"created_at": "2026-04-09T09:00:00"
}
]Creates a new user or admin account.
Request Body:
{
"username": "john_doe",
"password": "secure_pass_123",
"role": "user",
"name": "John Doe",
"emp_num": "EMP-001",
"designation": "Software Engineer",
"department": "IT"
}Valid roles: user, subadmin, admin
Response (201):
{ "message": "user added successfully." }Updates an existing account's role, name, employee number, or password.
Example: PUT /admin/account/john_doe
Request Body:
{
"role": "subadmin",
"name": "John Doe",
"emp_num": "EMP-001",
"password": ""
}Leave
passwordblank or empty to keep the existing password.
Response:
{ "message": "Account updated successfully." }Deletes an account. Cannot delete master_admin.
Example: DELETE /admin/account/john_doe
Response:
{ "message": "Account deleted successfully." }Serves the original PDF file inline in the browser.
Example: GET /download/HR_Policy_2026.pdf
Returns the PDF with Content-Disposition: inline.
The pipeline is defined as a StateGraph in main.py. Each node is a pure Python function that receives the full state and returns an updated state.
class AgentState(TypedDict):
query: str # The employee's question
employee_id: str # For audit logging
session_id: str # Groups messages into sessions
save_chat: bool # Should this session be persisted?
retrieved_chunks: List[str] # Retrieved document chunks from ChromaDB
draft_response: str # Current working response
final_response: str # The final response sent to the user
hallucination_check: str # 'pass' or 'fail'
accuracy_score: str # E.g., '92%' from Reviewer
rewrite_count: int # How many times Communicator has rewritten (max 2)
current_agent: str # Used for SSE streaming (which agent is active)| Node | Role | Model Used |
|---|---|---|
| Researcher | Embeds the query using all-MiniLM-L6-v2 and retrieves the top-3 most similar chunks from ChromaDB |
None (vector search only) |
| Compliance | Analyzes whether the retrieved context is ethically and legally sufficient to answer the query | Ollama llama3 |
| Communicator | Drafts a professional response citing only the retrieved document context. Refuses to answer if info is not in context. | Ollama llama3 |
| Reviewer | Compares the draft response against the context. Returns a JSON verdict {"verdict": "PASS", "accuracy": "92%"} |
Ollama llama3 |
| Audit | Saves the final Q&A pair to MSSQL AuditTrail and QueryCache |
None (DB write only) |
researcher ──► compliance ──► communicator ──► reviewer
│
┌──────────── PASS ────────────┘
│
audit ──► END
│
└──────────── FAIL ──► increment_rewrite ──► communicator
(max 2 retries)
MAX_REWRITES = 2— The Communicator gets at most 2 retries if the Reviewer rejects its response.- After 2 failed rewrites, the pipeline proceeds to audit with whatever the last draft was.
- The
accuracy_scoreandhallucination_checkare sent back to the frontend via SSE.
You MUST answer exclusively from the provided DOCUMENT CONTEXT.
If the answer is not clearly contained within the context (including general
knowledge, greetings, maths, or day-to-day questions), YOU MUST explicitly
state: 'I am not authorized to answer this question as it is outside the
scope of internal company policies.'
Do NOT fabricate, hallucinate, or use outside knowledge.
Only if you found the answer strictly inside the context, you MUST include
citations in the format: [Source: filename.pdf, Page: X]
- Username / password form
- Backend health check on load
- Error handling with visible messages
| Feature | Description |
|---|---|
| SSE Streaming | Response streams token-by-token; no waiting for full response |
| Agent Pipeline Progress Bar | Shows which node is currently active (Researcher → Compliance → Communicator → Reviewer → Done) |
| Chat History Sidebar | Left panel showing saved sessions, searchable |
| Restore Past Sessions | Click any saved session to reload its messages |
| Rename Sessions | Inline edit to give sessions a custom title |
| Library Tab | Lists all uploaded PDFs with download links |
| Save Chat Toggle | Toggle to opt-in to saving the session to history |
| Edit & Resend | Edit a previous message and the pipeline reruns from that point |
| New Chat | Starts a fresh session with a new session ID |
| Typing Indicator | Shows animated dots while the pipeline is running |
| Accuracy Badge | Shows 92% accuracy score from the Reviewer node |
| Hallucination Badge | Shows ✅ PASS or ❌ FAIL indicator from Reviewer |
| Section | Feature |
|---|---|
| Upload PDF | Drag-and-drop or click-to-browse. Shows upload progress bar. Accepts start/expiry dates. |
| Document Action Logs | Lists all UPLOAD, DELETE, RENAME, MODIFY actions with timestamps and chunk counts. Exportable to PDF using jsPDF. |
| Knowledge Documents | Expandable document list with inline editing (rename, date change), chunk browser, and delete. |
| User Management | Add/edit/delete user accounts. Assign roles: user, subadmin, admin. |
- Dark mode by default (deep navy/indigo palette)
- Light mode toggle (CSS invert technique)
- Glassmorphism UI cards
- Smooth animated transitions (Tailwind animations)
- Inter font via Google Fonts
| Permission | user |
subadmin |
admin |
master |
|---|---|---|---|---|
| Access Chat View | ✅ | ✅ | ✅ | ✅ |
| View Document Library | ✅ | ✅ | ✅ | ✅ |
| Access Admin Dashboard | ❌ | ✅ | ✅ | ✅ |
| Upload Documents | ❌ | ✅ | ✅ | ✅ |
| Delete / Rename Documents | ❌ | ✅ | ✅ | ✅ |
| View Action Logs | ❌ | ✅ | ✅ | ✅ |
| Manage Users (create/delete) | ❌ | ❌ | ✅ | ✅ |
| Create Admin accounts | ❌ | ❌ | ❌ | ✅ |
Delete master_admin |
❌ | ❌ | ❌ | ❌ (protected) |
Copy .env.example to .env and adjust:
# Ollama LLM settings
OLLAMA_MODEL=llama3
OLLAMA_BASE_URL=http://localhost:11434
# HuggingFace Embeddings (downloaded automatically on first run)
EMBED_MODEL=all-MiniLM-L6-v2
# ChromaDB local persistence directory (relative to main.py)
CHROMA_PERSIST_DIR=./chroma_db
# MSSQL connection (Windows Authentication)
MSSQL_SERVER=localhost\SQLEXPRESS
MSSQL_DATABASE=SDF_Copilot
MSSQL_USER=
MSSQL_PASSWORD=
# PDF upload directory
UPLOAD_DIR=./uploads
# Maximum hallucination rewrites before forcing audit output
MAX_REWRITES=2
# RAG – number of chunks to retrieve per query
RAG_TOP_K=3Note: The environment variables in
.envare currently informational. The values are hard-coded inmain.pyfor simplicity. To fully enable.envloading, addpython-dotenvloading at the top ofmain.py:from dotenv import load_dotenv load_dotenv()
cd backend
.\venv\Scripts\activate
python reset_kb.pyWhat it does:
- Deletes the entire
chroma_db/folder (all embedded vectors) - Deletes all files in
uploads/(local PDF copies) - Clears these SQL tables:
DocumentLogs,KnowledgeDocuments,AuditTrail,QueryCache
Use this when you want to start the knowledge base completely fresh.
Cause: SQL Server Express is not running or ODBC Driver 17 is not installed.
Fix:
- Open Services (
Win + R→services.msc) → FindSQL Server (SQLEXPRESS)→ Start - Run
setup_db.sqlin SSMS to create tables - Install ODBC Driver 17
Fix:
- Verify Ollama is installed:
ollama --version - Verify llama3 model exists:
ollama list - Re-pull if needed:
ollama pull llama3 - Ensure Ollama service is running (should auto-start, or run
ollama serve)
Cause: No PDFs have been uploaded yet, or ChromaDB is empty.
Fix:
- Log in as
master_admin - Go to Admin Dashboard → Upload a PDF policy document
- Wait for the success message (e.g., "47 chunks added")
- Try your query again
Fix:
- Make sure the backend is running on
http://127.0.0.1:8000 - Check that
api.jshasbaseURL: 'http://127.0.0.1:8000' - The backend already has
allow_origins=["*"]— this should not be an issue in local dev
Fix: The all-MiniLM-L6-v2 model is downloaded from HuggingFace Hub on first startup. If it fails:
- Check your internet connection
- Or pre-download manually:
pip install huggingface_hub python -c "from huggingface_hub import snapshot_download; snapshot_download('sentence-transformers/all-MiniLM-L6-v2')"
Fix: Always use the virtual environment:
cd backend
.\venv\Scripts\activate
pip install -r requirements.txt --upgradeCause: llama3 sometimes returns malformed JSON or wrapped in markdown code blocks.
Behavior: The pipeline automatically falls back to audit after MAX_REWRITES=2 retries. The response will still be returned to the user — it will just be marked as hallucination_check: fail.
Fix: This is normal for smaller models. Consider using llama3.1 or mistral for better JSON compliance:
ollama pull llama3.1
# Then change OLLAMA_MODEL=llama3.1 in .env / main.pyBuilt for internal enterprise use. All data stays 100% on your machine.