Sahara 🏔️

A safety-first, open-source AI framework for mental health support.

Sahara is not just another AI chatbot. It's a harness-engineered system where every AI response passes through a rigorous safety layer before reaching the user — crisis detection, medical claim stripping, tone enforcement, and structured output validation — all enforced in code, not in prompts.

Built for developers, researchers, and organizations who want to deploy AI in sensitive mental health contexts responsibly.

Why Sahara?

Most AI mental health tools are a system prompt and an API call. Sahara is different.

Feature	Typical AI Chatbot	Sahara
Crisis detection	Optional, prompt-based	Mandatory, code-enforced
Medical diagnosis prevention	"Please don't" in prompt	Stripped from every response
Response tone enforcement	Hope for the best	Verified, regenerated if it fails
Structured output	Free-form text	JSON schema, every field validated
Conversation memory	Usually none	Per-session, TTL-managed
Multilingual	Often English-only	6 languages with prompt fallback
Observability	None	Full Langfuse tracing per response
Open prompt system	Closed	Community-improvable via PRs

Architecture

Open in Excalidraw: docs/architecture.excalidraw — editable source for the diagram above.

How It Works

Every message flows through the Sahara Harness — a pipeline that the AI cannot bypass:

User Input
    ↓
Crisis Keyword Check        → if detected: immediate crisis resources, stop
    ↓
AI Emotion Detection        → gpt-4o-mini classifies: anxiety / depression / loneliness / neutral
    ↓
Prompt Manager              → selects versioned prompt template for detected emotion + language
    ↓
Conversation History        → last N turns injected for context
    ↓
OpenAI API (Structured)     → JSON schema enforced response_format
    ↓
Guardrails                  → crisis re-check on AI output
    ↓
Output Validator            → strips medical diagnoses, harmful advice, enforces disclaimer
    ↓
Tone Evaluator              → warm tone verified; regenerates if it fails
    ↓
Quality Scorer              → helpfulness scored via secondary LLM call
    ↓
Session Store               → turn added to conversation history
    ↓
Langfuse Logger             → full trace: model, prompt version, latency, tokens
    ↓
User Response

The AI generates text. The harness decides what the user sees.

Features

Mandatory crisis detection — keyword + AI double-check on both user input and AI response
Structured AI output — every response has message, emotion_acknowledged, recommendation, disclaimer
Medical diagnosis stripping — "you have depression" is removed before it reaches the user
Tone enforcement — cold or dismissive responses are regenerated, not sent
Multi-turn conversation — session history with automatic TTL expiry
SSE streaming — real-time token streaming via /chat/stream
User feedback loop — thumbs up/down per message, tracked alongside AI quality scores
Dual evaluation — AI-scored helpfulness + real user feedback, both in metrics
6 language support — English, Hindi, Spanish, French, Bengali, Telugu
Versioned prompts — community-improvable, benchmarked on merge
Langfuse observability — per-prompt-version metrics tracked in Supabase
Zero PII logging — user IDs are hashed, no conversation content stored in logs

Tech Stack

Layer	Technology
Runtime	Python 3.12+
API	FastAPI
AI	OpenAI GPT-4o (structured output)
Emotion Classification	GPT-4o-mini
Observability	Langfuse
Database	Supabase (PostgreSQL)
Package Manager	uv
Testing	pytest (42 tests, 100% harness coverage)

Quick Start

Prerequisites: Python 3.12+, uv, OpenAI API key, Supabase project, Langfuse account.

# Clone
git clone https://github.com/yourusername/sahara.git
cd sahara

# Install dependencies
uv sync

# Configure
cp .env.example .env
# Fill in your keys in .env

# Run
uv run fastapi dev api/main.py

Server starts at http://localhost:8000. Docs at http://localhost:8000/docs.

API

`POST /api/v1/chat`

Send a message and receive a full structured response.

curl -X POST http://localhost:8000/api/v1/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "I have been feeling really anxious lately", "session_id": "user-123"}'

Response:

{
  "message_id": "64b74029e6a34f909f24711970875407",
  "message": "I hear you — that kind of persistent anxiety is exhausting to carry...",
  "emotion_acknowledged": "It sounds like anxiety has been weighing on you for a while.",
  "recommendation": "Try a simple grounding exercise: name 5 things you can see right now.",
  "disclaimer": "This is a supportive conversation, not a substitute for professional mental health care.",
  "emotion_state": "anxiety",
  "language": "en",
  "is_crisis": false,
  "prompt_version": "en/anxiety.txt",
  "latency_ms": 1243
}

`POST /api/v1/chat/stream`

Same request body. Returns a Server-Sent Events stream.

data: {"type": "meta", "emotion_state": "anxiety", "emotion_acknowledged": "...", ...}
data: {"type": "token", "content": "I hear"}
data: {"type": "token", "content": " you —"}
...
data: [DONE]

`POST /api/v1/chat/feedback`

Submit user feedback (thumbs up/down) for a specific message. Use the message_id from the chat response.

curl -X POST http://localhost:8000/api/v1/chat/feedback \
  -H "Content-Type: application/json" \
  -d '{"message_id": "64b74029e6a34f909f24711970875407", "score": 1}'

score: 1 = helpful, score: 0 = not helpful.

Response:

{
  "message_id": "64b74029e6a34f909f24711970875407",
  "score": 1,
  "updated": true
}

`GET /api/v1/evaluation/metrics/{prompt_version}`

Fetch aggregated performance metrics for a specific prompt version. Includes both AI-scored and user-scored helpfulness.

curl http://localhost:8000/api/v1/evaluation/metrics/en/anxiety.txt

Response:

{
  "prompt_version": "en/anxiety.txt",
  "avg_helpfulness": 0.85,
  "avg_user_score": 1.0,
  "tone_pass_rate": 1.0,
  "crisis_detection_rate": 0.0,
  "regeneration_rate": 0.0,
  "avg_latency_ms": 7362.0,
  "total_sessions": 3
}

avg_user_score is null until at least one user submits feedback.

Safety Guarantees

These behaviors are enforced in code. They cannot be disabled by prompt injection or user input:

Crisis resources are always shown when crisis signals are detected — no exceptions
No clinical diagnoses will ever appear in a response
No harmful advice (stopping medication, skipping therapy) will reach the user
Every response has a professional help disclaimer
Cold or dismissive responses are regenerated, not sent
Crisis detection runs twice — on user input before the API call, and on AI output after

Crisis resources included in every crisis response:

iCall India: 9152987821
Vandrevala Foundation: 1860-2662-345
IASP Crisis Centres: https://www.iasp.info/resources/Crisis_Centres/

Running Tests

uv run pytest                          # all tests
uv run pytest tests/test_guardrails.py # safety layer only
uv run pytest -k "crisis" -v          # crisis-specific tests

Coverage requirements enforced:

harness/guardrails.py → 100%
harness/output_validator.py → 100%
engine/chat.py → 90%

Contributing

Sahara is open source and welcomes contributions — especially improved prompt templates.

See CONTRIBUTING.md to get started.
See PROMPTS.md to contribute or improve prompt templates.
See HARNESS.md for a deep dive into the safety architecture.

License

MIT License. See LICENSE for details.

A Note on Responsibility

Sahara is a support tool, not a replacement for professional care. If you're deploying this in a product, ensure you have reviewed the safety guarantees, have crisis resources localized to your region, and have a process for reviewing prompt changes before deploying them to users.

Mental health is not a space for moving fast and breaking things.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sahara 🏔️

Why Sahara?

Architecture

How It Works

Features

Tech Stack

Quick Start

API

`POST /api/v1/chat`

`POST /api/v1/chat/stream`

`POST /api/v1/chat/feedback`

`GET /api/v1/evaluation/metrics/{prompt_version}`

Safety Guarantees

Running Tests

Contributing

License

A Note on Responsibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
docs		docs
engine		engine
evaluation		evaluation
harness		harness
prompts		prompts
supabase		supabase
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Sahara 🏔️

Why Sahara?

Architecture

How It Works

Features

Tech Stack

Quick Start

API

POST /api/v1/chat

POST /api/v1/chat/stream

POST /api/v1/chat/feedback

GET /api/v1/evaluation/metrics/{prompt_version}

Safety Guarantees

Running Tests

Contributing

License

A Note on Responsibility

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/v1/chat`

`POST /api/v1/chat/stream`

`POST /api/v1/chat/feedback`

`GET /api/v1/evaluation/metrics/{prompt_version}`

Packages