SoulBot is a real-time, AI-powered mental wellness platform featuring expressive 3D avatars that listen, speak, and emotionally respond β turning conversations into clinically meaningful insights.
- β¨ Overview
- ποΈ System Architecture
- π Request Lifecycle
- π‘ Real-Time Flow Sequence
- π€ Dual LLM Pipeline
- π Mental Health Scoring
- π§° Tech Stack
- π Getting Started
- π Socket Events
- π Project Structure
- πΏ Environment Variables
SoulBot creates immersive, emotionally aware AI therapy sessions through:
- ποΈ Real-time Speech-to-Text via Sarvam AI Streaming API
- π§ Dual LLM Processing β one for response generation, one for avatar emotion
- π£οΈ Text-to-Speech streaming with avatar-specific voice profiles
- π Dynamic Avatar Expressions driven by a dedicated emotion LLM
- π Automated Clinical Reports scoring PHQ-9 (depression) and GAD-7 (anxiety)
flowchart LR
A([π€ User Speaks]) --> B[Audio Chunks\nvia WebSocket]
B --> C{STT\nSarvam AI}
C --> D[Transcript\nText]
D --> E{LangGraph\nOrchestrator}
E --> F["π§ LLM 1\nResponse Generator"]
E --> G["π LLM 2\nExpression Engine"]
F --> H[Response\nTranscript]
G --> I[Expression\nJSON]
H --> J{TTS\nSarvam AI}
J --> K[Audio Stream\nby Avatar Voice]
K --> L([π Avatar Speaks])
I --> M([π Avatar Reacts])
E --> N[(SQLite\nSession Log)]
N --> O[π PHQ-9 / GAD-7\nReport]
O --> P([π Dashboard])
style A fill:#e94560,color:#fff,stroke:none
style L fill:#533483,color:#fff,stroke:none
style M fill:#533483,color:#fff,stroke:none
style P fill:#0f3460,color:#fff,stroke:none
style E fill:#1a1a2e,color:#eee,stroke:#e94560
sequenceDiagram
autonumber
actor User
participant FE as π₯οΈ Frontend
participant WS as β‘ Socket.io
participant BE as π§ NestJS Backend
participant STT as ποΈ Sarvam STT
participant LG as π§ LangGraph
participant L1 as π¬ LLM Β· Response
participant L2 as π LLM Β· Emotion
participant TTS as π£οΈ Sarvam TTS
participant DB as ποΈ SQLite
User->>FE: Speaks into microphone
FE->>WS: emit(audio:chunk, audioBuffer)
loop Streaming STT
WS->>BE: audio chunk received
BE->>STT: stream audio buffer
STT-->>BE: partial transcript
end
STT-->>BE: final transcript β
BE->>DB: save user message to session
BE->>LG: invoke graph with transcript + history
par Parallel LLM Inference
LG->>L1: generate therapeutic response
L1-->>LG: response transcript
and
LG->>L2: analyze emotional context
L2-->>LG: expression { emotion, intensity }
end
LG-->>BE: { transcript, expression }
BE->>DB: save assistant response
BE->>WS: emit(avatar:expression, { emotion, intensity })
WS->>FE: avatar plays expression animation π
BE->>TTS: stream transcript with avatar voice profile
loop Streaming TTS
TTS-->>BE: audio chunk
BE->>WS: emit(audio:response, audioChunk)
WS->>FE: play audio chunk in realtime
end
FE->>User: Avatar speaks with matching expression π
Note over BE,DB: On session end
BE->>LG: generate clinical report
LG->>L1: summarize + score PHQ-9 / GAD-7
L1-->>LG: scores + insights
LG-->>BE: final report JSON
BE->>DB: persist report
BE->>WS: emit(session:report, reportData)
WS->>FE: display dashboard π
At the end of every session, SoulBot generates a comprehensive clinical report:
graph TD
LOG[π Full Conversation Log] --> ANALYZER[LangGraph\nReport Analyzer]
ANALYZER --> PHQ["π PHQ-9 Analysis\nPatient Health Questionnaire\n9-item Depression Scale"]
ANALYZER --> GAD["π GAD-7 Analysis\nGeneralized Anxiety Disorder\n7-item Scale"]
ANALYZER --> SUM["π Session Summary\nKey themes & observations"]
PHQ --> SCORE1["Score: 0β27\nβ
Minimal Β· π‘ Mild\nπ Moderate Β· π΄ Severe"]
GAD --> SCORE2["Score: 0β21\nβ
Minimal Β· π‘ Mild\nπ Moderate Β· π΄ Severe"]
SCORE1 --> REPORT[(π Report JSON\nSQLite)]
SCORE2 --> REPORT
SUM --> REPORT
REPORT --> DASH[π₯οΈ Frontend Dashboard\nVisual Analytics]
style LOG fill:#1a1a2e,stroke:#e94560,color:#eee
style ANALYZER fill:#0f3460,stroke:#533483,color:#eee
style REPORT fill:#533483,stroke:#e94560,color:#eee
style DASH fill:#16213e,stroke:#0f3460,color:#eee
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React+Vite+TailwindCSS | Component-based UI framework |
| 3D Graphics | Three.js | Core WebGL 3D rendering engine |
| Framework | NestJS + TypeScript | Scalable server architecture |
| Real-time | Socket.io | Bidirectional audio & event streaming |
| Speech-to-Text | Sarvam AI Streaming API | Low-latency audio transcription |
| Text-to-Speech | Sarvam AI Streaming API | Avatar voice synthesis |
| LLM Orchestration | LangGraph | Stateful multi-agent graph execution |
| AI Models | OpenAI SDK | Response generation & emotion classification |
| Database | SQLite | Session & report persistence |
- Node.js
>= 18.x - npm
>= 9.x - Sarvam AI API Key
- OpenAI API Key
Step 1 β Clone the repository
git clone git clone https://github.com/luv29/innovate-you-backend.git
cd innovate-you-backendStep 2 β Install dependencies
npm installStep 3 β Configure environment variables
cp .env.example .envOpen
.envand fill in your API keys and configuration (see Environment Variables below).
Step 4 β Start the development server
npm run start:devThe server will be live at http://localhost:3000 π