Built for the Vision Possible Hackathon by WeMakeDevs × Stream
InterviewAce is an AI-powered mock interview coach that joins a video call, conducts realistic job interviews, analyzes your body language in real-time, and generates a detailed performance scorecard.
- 🎥 Real-time Video AI — Gemini Realtime analyzes your body language at 2fps during the interview
- 🗣️ Voice Interaction — Natural conversational interview powered by STT/TTS
- 🧠 Smart Questions — Role-specific behavioral & technical questions
- 📊 Live Feedback — Real-time body language tips overlaid during the call
- 📋 Scorecard — Detailed post-interview report with per-question scoring
- ⚡ Low Latency — Sub-30ms audio/video via Stream's edge network
| Component | Technology |
|---|---|
| AI Agent | Vision Agents SDK (Python) |
| LLM + Vision | Gemini Realtime (video + voice) |
| Video Streaming | Stream Video (edge network) |
| Speech-to-Text | Deepgram |
| Text-to-Speech | ElevenLabs |
| Frontend | React + Vite + Stream Video React SDK |
| Body Language | Custom VideoProcessor + Gemini Vision |
- Python 3.12+ with uv
- Node.js 18+
- API keys for: Stream, Google AI, Deepgram, ElevenLabs
cd backend
cp .env.example .env
# Fill in your API keys in .env
uv run main.py runcd frontend
cp .env.example .env
# Fill in your Stream API key and user token
npm install
npm run dev- Start the backend agent (
uv run main.py run) - Start the frontend (
npm run dev) - Open
http://localhost:5173 - Select your target role and click Start Mock Interview
- The AI interviewer will join the call and begin
├── backend/
│ ├── main.py # Vision Agents agent (entry point)
│ ├── prompts.py # System prompt & question banks
│ ├── processors.py # Custom BodyLanguageProcessor
│ └── .env.example # API key template
├── frontend/
│ ├── src/
│ │ ├── App.jsx # Router
│ │ ├── pages/
│ │ │ ├── Landing.jsx # Role selection
│ │ │ ├── InterviewRoom.jsx # Video call + feedback
│ │ │ └── Scorecard.jsx # Results
│ │ └── components/
│ │ └── FeedbackOverlay.jsx
│ └── .env.example
└── README.md
┌─────────────┐ Stream Edge ┌──────────────────┐
│ React App │◄──── Network ──────►│ Vision Agent │
│ (Frontend) │ (sub-30ms A/V) │ (Python Backend)│
└─────────────┘ └──────────────────┘
│ │
│ Video + Audio │ Gemini Realtime
│ Feedback Events │ (fps=2 for vision)
│ │
▼ ▼
┌──────────┐ ┌────────────────┐
│ Scorecard│ │ Body Language │
│ Display │ │ Analysis │
└──────────┘ │ Score Tracking │
│ Report Gen │
└────────────────┘
- ✅ Gemini Realtime — Video frames + voice in one pipeline
- ✅ Function Calling —
save_score(),send_feedback(),generate_report() - ✅ Custom VideoProcessor —
BodyLanguageProcessorfor frame analysis - ✅ Custom Events —
BodyLanguageEventfor real-time frontend updates - ✅ Stream Edge Network — Ultra-low latency video/audio
- ✅ Agent Lifecycle —
create_agent→join_call→finish
MIT — Built with ❤️ for the Vision Possible Hackathon