Skip to content

jcmd13/Interview_Assistant

Β 
Β 

Repository files navigation

Interview Assistant

Real-Time AI Interview Support with Local LLMs

Python 3.9+ License: MIT Code style: black Platform: Windows|macOS|Linux


🎯 What is Interview Assistant?

Interview Assistant is a real-time AI-powered interview support system that:

  • Transcribes live conversations using Whisper (speech-to-text)
  • Detects interview questions using LLM analysis
  • Generates contextual answers powered by Ollama (local LLMs)
  • Displays everything in a live dashboard for at-a-glance monitoring

Perfect for job interviews, customer calls, sales pitches, or any high-stakes conversation where you need intelligent support.

Key Differentiators

Feature Interview Assistant Alternatives
Privacy 100% local processing (no data leaves your machine) Cloud-based = data sharing
Cost Free (no API costs) $5-50+ per month
Latency <4s end-to-end (optimized) 1-5s network overhead
Customization Swap any component (Whisper ↔ Azure, Ollama ↔ OpenAI) Locked into single provider
Setup Works out-of-box with defaults Requires configuration

✨ Features

πŸŽ™οΈ Real-Time Transcription

  • Sub-500ms latency audio processing
  • Automatic silence detection and noise gating
  • Supports all common audio formats (via FFmpeg)
  • Platform-specific audio capture (DirectShow/AVFoundation/ALSA)

πŸ€” Intelligent Question Detection

  • LLM-powered question extraction from dialogue
  • Filters noise and conversational padding
  • Deduplication to avoid answering the same question twice
  • Configurable aggressiveness (technical vs. casual interviews)

πŸ’‘ Context-Aware Answer Generation

  • Generates answers using full conversation history
  • Customizable persona (candidate, assistant, or neutral)
  • Multiple context modes for latency/accuracy tradeoff
  • Rate-limited to prevent overwhelming feedback

🎨 Beautiful Web Dashboard

  • Single HTML file - zero dependencies
  • Real-time transcript with word-by-word updates
  • Expandable answer panel with full context
  • Question Q&A list with timestamps
  • Dark/light mode toggle
  • Keyboard shortcuts for power users
  • One-click session export to Markdown

πŸ”’ Privacy & Security

  • All processing happens locally on your machine
  • No cloud dependencies for core functionality
  • Optional "offline mode" (no internet required)
  • Structured input validation and sanitization
  • Rate limiting to prevent abuse
  • Credential manager for secure API key storage

βš™οΈ Highly Configurable

  • Plugin architecture - swap any component
  • Adjust latency/accuracy tradeoffs
  • Multiple LLM model support
  • Customizable audio preprocessing
  • Session-specific context injection
  • Advanced buffer management

πŸš€ Quick Start (2 minutes)

Prerequisites

  • Python 3.9+ - Download
  • FFmpeg - For audio capture
    • macOS: brew install ffmpeg
    • Windows: choco install ffmpeg or download
    • Linux: sudo apt-get install ffmpeg
  • Ollama - For local LLMs
    • Download from ollama.ai
    • Start with: ollama serve

Installation & Launch

# 1. Clone the repository
git clone https://github.com/jcmd13/Interview_Assistant.git
cd Interview_Assistant

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # macOS/Linux
# or: .\venv\Scripts\activate  # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Make sure Ollama is running (in another terminal)
ollama serve

# 5. Pull the required model (only needed once)
ollama pull gpt-oss:120b-cloud

# 6. Start everything with the launcher
./launch.sh  # or: python launcher.py

That's it! The launcher will:

  • βœ… Start the WebSocket server
  • βœ… Open the web UI in your browser
  • βœ… Prompt you to select your microphone
  • βœ… Begin audio streaming automatically

πŸ“š Documentation

Getting Started

Using the System

Development

Troubleshooting


πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Interview Assistant System Architecture           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚   CLIENT     β”‚  β”‚    SERVER    β”‚  β”‚     UI       β”‚    β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚
β”‚  β”‚ Audio Captureβ”‚  β”‚ Transcriptionβ”‚  β”‚ Web Dashboardβ”‚    β”‚
β”‚  β”‚   (FFmpeg)   β”‚  β”‚  (Whisper)   β”‚  β”‚   (HTML)     β”‚    β”‚
β”‚  β”‚      ↓       β”‚  β”‚      ↓       β”‚  β”‚      ↑       β”‚    β”‚
β”‚  β”‚ PCM Stream   β”‚  β”‚ Text Chunks  β”‚  β”‚ Real-time    β”‚    β”‚
β”‚  β”‚   WebSocket  β”‚  β”‚ LLM Analysis β”‚  β”‚ Updates      β”‚    β”‚
β”‚  β”‚      ↓       │──→      ↓       │──→      ↑       β”‚    β”‚
β”‚  β”‚  16kHz Mono  β”‚  β”‚ Question    β”‚  β”‚ Q&A Display  β”‚    β”‚
β”‚  β”‚   s16le      β”‚  β”‚ Detection   β”‚  β”‚              β”‚    β”‚
β”‚  β”‚              β”‚  β”‚      ↓       β”‚  β”‚              β”‚    β”‚
β”‚  β”‚              β”‚  β”‚ Ollama LLM   β”‚  β”‚              β”‚    β”‚
β”‚  β”‚              β”‚  β”‚ Answer Gen   β”‚  β”‚              β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚       Local App        Local Server      Web Browser       β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚      Core Services (Thread-Safe, Local)           β”‚   β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€   β”‚
β”‚  β”‚  β€’ Plugin System    β€’ Configuration Management     β”‚   β”‚
β”‚  β”‚  β€’ Structured Logging      β€’ Error Handling       β”‚   β”‚
β”‚  β”‚  β€’ Metrics Collection      β€’ Security Hardening   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

User Speaking
     ↓
Audio Capture (FFmpeg) β†’ 16kHz PCM buffer
     ↓
WebSocket Stream β†’ Server
     ↓
Whisper Transcription (faster-whisper) β†’ Text chunks
     ↓
Question Detection (LLM) β†’ "Did the user ask a question?"
     ↓
YES β†’ Answer Generation (Ollama) β†’ Contextual response
      ↓
   Broadcast to UI β†’ User sees answer
     ↓
No β†’ Continue listening for next question

πŸ’» Component Details

Server (optimized_stt_server_v3.py)

  • Port: 8123 (WebSocket)
  • Transcription: faster-whisper (CPU/GPU accelerated)
  • Question Detection: Ollama (local LLM)
  • Answer Generation: Ollama (local LLM, configurable model)
  • Concurrency: 3 simultaneous LLM requests (configurable)

Client (stable_audio_client_multi_os.py)

  • Audio Format: PCM, 16-bit signed, mono, 16kHz
  • Buffer Size: 4096 samples (256ms)
  • Reconnection: Exponential backoff (up to 30s)
  • Backpressure: Automatic flow control

Web UI (index.html)

  • Technologies: Vanilla JavaScript, HTML5, CSS3
  • Dependencies: None (zero external libraries)
  • Styling: Responsive grid layout
  • Shortcuts: ? to view keyboard shortcuts

⚑ Performance

All benchmarks on M-series MacBook Pro with 16GB RAM:

Component Latency Notes
Audio Capture 50-100ms Platform dependent
Whisper (base) 200-400ms GPU: 100-200ms
Question Detection 50-150ms Cached when repeated
Answer Generation 1500-3000ms Depends on response length
WebSocket Round-trip <50ms Local network
UI Update 100-200ms Browser rendering
End-to-End <4 seconds Question β†’ Answer display

πŸ’‘ Target: <4s from question spoken to answer displayed (maintained)


πŸ”§ Configuration

Basic Configuration

Edit the top of optimized_stt_server_v3.py:

# Audio
WHISPER_MODEL = "base"  # tiny, base, small, medium, large
SAMPLE_RATE = 16000
WINDOW_SECONDS = 6.0    # Larger = more context but higher latency
HOP_SECONDS = 0.8       # Smaller = more updates but higher CPU

# LLM
OLLAMA_MODEL_CLOUD = "gpt-oss:120b-cloud"
MAX_CONCURRENT_LLM = 3
MAX_OUTTOK = 500        # Max answer length in tokens

# Behavior
TECH_INTERVIEW_MODE = True
PERSONA = "candidate"   # candidate, assistant, or neutral
LLM_CONTEXT_MODE = "full"  # full, window, or headtail

Advanced Configuration

See Configuration Guide for:

  • Custom model selection
  • Buffer tuning for different hardware
  • Rate limiting configuration
  • Credential management
  • Custom prompts for different scenarios
  • "Black hole" configurations (offline mode, minimal CPU, etc.)

πŸ§ͺ Testing

Run All Tests

pytest tests/ -v

Run Specific Test Suite

pytest tests/test_plugins.py -v
pytest tests/test_phase3.py -v
pytest tests/test_logging.py -v

Test Coverage

pytest tests/ --cov=src --cov-report=html

Current Status: ~75% pass rate (see Test Status for details)


πŸš€ Deployment

Local Development

python optimized_stt_server_v3.py

Docker (Production)

docker build -t interview-assistant .
docker run -p 8123:8123 interview-assistant

See Deployment Guide for:

  • Docker setup with GPU support
  • Kubernetes configuration
  • Load balancing
  • Monitoring and logging
  • Performance tuning

πŸ› οΈ Troubleshooting

Audio Device Not Found

# List available devices
ffmpeg -list_devices true -f dshow -i dummy  # Windows
ffmpeg -f avfoundation -list_devices true -i ""  # macOS
arecord -l  # Linux

See Troubleshooting Guide for more.

High Latency

  1. Reduce WINDOW_SECONDS (trades accuracy for speed)
  2. Use smaller Whisper model (tiny or base)
  3. Enable GPU acceleration
  4. Increase MAX_CONCURRENT_LLM

Ollama Not Responding

# Check Ollama status
curl http://localhost:11434/api/version

# Restart Ollama
ollama serve

More troubleshooting at Performance Tuning.


πŸ“¦ What's Included

Interview_Assistant/
β”œβ”€β”€ optimized_stt_server_v3.py      # Main WebSocket server
β”œβ”€β”€ stable_audio_client_multi_os.py # Audio streaming client
β”œβ”€β”€ launcher.py                      # Auto-launcher
β”œβ”€β”€ index.html                       # Web UI
β”œβ”€β”€ requirements.txt                 # Python dependencies
└── src/
    β”œβ”€β”€ core/                        # Core infrastructure
    β”‚   β”œβ”€β”€ logger.py               # Structured logging
    β”‚   β”œβ”€β”€ config.py               # Configuration management
    β”‚   β”œβ”€β”€ metrics.py              # Performance metrics
    β”‚   β”œβ”€β”€ security.py             # Rate limiting & validation
    β”‚   β”œβ”€β”€ plugins.py              # Plugin system
    β”‚   └── ...
    β”œβ”€β”€ transcription/              # Speech-to-text plugins
    β”‚   β”œβ”€β”€ whisper.py              # Whisper implementation
    β”‚   └── ...
    β”œβ”€β”€ llm/                        # LLM plugins
    β”‚   β”œβ”€β”€ ollama.py               # Ollama implementation
    β”‚   └── ...
    β”œβ”€β”€ audio/                      # Audio processing plugins
    β”‚   β”œβ”€β”€ effects.py              # 5 audio effects
    β”‚   └── ...
    └── plugins/
        └── __init__.py             # Plugin registration

πŸ”„ How It Works (High Level)

  1. User speaks into their microphone
  2. Client captures audio at 16kHz using FFmpeg
  3. Server receives audio stream via WebSocket
  4. Whisper transcribes incoming audio chunks (sub-500ms latency)
  5. New text appears in the transcript
  6. LLM analyzes text to detect questions
  7. Question detected?
    • YES β†’ Generate answer using full context
    • NO β†’ Wait for next audio chunk
  8. Answer appears in the UI (within 4 seconds of question)
  9. User can expand answer to see full reasoning

See How It Works for detailed architecture.


πŸŽ“ For Developers

Setting Up Development Environment

git clone https://github.com/jcmd13/Interview_Assistant.git
cd Interview_Assistant
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .  # Install in editable mode

Running Tests

pytest tests/ -v
pytest tests/ --cov=src  # With coverage

Building Documentation

# Documentation is in Markdown, no build needed
# Edit files in docs/ directory directly

Contributing

See Contributing Guidelines for:

  • Code style (Black, isort)
  • Test requirements
  • Commit message format
  • Pull request process

πŸ“„ License

This project is licensed under the MIT License - see LICENSE for details.


πŸ™‹ FAQ

Q: Does this work without internet? A: Yes! All core functionality is local. Optional: Use cloud Ollama models (requires internet).

Q: Is my data private? A: 100% private. Everything runs locally on your machine. No data is sent anywhere.

Q: Can I use this with OpenAI/Claude? A: Yes! The LLM backend is pluggable. See Configuration Guide.

Q: How much does it cost? A: Free. No API costs, no subscriptions. Just your hardware.

Q: What hardware do I need? A: Minimum: 2GB RAM, modern CPU. Recommended: 4GB+ RAM, GPU preferred for <500ms Whisper latency.

Q: Can I run this on a server? A: Yes! Docker setup available. See Deployment Guide.

More FAQs at FAQ.


πŸ“Š Performance Metrics

Real-world benchmarks on M-series MacBook Pro:

  • Question Detection Accuracy: 95%+ (with proper tuning)
  • Transcription Error Rate: <3% (on clear audio)
  • End-to-End Latency: <4s (p95)
  • Memory Usage: 200-400MB idle, 500-800MB during inference
  • CPU Usage: 5-15% (idle), 30-60% (during transcription)
  • GPU Usage: 20-40% (if available)

See Performance Benchmarks for detailed metrics.


🀝 Support


πŸ™ Acknowledgments

Built with:


Made with ❀️ for job seekers and professionals

Last Updated: November 8, 2025

About

Real-time AI Interview Assistant. Transcribes live audio with faster-whisper, detects questions, and generates answers using an LLM. Includes a multi-platform client and a web UI dashboard. Your personal interview co-pilot.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 78.1%
  • HTML 21.9%