NexLearn — AI Voice Transcription System

Real-time speech-to-text for classrooms, powered by Deepgram Nova-3 and Flask.

Overview

NexLearn is a web application that converts live speech into text in real time; built specifically for the classroom. A teacher speaks, students see the words appear on screen instantly. No file uploads, no post-processing delays, no third-party apps to install.

Under the hood, audio captured by the browser's MediaRecorder API is streamed in WebM chunks to a Flask/Socket.IO backend, which forwards it over a persistent WebSocket to Deepgram's Nova-3 model. Transcription results stream back within milliseconds and are broadcast to the client live.

Built as part of an EdTech project exploring how AI can reduce accessibility barriers in education.

Features

Feature	Description
🎙 Live Transcription	Audio streams from browser to Deepgram via WebSocket — words appear within seconds
⚡ Interim + Final Results	Interim results show words as detected; finals lock in with punctuation and smart formatting
🌍 Auto Language Detection	Deepgram detects spoken language automatically and displays it as a live badge
⏸ Pause / Resume	Pause and resume recording mid-session without losing any transcribed text
⏱ Two-Phase Timer	A connecting clock tracks handshake time; a separate recording timer starts from zero once live
📋 Summary Generation	Generate a concise summary of the transcribed text on demand
💾 One-Click Download	Export full transcription or summary as `.txt` directly from the browser
👥 Multi-Client Sessions	Each browser tab is an isolated session — multiple users can record simultaneously
🌌 Animated UI	Starfield background, audio visualizer bars, glowing cards, and smooth CSS transitions

Demo

Video Walkthrough

Watch on YouTube →

I wrote an article about NextLearn

Check it out on Medium →

Screenshot

Tech Stack

Layer	Technology	Purpose
Backend	Python 3.10+, Flask	HTTP server and routing
Real-time	Flask-SocketIO, Gevent	Bidirectional WebSocket events
Transcription	Deepgram Nova-3	Live speech-to-text AI model
Audio Capture	Browser MediaRecorder API	WebM/Opus audio stream from mic
Summarization	Gemini 2.5 Flash	Fast, efficient, large-scale content summarization
Frontend	Vanilla JS, Socket.IO client	UI logic and socket communication
Styling	CSS3 with custom properties	Animations, theming, responsive layout

Architecture

Browser (Client)
│
│  MediaRecorder → WebM chunks (every 2s)
│  Socket.IO (websocket transport)
│
▼
Flask Server (app.py)
│
│  Per-session store (in-memory dict)
│  Background thread per recording session
│
▼
DeepgramSession (features.py)
│
│  asyncio event loop in dedicated thread
│  Persistent WebSocket (wss://api.deepgram.com)
│  Nova-3 model, WebM/Opus auto-detect
│
▼
Deepgram API
│
│  Streams back interim + final transcripts
│
▼
Flask Server → Socket.IO emit → Browser UI

Key design decision — EBML header prepending: The browser's MediaRecorder only includes the WebM container header in the first chunk. Deepgram requires a valid WebM stream for every chunk it receives. The server saves the first chunk's header and prepends it to all subsequent chunks before forwarding — this is what makes streaming work reliably without ffmpeg.

Project Structure

NexLearn/
   •app.py
   •features.py
   •template/
      static/
         css/
            style.css
            js/
               script.js
   •templates/
      index.html
   •test.py
   •requirements.txt
   •.env
   •.gitignore
   •README.md

Working Principle

Getting Started

Prerequisites

Python 3.10 or higher
A Deepgram account — the free tier includes enough credits to get started
A modern browser (Chrome or Edge recommended for best MediaRecorder support)

1. Clone the repository

git clone https://github.com/fachiny17/NexLearn.git
cd nexlearn

2. Create and activate a virtual environment

# macOS / Linux
python3 -m venv venv
source venv/bin/activate

# Windows
python3 -m venv venv
venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

cp .env.example .env

Edit .env and add your Deepgram API key:

DEEPGRAM_API_KEY=your_deepgram_api_key_here

Get your key at console.deepgram.com → API Keys → Create a New Key.

5. Run the development server

python3 app.py

Open http://localhost:5000 in your browser. Allow microphone access when prompted and click Start Recording.

6. (Optional) Test transcription without a browser

python3 test.py

Speaks into your system microphone directly. Press Ctrl+C to stop. Useful for verifying your API key and network connection independently of the web UI.

Environment Variables

Variable	Required	Default	Description
`DEEPGRAM_API_KEY`	✅ Yes	—	API key from console.deepgram.com
`PORT`	❌ No	`5000`	Port the server listens on

API & Socket Events

NexLearn communicates entirely over Socket.IO. Here is the full event reference:

Client → Server

Event	Payload	Description
`start_recording`	—	Initiates a Deepgram session for this client
`stop_recording`	—	Closes the Deepgram session and returns final text
`pause_recording`	—	Pauses audio forwarding
`resume_recording`	—	Resumes audio forwarding
`audio_chunk`	`bytes`	Raw WebM audio chunk from `MediaRecorder`
`generate_summary`	—	Triggers summary generation from current transcript
`download_transcription`	—	Requests transcription text for client-side download
`download_summary`	—	Requests summary text for client-side download

Server → Client

Event	Payload	Description
`connected`	`{ session_id }`	Confirms socket connection
`recording_started`	`{ status: 'success' \| 'error' }`	Deepgram handshake result
`recording_stopped`	`{ full_text, language }`	Final transcript on stop
`transcription_update`	`{ text, full_text, language, is_final }`	Live transcript update
`recording_paused`	`{ status }`	Pause confirmed
`recording_resumed`	`{ status }`	Resume confirmed
`summary_result`	`{ success, summary?, error? }`	Summary result
`download_data`	`{ success, content, filename }`	File content for download

Known Limitations & Roadmap

Current Limitations

In-memory sessions — All session data lives in a Python dict. Restarting the server wipes all transcriptions. A database (Redis, PostgreSQL) would be needed for persistence.
Single language per session — The language is set at session start. Switching mid-recording is not currently supported.
Free tier cold starts — Free Render instances spin down after inactivity. The first request after a dormant period can take up to 60 seconds.

Roadmap

Persistent storage — save transcriptions to a database
User accounts and session history
Export to PDF and DOCX
Speaker diarization — identify who is speaking
Real-time collaborative view for students
Keyword highlighting and topic extraction
Translation to other languages

Contributing

Contributions are welcome and appreciated. Here is how to get involved:

Reporting Bugs

Open an issue on GitHub with:

A clear description of the bug
Steps to reproduce it
What you expected vs what actually happened
Your OS, browser, and Python version

Suggesting Features

Open an issue with the enhancement label. Describe the feature and the problem it solves.

Submitting a Pull Request

Fork the repository
Create a feature branch:
```
git checkout -b feat/your-feature-name
```

Make your changes and commit using Conventional Commits:

git commit -m "feat: add PDF export for transcriptions"

Push to your fork:
```
git push origin feat/your-feature-name
```
Open a Pull Request against main

Please keep PRs focused — one feature or fix per PR makes review much faster.

Support

If you run into issues or have questions:

🐛 GitHub Issues — Open an issue for bugs and feature requests
📝 Medium Article — Read the full build walkthrough for a deep dive into how NexLearn was built
📺 YouTube — Watch the demo to see the app in action

If NexLearn helped you or your project, consider giving the repo a ⭐ — it helps others find it.

License

This project is licensed under the MIT License — free to use, modify, and distribute with attribution.

Built by Chisom · Powered by Deepgram ·

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
assets		assets
template/static		template/static
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
features.py		features.py
requirements.txt		requirements.txt
test.py		test.py

Folders and files

Latest commit

History

Repository files navigation

NexLearn — AI Voice Transcription System

Table of Contents

Overview

Features

Demo

Video Walkthrough

I wrote an article about NextLearn

Screenshot

Tech Stack

Architecture

Project Structure

Working Principle

Getting Started

Prerequisites

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Configure environment variables

5. Run the development server

6. (Optional) Test transcription without a browser

Environment Variables

API & Socket Events

Client → Server

Server → Client

Known Limitations & Roadmap

Current Limitations

Roadmap

Contributing

Reporting Bugs

Suggesting Features

Submitting a Pull Request

Support

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages