Speech Processing Service

A comprehensive web application for audio transcription, text-to-speech conversion, and document processing with GPU-accelerated performance.

Features

🎙️ Audio/Video Transcription - Support for MP3, WAV, MP4, MOV files
📺 YouTube Processing - Extract transcripts from videos and playlists
📄 Document Processing - Extract text from PDF, DOCX, and TXT files
🔊 Text-to-Speech - Generate audio with multiple voice options
🌍 Multi-Language Support - 10 languages including English, Spanish, French, German, and more
📊 Multiple Output Formats - Text, Markdown, Word documents, and PDF
⚡ GPU Acceleration - Powered by faster-whisper for efficient transcription
📈 Real-time Progress - Track processing status with visual indicators
💾 Job History - All processing jobs saved to database

Quick Start

Prerequisites

Python 3.11+
PostgreSQL database
FFmpeg
faster-whisper (for transcription)

Local Development

Clone the repository

git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git
cd YOUR_REPO

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

cp .env.example .env
# Edit .env with your configuration

Run the application
```
gunicorn --bind 0.0.0.0:5000 main:app
```
Access the app Open http://localhost:5000 in your browser

Deployment

Deploy to Ubuntu Server

For detailed deployment instructions, see deployment/DEPLOY_TO_UBUAI.md

Quick deployment on Ubuntu:

# On your Ubuntu server
sudo mkdir -p /var/www
sudo git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git /var/www/speech-app
cd /var/www/speech-app
sudo bash deployment/setup_ubuntu.sh

Deployment Features

✅ Nginx reverse proxy
✅ Gunicorn WSGI server
✅ Systemd service management
✅ PostgreSQL database
✅ Automatic service restart
✅ Log rotation
✅ Security hardening

Usage

Audio Transcription

Select the Files tab
Upload your audio/video file (MP3, WAV, MP4, MOV)
Choose the language
Select output format
Click "Process"

YouTube Transcription

Select the YouTube tab
Paste a YouTube video or playlist URL
Choose whether to use existing transcript or transcribe audio
Select language and output format
Click "Process"

Document Processing

Select the Documents tab
Upload a PDF, DOCX, or TXT file
Select output format
Click "Process"

Text-to-Speech

Select the Text tab
Enter or paste your text
Choose a voice and language
Click "Generate Speech"

Architecture

Backend: Flask + SQLAlchemy
Database: PostgreSQL
Transcription: faster-whisper (GPU-accelerated)
Web Server: Gunicorn + Nginx
TTS: gTTS + pyttsx3
Video Processing: yt-dlp + moviepy

Configuration

Environment Variables

# Database
DATABASE_URL=postgresql://user:password@localhost/dbname

# Whisper Configuration
WHISPER_SERVER=localhost
WHISPER_SCRIPT_PATH=/path/to/faster-whisper/script

# Application
FLASK_ENV=production
SECRET_KEY=your-secret-key

Supported Languages

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Russian (ru)
Japanese (ja)
Korean (ko)
Chinese (zh)

Output Formats

Plain Text (.txt)
Markdown (.md)
Word Document (.docx)
PDF (.pdf)

API Integration

faster-whisper GPU Server

The application integrates with a GPU-accelerated faster-whisper instance:

Primary: Remote GPU server via SSH
Fallback: Local faster-whisper installation
Auto-detection of best processing method

Troubleshooting

Common Issues

Service won't start:

sudo journalctl -u speech-app -f

Database connection errors:

sudo systemctl status postgresql

Transcription fails:

Check faster-whisper installation
Verify GPU server connectivity
Review application logs

Logs

Application: /var/log/speech-app/
Nginx: /var/log/nginx/
System: sudo journalctl -u speech-app

Development

Project Structure

.
├── app.py                 # Flask application setup
├── main.py               # Application entry point
├── models.py             # Database models
├── utils/
│   ├── audio_converter.py
│   ├── youtube_processor.py
│   ├── document_processor.py
│   ├── text_to_speech.py
│   ├── output_formatter.py
│   └── whisper_client.py
├── templates/            # HTML templates
├── static/              # CSS, JS, assets
└── deployment/          # Deployment scripts

Adding Features

Create utility modules in utils/
Add routes in app.py
Update models in models.py
Add templates in templates/

Service Management

Commands

# Check status
sudo systemctl status speech-app

# Restart service
sudo systemctl restart speech-app

# View logs
sudo journalctl -u speech-app -f

# Update application
cd /var/www/speech-app
git pull
sudo systemctl restart speech-app

Performance

GPU-accelerated transcription with faster-whisper
Asynchronous job processing
Efficient file handling for large uploads
Database query optimization
Nginx caching for static assets

Security

Environment-based secret management
SQL injection protection via SQLAlchemy
XSS protection headers
CSRF protection
File upload validation
Secure password handling

License

This project is for internal use.

Support

For deployment assistance, see:

Credits

Built with:

Flask
faster-whisper
PostgreSQL
Nginx
Gunicorn

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
attached_assets		attached_assets
config		config
deployment		deployment
static		static
templates		templates
utils		utils
.DS_Store		.DS_Store
.replit		.replit
DEPLOY.md		DEPLOY.md
DEPLOYMENT.md		DEPLOYMENT.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
FASTER_WHISPER_SETUP.md		FASTER_WHISPER_SETUP.md
FINAL_DEPLOYMENT_GUIDE.md		FINAL_DEPLOYMENT_GUIDE.md
MV1_DEPLOYMENT_GUIDE.md		MV1_DEPLOYMENT_GUIDE.md
OPENQM_RECORD_STRUCTURE.md		OPENQM_RECORD_STRUCTURE.md
README.md		README.md
app.py		app.py
app_redesigned.py		app_redesigned.py
app_updated.py		app_updated.py
comprehensive-processor_redesigned.js		comprehensive-processor_redesigned.js
comprehensive-processor_updated.js		comprehensive-processor_updated.js
deploy-ui-updates.sh		deploy-ui-updates.sh
deploy.sh		deploy.sh
deploy_complete.sh		deploy_complete.sh
generated-icon.png		generated-icon.png
haproxy_speech_config.txt		haproxy_speech_config.txt
index.html		index.html
index_redesigned.html		index_redesigned.html
index_updated.html		index_updated.html
main.py		main.py
models.py		models.py
openqm_service.py		openqm_service.py
pyproject.toml		pyproject.toml
quick-deploy.sh		quick-deploy.sh
replit.md		replit.md
suggested_prompts (copy).txt		suggested_prompts (copy).txt
suggested_prompts.txt		suggested_prompts.txt
test_ui.html		test_ui.html
test_upload.py		test_upload.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Speech Processing Service

Features

Quick Start

Prerequisites

Local Development

Deployment

Deploy to Ubuntu Server

Deployment Features

Usage

Audio Transcription

YouTube Transcription

Document Processing

Text-to-Speech

Architecture

Configuration

Environment Variables

Supported Languages

Output Formats

API Integration

faster-whisper GPU Server

Troubleshooting

Common Issues

Logs

Development

Project Structure

Adding Features

Service Management

Commands

Performance

Security

License

Support

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages