Skip to content

pmsosa/tonalli

Repository files navigation

☀️ Tonalli

"Your tonalli for learning" (toh-NAH-lee)

Transform spoken wisdom into written knowledge.

In Aztec/Mexica cosmology, tonalli represents one's vital energy, destiny, and inner fire - the warmth and consciousness housed in one's head that drives learning and spiritual growth. We honor this sacred concept as we help capture and illuminate the knowledge within video and audio.

License Python

Features

  • Video & Audio Transcription: Upload video files (.mp4, .mov, .mkv, etc.) or audio files (.mp3, .wav, .m4a, etc.) and get accurate transcriptions using OpenAI's Whisper model
  • Project Management: Organize multiple videos into projects for collective analysis
  • AI-Powered Q&A: Ask questions about your video content using Google Gemini AI
  • Smart Summaries: Generate comprehensive overviews with key topics and suggested questions
  • Modern UI: Clean, Bootstrap-based interface with custom color scheme
  • Persistent Storage: Transcripts are saved permanently while original files are cleaned up automatically
  • Security-First: No API keys stored in code, temporary file cleanup, input validation

Tech Stack

  • Backend: FastAPI, Python 3.11+
  • Transcription: OpenAI Whisper (large model, configurable)
  • AI Integration: Google Gemini API
  • Audio Processing: FFmpeg
  • Database: SQLite with SQLAlchemy
  • Frontend: HTML, JavaScript, Bootstrap 5

Prerequisites

  • Python 3.11 or higher
  • FFmpeg (for video processing)
  • (Optional) Google Gemini API key for AI features

Installing FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

Quick Start

1. Clone or Download

cd tldr-vid

2. Run Setup

./setup.sh

This will:

  • Create a Python virtual environment
  • Install all dependencies
  • Create necessary directories
  • Set up the .env file

3. Configure (Optional)

Edit .env to add your Gemini API key for AI features:

GEMINI_API_KEY=your_api_key_here
WHISPER_MODEL=large
MAX_FILE_SIZE_MB=1024

4. Run the Application

./run.sh

Or manually:

source venv/bin/activate
uvicorn main:app --reload

5. Open Your Browser

Navigate to: http://localhost:8000

Usage Guide

Creating a Project

  1. Click "New Project" in the sidebar
  2. Enter a project name and optional description
  3. Click "Create Project"

Uploading Videos

  1. Select a project from the sidebar
  2. Drag and drop a video/audio file into the upload zone, or click "Select File"
  3. Wait for the transcription to complete (this may take a few minutes depending on file size)

Using AI Features

Generate Overview:

  1. After uploading transcripts, click "Generate Overview"
  2. View the summary, key topics, and suggested questions

Ask Questions:

  1. Type your question in the Q&A section
  2. Press Enter or click the send button
  3. The AI will answer based on all transcripts in the project
  4. Conversation history is maintained for context

Managing Projects

  • Click on a project to view its transcripts and conversations
  • Use the trash icon to delete a project (this will delete all associated data)
  • Download individual transcripts using the download button on each transcript card

Project Structure

tldr-vid/
├── main.py                 # FastAPI application entry point
├── config.py               # Configuration management
├── models.py               # Database models
├── transcription.py        # Whisper & FFmpeg integration
├── ai_integration.py       # Gemini AI integration
├── static/
│   ├── index.html         # Frontend HTML
│   ├── script.js          # Frontend JavaScript
│   └── styles.css         # Custom styles
├── uploads/               # Temporary file storage
├── transcripts/           # Permanent transcript storage
├── setup.sh               # Setup script
├── run.sh                 # Run script
├── requirements.txt       # Python dependencies
├── .env.example           # Environment variables template
├── .env                   # Your configuration (not in git)
├── .gitignore            # Git ignore rules
├── LICENSE               # BSD 3-Clause License
└── README.md             # This file

Configuration

Edit .env to customize:

Variable Default Description
GEMINI_API_KEY - Google Gemini API key (optional)
WHISPER_MODEL large Whisper model size (tiny/base/small/medium/large)
MAX_FILE_SIZE_MB 1024 Maximum upload size in MB
HOST 0.0.0.0 Server host
PORT 8000 Server port

Whisper Model Options

  • tiny: Fastest, least accurate (~1GB)
  • base: Fast, basic accuracy (~1GB)
  • small: Balanced (~2GB)
  • medium: Good accuracy (~3GB)
  • large: Best accuracy, slower (~6GB) - Default

API Endpoints

Projects

  • POST /api/projects - Create a new project
  • GET /api/projects - List all projects
  • GET /api/projects/{id} - Get project details
  • DELETE /api/projects/{id} - Delete a project

Transcription

  • POST /api/transcribe - Upload and transcribe a file
  • GET /api/projects/{id}/transcripts - List project transcripts
  • GET /api/transcripts/{id}/download - Download transcript

AI Features

  • POST /api/ai/overview - Generate AI overview
  • POST /api/ai/ask - Ask a question
  • GET /api/projects/{id}/conversation - Get conversation history

System

  • GET /api/health - System health check

Security Features

  • No API Key Storage: API keys are only in .env (excluded from git)
  • Input Validation: File types, sizes, and MIME types are validated
  • Automatic Cleanup: Uploaded videos/audio are deleted after transcription
  • No Shell Injection: Uses safe subprocess calls
  • CORS Protection: Configurable CORS middleware
  • Error Handling: No internal paths or stack traces exposed

Troubleshooting

FFmpeg Not Found

If you see "FFmpeg not installed":

  1. Install FFmpeg using instructions above
  2. Ensure FFmpeg is in your system PATH
  3. Restart the application

AI Features Not Working

If AI features are unavailable:

  1. Check that GEMINI_API_KEY is set in .env
  2. Verify your API key is valid
  3. Check your internet connection
  4. Review the console for error messages

Transcription Fails

  • Ensure the file format is supported
  • Check file size is under the limit
  • Verify FFmpeg is working: ffmpeg -version
  • Check disk space for temporary files

Port Already in Use

If port 8000 is occupied:

  1. Edit .env and change PORT=8000 to another port
  2. Or stop the other application using port 8000

Development

Running in Development Mode

source venv/bin/activate
uvicorn main:app --reload --log-level debug

Installing Additional Dependencies

source venv/bin/activate
pip install package-name
pip freeze > requirements.txt

Performance Tips

  • Use smaller Whisper models (base/small) for faster transcription
  • Keep video files under 500MB for optimal performance
  • The first transcription will be slower as Whisper downloads the model
  • Subsequent transcriptions are faster as the model is cached

Contributing

This tool is designed for personal/educational use. Feel free to fork and modify for your needs.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Acknowledgments

Support

For issues, questions, or suggestions:

  1. Check the troubleshooting section above
  2. Review the console output for error messages
  3. Ensure all prerequisites are installed correctly

☀️ Tonalli - Built with respect and reverence for learning, inspired by ancient wisdom

About

Transform spoken wisdom into written knowledge. ☀️

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors