This repository provides a command-line workflow that converts batches of local video files to audio (.m4a) and submits them to Google’s Gemini 2.5 Flash model for automatic transcription. Use it to bulk-process recordings you already possess—no YouTube links, web UI, or manual uploads required.
- Scans a directory for videos with a specified extension (default
mp4). - Uses
ffmpegto extract audio only when an.m4acopy does not already exist. - Uploads each audio file to Gemini 2.5 Flash and retrieves a verbatim transcript stored beside the source video.
- Cleans up uploaded assets from the Gemini account to avoid orphaned files.
- Python 3.10+
ffmpegavailable on yourPATH- Google Generative AI API access and a
GOOGLE_API_KEY - Python dependencies from
requirements.txt(google-generativeai,python-dotenv)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
touch .env
echo "GOOGLE_API_KEY=your_key" >> .envpython app.py --dir data/videos --format mp4--dirpoints to the folder with your source files.--formatis the input video extension (omit the dot). The default ismp4; change tomov,mkv, etc., as needed.- Each processed video produces:
<name>.m4a(audio) — skipped if it already exists.<name>.txt— plain-text transcript emitted next to the video.
- ffmpeg missing: install via your package manager (e.g.,
brew install ffmpeg,choco install ffmpeg) and reopen the shell. - Authentication errors: ensure
.envcontains a validGOOGLE_API_KEYand that the key has access to Gemini 2.5 Flash. - Quota limits: the script stops when Gemini rejects an upload; retry after confirming usage limits or switch to a paid tier.
Issues and pull requests are welcome. Please describe the scenario you processed (--dir, sample formats), list commands you ran, and attach relevant logs or transcript snippets to expedite reviews.