Transform audio recordings into professional transcripts and actionable memos using Google's Gemini AI.
- π€ Built-in Recording - Record meetings directly with one-click start/stop
- π― Smart Audio Processing - Automatic detection of optimal processing method
- π± Speech-Optimized - 22kHz mono recording perfect for meetings
- πΎ Efficient MP3 Encoding - Small file sizes without external dependencies
- π Organized Storage - Auto-creates recordings folder with timestamped files
- π Multi-Language Support - Estonian and English prompts with easy language switching
- β‘ Multiple Processing Methods - Inline, cloud upload, or auto-detection
- π Configurable Prompts - Customize transcription and memo generation
- π Wide Audio Support - MP3, WAV, M4A, OGG, FLAC, AAC formats
- π Markdown Output - Professional memo format with timestamps and action items
- π‘οΈ File Validation - Comprehensive format, size, and integrity checking
- π API Usage Tracking - Real-time token counts and processing statistics
- π Real Progress Bar - Step-by-step progress indication during processing
- π Built-in API Key Manager - Easy setup and management of Gemini API keys
- Python 3.8+
- Google API Key - Get one from Google AI Studio
- Required packages:
pip install customtkinter google-generativeai pillow sounddevice scipy numpy lameenc
-
Clone the repository:
git clone https://github.com/priit2000/memomaker.git cd memomaker -
Set your Gemini API key:
export GEMINI_API_KEY="your-api-key-here"
Or on Windows:
set GEMINI_API_KEY=your-api-key-here
Or use the built-in API key manager: The app will show a setup dialog if no API key is found.
-
Run the application:
python memomaker-ui.py
- Launch the app - Run
python memomaker-ui.py - Start recording - Click "π€ Start Recording" button
- Record your meeting - Speak clearly into your microphone
- Stop recording - Click "π Stop Recording" when finished
- Auto-processing - App automatically processes the recording and generates transcript + memo
- View results - Files saved in
recordings/folder with timestamp naming
- Launch the app - Run
python memomaker-ui.py - Select audio file - Click "Browse" and choose your audio file (or click the file path field)
- Choose language - Select Estonian (ET) or English (EN) from the language dropdown
- Choose processing method:
- π― Auto - Smart detection based on file size
- β‘ Inline - Fast processing for smaller files (<20MB)
- βοΈ Cloud Upload - Better for larger files (>20MB)
- Customize prompts (optional) - Edit transcription and memo prompts in the tabs
- Process - Click "Process Audio" and watch real-time progress
- Manage API key (optional) - Click "π API Key" button to view/edit your Gemini API key
- View results - Files saved in
recordings/folder with organized naming - Monitor usage - View detailed API usage statistics including token counts in the results area
python memomaker-ui.py audio_file.mp3 [--method auto|inline|upload] [--prompt "custom prompt"]# API Configuration
API_KEY = os.environ.get("GEMINI_API_KEY")
# Model Settings
MODEL_NAME = 'gemini-flash-latest'
# File Processing Settings
INLINE_THRESHOLD = 20 * 1024 * 1024 # 20 MB
MAX_FILE_SIZE = 100 * 1024 * 1024 # 100 MB max
MIN_FILE_SIZE = 1024 # 1 KB min
# Output File Settings
TRANSCRIPT_FILENAME = "transcript.txt"
MEMO_FILENAME = "memo.md"
# UI Settings
WINDOW_WIDTH = 1000
WINDOW_HEIGHT = 800The app automatically detects and uses language-specific prompt files:
- Estonian:
transcription-prompt-et.md - English:
transcription-prompt-en.md - Future:
transcription-prompt-fr.md,transcription-prompt-de.md, etc.
Each file contains:
- Transcription rules - Under
# Transkriptsioon/# Transcriptionsection - Memo format - Under
# Memosection
Language Selection:
- Dropdown menu appears when multiple language files are present
- Single language shows as label
- Missing files show clear error messages in prompt areas
memomaker/
βββ memomaker-ui.py # Main application
βββ transcription-prompt-et.md # Estonian prompts
βββ transcription-prompt-en.md # English prompts
βββ .gitignore # Git ignore rules
βββ README.md # This file
βββ recordings/ # Auto-created folder for all outputs
βββ 241113-143022-recording.mp3 # Recorded audio
βββ 241113-143022-transcript.txt # Generated transcript
βββ 241113-143022-memo.md # Generated memo
| Method | Best For | File Size | Speed | Quality |
|---|---|---|---|---|
| Auto | Most cases | Any | Smart | Optimal |
| Inline | Quick processing | < 20MB | Fastest | Good |
| Upload | Large files | > 20MB | Slower | Best |
[00h:02m:15s] Priit Kallas: Alustame tΓ€nase koosoleku. PΓ€evakorras on kolm punkti.
[00h:02m:28s] Henrik Aavik: TΓ€nan. Kas vΓ΅iksime alustada eelmise nΓ€dala tulemustega?
[00h:02m:45s] Priit Kallas: Kindlasti. Numbrid on vΓ€ga head...
- Structured sections: Participants, summary, decisions, actions
- Timestamps: References to specific moments in audio
- Action items: Clear responsibilities and deadlines
- Multi-language: Professional business language (Estonian or English)
- Markdown format: Easy to edit and convert to other formats
"No module named 'customtkinter'"
pip install customtkinter"Invalid API key"
- Verify your Google API key is correct
- Check environment variable is set:
echo $GEMINI_API_KEY(Linux/Mac) orecho %GEMINI_API_KEY%(Windows) - Use the built-in "π API Key" button to set up your key
"File validation failed" errors
- Check file format (supported: MP3, WAV, M4A, OGG, FLAC, AAC)
- Ensure file size is between 1KB and 100MB
- Use "Upload" method for files > 20MB
- Consider compressing large audio files
UI not appearing
- Ensure you're running GUI mode:
python memomaker-ui.py - Check if running with command line arguments (switches to CLI mode)
- Verify prompt files exist:
transcription-prompt-et.mdortranscription-prompt-en.md
- Use MP3 format for best compatibility
- Compress large files before processing
- Close other applications during processing for better performance
- Python 3.8+
- tkinter (usually included with Python)
- customtkinter
- google-generativeai
- Pillow
- sounddevice (for recording)
- scipy (for audio processing)
- numpy (for audio data)
- lameenc (for MP3 encoding)
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
- Built-in recording - Direct audio capture (completed)
- MP3 optimization - Speech-optimized recording (completed)
- Organized file structure - Timestamped file naming (completed)
- Batch processing - Process multiple files
- Export formats - PDF, Word, plain text
- Audio player - Built-in playback with waveform
- Cloud storage - Direct integration with Google Drive/OneDrive
- Multi-language - Estonian and English support (completed)
- Additional languages - French, German, etc.
- Templates - Custom memo templates
- API keys are stored locally as environment variables
- Audio files are processed according to Google's privacy policy
- No data retention - Files are not stored after processing
- Local processing - Transcripts and memos saved locally
MIT License - see LICENSE file for details.
- One-click recording: Start/stop with visual feedback
- Speech-optimized: 22kHz mono recording perfect for meetings
- Efficient encoding: Direct MP3 encoding with lameenc (no external tools needed)
- Auto-processing: Automatically processes recorded audio when recording stops
- File size optimization: ~1MB per minute vs ~10MB for standard recording
- Auto-folder creation:
recordings/folder created automatically - Timestamped naming:
yymmdd-hhmmss-[type]format for easy organization - Session grouping: All files from same recording session have matching timestamps
- Example:
241113-143022-recording.mp3,241113-143022-transcript.txt,241113-143022-memo.md
- Real-time statistics displayed in results area
- Token counts: Input, output, and total tokens
- Processing time: Detailed timing for each operation
- No log files: All data shown directly in UI
- Format checking: Validates audio file extensions and MIME types
- Size limits: Enforces minimum (1KB) and maximum (100MB) file sizes
- Integrity checks: Basic corruption detection
- Clear error messages: Specific validation failure details
- Step-by-step progress: Shows actual processing stages
- Visual feedback: Progress from 0.1 (start) to 1.0 (complete)
- Stays visible: Progress remains visible for 2 seconds after completion
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Google Gemini AI - For powerful audio processing capabilities
- CustomTkinter - For modern UI components
- Estonian language community - For feedback and testing
Made with β€οΈ for efficient meeting documentation