Skip to content

Transform audio recordings into professional transcripts and actionable memos using Google's Gemini AI.

Notifications You must be signed in to change notification settings

priit2000/memomaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎡 MemoMaker - Audio to Intelligent Memos

Transform audio recordings into professional transcripts and actionable memos using Google's Gemini AI.

Python License AI

✨ Features

  • 🎀 Built-in Recording - Record meetings directly with one-click start/stop
  • 🎯 Smart Audio Processing - Automatic detection of optimal processing method
  • πŸ“± Speech-Optimized - 22kHz mono recording perfect for meetings
  • πŸ’Ύ Efficient MP3 Encoding - Small file sizes without external dependencies
  • πŸ“ Organized Storage - Auto-creates recordings folder with timestamped files
  • 🌍 Multi-Language Support - Estonian and English prompts with easy language switching
  • ⚑ Multiple Processing Methods - Inline, cloud upload, or auto-detection
  • πŸ“ Configurable Prompts - Customize transcription and memo generation
  • πŸ”Š Wide Audio Support - MP3, WAV, M4A, OGG, FLAC, AAC formats
  • πŸ“‹ Markdown Output - Professional memo format with timestamps and action items
  • πŸ›‘οΈ File Validation - Comprehensive format, size, and integrity checking
  • πŸ“Š API Usage Tracking - Real-time token counts and processing statistics
  • πŸ”„ Real Progress Bar - Step-by-step progress indication during processing
  • πŸ”‘ Built-in API Key Manager - Easy setup and management of Gemini API keys

πŸš€ Quick Start

Prerequisites

  1. Python 3.8+
  2. Google API Key - Get one from Google AI Studio
  3. Required packages:
    pip install customtkinter google-generativeai pillow sounddevice scipy numpy lameenc

Setup

  1. Clone the repository:

    git clone https://github.com/priit2000/memomaker.git
    cd memomaker
  2. Set your Gemini API key:

    export GEMINI_API_KEY="your-api-key-here"

    Or on Windows:

    set GEMINI_API_KEY=your-api-key-here

    Or use the built-in API key manager: The app will show a setup dialog if no API key is found.

  3. Run the application:

    python memomaker-ui.py

🎯 How to Use

GUI Mode (Recommended)

Recording Mode (New!)

  1. Launch the app - Run python memomaker-ui.py
  2. Start recording - Click "🎀 Start Recording" button
  3. Record your meeting - Speak clearly into your microphone
  4. Stop recording - Click "πŸ›‘ Stop Recording" when finished
  5. Auto-processing - App automatically processes the recording and generates transcript + memo
  6. View results - Files saved in recordings/ folder with timestamp naming

File Mode

  1. Launch the app - Run python memomaker-ui.py
  2. Select audio file - Click "Browse" and choose your audio file (or click the file path field)
  3. Choose language - Select Estonian (ET) or English (EN) from the language dropdown
  4. Choose processing method:
    • 🎯 Auto - Smart detection based on file size
    • ⚑ Inline - Fast processing for smaller files (<20MB)
    • ☁️ Cloud Upload - Better for larger files (>20MB)
  5. Customize prompts (optional) - Edit transcription and memo prompts in the tabs
  6. Process - Click "Process Audio" and watch real-time progress
  7. Manage API key (optional) - Click "πŸ”‘ API Key" button to view/edit your Gemini API key
  8. View results - Files saved in recordings/ folder with organized naming
  9. Monitor usage - View detailed API usage statistics including token counts in the results area

CLI Mode

python memomaker-ui.py audio_file.mp3 [--method auto|inline|upload] [--prompt "custom prompt"]

βš™οΈ Configuration

Settings (Top of memomaker-ui.py)

# API Configuration
API_KEY = os.environ.get("GEMINI_API_KEY")

# Model Settings
MODEL_NAME = 'gemini-flash-latest'

# File Processing Settings
INLINE_THRESHOLD = 20 * 1024 * 1024  # 20 MB
MAX_FILE_SIZE = 100 * 1024 * 1024     # 100 MB max
MIN_FILE_SIZE = 1024                   # 1 KB min

# Output File Settings
TRANSCRIPT_FILENAME = "transcript.txt"
MEMO_FILENAME = "memo.md"

# UI Settings
WINDOW_WIDTH = 1000
WINDOW_HEIGHT = 800

Multi-Language Prompts

The app automatically detects and uses language-specific prompt files:

  • Estonian: transcription-prompt-et.md
  • English: transcription-prompt-en.md
  • Future: transcription-prompt-fr.md, transcription-prompt-de.md, etc.

Each file contains:

  • Transcription rules - Under # Transkriptsioon/# Transcription section
  • Memo format - Under # Memo section

Language Selection:

  • Dropdown menu appears when multiple language files are present
  • Single language shows as label
  • Missing files show clear error messages in prompt areas

πŸ“ File Structure

memomaker/
β”œβ”€β”€ memomaker-ui.py              # Main application
β”œβ”€β”€ transcription-prompt-et.md   # Estonian prompts
β”œβ”€β”€ transcription-prompt-en.md   # English prompts
β”œβ”€β”€ .gitignore                  # Git ignore rules
β”œβ”€β”€ README.md                   # This file
└── recordings/                 # Auto-created folder for all outputs
    β”œβ”€β”€ 241113-143022-recording.mp3    # Recorded audio
    β”œβ”€β”€ 241113-143022-transcript.txt   # Generated transcript
    └── 241113-143022-memo.md          # Generated memo

🎨 Processing Methods

Method Best For File Size Speed Quality
Auto Most cases Any Smart Optimal
Inline Quick processing < 20MB Fastest Good
Upload Large files > 20MB Slower Best

πŸ“ Output Examples

Transcript Format

[00h:02m:15s] Priit Kallas: Alustame tΓ€nase koosoleku. PΓ€evakorras on kolm punkti.
[00h:02m:28s] Henrik Aavik: TΓ€nan. Kas vΓ΅iksime alustada eelmise nΓ€dala tulemustega?
[00h:02m:45s] Priit Kallas: Kindlasti. Numbrid on vΓ€ga head...

Memo Format (Markdown)

  • Structured sections: Participants, summary, decisions, actions
  • Timestamps: References to specific moments in audio
  • Action items: Clear responsibilities and deadlines
  • Multi-language: Professional business language (Estonian or English)
  • Markdown format: Easy to edit and convert to other formats

πŸ”§ Troubleshooting

Common Issues

"No module named 'customtkinter'"

pip install customtkinter

"Invalid API key"

  • Verify your Google API key is correct
  • Check environment variable is set: echo $GEMINI_API_KEY (Linux/Mac) or echo %GEMINI_API_KEY% (Windows)
  • Use the built-in "πŸ”‘ API Key" button to set up your key

"File validation failed" errors

  • Check file format (supported: MP3, WAV, M4A, OGG, FLAC, AAC)
  • Ensure file size is between 1KB and 100MB
  • Use "Upload" method for files > 20MB
  • Consider compressing large audio files

UI not appearing

  • Ensure you're running GUI mode: python memomaker-ui.py
  • Check if running with command line arguments (switches to CLI mode)
  • Verify prompt files exist: transcription-prompt-et.md or transcription-prompt-en.md

Performance Tips

  • Use MP3 format for best compatibility
  • Compress large files before processing
  • Close other applications during processing for better performance

πŸ› οΈ Development

Requirements

  • Python 3.8+
  • tkinter (usually included with Python)
  • customtkinter
  • google-generativeai
  • Pillow
  • sounddevice (for recording)
  • scipy (for audio processing)
  • numpy (for audio data)
  • lameenc (for MP3 encoding)

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“‹ Roadmap

  • Built-in recording - Direct audio capture (completed)
  • MP3 optimization - Speech-optimized recording (completed)
  • Organized file structure - Timestamped file naming (completed)
  • Batch processing - Process multiple files
  • Export formats - PDF, Word, plain text
  • Audio player - Built-in playback with waveform
  • Cloud storage - Direct integration with Google Drive/OneDrive
  • Multi-language - Estonian and English support (completed)
  • Additional languages - French, German, etc.
  • Templates - Custom memo templates

⚠️ Security & Privacy

  • API keys are stored locally as environment variables
  • Audio files are processed according to Google's privacy policy
  • No data retention - Files are not stored after processing
  • Local processing - Transcripts and memos saved locally

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ“Š Key Features Detailed

🎀 Built-in Audio Recording

  • One-click recording: Start/stop with visual feedback
  • Speech-optimized: 22kHz mono recording perfect for meetings
  • Efficient encoding: Direct MP3 encoding with lameenc (no external tools needed)
  • Auto-processing: Automatically processes recorded audio when recording stops
  • File size optimization: ~1MB per minute vs ~10MB for standard recording

πŸ“ Organized File Management

  • Auto-folder creation: recordings/ folder created automatically
  • Timestamped naming: yymmdd-hhmmss-[type] format for easy organization
  • Session grouping: All files from same recording session have matching timestamps
  • Example: 241113-143022-recording.mp3, 241113-143022-transcript.txt, 241113-143022-memo.md

πŸ“Š API Usage Tracking

  • Real-time statistics displayed in results area
  • Token counts: Input, output, and total tokens
  • Processing time: Detailed timing for each operation
  • No log files: All data shown directly in UI

πŸ›‘οΈ File Validation

  • Format checking: Validates audio file extensions and MIME types
  • Size limits: Enforces minimum (1KB) and maximum (100MB) file sizes
  • Integrity checks: Basic corruption detection
  • Clear error messages: Specific validation failure details

πŸ”„ Real Progress Bar

  • Step-by-step progress: Shows actual processing stages
  • Visual feedback: Progress from 0.1 (start) to 1.0 (complete)
  • Stays visible: Progress remains visible for 2 seconds after completion

🀝 Support

πŸ™ Acknowledgments

  • Google Gemini AI - For powerful audio processing capabilities
  • CustomTkinter - For modern UI components
  • Estonian language community - For feedback and testing

Made with ❀️ for efficient meeting documentation

About

Transform audio recordings into professional transcripts and actionable memos using Google's Gemini AI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages