🎵 MemoMaker - Audio to Intelligent Memos

Transform audio recordings into professional transcripts and actionable memos using Google's Gemini AI.

✨ Features

🎤 Built-in Recording - Record meetings directly with one-click start/stop
🎯 Smart Audio Processing - Automatic detection of optimal processing method
📱 Speech-Optimized - 22kHz mono recording perfect for meetings
💾 Efficient MP3 Encoding - Small file sizes without external dependencies
📁 Organized Storage - Auto-creates recordings folder with timestamped files
🌍 Multi-Language Support - Estonian and English prompts with easy language switching
⚡ Multiple Processing Methods - Inline, cloud upload, or auto-detection
📝 Configurable Prompts - Customize transcription and memo generation
🔊 Wide Audio Support - MP3, WAV, M4A, OGG, FLAC, AAC formats
📋 Markdown Output - Professional memo format with timestamps and action items
🛡️ File Validation - Comprehensive format, size, and integrity checking
📊 API Usage Tracking - Real-time token counts and processing statistics
🔄 Real Progress Bar - Step-by-step progress indication during processing
🔑 Built-in API Key Manager - Easy setup and management of Gemini API keys

🚀 Quick Start

Prerequisites

Python 3.8+
Google API Key - Get one from Google AI Studio

Required packages:

pip install customtkinter google-generativeai pillow sounddevice scipy numpy lameenc

Setup

Clone the repository:

git clone https://github.com/priit2000/memomaker.git
cd memomaker

Set your Gemini API key:
```
export GEMINI_API_KEY="your-api-key-here"
```
Or on Windows:
```
set GEMINI_API_KEY=your-api-key-here
```
Or use the built-in API key manager: The app will show a setup dialog if no API key is found.
Run the application:
```
python memomaker-ui.py
```

🎯 How to Use

GUI Mode (Recommended)

Recording Mode (New!)

Launch the app - Run python memomaker-ui.py
Start recording - Click "🎤 Start Recording" button
Record your meeting - Speak clearly into your microphone
Stop recording - Click "🛑 Stop Recording" when finished
Auto-processing - App automatically processes the recording and generates transcript + memo
View results - Files saved in recordings/ folder with timestamp naming

File Mode

Launch the app - Run python memomaker-ui.py
Select audio file - Click "Browse" and choose your audio file (or click the file path field)
Choose language - Select Estonian (ET) or English (EN) from the language dropdown
Choose processing method:
- 🎯 Auto - Smart detection based on file size
- ⚡ Inline - Fast processing for smaller files (<20MB)
- ☁️ Cloud Upload - Better for larger files (>20MB)
Customize prompts (optional) - Edit transcription and memo prompts in the tabs
Process - Click "Process Audio" and watch real-time progress
Manage API key (optional) - Click "🔑 API Key" button to view/edit your Gemini API key
View results - Files saved in recordings/ folder with organized naming
Monitor usage - View detailed API usage statistics including token counts in the results area

CLI Mode

python memomaker-ui.py audio_file.mp3 [--method auto|inline|upload] [--prompt "custom prompt"]

⚙️ Configuration

Settings (Top of `memomaker-ui.py`)

# API Configuration
API_KEY = os.environ.get("GEMINI_API_KEY")

# Model Settings
MODEL_NAME = 'gemini-flash-latest'

# File Processing Settings
INLINE_THRESHOLD = 20 * 1024 * 1024  # 20 MB
MAX_FILE_SIZE = 100 * 1024 * 1024     # 100 MB max
MIN_FILE_SIZE = 1024                   # 1 KB min

# Output File Settings
TRANSCRIPT_FILENAME = "transcript.txt"
MEMO_FILENAME = "memo.md"

# UI Settings
WINDOW_WIDTH = 1000
WINDOW_HEIGHT = 800

Multi-Language Prompts

The app automatically detects and uses language-specific prompt files:

Estonian: transcription-prompt-et.md
English: transcription-prompt-en.md
Future: transcription-prompt-fr.md, transcription-prompt-de.md, etc.

Each file contains:

Transcription rules - Under # Transkriptsioon/# Transcription section
Memo format - Under # Memo section

Language Selection:

Dropdown menu appears when multiple language files are present
Single language shows as label
Missing files show clear error messages in prompt areas

📁 File Structure

memomaker/
├── memomaker-ui.py              # Main application
├── transcription-prompt-et.md   # Estonian prompts
├── transcription-prompt-en.md   # English prompts
├── .gitignore                  # Git ignore rules
├── README.md                   # This file
└── recordings/                 # Auto-created folder for all outputs
    ├── 241113-143022-recording.mp3    # Recorded audio
    ├── 241113-143022-transcript.txt   # Generated transcript
    └── 241113-143022-memo.md          # Generated memo

🎨 Processing Methods

Method	Best For	File Size	Speed	Quality
Auto	Most cases	Any	Smart	Optimal
Inline	Quick processing	< 20MB	Fastest	Good
Upload	Large files	> 20MB	Slower	Best

📝 Output Examples

Transcript Format

[00h:02m:15s] Priit Kallas: Alustame tänase koosoleku. Päevakorras on kolm punkti.
[00h:02m:28s] Henrik Aavik: Tänan. Kas võiksime alustada eelmise nädala tulemustega?
[00h:02m:45s] Priit Kallas: Kindlasti. Numbrid on väga head...

Memo Format (Markdown)

Structured sections: Participants, summary, decisions, actions
Timestamps: References to specific moments in audio
Action items: Clear responsibilities and deadlines
Multi-language: Professional business language (Estonian or English)
Markdown format: Easy to edit and convert to other formats

🔧 Troubleshooting

Common Issues

"No module named 'customtkinter'"

pip install customtkinter

"Invalid API key"

Verify your Google API key is correct
Check environment variable is set: echo $GEMINI_API_KEY (Linux/Mac) or echo %GEMINI_API_KEY% (Windows)
Use the built-in "🔑 API Key" button to set up your key

"File validation failed" errors

Check file format (supported: MP3, WAV, M4A, OGG, FLAC, AAC)
Ensure file size is between 1KB and 100MB
Use "Upload" method for files > 20MB
Consider compressing large audio files

UI not appearing

Ensure you're running GUI mode: python memomaker-ui.py
Check if running with command line arguments (switches to CLI mode)
Verify prompt files exist: transcription-prompt-et.md or transcription-prompt-en.md

Performance Tips

Use MP3 format for best compatibility
Compress large files before processing
Close other applications during processing for better performance

🛠️ Development

Requirements

Python 3.8+
tkinter (usually included with Python)
customtkinter
google-generativeai
Pillow
sounddevice (for recording)
scipy (for audio processing)
numpy (for audio data)
lameenc (for MP3 encoding)

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📋 Roadmap

⚠️ Security & Privacy

API keys are stored locally as environment variables
Audio files are processed according to Google's privacy policy
No data retention - Files are not stored after processing
Local processing - Transcripts and memos saved locally

📄 License

MIT License - see LICENSE file for details.

📊 Key Features Detailed

🎤 Built-in Audio Recording

One-click recording: Start/stop with visual feedback
Speech-optimized: 22kHz mono recording perfect for meetings
Efficient encoding: Direct MP3 encoding with lameenc (no external tools needed)
Auto-processing: Automatically processes recorded audio when recording stops
File size optimization: ~1MB per minute vs ~10MB for standard recording

📁 Organized File Management

Auto-folder creation: recordings/ folder created automatically
Timestamped naming: yymmdd-hhmmss-[type] format for easy organization
Session grouping: All files from same recording session have matching timestamps
Example: 241113-143022-recording.mp3, 241113-143022-transcript.txt, 241113-143022-memo.md

📊 API Usage Tracking

Real-time statistics displayed in results area
Token counts: Input, output, and total tokens
Processing time: Detailed timing for each operation
No log files: All data shown directly in UI

🛡️ File Validation

Format checking: Validates audio file extensions and MIME types
Size limits: Enforces minimum (1KB) and maximum (100MB) file sizes
Integrity checks: Basic corruption detection
Clear error messages: Specific validation failure details

🔄 Real Progress Bar

Step-by-step progress: Shows actual processing stages
Visual feedback: Progress from 0.1 (start) to 1.0 (complete)
Stays visible: Progress remains visible for 2 seconds after completion

🤝 Support

Issues: GitHub Issues
Discussions: GitHub Discussions

🙏 Acknowledgments

Google Gemini AI - For powerful audio processing capabilities
CustomTkinter - For modern UI components
Estonian language community - For feedback and testing

Made with ❤️ for efficient meeting documentation

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
memomaker-ui.py		memomaker-ui.py
transcription-prompt-en.md		transcription-prompt-en.md
transcription-prompt-et.md		transcription-prompt-et.md

priit2000/memomaker

Folders and files

Latest commit

History

Repository files navigation