monkeyplug-enhanced is an enhanced fork of mmguero/monkeyplug (available on PyPI as monkeyplug). It censors profanity in audio files using speech recognition, detecting profanity timestamps and either muting, beeping, or splicing in instrumental audio using FFmpeg.
The CLI command is still monkeyplug — only the package name changed to avoid conflicting with the original.
- Groq API integration (fast, default mode)
- AI instrumental generation via sherpa-onnx source separation
- AI profanity detection via Groq LLM with structured outputs
- Wildcard/batch processing with automatic vocal detection
- Progress bar for non-verbose mode
- Transcript save/reuse for faster reprocessing
- Config file support with sensible defaults
- Automatic metadata tagging via ShazamIO (title, artist, genre, cover art)
- Speech recognition produces word-level timestamps (using Groq, Whisper, or Vosk)
- Each word is checked against a built-in profanity list (or your custom list)
- FFmpeg creates a cleaned audio file by either muting, beeping, or replacing profanity sections with instrumental audio
- Optionally, transcripts can be saved and reused to skip transcription on future runs
If provided a video file, monkeyplug processes the audio stream and remultiplexes it with the original video stream.
pip install monkeyplug-enhancedOr install from GitHub:
pip install 'git+https://github.com/ljbred08/monkeyplug'- FFmpeg — install via your OS package manager or from ffmpeg.org
- Python 3.10+
- Groq API key (for default mode) — see Groq API Setup
- Optional: Whisper or Vosk for offline recognition
The default mode uses Groq's fast Whisper API. Configure your API key using one of these methods (in order of priority):
Command-line parameter:
monkeyplug -i input.mp3 -o output.mp3 --groq-api-key gsk_...Environment variable:
export GROQ_API_KEY=gsk_...Config file (~/.groq/config.json):
{"api_key": "gsk_..."}Project-local file (add .groq_key to .gitignore):
echo 'gsk_...' > .groq_key# Basic usage — mutes profanity using Groq API and built-in word list
# Shows progress bar automatically in non-verbose mode
monkeyplug -i song.mp3 -o song_clean.mp3
# Verbose output to see what's happening
monkeyplug -i song.mp3 -o song_clean.mp3 -v
# Use local Whisper instead of Groq
monkeyplug -i song.mp3 -o song_clean.mp3 -m whisperThree modes are available. Priority order: --mute > --beep > --instrumental.
Silences profanity sections with short fade transitions.
monkeyplug -i song.mp3 -o song_clean.mp3 --muteReplaces profanity with a beep tone.
# Basic beep
monkeyplug -i song.mp3 -o song_clean.mp3 -b
# Customize beep frequency and mix
monkeyplug -i song.mp3 -o song_clean.mp3 -b -z 1000 --beep-mix-normalizeReplaces profanity sections with instrumental audio for a professional-sounding clean edit. Supports several sub-modes:
monkeyplug -i explicit.mp3 -o clean.mp3 --instrumental instrumental.mp3Searches for an instrumental file using fuzzy matching. If not found, falls back to AI generation.
# Default behavior — searches for matching instrumental, generates if not found
monkeyplug -i song.mp3 -o song_clean.mp3 --instrumental auto
# This is also the default when no --instrumental flag is given
monkeyplug -i song.mp3 -o song_clean.mp3AUTO fuzzy matching searches the same directory for audio files with similar names (30% similarity threshold). Examples:
1-satisfied.mp3→ findssatisfied-inst.mp3MySong_v2.mp3→ findsMySong_instrumental.mp3
Searches for instrumental files using a specific prefix/suffix pattern:
# Searches for: song_inst.mp3, song-inst.mp3, inst_song.mp3, etc.
monkeyplug -i song.mp3 -o song_clean.mp3 --instrumental prefix --instrumental-prefix instUses sherpa-onnx to AI-generate instrumental sections for profanity segments. Skips all instrumental file searching.
monkeyplug -i song.mp3 -o song_clean.mp3 --instrumental generateThe AI separation process:
- Extracts profanity segments from the original audio
- Concatenates them with configurable padding (default: 1.0s)
- Separates vocals from instrumental using a Spleeter model
- Splices the AI-generated instrumental back into the original
Separation models are cached at ~/.cache/monkeyplug/separation_models/ (downloaded on first use).
Process multiple files at once using * wildcards:
# Process all MP3s in current directory
monkeyplug -i "*.mp3" -o "*_clean.mp3" --instrumental generate
# With verbose output
monkeyplug -i "*.mp3" -o "*_clean.mp3 -vIn wildcard mode, monkeyplug automatically detects which files have vocals by transcribing a 10-second sample from the middle of each file. Instrumental files (no speech detected) are skipped.
With --instrumental generate, vocal detection is skipped by default (all files are processed) since you're generating instrumentals anyway. Use --filter-instrumentals to re-enable it:
# Process all files (default — no vocal detection)
monkeyplug -i "*.mp3" -o "*_clean.mp3" --instrumental generate
# Skip files detected as instrumentals
monkeyplug -i "*.mp3" -o "*_clean.mp3" --instrumental generate --filter-instrumentalsFiles matching the output pattern are automatically skipped (already processed).
Save and reuse transcripts to avoid redundant API calls (up to 22x faster on repeat runs):
# Generate and save transcript alongside output
monkeyplug -i song.mp3 -o song_clean.mp3 --save-transcript
# Creates: song_clean.mp3 + song_clean_transcript.json
# Second run: automatically finds and reuses the transcript
monkeyplug -i song.mp3 -o song_clean.mp3 --save-transcript
# Force new transcription (ignore existing transcript)
monkeyplug -i song.mp3 -o song_clean.mp3 --save-transcript --force-retranscribe
# Manually specify a transcript to load
monkeyplug -i song.mp3 -o song_clean_strict.mp3 --input-transcript song_clean_transcript.json -w strict_swears.txt# Use a custom text file (one word per line, or word|replacement)
monkeyplug -i podcast.mp3 -o podcast_clean.mp3 --swears custom_swears.txt
# Use a custom JSON file (array of strings)
monkeyplug -i podcast.mp3 -o podcast_clean.mp3 --swears custom_swears.json
# Custom words are merged with the built-in profanity listmonkeyplug automatically fetches song metadata from Shazam and embeds it into the output file:
- Title, Artist, Genre - Text tags embedded in the audio file
- Cover Art - Album artwork downloaded and embedded (MP3 only)
# Metadata is enabled by default
monkeyplug -i song.mp3 -o song_clean.mp3
# Disable metadata fetching
monkeyplug -i song.mp3 -o song_clean.mp3 --disable-metadataWhat happens:
- The input file is analyzed by Shazam to identify the song
- Metadata (title, artist, genre, cover art URL) is retrieved
- Cover art is downloaded and embedded as ID3/APIC frames
- Text tags are added to the output file
Notes:
- Requires internet connection for Shazam recognition
- Cover art embedding is supported for MP3 files
- If recognition fails, the file is still processed (no error)
- Metadata can be viewed in any music player or with
ffprobe
Control what's printed about detected profanity in normal (non-verbose) mode:
# Show count only (default)
monkeyplug -i song.mp3 -o song_clean.mp3 -w clean
# Show full list with timestamps
monkeyplug -i song.mp3 -o song_clean.mp3 -w full
# Silent mode (no profanity output)
monkeyplug -i song.mp3 -o song_clean.mp3 -w noneUse Groq's LLM for context-aware profanity detection instead of (or in addition to) the static word list:
# AI-only detection (replaces static list)
monkeyplug -i song.mp3 -o song_clean.mp3 --detect ai
# Both list + AI (word flagged if either catches it)
monkeyplug -i song.mp3 -o song_clean.mp3 --detect both
# Default: static list only
monkeyplug -i song.mp3 -o song_clean.mp3 --detect listRequires a Groq API key (same setup as Groq STT mode). Works with all speech recognition modes (Groq, Whisper, Vosk).
Configurable via ~/.cache/monkeyplug/config.json:
{
"detect_mode": "list",
"ai_detect_model": "openai/gpt-oss-20b",
"ai_detect_prompt": "You are a profanity detection assistant..."
}Unify album names, cover art, and assign track numbers across a folder of songs using AI:
# Basic AI unification
monkeyplug --unify-album
# With Spotify integration (recommended for best results)
monkeyplug --unify-album --use-spotify
# With direct Spotify URL (skip search)
monkeyplug --unify-album --use-spotify "https://open.spotify.com/album/1kCHru7uhxBUdzkm4gzRQc"
# Combine with normal processing
monkeyplug -i "album/*.mp3" -o "album/*_clean.mp3" --unify-album
# Full workflow with Spotify and smart renaming
monkeyplug -i "album/*.mp3" -o "album/*_clean.mp3" --unify-album --use-spotify --auto-renameThe AI analyzes all songs together to determine the correct album name and track order. With --use-spotify, it fetches official cover art and track listings from Spotify for accurate results.
Two modes:
- Combined with processing: Runs after normal audio processing completes
- Standalone: Processes existing files without audio processing (requires Groq API key)
Spotify Integration (--use-spotify):
- Provide a direct Spotify URL to skip the search step
- Or let it search automatically for the album
- Downloads official cover art (640x640)
- Gets official track listing for accurate ordering
- Applies consistent cover art to all tracks
Configurable via ~/.cache/monkeyplug/config.json:
{
"unify_album_model": "openai/gpt-oss-120b",
"unify_album_prompt": "You are a music metadata expert..."
}Requirements:
- Groq API key (same setup as other AI features)
- Files must have existing metadata (title, album)
- MP3 files get full support (album + track number + cover art via ID3 tags)
monkeyplug looks for a JSON config file in this order (first found wins):
./.monkeyplug.json(current directory — project-specific)~/.cache/monkeyplug/config.json(user-specific)
If neither exists, a default config is auto-created at ~/.cache/monkeyplug/config.json:
{
"pad_milliseconds": 10,
"pad_milliseconds_pre": 10,
"pad_milliseconds_post": 10,
"separation_padding": 1.0,
"beep_hertz": 1000,
"show_words": "clean",
"detect_mode": "list",
"ai_detect_model": "openai/gpt-oss-20b",
"ai_detect_prompt": "You are a profanity detection assistant..."
}Config values provide defaults that can be overridden by CLI arguments.
Clean all caches (models, config) with:
monkeyplug --clean-cacheAdd padding around profanity for smoother transitions:
# Equal padding on both sides
monkeyplug -i song.mp3 -o clean.mp3 --pad-milliseconds 100
# Different pre and post padding
monkeyplug -i song.mp3 -o clean.mp3 --pad-milliseconds-pre 50 --pad-milliseconds-post 100usage: monkeyplug <arguments>
Core Options:
-i, --input <string> Input file, URL, or wildcard pattern
-o, --output <string> Output file or pattern
-v [concise|full], --verbose Verbose output
-m [groq|whisper|vosk], --mode Speech recognition engine (default: groq)
Censorship Modes:
--mute Mute profanity (disables instrumental mode)
-b, --beep Beep instead of silence
--instrumental <mode|file> Instrumental mode: auto, generate, prefix, or file path
--instrumental-prefix <string> Prefix to search for instrumental file (default: AUTO)
--instrumental-auto-candidates <int> Top candidates for AUTO matching (default: 5)
Profanity:
--swears <file> Custom profanity list (text or JSON)
--detect <list|ai|both> Profanity detection method (default: list)
-w, --show-words <clean|full|none> Show detected profanity (default: clean)
--pad-milliseconds <int> Padding around profanity (default: 10)
--pad-milliseconds-pre <int> Padding before profanity (default: 10)
--pad-milliseconds-post <int> Padding after profanity (default: 10)
Beep Options:
-z, --beep-hertz <int> Beep frequency in Hz (default: 1000)
--beep-mix-normalize Normalize audio/beep mix
--beep-audio-weight <int> Non-beeped audio weight (default: 1)
--beep-sine-weight <int> Beep weight (default: 1)
--beep-dropout-transition <int> Dropout transition for beep (default: 0)
Transcript:
--save-transcript Save transcript JSON alongside output
--input-transcript <file> Load existing transcript JSON
--output-json <file> Save transcript to specific file
--force-retranscribe Force new transcription
AI Separation:
--separation-padding <seconds> Context padding for AI generation (default: 1.0)
--filter-instrumentals Filter out instrumental files in wildcard mode with generate
Audio Output:
-f, --format <string> Output format (default: inferred from extension or "MATCH")
-c, --channels <int> Output channels (default: 2)
-s, --sample-rate <int> Output sample rate (default: 48000)
-r, --bitrate <string> Output bitrate (default: 256K)
-a, --audio-params <string> FFmpeg audio parameters
-q, --vorbis-qscale <int> qscale for libvorbis (default: 5)
Other:
--force Process file even if already tagged
--disable-metadata Disable automatic metadata fetching via ShazamIO
--unify-album Unify album metadata across all files in the folder using AI
--clean-cache Delete all cached data (models, config) and exit
Groq Options:
--groq-api-key <string> Groq API key
--groq-model <string> Groq Whisper model (default: whisper-large-v3)
Whisper Options:
--whisper-model-dir <string> Model directory (default: ~/.cache/whisper)
--whisper-model-name <string> Model name (default: small.en)
--torch-threads <int> CPU inference threads (default: 0)
VOSK Options:
--vosk-model-dir <string> Model directory (default: ~/.cache/vosk)
--vosk-read-frames-chunk <int> WAV frame chunk (default: 8000)
Pull requests welcome!
- Seth Grover - Initial work - mmguero
- Lincoln Brown - Enhanced fork (Groq API, AI generation, batch mode) - ljbred08
BSD 3-Clause License — see the LICENSE file for details.