A machine-learning model designed to detect "drops" (high-energy payoffs) in musical tracks, with a specific optimization for Electronic Dance Music (EDM).
This project iterates upon the work of Yadati et al. (ISMIR 2014) regarding content-based drop detection. Using EDM-specific features and XGBoost, this implementation significantly improves detection accuracy:
| Model | F1 Score |
|---|---|
| Yadati et al. (2014) | 0.71 |
| Drop Detector (This Repo) | 0.95 |
-
Clone the repository:
git clone https://github.com/yourusername/edm-drop-detector.git cd edm-drop-detector -
Install Python dependencies:
pip install -r requirements.txt
-
Install FFmpeg: This tool relies on
ffmpegfor audio processing.- Mac:
brew install ffmpeg - Linux:
sudo apt install ffmpeg - Windows: Download binary and add to PATH.
- Mac:
You can use the CLI tool model_predict.py to scan folders of audio and detect drops.
python model_predict.py [OPTIONS]| Flag | Argument | Description |
|---|---|---|
-f, --folder |
PATH |
Required. Path to the root directory containing audio files to scan. |
-m, --model |
PATH |
Path to the .joblib model file. Defaults to ./model.joblib. |
-T, --threshold |
FLOAT |
Manual confidence threshold (0.0 - 1.0). Overrides the model's optimal default. |
-k, --topk |
INT |
Output only the top K drops per track. If set without a threshold, ignores confidence scores. |
-c, --csv |
PATH |
Save predictions to a CSV file. Default: ./model_predictions.csv. |
-t, --tag |
N/A | Write drop times (e.g., DROP_TIME=60.5,124.2) into the audio file metadata. |
Scan the test folder, save the top 3 drops that have a confidence above 90%, save the results to CSV, and tag the actual audio files:
python model_predict.py \
-f "/Users/Admin/Music/Download/test" \
--csv \
--topk 3 \
--threshold 0.90 \
--tagIf you wish to retrain the model on your own dataset:
Place your raw audio files into the dataset_train/ folder.
Run the labeling assistant:
python dataset_build.pyThis script will iterate through your tracks, propose candidates, and ask you to verify if they are true drops.
Once labeled, process the data into a clean CSV format for the model:
python dataset_clean.pyRun the training pipeline:
python model_build.pyThis will:
- Extract features for all labeled candidates.
- Run Bayesian Hyperparameter Optimization (Optuna).
- Train an XGBoost classifier.
- Output the final
model.joblibfile.
The system utilizes a three-stage pipeline:
The code uses fast signal processing heuristics to identify potential points of interest:
- Bass Boost & Envelope: Applies a low-shelf filter and scans the volume envelope for sharp energy rises.
- Transient Snapping: Mathematically aligns timestamps to the exact "kick" or transient.
For each candidate, the model extracts 29 context-aware features, comparing the audio after the impact against the build-up before it:
- RMS Energy: Does the volume increase?
- Future Energy: Is there a louder section coming up later?
- Grid Alignment: Does the candidate land on a significant 4, 8, 16, or 32-bar boundary?
- Pulse Clarity: Is there a strong, defined rhythmic pulse, or is the texture messy?
- Transient Dominance: Does the initial "kick" stand out compared to its surroundings?
- Bass Ratio: Is the sound spectrum suddenly dominated by low-frequency energy?
- Bass Continuity: Does the bassline sustain, or does it fade?
The core logic is handled by a Gradient Boosted Decision Tree.
- Optuna Tuning: Uses Bayesian optimization to find the perfect hyperparameter combination
- Dynamic Thresholding: Automatically calibrates the probability cutoff to maximize the F1-Score
- Tracks with irregular time signatures or lack of percussion may yield fewer candidates.
- Extreme compression or sparse bass mixing can mask the specific contrast features the model looks for.
- "Fake drops" can trick the model if they are rhythmically similar to a real drop.