A binary classification model that labels 10-second audio segments from full songs as either "good" (1) or "bad" (0) for use in audio-based applications like guessing games, highlight extraction, or clip curation.
This project powers automated clip selection for Bollyguess, improving the quality of daily audio snippets by predicting which segments are likely to be familiar yet challenging.
The model is trained on manually labeled segments and extracts audio features such as MFCC, chroma, tempo, and pitch using Librosa. It uses a feedforward neural network built in PyTorch, and is evaluated with standard classification metrics.
- Python
- PyTorch – model architecture and training
- TensorFlow – experimentation support
- Librosa – feature extraction (MFCC, chroma, tempo, pitch)
- FFmpeg – audio slicing and preprocessing
- Scikit-learn – evaluation (confusion matrix, F1-score)
- Pandas
- 4 hidden layers with ReLU activation
- Final layer uses Sigmoid for binary classification
- Trained on labeled 10-second audio clips (0 = not suitable, 1 = suitable)
-
Audio Preprocessing
Full songs are sliced into 10-second segments usingFFmpeg. -
Feature Extraction
Each segment is converted to a feature vector usingLibrosa:- MFCCs
- Chroma
- Tempo
- Pitch
-
Model Training
A binary classifier is trained on the extracted features usingPyTorch. -
Evaluation
Model performance is analyzed using:- F1-score
- Confusion matrix
-
Deployment (Optional)
The classifier can be integrated into apps for automated segment selection.
Used in Bollyguess to select ideal audio clips for a daily Bollywood music guessing game.
-
Clone the repo
git clone https://github.com/omn25/audio-classifier.git cd audio-classifier -
Install dependencies
pip install -r requirements.txt
-
Run preprocessing
python model_trainer/build_dataset.py --input songs/ --output clips/
-
Extract features
python model_trainer/utils.py --input clips/ --output features.csv
-
Train the model
python model_trainer/train.py --features features.csv
- Segment:
00:50–01:00→ Predicted: 1 - Segment:
02:15–02:25→ Predicted: 0
audio-classifier/
│
├── model_trainer/
│ ├── build_dataset.py
│ ├── model.py
│ ├── test_adaptive_segments.py
│ ├── test_model.py
│ ├── train.py
│ └── utils.py
│
├── songs/ # Optional: raw input songs (if used)
├── test_songs/ # Sample songs for testing predictions
│ ├── abhi_na_jao_chod_kar.mp3
│ └── ghar_more_pardesiya.mp3
MIT License
Built by Om Nathwani
Email: ornathwa@uwaterloo.ca
GitHub: omn25