This project involves the classification of musical instruments in audio files using a Convolutional Neural Network (CNN) trained on mel spectrograms. The project is organized into several components:
-
Data Loader:
datav2.pycontains the data loader class responsible for loading audio files. -
CNN Architecture:
modelv2.pyincludes the CNN architecture used for instrument classification. -
Training:
trainv2.pyis where the model was trained using actual labels as targets and mel spectrograms as inputs. The trained model is saved asfeedforwardnet.pth. -
Inference:
inference.pyevaluates the trained model and generates inferences. The expected and predicted values are mapped in a file calledinference_results.csv. An updated annotations file (updated_metadata.csv) containing pseudo labels is created based on these inferences. -
Training with Pseudo Labels:
train_pseudo.pyinvolves training the model again, but this time using the pseudo labels obtained from the previous step. The trained model is saved aspseudo_model.pth. -
Inference with Pseudo Labels:
inference_pseudo.pyevaluates the model trained with pseudo labels, and the inferences are compared against both actual labels and pseudo labels. The accuracy is recorded for both scenarios.
##datset - https://zenodo.org/records/3685367
The project directory structure is as follows:
- datav2.py
- modelv2.py
- trainv2.py
- inference.py
- updated_metadata.csv
- train_pseudo.py
- pseudo_model.pth
- inference_pseudo.py
- README.md