Skip to content

cricket songs classification using mel spectrograms fined tuned on AST transformers

Notifications You must be signed in to change notification settings

pvbhanuteja/cricket-classification

Repository files navigation


cricket songs classification

📍 Cricket songs classification fined tuned on AST transformers

pytorch git hypothesis


📚 Table of Contents


📍Overview

The Cricket Classification GitHub project is an audio classification system that utilizes deep learning techniques to identify and categorize cricket species based on their sound recordings. The project leverages the PyTorch Lightning framework and the ASTForAudioClassification model from Hugging Face's Transformers library to build and train the classifier. The code includes data preprocessing, model training, and evaluation, providing a complete end-to-end solution for cricket sound classification tasks.

Results

Experiment Test Accuracy
5 genus classification 97.00%
8 genus classification 94.40%
10 genus classification 89.51%

These results are obtained on test data using an 80:20 train:test split. The train and test waveforms are split into 10-second segments with a 5-second overlap.


⚙️ Project Structure

.
├── config.json
├── data
│ ├──  final_features
│ ├──  raw_all_data
│ └──  vad_processed
├── dataset.py
├── feature_extractor.py
├── helpers
│ ├──  data.txt
│ ├──  make_data.py
│ └──  make_data_dir.sh
├── main.py
├── preprocess.py
├── readme.md
├── requirements.txt
├── run_pipeline.sh
├── utils.py
└── val.py

💻 Modules

File Summary
run_pipeline.sh Runs complete pipeline. (preprocess, feature extraction and trains the model)
preprocess.py This script processes a set of audio files for machine learning purposes, using the Silero Voice Activity Detector (VAD) model to extract relevant speech segments.
dataset.py This script defines a CustomDataset class that inherits from PyTorch's Dataset class, tailored for processing audio data related to cricket sounds.
utils.py This script demonstrates how to remove human voice from an audio file using the Silero Voice Activity Detector (VAD) model.
feature_extractor.py This script extracts features from audio samples using a pre-trained feature extractor from the transformers library. The process_samples_in_batches function processes audio samples in batches, applying the feature extractor to each sample and storing the extracted features along with the sample's label.
main.py This script trains a cricket audio classifier using a pre-trained ASTForAudioClassification model from the transformers library.

🚀 Getting Started

💻 Installation

  1. Clone the readme-ai repository:
git clone  https://github.com/pvbhanuteja/cricket-classification
  1. Change to the project directory:
cd  cricket-classification
  1. Install the dependencies:
pip install  -r  requirements.txt

🤖 Training Model

# Update config.json with correct paths then run shell script
sh run_pipeline.sh

🤝 Contributing

Check out CONTRIBUTING.md for best practices and instructions on how to contribute to this project.


License

This project is licensed under the MIT License.


Acknowledgments


About

cricket songs classification using mel spectrograms fined tuned on AST transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published