This repository contains code and research papers on audio classification. The focus of this repository is on speech or audio classification and it includes a basic classifier for Gujarati digits.
To get started with the code in this repository, simply run all the blocks in the MultilingualAudioClassification
notebook to reproduce the results. The model inference is available at huggingface.co/manthan40/wav2vec2-base-finetuned-manthan_base.
You will need to have the following packages installed in order to run the code in this repository:
The code in this repository uses the following research papers as reference:
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli.
- Development of a Novel Database in Gujarati Language for Spoken Digits Classification by Nikunj Dalsaniya, Sapan H. Mankad, Sanjay Garg, and Dhuri Shrivastava.
If you have any questions or suggestions, please open an issue in this repository or contact the repository owner.