Skip to content

Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset

License

Notifications You must be signed in to change notification settings

karthikbhamidipati/multi-task-speech-classification

Repository files navigation

multi-task-speech-classification

Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset. Paper can be found here

Run instructions

  1. To preprocess the audio data, run

    python main.py preprocess -r <audio_data_path>
  2. To train the model using the preprocessed audio data, run

    python main.py train -r <audio_data_path> -m <model_name> 

    Models Implemented: simple_cnn, resnet18, resnet34, resnet50, simple_lstm, bi_lstm, lstm_attention, bi_lstm_attention

  3. To test the model on the test data, run

    python main.py test -r <audio_data_path> -m <model_name> -c <saved_model_path> 
  4. To perform inference on the audio files directly, run

    python main.py inference -r <audio_files_path> -m <saved_model_path>

About

Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages