wav2vec2_hugging_face

The objection of this task is to have audio file as user input and generate text. we can then use generated for different task based on situation.

My repo contains 2 notebooks and 3 sets of audio files. To run them, you’ll need: Transformers ≥ 4.3 Librosa (to manage the audio files)

I’m sticking with the wav2vec2-base-960h base model. we can use large model for better performance.