The objection of this task is to have audio file as user input and generate text. we can then use generated for different task based on situation.
My repo contains 2 notebooks and 3 sets of audio files. To run them, you’ll need: Transformers ≥ 4.3 Librosa (to manage the audio files)
I’m sticking with the wav2vec2-base-960h base model. we can use large model for better performance.
Audio file drive link recorded +15 minute https://drive.google.com/file/d/1BqdcrslUPP8JC5ym7qy4_6urYjtsYnEG/view?usp=sharing