A real-time Retrieval-Augmented-Generation(RAG) based model to perform question answering on hindi audio data. Here, the fine-tuned open ai's whisper-tiny model downsampled the word error rate(WER) to 74.24 for the hindi dataset.
Follow these steps to run the prototype in your system:
- git clone https://github.com/system-reboot/Multilingual-whisper-based-RAG.git
- cd Multilingual-whisper-based-RAG
- jupyter execute fine-tuning-whisper.ipynb
- python3 run inference.py
- fine-tuning-whisper.ipynb - Whisper-tiny model tuned for hindi dataset.
- rag.py - QA-Bert model for performing question answering on the passed audio.
- inference.py - Displays the Gradio-based interface for inference results.
Try to give shorter length audio for efficient results.