 # Podcast Q&A using RAG

 An AI-Powered Podcast Q&A System that allows users to ask questions about a podcast episode, leveraging Conformer ASR for transcription, LLM-powered retrieval (RAG) for finding relevant answers, and text generation for generating responses.

🚀 System Pipeline

1.  Audio Input (Podcast Episode 🎙️)
2.  Speech-to-Text (ASR with Conformer) 📜
3.  Chunking & Embedding Storage (Vector DB) 🔎
4.  Question Input (User Query) 🤔
5.  RAG-based Retrieval (LLM + Vector DB) 🔍
6.  LLM Response Generation (Answer) 💡

### 1. Setup the Google Colab environment

In [None]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [None]:
!ls /content/gdrive/MyDrive/LLMs/Podcast-QA/


audio_files	      faiss_index.faiss  __pycache__  transcriptions
audio_transcriber.py  Podcast-QA	 rag_qa.py


In [None]:
!pip install -r /content/gdrive/MyDrive/LLMs/Podcast-QA/requirements.txt

Collecting nemo_toolkit==2.1.0 (from nemo_toolkit[asr]==2.1.0)
  Downloading nemo_toolkit-2.1.0-py3-none-any.whl.metadata (70 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/70.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.5/70.5 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
Collecting datasets
  Downloading datasets-3.4.1-py3-none-any.whl.metadata (19 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting onnx>=1.7.0 (from nemo_toolkit==2.1.0->nemo_toolkit[asr]==2.1.0)
  Downloading onnx-1.17.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting ruamel.yaml (from nemo_toolkit==2.1.0->nemo_toolkit[asr]==2.1.0)
  Downloading ruamel.yaml-0.18.10-py3-none-any.whl.metadata (23 kB)
Collecting wget (from nemo_toolkit==2.1.

Found existing installation: numpy 2.0.2
Uninstalling numpy-2.0.2:
  Successfully uninstalled numpy-2.0.2
Collecting numpy==1.26.4
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m141.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m322.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: numpy
[31mERROR: Operation cancelled by user[0m[31m
[0m^C


In [None]:
import sys
sys.path.append('/content/gdrive/MyDrive/LLMs/Podcast-QA')

### 2. Transcribe the podcast's audio

In [None]:
from audio_transcriber import load_asr_model, transcribe_audio_folder

Loading ASR model...
[NeMo I 2025-03-20 22:49:36 nemo_logging:393] Tokenizer SentencePieceTokenizer initialized with 1024 tokens


[NeMo W 2025-03-20 22:49:37 nemo_logging:405] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: /data/NeMo_ASR_SET/English/v2.0/train/tarred_audio_manifest.json
    sample_rate: 16000
    batch_size: 64
    shuffle: true
    num_workers: 8
    pin_memory: true
    use_start_end_token: false
    trim_silence: false
    max_duration: 20.0
    min_duration: 0.1
    shuffle_n: 2048
    is_tarred: true
    tarred_audio_filepaths: /data/NeMo_ASR_SET/English/v2.0/train/audio__OP_0..4095_CL_.tar
    
[NeMo W 2025-03-20 22:49:37 nemo_logging:405] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    manifest_filepath:
    - /data/ASR/LibriSpeech/librisp

[NeMo I 2025-03-20 22:49:37 nemo_logging:393] PADDING: 0
[NeMo I 2025-03-20 22:49:37 nemo_logging:393] Model EncDecCTCModelBPE was successfully restored from /root/.cache/huggingface/hub/models--nvidia--stt_en_conformer_ctc_small/snapshots/e5b9941cc1b0b8a08c29b31a111c674f3040a80f/stt_en_conformer_ctc_small.nemo.
ASR model loaded
Output directory: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions
Transcribing: /content/gdrive/MyDrive/LLMs/Podcast-QA/audio_files/clip2.wav


Transcribing:   0%|          | 0/1 [00:00<?, ?it/s][NeMo W 2025-03-20 22:49:38 nemo_logging:405] Function ``_transcribe_output_processing`` is deprecated. The return type of args will be updated in the upcoming release to ensure a consistent output             format across all decoder types, such that a Hypothesis object is always returned.
Transcribing: 100%|██████████| 1/1 [00:00<00:00,  1.60it/s]


Transcription saved: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions/clip2.txt
Transcribing: /content/gdrive/MyDrive/LLMs/Podcast-QA/audio_files/clip1.wav


Transcribing: 100%|██████████| 1/1 [00:00<00:00,  8.56it/s]

Transcription saved: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions/clip1.txt
🎯 All transcriptions saved in: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions





In [None]:
# Configuration
asr_model_id = "nvidia/stt_en_conformer_ctc_small"
asr_model = load_asr_model(asr_model_id)
audio_files = "/content/gdrive/MyDrive/LLMs/Podcast-QA/audio_files"
asr_output = "/content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions"

In [None]:
# Run the audio through the asr model to transcribe it
transcribe_audio_folder(asr_model, audio_files, asr_output)

Output directory: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions
Transcribing: /content/gdrive/MyDrive/LLMs/Podcast-QA/audio_files/clip2.wav


Transcribing: 100%|██████████| 1/1 [00:00<00:00,  9.35it/s]


Transcription saved: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions/clip2.txt
Transcribing: /content/gdrive/MyDrive/LLMs/Podcast-QA/audio_files/clip1.wav


Transcribing: 100%|██████████| 1/1 [00:00<00:00,  9.47it/s]

Transcription saved: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions/clip1.txt
🎯 All transcriptions saved in: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions





### 3. RAG pipeline

In [None]:
from rag_qa import chunk_transcripts, store_multiple_transcripts_in_faiss, retrieve_relevant_chunks, load_llm_generation_model, generate_answer

#### 3.1 Chunking and Embedding Storage

Chunk the audio transcripts and convert them in an embedding representation stored using FAISS index.

In [None]:
chunked_transcripts = chunk_transcripts(asr_output)

Processing: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions/clip2.txt
Number of chunks in clip2.txt: 4
clip2.txt
Processing: /content/gdrive/MyDrive/LLMs/Podcast-QA/transcriptions/clip1.txt
Number of chunks in clip1.txt: 4
clip1.txt


In [None]:
from sentence_transformers import SentenceTransformer

print("Storing podcast in FAISS...")
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
faiss_index_path = "/content/gdrive/MyDrive/LLMs/Podcast-QA/faiss_index.faiss"
faiss_index, stored_chunks = store_multiple_transcripts_in_faiss(chunked_transcripts, embedding_model, faiss_index_path)

Storing podcast in FAISS...
FAISS index saved at: /content/gdrive/MyDrive/LLMs/Podcast-QA/faiss_index.faiss


#### 3.2 Retrieve Relevant Chunks based on the input query

In [None]:
print("Retrieving relevant information...")
user_question = "What is the latest book from Neil?" #"Is earth flat?" #What is the shape of earth shadow" #"What is the latest book from Neil?"
relevant_context = retrieve_relevant_chunks(embedding_model, user_question, faiss_index, stored_chunks)
print(relevant_context)

Retrieving relevant information...
[{'filename': 'clip1.txt', 'text': "for people in a hurry what we just happen to just happens i swear we didn't plant the book i have an entire chapter titled on being round and it's an exploration of how all the laws of physics and the accounting of energy as processes unfold in the universe how that conspire to make things round so it favors the sphere favors a sphere yet livees and if if something's not a sphere it's a little bit flattened you can ask what flalanded and you find out oh it's rotating real fat right so that it"}, {'filename': 'clip1.txt', 'text': "yes columbus predates the era of experimental checking of any idea you might have not a lot of purebde sts that till si in day or in the de back in the day back in the day back in the day not a dislay to say whatever i felt you know what i believe that elephant poot will cure a canc there you go that's right wrote a paper about it like to reading here it is right so it wasn't till till fran

#### 3.3 Generate the answer based on the retrieved context and the LLM

In [None]:
# Loading the LLM
llm_model_id = "teknium/OpenHermes-2.5-Mistral-7B"
llm_model = load_llm_generation_model(llm_model_id)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.


Generating AI response...
What is the latest book from Neil?
[{'filename': 'clip1.txt', 'text': "for people in a hurry what we just happen to just happens i swear we didn't plant the book i have an entire chapter titled on being round and it's an exploration of how all the laws of physics and the accounting of energy as processes unfold in the universe how that conspire to make things round so it favors the sphere favors a sphere yet livees and if if something's not a sphere it's a little bit flattened you can ask what flalanded and you find out oh it's rotating real fat right so that it"}, {'filename': 'clip1.txt', 'text': "yes columbus predates the era of experimental checking of any idea you might have not a lot of purebde sts that till si in day or in the de back in the day back in the day back in the day not a dislay to say whatever i felt you know what i believe that elephant poot will cure a canc there you go that's right wrote a paper about it like to reading here it is right s

In [None]:
print("Generating AI response...")
print(user_question)
print(relevant_context)
response = generate_answer(user_question, relevant_context, llm_model)
print(response)