# YouTube Video Search and Transcript-based QA with LLM
This tutorial walks through searching for a YouTube video, fetching its transcript, processing it into chunks, and using an LLM to extract answers with timestamps.

The general system architecure is depicted in the diagram below:

![system_design](images/yt-search-architecture.png)



**Note:** We are implementing a basic version that includes the following steps:

1. YouTube Video Search: Use a library like `yt-dlp` to search for videos on YouTube.
2. Fetch Transcript: Use `youtube-transcript-api` to fetch the transcript of the selected video.
3. Process Transcript: Break down the transcript into manageable chunks.
4. QA with LLM: Use a language model (LLM) to extract answers from the transcript chunks and map them to timestamps.
   
```
+------------------+       +---------------------+       +---------------------+       +---------------------+
|  YouTube Search  | ----> |  Fetch Transcript   | ----> |  Process Chunks     | ----> |  QuestionAnswering  |
|                  |       |                     |       |                     |       |                     |
+------------------+       +---------------------+       +---------------------+       +---------------------+

```


## Step 1: Install Dependencies

```python
!pip install yt-dlp youtube-transcript-api litellm nest_asyncio
```

## Step 2: Search for YouTube Videos

In [373]:
from yt_dlp import YoutubeDL

def search_youtube(query):
    ydl_opts = {"quiet": True, "default_search": "ytsearch5"}
    with YoutubeDL(ydl_opts) as ydl:
        result = ydl.extract_info(query, download=False)
        videos = [
            {"title": entry["title"], "videoId": entry["id"]}
            for entry in result["entries"]
        ]
    return videos

videos = search_youtube("twosetai")
print(videos)

[{'title': 'Efficient Document Search with ModernBERT (Step-by-step Tutorial)', 'videoId': 'xzd-RtvJvOs'}, {'title': 'Top RAG Expert Shares 4 Powerful Open-Source Tools', 'videoId': 'tCPuvr-5h5o'}, {'title': "Production RAG Secrets the Pros Don't Want You to Know -- Part 2", 'videoId': 'nwDyXwPt2bI'}, {'title': 'RAG for Beginners! Step-by-Step Tutorial Using Jupyter Notebook', 'videoId': 'FKmjT93D50U'}, {'title': "With today's AI, do you REALLY need to learn the machine learning basics?", 'videoId': 'v8WRzlRdPtw'}]


## Step 3: Fetch Video Transcript

In [377]:
from youtube_transcript_api import YouTubeTranscriptApi

def get_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)

        # Print the transcript with timestamps
        # for entry in transcript:
        #     print(f"Timestamp: {entry['start']} - {entry['start'] + entry['duration']} seconds")
        #     print(f"Text: {entry['text']}")
        #     print("-" * 50)
        return transcript
    except Exception as e:
        print("Transcript not available:", str(e))
        return None

# Select a video and fetch its transcript
video_id = videos[0]["videoId"]
transcript = get_transcript(video_id)
print(transcript[:5])  # Print first 5 entries

[{'text': 'all right this is M and Angelina today', 'start': 0.08, 'duration': 5.08}, {'text': "we're going to talk about document", 'start': 3.08, 'duration': 5.32}, {'text': 'search using molden BT if you want to', 'start': 5.16, 'duration': 5.519}, {'text': 'search in a document the most important', 'start': 8.4, 'duration': 3.96}, {'text': 'step is to really understand the', 'start': 10.679, 'duration': 4.08}]


## Step 4: Split Transcript into Chunks

In [383]:
def split_transcript_into_chunks(transcript, chunk_duration=30):
    chunks = []
    current_chunk = []
    current_time = 0
    chunk_start = 0
    
    for entry in transcript:
        current_time = entry['start']
        current_chunk.append(entry['text'])
        
        # When the chunk reaches the specified duration, store it and start a new chunk
        if current_time - chunk_start >= chunk_duration:
            chunks.append({"text": " ".join(current_chunk), "start": chunk_start, "end": current_time})
            current_chunk = []
            chunk_start = current_time
    
    # Adding the last chunk if exists
    if current_chunk:
        chunks.append({"text": " ".join(current_chunk), "start": chunk_start, "end": current_time})
    
    return chunks

# Example usage
chunks = split_transcript_into_chunks(transcript)
print(f"First Chunk: {chunks[0]}")

First Chunk: {'text': "all right this is M and Angelina today we're going to talk about document search using molden BT if you want to search in a document the most important step is to really understand the document right that's where this model comes in and we will show you a prototype Search application today using this model and together with how we build it speaking of Bert it's kind of a dinosaur model right in terms of the AI in terms of the AI ears if you're from with the burd model it's released in", 'start': 0, 'end': 30.24}


## Step 5: Format Transcript for LLM

In [123]:
def format_transcript_chunks(chunks):
    """
    Formats the transcript chunks into a structured text block for LLM processing.
    
    Args:
        chunks (list): List of transcript chunks, each with 'text', 'start', and 'end' timestamps.
    
    Returns:
        str: Formatted transcript text.
    """
    formatted_chunks = []
    for chunk in chunks:
        formatted_chunks.append(f"[{chunk['start']}s - {chunk['end']}s] {chunk['text']}")
    
    return "\n".join(formatted_chunks)

# Example usage
formatted_transcript = format_transcript_chunks(chunks)
print(formatted_transcript[:500])  # Print first 500 characters for preview


[0s - 30.24s] all right this is M and Angelina today we're going to talk about document search using molden BT if you want to search in a document the most important step is to really understand the document right that's where this model comes in and we will show you a prototype Search application today using this model and together with how we build it speaking of Bert it's kind of a dinosaur model right in terms of the AI in terms of the AI ears if you're from with the burd model it's released


## Step 6: Query LLM for Answer with Timestamps

In [305]:
from litellm import completion, acompletion
from IPython.display import display, Markdown
import nest_asyncio
nest_asyncio.apply()


async def call_llm(user_message: str):
    response = await acompletion(
        model="ollama/deepseek-r1:latest", 
        messages=[
        {
            "role": "user",
            "content": user_message 
        }
    ],
        stream=True,
        api_base="http://localhost:11434"
    )
    answer_text = ""

    async for chunk in response:
        if not chunk.choices:
            continue
        content = chunk.choices[0].delta.content  # Extract content

        if content is None:
            break  

        print(content, end="", flush=True)
        answer_text += content  

    return answer_text 
    # return response.choices[0].message.content.strip()

## Step 7: Ask a Question about the Video

In [531]:
question = "What was the batch size for inserting documents into vector db?"
# question = "what are different virations of modernbert?" 
user_message = (
    "You are an AI assistant that extracts answers from video transcripts and provides timestamps.\n"
    "Your task is to analyze the given transcript and answer the user's question.\n"
    "Make sure to include the most relevant timestamps in your response. Timestamps must be from the transcript and must follow this format [531.88s - 564.6s].\n\n"
    "Here is the transcript of a YouTube video:\n\n"
    f"{formatted_transcript}\n\n"
    f"Question: {question}\n"
    "Answer with timestamps:"
)

async def main():
    answer = await call_llm(user_message)
    if answer:
        print("\n\n\n=====================================")
        display(Markdown(answer))
    else:
        print("No relevant answer found.")
await main()

<think>
Alright, so looking at the user's question, they're asking about the batch size used when inserting documents into the vector database. From what I remember in the provided tutorial, there was a mention of using batches to insert data efficiently.

I'll need to find where that information is located. The user included timestamps with each section, so maybe those can help me pinpoint exactly where the batch size was discussed.

Looking through the timeline:

- At 1067.28s to 1099.52s, it says they created multiple batches of size 50 for inserting data into the vector database.
  
So that seems like the relevant section. The user wants the specific batch size, which is 50 in this case. I should make sure my answer directly references that timestamp to show where the information came from.

I think it's important to state clearly that each document was inserted in batches of 50 and mention how many batches there were for 1,000 documents (which would be 20 batches). That way, anyon

<think>
Alright, so looking at the user's question, they're asking about the batch size used when inserting documents into the vector database. From what I remember in the provided tutorial, there was a mention of using batches to insert data efficiently.

I'll need to find where that information is located. The user included timestamps with each section, so maybe those can help me pinpoint exactly where the batch size was discussed.

Looking through the timeline:

- At 1067.28s to 1099.52s, it says they created multiple batches of size 50 for inserting data into the vector database.
  
So that seems like the relevant section. The user wants the specific batch size, which is 50 in this case. I should make sure my answer directly references that timestamp to show where the information came from.

I think it's important to state clearly that each document was inserted in batches of 50 and mention how many batches there were for 1,000 documents (which would be 20 batches). That way, anyone reading the answer understands both the batch size and the total number of batches used.
</think>

The batch size for inserting documents into the vector database was **50**. This means that each document was inserted in batches of 50, with a total of 20 batches to process all 1,000 documents.

Answer:  
Batch size = 50 (each batch contains 50 documents) and there were 20 batches for inserting 1,000 documents.

In [408]:
formatted_transcript

"[0s - 30.24s] all right this is M and Angelina today we're going to talk about document search using molden BT if you want to search in a document the most important step is to really understand the document right that's where this model comes in and we will show you a prototype Search application today using this model and together with how we build it speaking of Bert it's kind of a dinosaur model right in terms of the AI in terms of the AI ears if you're from with the burd model it's released in\n[30.24s - 62.039s] 2018 um but it did Mark the beginning of the AI ERA with the most popular model architecture which is Transformer uh but we're now going to talk about history of AI language models today how about let's that [Music] in awesome as you said today we're going to show how to create a Search application so we are essentially going back to the fundamental and Basics the race and the progress in AI is crazy\n[62.039s - 93.159s] nowadays and every day you would see a lot of diff

## Components to add

1. Transcript service to transcribe audio when transcript is not available.
2. RAG (Vector DB and Search engine) to store large amounts of transcript and video metadata
3. Support for multiple languages in transcripts
4. UI for better user interaction
5. Deployment pipeline for scaling the application
6. Monitoring and logging for production use