### ðŸ”¹ Cell 1: Install Dependencies

- Installs **pocketflow** â†’ lightweight workflow engine (Node + Flow concept)
- Installs **yt-dlp** â†’ reliable YouTube subtitle extractor (works in Colab)
- Installs **google-generativeai** â†’ Gemini API for summarization
- `!pip install` ensures all packages are available in the Colab runtime


In [53]:
!pip install pocketflow yt-dlp google-generativeai





### ðŸ”¹ Cell 2: Imports and API Configuration

- Imports core Python utilities:
  - `os` â†’ environment variables & file handling
  - `re` â†’ regex for extracting video ID
  - `time` â†’ retry wait logic
  - `copy` â†’ safe node cloning in PocketFlow
  - `subprocess` â†’ execute yt-dlp shell command
- Imports `google.generativeai` to access Gemini models
- Reads Gemini API key securely from **Colab userdata**
- Stores model name in a constant for reuse


In [54]:
import os, re, time, copy, subprocess
import google.generativeai as genai
from google.colab import userdata

os.environ["GEMINI_API_KEY"] = userdata.get("RAGAGENTKEY")
MODEL = "gemini-2.5-flash"


### ðŸ”¹ Cell 3: BaseNode Class (PocketFlow Core)

- Represents a **single step** in a workflow
- `successors` stores transitions â†’ action â†’ next node
- `next()` links nodes together using action labels
- `prep()` â†’ prepares input from shared state
- `exec()` â†’ main logic of the node
- `post()` â†’ writes output back to shared state
- `_run()` enforces execution order: prep â†’ exec â†’ post
- `_exec()` allows override (used for retries in child class)


In [55]:
class BaseNode:
    def __init__(self):
        self.successors = {}

    def next(self, action, node):
        self.successors[action] = node
        return node

    def prep(self, shared): pass
    def exec(self, data): pass
    def post(self, shared, data, out): return out

    def _run(self, shared):
        data = self.prep(shared)
        out = self._exec(data)
        return self.post(shared, data, out)

    def _exec(self, data):
        return self.exec(data)


### ðŸ”¹ Cell 4: Node Class with Retry Support

- Extends `BaseNode`
- Adds:
  - `retries` â†’ number of attempts
  - `wait` â†’ delay between retries
- `_exec()` wraps `exec()` inside a retry loop
- Automatically retries on failure
- Useful for unstable operations like:
  - network calls
  - YouTube caption extraction


In [56]:
class Node(BaseNode):
    def __init__(self, retries=1, wait=0):
        super().__init__()
        self.retries, self.wait = retries, wait

    def _exec(self, data):
        for i in range(self.retries):
            try:
                return self.exec(data)
            except:
                if i == self.retries - 1:
                    raise
                time.sleep(self.wait)


### ðŸ”¹ Cell 5: Flow Class

- Controls execution of connected nodes
- Takes a **starting node**
- Executes nodes sequentially based on returned actions
- Uses `copy.copy()` to avoid mutating original nodes
- Continues until no next node is found
- Shared dictionary flows through all nodes


In [57]:
class Flow:
    def __init__(self, start):
        self.start = start

    def run(self, shared):
        node = copy.copy(self.start)
        while node:
            action = node._run(shared)
            node = copy.copy(node.successors.get(action))


### ðŸ”¹ Cell 6: GetVideoID Node

- Extracts YouTube video ID from URL
- `prep()` reads `video_url` from shared state
- `exec()` uses regex to extract the `v=` parameter
- `post()` stores `video_id` back into shared state
- Returns `"ok"` to trigger next node


In [58]:
class GetVideoID(Node):
    def prep(self, shared):
        return shared["video_url"]

    def exec(self, url):
        return re.search(r"v=([^&]+)", url).group(1)

    def post(self, shared, _, vid):
        shared["video_id"] = vid
        return "ok"


### ðŸ”¹ Cell 7: GetTranscript Node (Using yt-dlp)

- Fetches **auto-generated subtitles** from YouTube
- Uses yt-dlp via `subprocess`
- Flags used:
  - `--skip-download` â†’ no video download
  - `--write-auto-sub` â†’ auto captions
  - `--sub-lang en` â†’ English subtitles
  - `--sub-format vtt` â†’ readable subtitle format
- Reads `.vtt` file and:
  - removes timestamps
  - removes metadata
  - merges subtitle text
- Deletes subtitle file after processing
- Saves transcript into shared state


In [59]:
class GetTranscript(Node):
    def prep(self, shared):
        return shared["video_id"]

    def exec(self, vid):
        cmd = [
            "yt-dlp",
            "--skip-download",
            "--write-auto-sub",
            "--sub-lang", "en",
            "--sub-format", "vtt",
            f"https://www.youtube.com/watch?v={vid}"
        ]
        subprocess.run(cmd, check=True)

        vtt = next(f for f in os.listdir() if f.endswith(".vtt"))
        with open(vtt, "r", encoding="utf-8") as f:
            text = " ".join(
                line.strip() for line in f
                if line and not line.startswith(("WEBVTT", "Kind:", "Language:")) and "-->" not in line
            )
        os.remove(vtt)
        return text

    def post(self, shared, _, text):
        shared["transcript"] = text
        return "ok"


### ðŸ”¹ Cell 8: Summarize Node (Gemini API)

- Reads full transcript from shared state
- Loads Gemini model (`gemini-2.5-flash`)
- Sends transcript with a clear summarization prompt
- Receives AI-generated summary text
- Stores summary back into shared state
- Returns `"done"` to end the flow


In [60]:
class Summarize(Node):
    def prep(self, shared):
        return shared["transcript"]

    def exec(self, text):
        model = genai.GenerativeModel(MODEL)
        return model.generate_content(
            f"Summarize clearly:\n{text}"
        ).text

    def post(self, shared, _, summary):
        shared["summary"] = summary
        return "done"


### ðŸ§  Inbuilt Chatbot Assistant (Q&A on Video)

At this stage, the YouTube video has already been:
- converted into a transcript
- summarized once and stored in shared memory

This chatbot does **not regenerate the summary**.

Instead, it:
- takes the existing **summary** and **transcript**
- accepts a user question
- uses Gemini to answer questions **only related to this video**

Think of this as an interactive assistant sitting *on top* of the summary.


In [67]:
class ChatAssistant(Node):
    def prep(self, shared):
        return {
            "question": shared["user_question"],
            "summary": shared["summary"],
            "transcript": shared["transcript"]
        }

    def exec(self, data):
        model = genai.GenerativeModel(MODEL)

        prompt = f"""
You are an assistant answering questions about a YouTube video.

Summary:
{data['summary']}

Transcript (for reference):
{data['transcript'][:6000]}

User question:
{data['question']}
"""

        return model.generate_content(prompt).text

    def post(self, shared, _, answer):
        shared["chat_answer"] = answer
        return "done"


### ðŸ”¹ Cell 9: Build PocketFlow Graph

- Creates node instances:
  - GetVideoID
  - GetTranscript (with retries)
  - Summarize
- Links nodes using action-based transitions
- Defines execution order:
  Video URL â†’ Video ID â†’ Transcript â†’ Summary
- Initializes `Flow` with the starting node


In [61]:
id_node = GetVideoID()
tx_node = GetTranscript(retries=3, wait=2)
sm_node = Summarize()

id_node.next("ok", tx_node)
tx_node.next("ok", sm_node)

youtube_flow = Flow(id_node)


### ðŸ”¹ Cell 10: Shared State Dictionary

- Central data store passed across all nodes
- Contains:
  - `video_url` â†’ input
  - `video_id` â†’ extracted ID
  - `transcript` â†’ captions text
  - `summary` â†’ final output
- Enables clean data flow without global variables
- Starts workflow execution using `youtube_flow.run()`
- Each node reads and writes to shared state
- Flow ends automatically after Summarize node
- Final summary is printed from shared state


In [66]:
shared = {
    "video_url": "https://www.youtube.com/watch?v=dyUojOVBEcE",
    "video_id": None,
    "transcript": None,
    "summary": None
}

youtube_flow.run(shared)
print(shared["summary"])


The video explains how to enhance Large Language Models (LLMs) to answer questions using private or company-specific data, overcoming the limitations of relying solely on general training data.

The presenter contrasts two main approaches:

1.  **Fine-tuning:** Directly retraining an LLM on custom data. This is highlighted as very costly, computationally expensive, time-consuming, and difficult to update frequently.
2.  **Retrieval Augmented Generation (RAG):** Presented as a more efficient and practical solution.

**How RAG works:**

1.  **Data Ingestion:** Relevant information (e.g., website content from educosis.com) is scraped using a `WebBaseLoader`.
2.  **Text Splitting:** The content is broken down into smaller, overlapping "chunks" using a `RecursiveCharacterTextSplitter`. Overlapping chunks ensure that context is maintained even if a key piece of information is split between two chunks.
3.  **Embedding:** These text chunks are converted into numerical "embeddings" (vector repr

### ðŸ’¬ Interactive Chat with the Video

This section enables a continuous questionâ€“answer loop.

How it works:
- The summary and transcript are already stored in `shared`
- The user can now ask **multiple questions**
- Each question is answered using the same video context
- Typing `exit` stops the chat

This makes the notebook behave like a mini ChatGPT
that only knows about this YouTube video.


In [68]:
chat_node = ChatAssistant()

while True:
    question = input("\nAsk something about the video (type 'exit' to stop): ")
    if question.lower() == "exit":
        break

    shared["user_question"] = question
    chat_node._run(shared)

    print("\nAnswer:\n", shared["chat_answer"])



Ask something about the video (type 'exit' to stop): what is rag

Answer:
 RAG stands for **Retrieval Augmented Generation**.

It is presented in the video as a more efficient and practical solution for enhancing Large Language Models (LLMs) to answer questions using private or company-specific data, overcoming the limitations of relying solely on general training data.

Here's how RAG works:

1.  **Data Ingestion and Processing:**
    *   Relevant information (e.g., website content) is collected.
    *   This content is then split into smaller, overlapping "chunks" to maintain context.
    *   These text chunks are converted into numerical "embeddings" (vector representations).
    *   The embeddings are stored in a vector database.

2.  **Question Answering Process:**
    *   When a user asks a question, the system first retrieves the most relevant embedded chunks from the vector database based on the user's query.
    *   This retrieved context is then "augmented" by being added to