<a href="https://colab.research.google.com/github/pritiyadav888/AI-projects/blob/main/YouTube_Video_Q%26A_with_Gemini_on_Vertex_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# YouTube Video Q&A with Gemini on Vertex AI

This application provides an intelligent way to interact with YouTube video content by allowing users to ask questions and receive AI-generated answers based on the video's transcript.

## Key Features:

* **Question Answering:** Users can input a YouTube video URL and then pose specific questions about the video's content. The app intelligently retrieves relevant sections from the video's transcript to formulate a precise answer using advanced AI.
* **Vertex AI Integration:** The app seamlessly integrates with Google Cloud's Vertex AI platform. This integration ensures access to cutting-edge and regularly updated AI models, providing robust and scalable capabilities.
* **Gemini Models:**
    * **LLM (Large Language Model):** The application utilizes `gemini-2.0-flash-001` (with `gemini-pro` as a reliable fallback) to comprehend the user's question and synthesize a coherent answer derived directly from the video transcript.
    * **Embedding Model:** `gemini-embedding-001` (with `textembedding-gecko` as a fallback) is employed to transform both the user's question and the video transcript's content into numerical representations (embeddings). These embeddings are crucial for enabling efficient and accurate retrieval of relevant information.
* **Transcript Processing:** The app automates the fetching of video transcripts using Langchain's `YoutubeLoader`. Once obtained, the transcript is intelligently split into smaller, manageable chunks, optimizing the retrieval process. The processing is cached to improve performance for repeated queries.
* **Semantic Search:** A vector database, Chroma, is used to store the embeddings of the transcript chunks. This enables "semantic search," meaning the app can find parts of the transcript that are conceptually similar to the user's question, even if the exact keywords are not present.
* **Gradio Interface:** A user-friendly web-based interface is provided by Gradio, enabling a chatbot-like experience. Users can enter a YouTube URL to automatically load the video transcript, ask multiple questions without reprocessing, and quit by typing "quit" or clicking a "Quit" button. The interface provides real-time status updates and supports markdown-formatted answers with timestamp links where applicable.

## Target Audience:

This application is designed to benefit a diverse group of users, including:

* **General Users:** Anyone seeking quick, direct answers from video content without the need to watch the entire duration.

## Purpose:

The primary objective of this application is to revolutionize how users consume information from videos. It directly addresses the challenge of manually sifting through extensive video content to find particular details. By intelligently processing video transcripts, implementing caching for efficiency, and offering an interactive chatbot interface, the app empowers users to gain insights rapidly and effortlessly.

In [15]:
!pip install -U langchain-community

Collecting langchain-core<1.0.0,>=0.3.65 (from langchain-community)
  Using cached langchain_core-0.3.65-py3-none-any.whl.metadata (5.8 kB)
Collecting pydantic<3.0.0,>=2.7.4 (from langchain<1.0.0,>=0.3.25->langchain-community)
  Using cached pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)
Collecting pydantic-core==2.33.2 (from pydantic<3.0.0,>=2.7.4->langchain<1.0.0,>=0.3.25->langchain-community)
  Using cached pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Using cached langchain_core-0.3.65-py3-none-any.whl (438 kB)
Using cached pydantic-2.11.7-py3-none-any.whl (444 kB)
Using cached pydantic_core-2.33.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
Installing collected packages: pydantic-core, pydantic, langchain-core
  Attempting uninstall: pydantic-core
    Found existing installation: pydantic_core 2.18.2
    Uninstalling pydantic_core-2.18.2:
      Successfully uninstalled pydantic_core-2.18.2
  Attempting 

In [11]:
!pip install -q google-cloud-aiplatform langchain-google-vertexai pydantic==2.7.1 youtube-transcript-api chromadb gradio

In [1]:
from google.colab import auth
auth.authenticate_user()

In [13]:
!apt update && !apt install ffmpeg

[33m0% [Working][0m            Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [79.8 kB]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:7 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Get:11 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:12 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,747 kB]
Get:13 https://developer.downl

In [2]:
# Install required packages
!pip install whisper
!pip install yt-dlp
!apt install ffmpeg  # Required for audio extraction with yt-dlp

Collecting whisper
  Using cached whisper-1.1.10-py3-none-any.whl
Installing collected packages: whisper
Successfully installed whisper-1.1.10
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.


In [None]:
!pip uninstall whisper -y
!pip install -U openai-whisper

In [3]:
import vertexai
from vertexai.generative_models import GenerativeModel, Part
from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_core.messages import HumanMessage
import gradio as gr
import os
import re
from tenacity import retry, stop_after_attempt, wait_exponential
from joblib import Memory

In [4]:
# --- Configuration ---
# Replace with your actual Google Cloud Project ID and desired region
PROJECT_ID = "YOUR_PROJECT_ID"
REGION = "us-central1"

# Initialize Vertex AI
print(f"Initializing Vertex AI with project: {PROJECT_ID}, location: {REGION}")
vertexai.init(project=PROJECT_ID, location=REGION)

# --- Initialize Models and Global State ---
llm_model = None
embedding_model = None
embeddings_for_chroma = None
global_retriever = None  # Store the retriever for the loaded video
current_video_url = None  # Track the currently loaded video
memory = Memory("cache_directory", verbose=0)  # Initialize joblib cache

# Attempt to load LLM: gemini-2.0-flash-001, fallback to gemini-pro if not found
try:
    llm_model = GenerativeModel("gemini-2.0-flash-001")
    print("Attempting to load gemini-2.0-flash-001 for LLM.")
    _ = llm_model.generate_content([Part.from_text("Hello")], generation_config={"max_output_tokens": 10})
    print("Successfully loaded and tested gemini-2.0-flash-001 for LLM.")
except Exception as e:
    print(f"Warning: Could not fully load or access gemini-2.0-flash-001. Error: {e}. Falling back to gemini-pro.")
    try:
        llm_model = GenerativeModel("gemini-pro")
        _ = llm_model.generate_content([Part.from_text("Hello")], generation_config={"max_output_tokens": 10})
        print("Successfully loaded and tested gemini-pro for LLM.")
    except Exception as e_fallback:
        print(f"CRITICAL ERROR: Could not load gemini-pro either. Error: {e_fallback}. Please check project permissions and model availability.")
        llm_model = None

Initializing Vertex AI with project: data-read-421406, location: us-central1
Attempting to load gemini-2.0-flash-001 for LLM.
Successfully loaded and tested gemini-2.0-flash-001 for LLM.


In [5]:
# Attempt to load Embedding Model: gemini-embedding-001, fallback to textembedding-gecko
try:
    embedding_model = TextEmbeddingModel.from_pretrained("gemini-embedding-001")
    print("Successfully loaded gemini-embedding-001 for embeddings.")
except Exception as e:
    print(f"Warning: Could not load gemini-embedding-001. Error: {e}. Falling back to textembedding-gecko.")
    try:
        embedding_model = TextEmbeddingModel.from_pretrained("textembedding-gecko")
        print("Successfully loaded textembedding-gecko for embeddings.")
    except Exception as e_fallback:
        print(f"CRITICAL ERROR: Could not load textembedding-gecko either. Error: {e_fallback}. Please check project permissions and model availability.")
        embedding_model = None

Successfully loaded gemini-embedding-001 for embeddings.


In [6]:
# Only proceed with creating embedding wrapper if embedding_model loaded successfully
if embedding_model:
    # --- Embedding Function for ChromaDB ---
    class VertexAIGeminiEmbeddings:
        def __init__(self, model_instance):
            self.model = model_instance
            self.output_dimensionality = 768  # Enforce 768 dimensions for Chroma compatibility

        def embed_documents(self, texts: list[str]) -> list[list[float]]:
            batch_size = 50  # Process texts in batches to reduce API calls
            embeddings = []
            for i in range(0, len(texts), batch_size):
                batch = texts[i:i + batch_size]
                text_inputs = [TextEmbeddingInput(text, "RETRIEVAL_DOCUMENT") for text in batch]
                kwargs = {"output_dimensionality": self.output_dimensionality}
                try:
                    embedding_response = self.model.get_embeddings(text_inputs, **kwargs)
                    embeddings.extend([r.values if r.values else [0.0] * self.output_dimensionality for r in embedding_response])
                except Exception as e:
                    print(f"Error during batch embedding documents: {e}")
                    embeddings.extend([[0.0] * self.output_dimensionality] * len(batch))
            return embeddings

        def embed_query(self, text: str) -> list[float]:
            text_input = TextEmbeddingInput(text, "RETRIEVAL_QUERY")
            kwargs = {"output_dimensionality": self.output_dimensionality}
            try:
                embedding_response = self.model.get_embeddings([text_input], **kwargs)
                if embedding_response and embedding_response[0].values:
                    return embedding_response[0].values
                return [0.0] * self.output_dimensionality
            except Exception as e:
                print(f"Error during embedding query: {e}")
                return [0.0] * self.output_dimensionality

    embeddings_for_chroma = VertexAIGeminiEmbeddings(embedding_model)
else:
    print("Embeddings model not loaded, Q&A functionality will be limited.")
    embeddings_for_chroma = None

In [7]:
# Function to validate YouTube URL
def validate_youtube_url(url):
    youtube_regex = r'^(https?://)?(www\.)?(youtube\.com|youtu\.be)/.+$'
    return bool(re.match(youtube_regex, url))

# Function to Process YouTube Video
@memory.cache
def process_video(video_url):
    global global_retriever, current_video_url
    if not embeddings_for_chroma:
        raise ValueError("Embedding model not initialized. Cannot process video.")

    if not validate_youtube_url(video_url):
        raise ValueError("Invalid YouTube URL. Please provide a valid YouTube video URL.")

    # Extract video ID from playlist URL if present
    video_id = re.search(r'(?:v=|\/)([0-9A-Za-z_-]{11})', video_url)
    if not video_id:
        raise ValueError("Could not extract video ID from URL. Please provide a valid video URL.")
    video_id = video_id.group(1)
    print(f"Extracted video ID: {video_id}")

    if "list=" in video_url:
        print(f"Playlist URL detected. Processing video ID: {video_id} instead of playlist.")
        video_url = f"https://www.youtube.com/watch?v={video_id}"  # Use the specific video URL

    print(f"Loading YouTube video transcript for: {video_url}...")
    loader = YoutubeLoader.from_youtube_url(video_url)
    try:
        from tenacity import retry, stop_after_attempt, wait_exponential
        @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
        def load_with_retry():
            return loader.load()
        docs = load_with_retry()
    except Exception as e:
        print(f"Failed to load public transcript: {str(e)}. Falling back to Whisper...")
        import whisper
        import yt_dlp
        ydl_opts = {
            'format': 'bestaudio/best',
            'postprocessors': [{
                'key': 'FFmpegExtractAudio',
                'preferredcodec': 'mp3',
                'preferredquality': '192',
            }],
            'outtmpl': f'temp_audio_{video_id}.mp3',  # Unique filename per video
            'quiet': False,  # Enable verbose output for debugging
        }
        try:
            with yt_dlp.YoutubeDL(ydl_opts) as ydl:
                print(f"Downloading audio for video ID: {video_id}...")
                ydl.download([video_url])
            model = whisper.load_model("medium")  # Should work with openai-whisper
            print("Transcribing audio with Whisper...")
            result = model.transcribe(f'temp_audio_{video_id}.mp3')  # Auto-detect language
            print(f"Detected language: {result['language']}")
            print(f"Transcribed text: {result['text'][:200]}...")  # Log first 200 chars for debugging
            docs = [result["text"]]
            os.remove(f'temp_audio_{video_id}.mp3')  # Clean up temporary file
        except Exception as e:
            print(f"Failed to load audio: {str(e)}. This video may have restricted audio or an issue with ffmpeg/yt-dlp. Please try another video.")
            raise ValueError("Audio extraction failed. Please try a different video.")

    if not docs:
        raise ValueError("This video does not have a public transcript, and Whisper transcription failed. Please try another video.")

    print(f"Loaded {len(docs)} document(s) from transcript.")
    print("Splitting documents into chunks...")
    splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=150)
    chunks = splitter.split_documents(docs)
    print(f"Split into {len(chunks)} chunks.")

    print("Creating Chroma vectorstore with embeddings...")
    vectorstore = Chroma.from_documents(chunks, embedding=embeddings_for_chroma)
    print("Chroma vectorstore created.")
    retriever = vectorstore.as_retriever()

    global_retriever = retriever
    current_video_url = video_url
    return retriever

In [8]:
# Function to Ask Question
def ask_question(video_url: str, question: str, chat_history: list):
    global global_retriever, current_video_url
    if llm_model is None or embeddings_for_chroma is None:
        chat_history.append(["Bot", "Error: LLM or Embedding model failed to initialize."])
        return chat_history

    # Check for quit command
    if question.lower().strip() == "quit":
        chat_history.append(["User", "quit"])
        chat_history.append(["Bot", "Conversation ended. You can load a new video or ask more questions."])
        global_retriever = None
        current_video_url = None
        return chat_history

    try:
        # Reset and reprocess if URL changes
        if global_retriever and video_url != current_video_url:
            global_retriever = None
            chat_history.append(["Bot", f"Switching to new video: {video_url}..."])
        # Process video if not yet loaded or URL changed
        if not global_retriever:
            chat_history.append(["Bot", f"Processing video: {video_url}..."])
            retriever = process_video(video_url)
        else:
            retriever = global_retriever

        print(f"Retrieving relevant documents for question: '{question}'...")
        docs = retriever.get_relevant_documents(question)
        print(f"Found {len(docs)} relevant documents.")

        # Take top 2 for context
        context = ""
        timestamps = []
        if docs:
            for doc in docs[:2]:
                content = doc.page_content
                timestamp_matches = re.findall(r'(\d+:\d+)', content)
                if timestamp_matches:
                    timestamps.extend(timestamp_matches)
                context += content + "\n"
        else:
            chat_history.append(["User", question])
            chat_history.append(["Bot", "No relevant context found in the video transcript for your question."])
            return chat_history

        print("\nConstructing prompt for LLM...")
        prompt_parts = [
            Part.from_text("You are a helpful AI assistant. Based on the video transcript below, answer the user's question in a clear and concise manner."),
            Part.from_text(f"\nContext: {context}"),
            Part.from_text(f"\nQuestion: {question}"),
            Part.from_text("\nAnswer:")
        ]

        print("Invoking LLM to get response...")
        response = llm_model.generate_content(
            prompt_parts,
            generation_config={"temperature": 0.2, "max_output_tokens": 512}
        )
        print("\n--- LLM Response ---")

        # Format response with markdown
        formatted_response = f"**Answer:**\n{response.text}"
        if timestamps:
            video_id = re.search(r'(?:v=|\/)([0-9A-Za-z_-]{11})', video_url)
            if video_id:
                for ts in timestamps:
                    minutes, seconds = map(int, ts.split(':'))
                    total_seconds = minutes * 60 + seconds
                    formatted_response += f"\n- [Jump to {ts}](https://www.youtube.com/watch?v={video_id.group(1)}&t={total_seconds}s)"

        chat_history.append(["User", question])
        chat_history.append(["Bot", formatted_response])
        return chat_history

    except Exception as e:
        chat_history.append(["User", question])
        chat_history.append(["Bot", f"Error during Q&A: {str(e)}"])
        return chat_history

In [10]:
# --- Gradio Interface ---
if llm_model is None or embeddings_for_chroma is None:
    print("\nSkipping Gradio interface launch due to critical model loading errors.")
else:
    print("\nLaunching Gradio Interface...")

    def process_video_on_change(video_url, chat_history):
        global global_retriever, current_video_url
        try:
            if not validate_youtube_url(video_url):
                chat_history.append(["Bot", "Invalid YouTube URL. Please provide a valid YouTube video URL."])
                return chat_history, "Error"
            # Reset if URL changes
            if global_retriever and video_url != current_video_url:
                global_retriever = None
                chat_history.append(["Bot", f"Switching to new video: {video_url}..."])
            if not global_retriever:
                chat_history.append(["Bot", f"Loading video transcript for: {video_url}..."])
                retriever = process_video(video_url)
                chat_history.append(["Bot", f"Video loaded: {video_url}. You can now ask questions. Type 'quit' to end the conversation."])
            return chat_history, "Video loaded"
        except Exception as e:
            chat_history.append(["Bot", f"Error loading video: {str(e)}"])
            return chat_history, "Error"

    def quit_conversation(chat_history):
        global global_retriever, current_video_url
        chat_history.append(["Bot", "Conversation ended. You can load a new video or ask more questions."])
        global_retriever = None
        current_video_url = None
        return [], "Ready"  # Return empty chat history to refresh

    with gr.Blocks() as interface:
        gr.Markdown("# YouTube Video Q&A with Gemini on Vertex AI")
        gr.Markdown("Enter a YouTube video URL to load it automatically and ask questions about its content. Type 'quit' or click 'Quit' to end the conversation.")

        with gr.Row():
            with gr.Column(scale=2):
                video_url_input = gr.Textbox(label="YouTube Video URL", placeholder="e.g., https://www.youtube.com/watch?v=1lm4Wlpy2wU")
            with gr.Column(scale=1):
                status_output = gr.Textbox(label="Status", value="Ready", interactive=False)

        chatbot = gr.Chatbot(label="Conversation", height=400)
        question_input = gr.Textbox(label="Ask a Question", placeholder="e.g., How do you get a transcript from a YouTube video?")
        with gr.Row():
            ask_button = gr.Button("Ask")
            quit_button = gr.Button("Quit")

        chat_history = gr.State(value=[])


        # Trigger video processing when URL changes
        video_url_input.change(
            fn=process_video_on_change,
            inputs=[video_url_input, chat_history],
            outputs=[chatbot, status_output]
        )
        ask_button.click(ask_question, inputs=[video_url_input, question_input, chat_history], outputs=[chatbot])
        quit_button.click(quit_conversation, inputs=[chat_history], outputs=[chatbot, status_output])

    interface.launch(share=True)


Launching Gradio Interface...


  chatbot = gr.Chatbot(label="Conversation", height=400)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://a91035599c93431507.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
