# Building a RAG application from scratch

Here is a high-level overview of the system we will be building

In [74]:
import os 
from dotenv import load_dotenv
import google.generativeai as genai

load_dotenv()

api_key = genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=cdiD-9MMpb0"
yt_link = "https://www.youtube.com/watch?v=cdiD-9MMpb0"
yt2_link = "https://youtu.be/Yq0QkCxoTHM?si=3IzccKClJ-sCQnsJ"

# Setting up the model

Define the LLM model that we'll use as part of the workflow

In [77]:
from langchain_google_genai import ChatGoogleGenerativeAI

model = ChatGoogleGenerativeAI(
    model = "gemini-2.5-flash",
    google_api_key=api_key,
    temperature=0.7
    )

E0000 00:00:1759782384.800676 4165676 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.
E0000 00:00:1759782384.803935 4165676 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


### Testing the model by asking a simple question

In [78]:
resp = model.invoke("when did lakers last win NBA?")
resp

AIMessage(content='The Los Angeles Lakers last won the NBA championship in **2020**.', response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []}, id='run-e5c8e6f3-50c8-4d64-b4a7-0b98f647fd4c-0', usage_metadata={'input_tokens': 9, 'output_tokens': 16, 'total_tokens': 135})

### Now we will be "extracting" this answer by chaining the model with an output parser

Here is what chaining the model with an output parser looks like:

In [79]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser

resp = chain.invoke("when did lakers last win NBA?")
resp

'The Los Angeles Lakers last won the NBA championship in **2020**.'

# Introducing Prompt Templates

We want to provide the model with some context and the question. 
Prompt templates are a simple way to define and reuse prompts

In [80]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't answer the question, reply "I don't know". 
Context" {context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt.format(context = "Palash is in Coreografia, Coreografia is dance club", question = "What is Coreografia?")

'Human: \nAnswer the question based on the context below. If you can\'t answer the question, reply "I don\'t know". \nContext" Palash is in Coreografia, Coreografia is dance club\nQuestion: What is Coreografia?\n'

Okay so now that we have this, we can chain the prompt with the model and the output parser

In [81]:
chain = prompt | model | parser
answer = chain.invoke({
    "context":"Palash is a member in Coreografia, Coreografia is a dance club in Manipal University Jaipur (MUJ).",
    "question":"Who Palash?"
    })
answer

'Palash is a member of Coreografia, a dance club in Manipal University Jaipur (MUJ).'

# Combining Chains

Now we will be combining chains to create a more complex workflow. 

For example, let's create a second chain that translates the answer from the first chain into a different language

For the same, we will first have to start by creating a new prompt template for the translation chain:

In [37]:
translation_prompt = ChatPromptTemplate.from_template("Translate {answer} to {language}.")

We can now create a new translation chain that combines the result from the first chain with the translation prompt.

Here is what the new workflow looks like:

In [52]:
from operator import itemgetter

# translation_chain = (
#     {"answer": chain, "language" : itemgetter("language") | translation_prompt | model | parser}    
# )

# translation_chain = (
#     prompt | model | parser | {"answer": chain, "language" : itemgetter("language")} | translation_prompt | model | parser
# )

# translation_chain = translation_prompt | model | parser
# translation_chain.invoke({
#     "answer" : answer,
#     "language" : "Hindi"
#     })

translation_chain = {"answer": chain, "language" : itemgetter("language")} | translation_prompt | model | parser
translation_chain.invoke({
    "context":"Palash is a member in Coreografia, Coreografia is a dance club in Manipal University Jaipur (MUJ).",
    "question":"Who is Palash?",
    "language":"Spanish"
    })

'Aquí tienes la traducción:\n\n**Palash es miembro de Coreografia, un club de baile en Manipal University Jaipur (MUJ).**'

# Transcribing the video

As we are working with sourcing a video, we will have to transcribe the video in order to source from it.

In order to do that, we will first download the video and then transcribe it.

In order to transcribe it, we will first have to convert the video from an mp4 to an mp3 file and then pass that to the llm to generate content.

The generated content will then be used as the content for the prompt to the model and questions can be used to ask questions regarding the video.

In [56]:
from pytube import YouTube

def transcribe_yt(link):
    print("Downloading video...")
    yt = YouTube(link)
    audio_stream = yt.streams.filter(only_audio=True).first()

    # Downloading audio to a temporary file
    temp_path = "temp_audio.mp3"
    audio_stream.download(filename=temp_path)
    print("Download complete.")

    # Now we upload the file to gemini
    print("Uploading to Gemini...")
    model = genai.GenerativeModel(model="gemini-2.5-flash")
    audio_file = genai.upload_file(temp_path)

    # Asking gemini to transcribe it
    response = model.generate_content([
        "Transcribe this audio file clearly and accurately.",
        audio_file
    ])

    print("Transcription complete.")
    return response.text

In [75]:
import os
import yt_dlp
import google.generativeai as genai

# --- Step 1: Make sure ffmpeg is visible to this environment ---
os.environ["PATH"] += os.pathsep + "/opt/homebrew/bin"  # Adjust if ffmpeg is elsewhere

# --- Step 2: Configure Gemini ---
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

def transcribe_yt(link):
    print("🎥 Downloading audio from YouTube...")
    ydl_opts = {
        "format": "bestaudio/best",
        "outtmpl": "temp_audio.%(ext)s",
        "postprocessors": [{
            "key": "FFmpegExtractAudio",
            "preferredcodec": "mp3",
            "preferredquality": "192",
        }],
        "ffmpeg_location": "/opt/homebrew/bin",  # important for macOS ARM users
        "quiet": True
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        ydl.download([link])

    temp_path = "temp_audio.mp3"
    print(f"✅ Audio extracted: {temp_path}")

    # --- Step 3: Upload to Gemini ---
    print("⬆️ Uploading to Gemini...")
    transcribing_model = genai.GenerativeModel("gemini-2.5-flash")  # can change to gemini-1.5-pro if needed
    audio_file = genai.upload_file(temp_path)

    # --- Step 4: Ask Gemini to transcribe ---
    print("🧠 Transcribing...")
    response = transcribing_model.generate_content([
        "Transcribe this audio accurately into English text:",
        audio_file
    ])

    print("✅ Transcription complete!")
    return response.text

# Example usage:
# yt_link = "https://www.youtube.com/watch?v=XXXX"
# transcription = transcribe_yt(yt_link)
# print(transcription[:1000])


In [76]:
transcription = transcribe_yt(yt2_link)
print(transcription[:1000])

🎥 Downloading audio from YouTube...
                                                           

I0000 00:00:1759782260.845604 4165676 fork_posix.cc:71] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1759782260.998372 4165676 fork_posix.cc:71] Other threads are currently calling into gRPC, skipping fork() handlers


✅ Audio extracted: temp_audio.mp3
⬆️ Uploading to Gemini...


E0000 00:00:1759782268.249149 4165676 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


🧠 Transcribing...


E0000 00:00:1759782279.748874 4165676 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


✅ Transcription complete!
If you don't have a technical background, but you still want to learn the basics of artificial intelligence, stick around because we're distilling Google's four-hour AI course for beginners into just 10 minutes. I was initially very skeptical because I thought the course would be too conceptual. We're all about practical tips on this channel. And knowing Google, the course might just disappear after one hour. But I found the underlying concepts actually made me better at using tools like Chat GPT and Google Bard, and cleared up a bunch of misconceptions I didn't know I had about AI, machine learning, and large language models.

So, starting with the broadest possible question, what is artificial intelligence? It turns out, and I'm so embarrassed to admit I didn't know this. AI is an entire field of study, like physics. And machine learning is a sub-field of AI, much like how thermodynamics is a sub-field of physics. Going down another level, deep learning is a

In [83]:
transcription

'If you don\'t have a technical background, but you still want to learn the basics of artificial intelligence, stick around because we\'re distilling Google\'s four-hour AI course for beginners into just 10 minutes. I was initially very skeptical because I thought the course would be too conceptual. We\'re all about practical tips on this channel. And knowing Google, the course might just disappear after one hour. But I found the underlying concepts actually made me better at using tools like Chat GPT and Google Bard, and cleared up a bunch of misconceptions I didn\'t know I had about AI, machine learning, and large language models.\n\nSo, starting with the broadest possible question, what is artificial intelligence? It turns out, and I\'m so embarrassed to admit I didn\'t know this. AI is an entire field of study, like physics. And machine learning is a sub-field of AI, much like how thermodynamics is a sub-field of physics. Going down another level, deep learning is a subset of machi

In [84]:
prompt = ChatPromptTemplate.from_template(
    "You are a helpful assistant. Use the provided video transcript to answer the question.\n\n"
    "Transcript:\n{context}\n\n"
    "Question: {question}\n\n"
    "Answer clearly and concisely:"
)

In [86]:
transcribing_chain = prompt | model | parser
transcribing_chain.invoke({
    "context":transcription,
    "question":"What is the video about?"
})

"The video is about distilling Google's four-hour AI course for beginners into a 10-minute summary, covering the basics of artificial intelligence, machine learning, deep learning, generative AI, and large language models."