# Example of RAG with a Youtube video

In this example, we download the trascription of a Youtube video and use an LLM for extracting information from that video.

**Please, complete the example_rag.ipynb first to get more insight.**

Let's go!

Install the dependencies:

In [4]:
!pip3 install youtube_transcript_api

Defaulting to user installation because normal site-packages is not writeable
Collecting youtube_transcript_api
  Downloading youtube_transcript_api-0.6.2-py3-none-any.whl (24 kB)
Installing collected packages: youtube-transcript-api
Successfully installed youtube-transcript-api-0.6.2
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m


# Let's download an example transcript from a YT video

You can change the id of the video to download other video transcriptions.
We save the contect to a file

In [5]:
from youtube_transcript_api import YouTubeTranscriptApi

srt = YouTubeTranscriptApi.get_transcript("pxiP-HJLCx0")

with open("subtitles.txt", "a") as file:
    for i in srt:
        file.write(i['text'])



# We instantiate the model and the embeddings

In [26]:
MODEL = "llama3"
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

# We load the subtitles previously saved using TextLoader

In [27]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("subtitles_reduced.txt")
text_documents = loader.load()
text_documents

[Document(page_content="in this video I'm going to tell you thebest laptops for students now for thisone my team and I went absolutely nutswe got in pretty much every viablestudent laptop think I'm joking I am notwe tested an epic 15 laptops everythingfrom Apple's MacBook Air to MicrosoftSurface laptop from Asus zenbooks toSamsung Galaxy books you name it wetested it oh and if you're wonderingwhat our experience with student laptopsis between the three of us who worked onthis video we have a combined fiveUniversity degrees now I value your timeso I'm going to very first briefly hiton some important points that you mustbe aware of when picking a laptop forschool then I'm going to get straightinto what you're actually here to seewhich is which laptops we recommend forvarious types of students firstly thelaptop should be small and portableunless we get hit by another virus whichwe all hope we don't students frequentlymove from classroom to classroom plusthey often have group projects wher

# We explit the document into chunks

In [17]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_splitter.split_documents(text_documents)[:5]

[Document(page_content="in this video I'm going to tell you thebest laptops for students now for thisone my team and I went", metadata={'source': 'subtitles_reduced.txt'}),
 Document(page_content="my team and I went absolutely nutswe got in pretty much every viablestudent laptop think I'm joking", metadata={'source': 'subtitles_reduced.txt'}),
 Document(page_content="think I'm joking I am notwe tested an epic 15 laptops everythingfrom Apple's MacBook Air to", metadata={'source': 'subtitles_reduced.txt'}),
 Document(page_content='MacBook Air to MicrosoftSurface laptop from Asus zenbooks toSamsung Galaxy books you name it', metadata={'source': 'subtitles_reduced.txt'}),
 Document(page_content="books you name it wetested it oh and if you're wonderingwhat our experience with student laptopsis", metadata={'source': 'subtitles_reduced.txt'})]

In [28]:
text_documents

[Document(page_content="in this video I'm going to tell you thebest laptops for students now for thisone my team and I went absolutely nutswe got in pretty much every viablestudent laptop think I'm joking I am notwe tested an epic 15 laptops everythingfrom Apple's MacBook Air to MicrosoftSurface laptop from Asus zenbooks toSamsung Galaxy books you name it wetested it oh and if you're wonderingwhat our experience with student laptopsis between the three of us who worked onthis video we have a combined fiveUniversity degrees now I value your timeso I'm going to very first briefly hiton some important points that you mustbe aware of when picking a laptop forschool then I'm going to get straightinto what you're actually here to seewhich is which laptops we recommend forvarious types of students firstly thelaptop should be small and portableunless we get hit by another virus whichwe all hope we don't students frequentlymove from classroom to classroom plusthey often have group projects wher

# We store the text in a vector space

In [29]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(text_documents, embedding=embeddings)

In [21]:
retriever = vectorstore.as_retriever()

# We instantiate the parser

In [24]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

# Generate the conversation template

In [22]:
from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, answer with "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
prompt.format(context="Here is some context", question="Here is a question")

'\nAnswer the question based on the context below. If you can\'t \nanswer the question, answer with "I don\'t know".\n\nContext: Here is some context\n\nQuestion: Here is a question\n'

# We can now extract the information from the video!

In [25]:
retrieved_context = retriever.invoke("laptop")
questions = [
    "Which is the best laptop for students?",
    "How much is a laptop worth?",
    "Make a summary of the video"
]

for question in questions:
    formatted_prompt = prompt.format(context=retrieved_context, question=question)
    response_from_model = model.invoke(formatted_prompt)
    parsed_response = parser.parse(response_from_model)

    print(f"Question: {question}")
    print(f"Answer: {parsed_response}")
    print()

Question: Which is the best laptop for students?
Answer: I don't know. The context only provides information about what factors to consider when choosing a laptop for students, but it does not specifically recommend a particular laptop model as the "best" one. It mentions testing 15 laptops from various manufacturers and recommends certain features such as a 14-inch screen with high resolution and brightness, but it does not provide a single recommendation.

Question: How much is a laptop worth?
Answer: I don't know. The provided context does not mention the value or price of laptops. It appears to be discussing factors to consider when choosing a laptop for students, such as screen size, resolution, and brightness, but it does not provide pricing information.

Question: Make a summary of th evideo
Answer: The video discusses the best laptops for students, with the presenter sharing their team's experience testing and reviewing multiple options. The presenter emphasizes the importance 