In [None]:
from youtube_transcript_api import YouTubeTranscriptApi
from langchain.text_splitter import RecursiveCharacterTextSplitter

video_id = "tqPQB5sleHY"  # Example YouTube video ID

ytt_api = YouTubeTranscriptApi()
fetched = ytt_api.fetch(video_id)
raw_data = fetched.to_raw_data()

full_text = " ".join([item["text"] for item in raw_data])
print(full_text)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.create_documents([full_text])

Large language models. They are everywhere. They get some things amazingly right and other things very interestingly wrong. My name is Marina Danilevsky. I am a Senior Research Scientist here at IBM Research. And I want to tell you about a framework to help large language models be more accurate and more up to date: Retrieval-Augmented Generation, or RAG. Let's just talk about the "Generation" part for a minute. So forget the "Retrieval-Augmented". So the generation, this refers to large language models, or LLMs, that generate text in response to a user query, referred to as a prompt. These models can have some undesirable behavior. I want to tell you an anecdote to illustrate this. So my kids, they recently asked me this question: "In our solar system, what planet has the most moons?" And my response was, “Oh, that's really great that you're asking this question. I loved space when I was your age.” Of course, that was like 30 years ago. But I know this! I read an article and the artic

In [2]:
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}: {chunk.page_content[:100]}...")  # Print first 100 characters of each chunk

Chunk 1: Large language models. They are everywhere. They get some things amazingly right and other things ve...
Chunk 2: And my response was, “Oh, that's really great that you're asking this question. I loved space when I...
Chunk 3: interacting with large language models. They’re LLM challenges. Now, what would have happened if I'd...
Chunk 4: believable. I have not hallucinated or made up an answer. Oh, by the way, I didn't leak personal inf...
Chunk 5: are adding a content store. This could be open like the internet. This can be closed like some colle...
Chunk 6: "No, no, no." "First, go and retrieve relevant content." "Combine that with the user's question and ...
Chunk 7: next time that a user comes and asks the question, we're ready. We just go ahead and retrieve the mo...
Chunk 8: a negative effect as well though, because if the retriever is not sufficiently good to give the larg...


In [3]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings=HuggingFaceEmbeddings(
    model_name="intfloat/e5-base-v2",
)




In [4]:
from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_documents(chunks, embeddings)

In [5]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k":5})
result=retriever.invoke("What is RAG ?")
result

[Document(id='4bdde9aa-24f7-44a8-8497-000c097c948d', metadata={}, page_content='"No, no, no." "First, go and retrieve\xa0relevant content." "Combine that with the user\'s question and only then generate the\xa0answer." So the prompt now has three parts: the instruction to pay attention to, the retrieved\xa0content, together with the user\'s question. Now give a response. And in fact, now you can give\xa0evidence for why your response was what it was.\xa0\xa0 So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before?\xa0\xa0 So first of all, I\'ll start with the out of\xa0date part. Now, instead of having to retrain your model, if new information comes up, like, hey,\xa0we found some more moons-- now to Jupiter again, maybe it\'ll be Saturn again in the future. All\xa0you have to do is you augment your data store with new information, update information. So now the next time that a user comes and asks the question, we\'re ready. We just go ahead 

In [6]:
context ="\n\n".join(doc.page_content for doc in result)
context

'"No, no, no." "First, go and retrieve\xa0relevant content." "Combine that with the user\'s question and only then generate the\xa0answer." So the prompt now has three parts: the instruction to pay attention to, the retrieved\xa0content, together with the user\'s question. Now give a response. And in fact, now you can give\xa0evidence for why your response was what it was.\xa0\xa0 So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before?\xa0\xa0 So first of all, I\'ll start with the out of\xa0date part. Now, instead of having to retrain your model, if new information comes up, like, hey,\xa0we found some more moons-- now to Jupiter again, maybe it\'ll be Saturn again in the future. All\xa0you have to do is you augment your data store with new information, update information. So now the next time that a user comes and asks the question, we\'re ready. We just go ahead and retrieve the most up to date information. The second problem, source. Well,

In [7]:
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate

In [12]:
prompt=PromptTemplate(
    template='''
    You are a helpful assistant. Use only the context provided to answer the question. if the answer is not present there, say: "I don't know"
    Context: {context}
    Question: {question}
'''
)

final_prompt=prompt.format(context=context, question="Who is the talking in the video ?")

In [13]:
final_prompt

'\n    You are a helpful assistant. Use only the context provided to answer the question. if the answer is not present there, say: "I don\'t know"\n    Context: "No, no, no." "First, go and retrieve\xa0relevant content." "Combine that with the user\'s question and only then generate the\xa0answer." So the prompt now has three parts: the instruction to pay attention to, the retrieved\xa0content, together with the user\'s question. Now give a response. And in fact, now you can give\xa0evidence for why your response was what it was.\xa0\xa0 So now hopefully you can see, how does RAG help the two LLM challenges that I had mentioned before?\xa0\xa0 So first of all, I\'ll start with the out of\xa0date part. Now, instead of having to retrain your model, if new information comes up, like, hey,\xa0we found some more moons-- now to Jupiter again, maybe it\'ll be Saturn again in the future. All\xa0you have to do is you augment your data store with new information, update information. So now the n

In [14]:
chat=ChatOllama(
    model="mistral:7b",
    temperature=.5
)

answer=chat.invoke(final_prompt)
print(answer)

content=' The person speaking in the video is Marina Danilevsky, a Senior Research Scientist at IBM Research.' additional_kwargs={} response_metadata={'model': 'mistral:7b', 'created_at': '2025-08-08T14:18:19.1001745Z', 'done': True, 'done_reason': 'stop', 'total_duration': 105552786200, 'load_duration': 6361876700, 'prompt_eval_count': 1087, 'prompt_eval_duration': 95686714000, 'eval_count': 23, 'eval_duration': 3498733700, 'model_name': 'mistral:7b'} id='run--0affe117-ed91-4419-8baf-dcc5db7b374d-0' usage_metadata={'input_tokens': 1087, 'output_tokens': 23, 'total_tokens': 1110}
