### RAG
* RAG is a way to make a language model smarter by giving it extra information at the time you ask your question.

* In-Context Learningis a core capability of Large Language Models (LLMs) like GPT-3/4, Claude, and Llama, where the model learns to solve a task purely by seeing examples in the prompt—without updating its weights.

* An emergent property is a behaviour or ability that suddenly appears in a system when it reaches a certain scale or complexity—even though it was not explicitly programmed or expected from the individual components.

RAG contains 4 important components:
1. Indexing
2. Retrieval
3. Augmentation
4. Generation

In [9]:
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled
from youtube_transcript_api._errors import TranscriptsDisabled
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_groq import ChatGroq
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv
load_dotenv()


True

### Indexing
* It is the process of preparing your knowledge base so that it can be efficiently searchedat query time. This steps consists of 4 sub-steps.

### Step 1(a)-Indexing (Document Ingestion)

1. Document Ingestion - You load your source knowledge into memory.

Examples:
* PDF reports, Word documents
* YouTube transcripts, blog pages
* Github repos, internal wikis
* SQL records, scraped webpages

Tools:
* LangChain loaders (PyPDFLoader, YoutubeLoader, WebBaseLoader, GitLoader, etc)

In [10]:
video_id = "Gfr50f6ZBvo" # only the ID, not full URL
try:
    # If you don’t care which language, this returns the “best” one
    ytt_api = YouTubeTranscriptApi()
    transcript_list = ytt_api.fetch(video_id)

    # Flatten it to plain text
    transcript = " ".join(chunk.text if hasattr(chunk, "text") else chunk["text"] for chunk in transcript_list)
    print(transcript)

except TranscriptsDisabled:
    print("No captions available for this video.")

the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal question am i an ai program you wrote to interview people until i get good enough 

In [11]:
transcript_list

FetchedTranscript(snippets=[FetchedTranscriptSnippet(text='the following is a conversation with', start=0.08, duration=3.44), FetchedTranscriptSnippet(text='demus hasabis', start=1.76, duration=4.96), FetchedTranscriptSnippet(text='ceo and co-founder of deepmind', start=3.52, duration=5.119), FetchedTranscriptSnippet(text='a company that has published and builds', start=6.72, duration=4.48), FetchedTranscriptSnippet(text='some of the most incredible artificial', start=8.639, duration=4.561), FetchedTranscriptSnippet(text='intelligence systems in the history of', start=11.2, duration=4.8), FetchedTranscriptSnippet(text='computing including alfred zero that', start=13.2, duration=3.68), FetchedTranscriptSnippet(text='learned', start=16.0, duration=2.96), FetchedTranscriptSnippet(text='all by itself to play the game of gold', start=16.88, duration=4.559), FetchedTranscriptSnippet(text='better than any human in the world and', start=18.96, duration=5.6), FetchedTranscriptSnippet(text='alph

### Step 1(b)-Indexing (Text Splitting)

2. Text Chunking -Break large documents into small, semantically meaningful chunks.

Why chunk?
* LLMs have context limits (e.g., 4K-32K tokens)
* Smaller chunks are more focused -> better semantic search

Tools:
* RecursiveCharacterTextSplitter, MarkdownHeaderTextSplitter, SemanticChunker

In [12]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.create_documents([transcript])

In [13]:
len(chunks)

168

In [14]:
chunks[100]

Document(metadata={}, page_content="and and kind of come up with descriptions of the electron clouds where they're gonna go how they're gonna interact when you put two elements together uh and what we try to do is learn a simulation uh uh learner functional that will describe more chemistry types of chemistry so um until now you know you can run expensive simulations but then you can only simulate very small uh molecules very simple molecules we would like to simulate large materials um and so uh today there's no way of doing that and we're building up towards uh building functionals that approximate schrodinger's equation and then allow you to describe uh what the electrons are doing and all materials sort of science and material properties are governed by the electrons and and how they interact so have a good summarization of the simulation through the functional um but one that is still close to what the actual simulation would come out with so what um how difficult is that to ask w

### Step 1(c)-Indexing (Embedding Generation)

3. Embedding Generation - Convert each chunk into a dense vector (embedding) that captures its meaning.

Why embeddings?
* Similar ideas land close together in vector space
* Allows fast, fuzzy semantic search

Tools:
* OpenAIEmbeddings, SentenceTransformerEmbeddings, InstructorEmbeddings, etc

In [15]:
emdeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

### Step 1(d)-Indexing (Storing in Vector Store)

4. Storage in a Vector Store - Store the vectors along with the original chunk text + metadata in a vector database.

Vector DB options:
* Local: FAISS, Chroma
* Cloud: Pinecone, Weaviate, Milvus, Qdrant

In [16]:
vector_store = FAISS.from_documents(chunks, emdeddings)

In [17]:
vector_store.index_to_docstore_id

{0: 'a8cc9069-8a93-4568-a04b-eab4c967c9de',
 1: '80226680-930d-403d-bb6b-42009644b4eb',
 2: '292ebd7c-10c1-4f7b-afb3-15bde1264a08',
 3: '87a06245-6c71-43d3-9375-2f03a84b8342',
 4: '29f7b8d1-7ef3-4f56-967c-ef4474145cfd',
 5: 'aa735a29-a848-4e07-aa24-6113127c4195',
 6: 'bdd7383b-b8b5-4177-81b4-badcfa026edd',
 7: 'f869014c-72f7-4790-9412-afb0bb370641',
 8: 'bc7d0f35-0e65-4966-ba2f-f372e734cfae',
 9: 'd5aed109-d62d-49ae-843a-533a8e853da4',
 10: 'af0a9d64-707c-4a54-86a7-0dd5441c150a',
 11: '42c70165-9a8d-42d5-b3be-b13637c91388',
 12: 'fcf6a53f-aac3-4499-bd1f-866b065bacf6',
 13: '9e2f1bea-38a9-49ae-8525-cdd80920e028',
 14: 'dfeb511f-e86f-4324-ab4e-435e8a1004a4',
 15: 'ec96bff3-55ad-4834-9732-4c96492ad026',
 16: '3708699c-d995-40f9-b60a-d442c63aa1a1',
 17: '55c16d7e-d95d-4aca-9c47-7e8b05dfc098',
 18: '59b54c79-4372-46a2-bb1a-1b1cabbcab19',
 19: '892205a3-2aa5-4ffd-89e1-1d84c6dedaa4',
 20: '2e66b223-42d4-4651-9a02-ba33f96d1b7c',
 21: 'a6684ef0-1a41-4543-b6c3-f5d010d02cd5',
 22: '18f1d48b-7018-

In [18]:
vector_store.get_by_ids(['a8cc9069-8a93-4568-a04b-eab4c967c9de'])

[Document(id='a8cc9069-8a93-4568-a04b-eab4c967c9de', metadata={}, page_content="the following is a conversation with demus hasabis ceo and co-founder of deepmind a company that has published and builds some of the most incredible artificial intelligence systems in the history of computing including alfred zero that learned all by itself to play the game of gold better than any human in the world and alpha fold two that solved protein folding both tasks considered nearly impossible for a very long time demus is widely considered to be one of the most brilliant and impactful humans in the history of artificial intelligence and science and engineering in general this was truly an honor and a pleasure for me to finally sit down with him for this conversation and i'm sure we will talk many times again in the future this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal qu

### Step 2-Retrieval

* Retrieval is the real-time process of finding the most relevant pieces of information from a pre-built index (created during indexing) based on the user’s question.

It's like asking:

| "From all the knowledge I have, which 3-5 chunks are most helpful to answer this query?" |

In [19]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [20]:
retriever

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x00000235F2D34C20>, search_kwargs={'k': 4})

In [21]:
retriever.invoke('What is deepmind')

[Document(id='b2586c82-d30d-4ab1-83a3-0260e8378e98', metadata={}, page_content="and how it works this is tough to uh ask you this question because you probably will say it's everything but let's let's try let's try to think to this because you're in a very interesting position where deepmind is the place of some of the most uh brilliant ideas in the history of ai but it's also a place of brilliant engineering so how much of solving intelligence this big goal for deepmind how much of it is science how much is engineering so how much is the algorithms how much is the data how much is the hardware compute infrastructure how much is it the software computer infrastructure yeah um what else is there how much is the human infrastructure and like just the humans interact in certain kinds of ways in all the space of all those ideas how much does maybe like philosophy how much what's the key if um uh if if you were to sort of look back like if we go forward 200 years look back what was the key 

### Step 3-Augmentation
* Augmentation refers to the step where the retrieved documents (chunks of relevant context) are combined with the user’s query to form a new, enriched prompt for the LLM.

In [22]:
llm=ChatGroq(model="qwen/qwen3-32b", temperature=0.2)

In [23]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [26]:
question= "is the topic of nuclear fusion discussed in this video? if yes then what was discussed"
retrieved_docs= retriever.invoke(question)

In [27]:
retrieved_docs

[Document(id='29389d67-dd45-4211-aa6f-9a0cba601303', metadata={}, page_content="in this case in fusion we we collaborated with epfl in switzerland the swiss technical institute who are amazing they have a test reactor that they were willing to let us use which you know i double checked with the team we were going to use carefully and safely i was impressed they managed to persuade them to let us use it and um and it's a it's an amazing test reactor they have there and they try all sorts of pretty crazy experiments on it and um the the the what we tend to look at is if we go into a new domain like fusion what are all the bottleneck problems uh like thinking from first principles you know what are all the bottleneck problems that are still stopping fusion working today and then we look at we you know we get a fusion expert to tell us and then we look at those bottlenecks and we look at the ones which ones are amenable to our ai methods today yes right and and and then and would be intere

In [28]:
context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
context_text

"in this case in fusion we we collaborated with epfl in switzerland the swiss technical institute who are amazing they have a test reactor that they were willing to let us use which you know i double checked with the team we were going to use carefully and safely i was impressed they managed to persuade them to let us use it and um and it's a it's an amazing test reactor they have there and they try all sorts of pretty crazy experiments on it and um the the the what we tend to look at is if we go into a new domain like fusion what are all the bottleneck problems uh like thinking from first principles you know what are all the bottleneck problems that are still stopping fusion working today and then we look at we you know we get a fusion expert to tell us and then we look at those bottlenecks and we look at the ones which ones are amenable to our ai methods today yes right and and and then and would be interesting from a research perspective from our point of view from an ai point of\n\

In [29]:
final_prompt = prompt.invoke({"context": context_text, "question": question})

In [30]:
final_prompt

StringPromptValue(text="\n      You are a helpful assistant.\n      Answer ONLY from the provided transcript context.\n      If the context is insufficient, just say you don't know.\n\n      in this case in fusion we we collaborated with epfl in switzerland the swiss technical institute who are amazing they have a test reactor that they were willing to let us use which you know i double checked with the team we were going to use carefully and safely i was impressed they managed to persuade them to let us use it and um and it's a it's an amazing test reactor they have there and they try all sorts of pretty crazy experiments on it and um the the the what we tend to look at is if we go into a new domain like fusion what are all the bottleneck problems uh like thinking from first principles you know what are all the bottleneck problems that are still stopping fusion working today and then we look at we you know we get a fusion expert to tell us and then we look at those bottlenecks and we 

### Step 4-Generation

In [31]:
answer = llm.invoke(final_prompt)
print(answer.content)

<think>
Okay, let's see. The user is asking if the topic of nuclear fusion is discussed in the video and what was discussed if it is. The provided transcript has a lot of talk about fusion, so I need to parse through that.

First, the speaker mentions collaborating with EPFL in Switzerland, a technical institute, on a test reactor. They talk about using AI methods to address bottleneck problems in fusion. They mention a Nature paper from last year where they held plasma in specific shapes, like "carving the plasma into different shapes" and controlling it for a record time. They also reference working with fusion startups and discussing problems like the fractional electron problem in density functionals. There's mention of using AI controllers for magnetic fields and simulators for plasma behavior. The speaker refers to solving issues in fusion, such as containing plasma, which is a known challenge. They also touch on using reinforcement learning for prediction problems in fusion. 

S

### Building a Chain

In [32]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [33]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
  return context_text

In [34]:
parallel_chain = RunnableParallel({
    'context': retriever | RunnableLambda(format_docs),
    'question': RunnablePassthrough()
})

In [35]:
parallel_chain.invoke('who is Demis')

{'context': "demas establish to support this podcast please check out our sponsors in the description and now let me leave you with some words from edskar dykstra computer science is no more about computers than astronomy is about telescopes thank you for listening and hope to see you next time\n\nout our sponsors in the description and now dear friends here's demis hassabis let's start with a bit of a personal question am i an ai program you wrote to interview people until i get good enough to interview you well i'll be impressed if if you were i'd be impressed by myself if you were i don't think we're quite up to that yet but uh maybe you're from the future lex if you did would you tell me is that is that a good thing to tell a language model that's tasked with interviewing that it is in fact um ai maybe we're in a kind of meta turing test uh probably probably it would be a good idea not to tell you so it doesn't change your behavior right this is a kind of heisenberg uncertainty pri

In [36]:
parser = StrOutputParser()

In [37]:
main_chain = parallel_chain | prompt | llm | parser

In [38]:
main_chain.invoke('Can you summarize the video')

'<think>\nOkay, let\'s see. The user wants a summary of the video based on the provided transcript. First, I need to parse through the transcript to identify the main points.\n\nThe transcript starts with someone discussing how to explain a complex topic, maybe physics, and mentions the standard model not working but being added to. There\'s a mention of a more fundamental explanation, possibly simpler, and the idea of the universe as information or a simulation. Then there\'s a part about computer science and simulations, referencing Edskar Dykstra\'s quote about computer science and telescopes. Later, there\'s a discussion on language as a tool for generalization across tasks and modalities of understanding.\n\nI need to extract the key themes: the search for a deeper understanding of physics beyond the standard model, the concept of information as fundamental, simulation theories, the role of language in communication and problem-solving, and the connection between computer science 