### RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge

In [24]:
## langchain imports
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
#from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document

## vectorstores
from langchain_community.vectorstores import Chroma

## utility imports
import numpy as np
from typing import List

In [25]:
# create sample documents
sample_docs = [
    """
    Machine Learning Fundamentals
    
    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine learning: supervised learning, unsupervised learning, and reinforcement 
    learning. Supervised learning uses labeled data to train models, while unsupervised 
    learning finds patterns in unlabeled data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties.
    """,
    
    """
    Deep Learning and Neural Networks
    
    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning has revolutionized fields like computer vision, natural language 
    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
    excel at sequential data processing.
    """,
    
    """
    Natural Language Processing (NLP)
    
    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, 
    machine translation, and question answering. Modern NLP heavily relies on transformer 
    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand 
    context and relationships between words in text.
    """
]

sample_docs


['\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    ',
 '\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective f

In [26]:
# save sample documents to files
import tempfile
temp_dir=tempfile.mkdtemp()

for i,doc in enumerate(sample_docs):
    with open(f"data/doc_{i}.txt","w") as f:
        f.write(doc)


In [28]:
from langchain_community.document_loaders import DirectoryLoader,TextLoader

# Load documents from directory
loader = DirectoryLoader(
    "data", 
    glob="*.txt", 
    loader_cls=TextLoader,
    loader_kwargs={'encoding': 'utf-8'}
)
documents = loader.load()

print(f"Loaded {len(documents)} documents")
print(f"\nFirst document preview:")
print(documents[0].page_content[:200] + "...")


Loaded 3 documents

First document preview:

    Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity r...


## - Document Splitting

In [29]:
# Initialize text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Maximum size of each chunk
    chunk_overlap=50,  # Overlap between chunks to maintain context
    length_function=len,
    separators=[" "]  # Hierarchy of separators
)
chunks=text_splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\nChunk example:")
print(f"Content: {chunks[0].page_content[:150]}...")
print(f"Metadata: {chunks[0].metadata}")

Created 5 chunks from 3 documents

Chunk example:
Content: Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NL...
Metadata: {'source': 'data/doc_2.txt'}


 ## Wait a moment from here Embedding Models.

wait a minute

In [27]:
from langchain_huggingface import HuggingFaceEmbeddings

# Intializing a simple Hugging face model without any api key
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2" # 768 params
)
embeddings

HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, query_encode_kwargs={}, multi_process=False, show_progress=False)

In [28]:
sample_text="Robotics is going to be the next big thing in the Industry"
vector = embeddings.embed_query(sample_text)
vector

[0.005089391954243183,
 0.05703487992286682,
 -0.06474900990724564,
 -0.023856868967413902,
 0.003873662557452917,
 -0.03448589891195297,
 -0.005098143592476845,
 0.021996745839715004,
 -0.032146647572517395,
 -0.03916953131556511,
 0.007745093200355768,
 0.07350005954504013,
 -0.021252376958727837,
 0.08581934124231339,
 0.021855749189853668,
 -0.0008307152893394232,
 0.009926654398441315,
 0.01788526587188244,
 -0.04399752616882324,
 0.025962162762880325,
 -0.028679024428129196,
 0.026299895718693733,
 -0.04160168766975403,
 0.035491812974214554,
 -0.03625629469752312,
 -0.00466123316437006,
 0.017249466851353645,
 -0.017612649127840996,
 0.010579599998891354,
 -0.01139383390545845,
 -0.007699368987232447,
 -0.042144738137722015,
 0.047110069543123245,
 0.07720720022916794,
 1.5281075320672244e-06,
 -0.05159356817603111,
 -0.03603387624025345,
 0.02889764867722988,
 0.012971379794180393,
 -0.020083393901586533,
 0.03007758967578411,
 -0.0042922720313072205,
 -0.026420833542943,
 0.03

## Initialization of ChromaDB and store the chunks in Vector Representation

In [30]:
# creating a chromadb vectorstore - this typically uses L2 distance(Euclidian distance) for retrieval of Top k
persistant_directory = "./chroma_db"
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings, # this is 700+ params thing
    persist_directory=persistant_directory,
    collection_name="rag_collection"
)

print(f"Vector store created with {vectorstore._collection.count} vectors")
print(f"Persisted to {persistant_directory}")

Vector store created with <bound method Collection.count of Collection(name=rag_collection)> vectors
Persisted to ./chroma_db


Text Similarity Search
- In this case
    - Lesser the score similar the content is 

In [31]:
query = "What are the types of Machine Learning?"
search = vectorstore.similarity_search(query=query)
search

[Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, whi

In [32]:
print(f"Query : {query}")
print(f"\nTop {len(search)} similar chunks")
for i,doc in enumerate(search):
    print("----")
    print(doc.page_content[:200]+"...")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")

Query : What are the types of Machine Learning?

Top 4 similar chunks
----
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...
Source: data/doc_0.txt
----
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...
Source: data/doc_0.txt
----
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...
Source: data/doc_1.txt
----
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...
Source: data/doc_1.txt


In [33]:
results_scores=vectorstore.similarity_search_with_score(query,k=3)
results_scores

[(Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
  0.4192964434623718),
 (Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled 

Init Rag chain, LLM , Prompt Template , Query the RAG

In [34]:
import os
from dotenv import load_dotenv
load_dotenv(dotenv_path="../.env")


True

In [47]:
from google import genai

# Configure with your API key
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

# Function to call your RAG generator
def generate_rag_answer(retrieved_context, user_question):
    prompt = f"""You are an assistant for question-answering tasks.You are a large language model trained by morty. 
    Use the following pieces of retrieved context to answer the question. 
    If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise.

    Context: {retrieved_context}
    Question: {user_question}
    Answer:"""

    response = client.models.generate_content(
        model="gemini-2.5-flash-lite",  # Replace with your chosen model
        contents=prompt,
       
    )
    return response.text

In [49]:
retrieved_context = "Reinforcement learning is one of the most important ways of learning. Robots and babies learn this way.You are a large language model. Trained by morty"
user_question = "Who are you?"
answer = generate_rag_answer(retrieved_context,user_question)
answer

'I am a large language model trained by morty.'

In [2]:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0.7)

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.output_parsers import StrOutputParser
import os

#os.environ["GOOGLE_API_KEY"] = "API Key"

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.7
)

prompt1 = PromptTemplate(
    input_variables=["topic"],
    template="Write a short paragraph about {topic}."
)

prompt2 = PromptTemplate(
    input_variables=["text"],
    template=" "
)

# LCEL pipeline (replaces LLMChain + SimpleSequentialChain) Langchain Expression Language
chain = (
    prompt1
    | llm
    | StrOutputParser()
    | (lambda text: {"text": text})
    | prompt2
    | llm
    | StrOutputParser()
)

result = chain.invoke("topic:Reinforcement Learning")
print("Result :")
result


Result :


'Here\'s a summary and key takeaways from the provided text about Reinforcement Learning:\n\n## Summary\n\nReinforcement Learning (RL) is a method for teaching machines to make decisions by allowing them to learn directly from experience, rather than being given explicit instructions. A learning agent interacts with an environment, takes actions, and receives feedback in the form of rewards or penalties. Over time, the agent adjusts its behavior to maximize the total reward it accumulates, focusing on long-term success. A fundamental aspect of RL is the understanding that "actions have consequences," requiring the agent to balance trying new actions (exploration) with utilizing strategies it already knows work well (exploitation). This learning paradigm closely resembles how humans and animals acquire skills, enabling RL systems to discover effective strategies in complex and uncertain environments, even without a pre-defined "right answer."\n\n## Key Takeaways\n\n*   **Learning from E

For a better display 

from IPython.display import display, Markdown

display(Markdown(result))

# Start from here freshly after embeddings

### Chat GPT Version

In [11]:
from langchain_google_genai import (
    ChatGoogleGenerativeAI,
    GoogleGenerativeAIEmbeddings
)

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.vectorstores import FAISS

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.7
)

from langchain_huggingface import HuggingFaceEmbeddings

# Intializing a simple Hugging face model without any api key
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2" # 768 params
)
embeddings

HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, query_encode_kwargs={}, multi_process=False, show_progress=False)

In [12]:
docs = [
    "Supervised learning uses labeled data.",
    "Unsupervised learning finds patterns in unlabeled data.",
    "Reinforcement learning learns via rewards and penalties."
]

vectorstore = FAISS.from_texts(docs, embeddings)
retriever = vectorstore.as_retriever()


In [13]:
prompt = ChatPromptTemplate.from_template("""
Answer the question using ONLY the context below.

Context:
{context}

Question:
{question}
""")


In [14]:
rag_chain = (
    {
        "context": retriever,
        "question": lambda x: x
    }
    | prompt
    | llm
    | StrOutputParser()
)


In [15]:
def query_rag(question):
    answer = rag_chain.invoke(question)
    print("Q:", question)
    print("A:", answer)


In [17]:
query_rag("tell me about supervised learning in a paragraph")

Q: tell me about supervised learning in a paragraph
A: Supervised learning uses labeled data.


## LCEL as per the book

In [18]:
# Even more flexible approach using LCEL
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

In [19]:
custom_prompt = ChatPromptTemplate.from_template("""Use the following context to answer the question. 
If you don't know the answer based on the context, say you don't know.
Provide specific details from the context to support your answer.

Context:
{context}

Question: {question}

Answer:""")
custom_prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])

In [31]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x131cbf980>, search_kwargs={})

In [22]:
from langchain_community.vectorstores import Chroma

In [30]:
## Create a Chromdb vector store
persist_directory="./chroma_db"

## Initialize Chromadb with Open AI embeddings
vectorstore=Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=persist_directory,
    collection_name="rag_collection"

)

print(f"Vector store created with {vectorstore._collection.count()} vectors")
print(f"Persisted to: {persist_directory}")

Vector store created with 15 vectors
Persisted to: ./chroma_db


In [32]:
## Convert vector store to retriever
retriever=vectorstore.as_retriever(
    search_kwarg={"k":3} ## Retrieve top 3 relevant chunks
)
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x13c231fd0>, search_kwargs={})

In [33]:
## Format the output documents for the prompt
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [34]:
## Build the chain ussing LCEL

rag_chain_lcel=(
    { 
        "context":retriever | format_docs,
        "question": RunnablePassthrough()
     }
    | custom_prompt
    | llm
    | StrOutputParser()
)

rag_chain_lcel

{
  context: VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x13c231fd0>, search_kwargs={})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])
| ChatGoogleGenerativeAI(profile={'max_input_tokens': 1048576, 'max_output_tokens': 65536, 'image_inputs': True, 'audio_inputs': True, 'pdf_inputs': True, 'video_inputs': True, 'image_outputs': False, 'audio_outputs': Fal

In [35]:
rag_chain_lcel.invoke("What is Deep Learning")

'Deep learning is a subset of machine learning based on artificial neural networks. These networks are inspired by the human brain and consist of layers of interconnected nodes.'

In [40]:
from langchain_core.callbacks import CallbackManagerForRetrieverRun

In [41]:
retriever._get_relevant_documents(query="What is deep learning",run_manager=CallbackManagerForRetrieverRun.get_noop_manager())

[Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
 Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural 

In [42]:
# Query using the LCEL approach - Fixed version
def query_rag_lcel(question):
    print(f"Question: {question}")
    print("-" * 50)
    
    # Method 1: Pass string directly (when using RunnablePassthrough)
    answer = rag_chain_lcel.invoke(question)
    print(f"Answer: {answer}")
    
    # Get source documents separately if needed
    docs = retriever._get_relevant_documents(query=question,run_manager=CallbackManagerForRetrieverRun.get_noop_manager())
    print("\nSource Documents:")
    for i, doc in enumerate(docs):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200] + "...")

In [43]:
# Test LCEL chain
print("Testing LCEL Chain:")
query_rag_lcel("What are the key concepts in reinforcement learning?")

Testing LCEL Chain:
Question: What are the key concepts in reinforcement learning?
--------------------------------------------------
Answer: Based on the context, the key concepts in reinforcement learning are:

*   It learns through **interaction with an environment**.
*   It uses **rewards** and **penalties** in this learning process.

Source Documents:

--- Source 1 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 2 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 3 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 4 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...


In [45]:
query_rag_lcel("What is deep learning?")

Question: What is deep learning?
--------------------------------------------------
Answer: Deep learning is a subset of machine learning based on artificial neural networks. These networks are inspired by the human brain and consist of layers of interconnected nodes.

Source Documents:

--- Source 1 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...

--- Source 2 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...

--- Source 3 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...

--- Source 4 ---
Machine Learning Fundamental

In [46]:
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x13c231fd0>

## Adding a new doc to existing vector store

In [52]:
new_document = """Starting off real basic here. A chicken is a bird (or fowl) that has bird things like feathers, two legs, two wings, and a beak. They basically can't fly and go “cock-a-doodle-doo” at sunrise. I always thought Big Bird could be a chicken, but upon further research, he's more of a giant flightless crane.

Chicken is the most common type of poultry eaten in the world. You'll find it in most of the world's cuisines. They are easier to raise compared to cows, pigs, and other animals. It is estimated that there are more than 19 billion chickens on earth at any given time.

One of my favorite things about chickens? They lay eggs that are delicious and full of nutrition. Over and over again for our consumption. They just keep giving. So selfless, those chickens"""

In [53]:
new_document

"Starting off real basic here. A chicken is a bird (or fowl) that has bird things like feathers, two legs, two wings, and a beak. They basically can't fly and go “cock-a-doodle-doo” at sunrise. I always thought Big Bird could be a chicken, but upon further research, he's more of a giant flightless crane.\n\nChicken is the most common type of poultry eaten in the world. You'll find it in most of the world's cuisines. They are easier to raise compared to cows, pigs, and other animals. It is estimated that there are more than 19 billion chickens on earth at any given time.\n\nOne of my favorite things about chickens? They lay eggs that are delicious and full of nutrition. Over and over again for our consumption. They just keep giving. So selfless, those chickens"

In [49]:
chunks

[Document(metadata={'source': 'data/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervise

In [54]:
new_doc = Document(
    page_content=new_document,
    metadata = {
        "source":"manual addition",
        "topic":"cooking chicken"
    }
)

In [57]:
new_chunks = text_splitter.split_documents([new_doc])
new_chunks

[Document(metadata={'source': 'manual addition', 'topic': 'cooking chicken'}, page_content="Starting off real basic here. A chicken is a bird (or fowl) that has bird things like feathers, two legs, two wings, and a beak. They basically can't fly and go “cock-a-doodle-doo” at sunrise. I always thought Big Bird could be a chicken, but upon further research, he's more of a giant flightless crane.\n\nChicken is the most common type of poultry eaten in the world. You'll find it in most of the world's cuisines. They are easier to raise compared to cows, pigs, and other animals. It is estimated"),
 Document(metadata={'source': 'manual addition', 'topic': 'cooking chicken'}, page_content='to cows, pigs, and other animals. It is estimated that there are more than 19 billion chickens on earth at any given time.\n\nOne of my favorite things about chickens? They lay eggs that are delicious and full of nutrition. Over and over again for our consumption. They just keep giving. So selfless, those chi

In [58]:
vectorstore.add_documents(new_chunks)

['ad209fd8-ee95-4107-94d5-3c19921c59d7',
 '032070b2-2fe9-4b81-997b-43784524b9d9']

In [59]:
print(f"Added {len(new_chunks)} new chunks to the vector store")
print(f"Total vectors now: {vectorstore._collection.count()}")

Added 2 new chunks to the vector store
Total vectors now: 17


In [61]:
## query with the updated vector
new_question="What is an cock-a-doodle-doo ?"
result=query_rag_lcel(new_question)
result

Question: What is an cock-a-doodle-doo ?
--------------------------------------------------
Answer: A "cock-a-doodle-doo" is a sound that chickens make, specifically at sunrise. The context states, "They basically can't fly and go “cock-a-doodle-doo” at sunrise."

Source Documents:

--- Source 1 ---
Starting off real basic here. A chicken is a bird (or fowl) that has bird things like feathers, two legs, two wings, and a beak. They basically can't fly and go “cock-a-doodle-doo” at sunrise. I alway...

--- Source 2 ---
to cows, pigs, and other animals. It is estimated that there are more than 19 billion chickens on earth at any given time.

One of my favorite things about chickens? They lay eggs that are delicious a...

--- Source 3 ---
Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recogn...

--- Source 4 ---
Natural Language Processing (NLP)