Install libraries

RAG Stage: Environment Setup
We install all required libraries that will give us the ability to load documents, split them, convert them into embeddings (vectors), store them in a vector database, and run an LLM locally for Q/A.

In [None]:
!pip install langchain langchain-community chromadb sentence-transformers transformers accelerate


Load website

RAG Stage: Document Loading (Knowledge Source Collection)
We load a public webpage from Python documentation. The loader extracts visible text from the website and stores it in a list of Document objects (docs), which will later be chunked and embedded.

In [None]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(web_paths=["https://docs.python.org/3/tutorial/introduction.html#lists"])
docs = loader.load()

print("Pages loaded:", len(docs))
print("\nPreview:\n", docs[0].page_content[:300])


Split into chunks

RAG Stage: Chunking (Text Preprocessing for Retrieval)
Large text is harder to embed and search. So we split the webpage text into smaller overlapping chunks (500 characters each, 50 characters overlap) so that each chunk contains meaningful, searchable information without losing continuity.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

splits = splitter.split_documents(docs)

print("Total chunks created:", len(splits))
print("\nExample chunk:\n", splits[0].page_content[:200])

Load embedding model

RAG Stage: Embedding Creation (Convert text â†’ meaning vectors)
We load a lightweight open embedding model from Sentence Transformers. This model will convert each chunk of text into a numerical vector that captures its semantic meaning. These vectors are used for similarity search in retrieval.

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

print("Embedding model loaded!")


Store vectors in ChromaDB + Initialize Retriever

RAG Stage: Indexing & Retrieval Setup
We convert each text chunk into an embedding vector and store it in ChromaDB. We then initialize a retriever, which is a meaning-based search tool that will later fetch the most relevant chunks for any user question.

In [None]:
from langchain_community.vectorstores import Chroma

vector_db = Chroma.from_documents(
    documents=splits,
    embedding=embedding_model
)

retriever = vector_db.as_retriever()

print("Vector database created!")
print("Total vectors stored:", vector_db._collection.count())


Retrieve context for a question

RAG Stage: Retrieval (Search by meaning and fetch relevant context)
We input a question. The retriever converts it into an embedding internally and finds the most semantically similar vectors from the DB. It returns the top matching chunks as Document objects, which we print to verify grounding context.

In [None]:
question = "What are Python lists?"

retrieved_docs = retriever.invoke(question)

print("Retrieved knowledge chunks:\n")
for i, doc in enumerate(retrieved_docs):
    print(i+1, "->", doc.page_content[:150], "\n")

Load the LLM

RAG Stage: LLM Setup (Load free open Q/A-capable model)
We now load a small open-source LLM trained for Q/A and prompt following. flan-t5-base is fully open (not gated), fast, and better than GPT-2 for instruction-based answers. It will generate answers from our augmented prompt.

In [None]:
from transformers import pipeline

llm = pipeline(
    "text2text-generation",
    model="google/flan-t5-base",
    max_new_tokens=150
)

print("LLM loaded!")


Augment prompt and generate answer

RAG Stage: Augmented Generation (Insert retrieved context into prompt and answer)
We combine retrieved chunks into one text block, build a prompt containing only that context and the question, and pass it to the LLM. The LLM reads the real website knowledge and generates a grounded answer from it.

In [None]:
context = "\n".join([doc.page_content for doc in retrieved_docs])

prompt = f"""
Answer the question using only the context below.

Context:
{context}

Question: {question}

Answer:
"""

print("Prompt sent to LLM:\n", prompt)

response = llm(prompt)
print("\nFinal Answer:\n", response[0]["generated_text"])


In [None]:
# -------- RAG Stage: Retrieval --------
question = "Explain Python lists in simple words"

retrieved_docs = retriever.invoke(question)

print("Retrieved Chunks:\n")
for i, doc in enumerate(retrieved_docs):
    print(i+1, "->", doc.page_content[:150], "\n")

# -------- RAG Stage: Augmentation --------
context = "\n".join([doc.page_content for doc in retrieved_docs])

prompt = f"""
Answer the question using only the context below.

Context:
{context}

Question: {question}

Answer:
"""

print("Prompt sent to LLM:\n", prompt)

# -------- RAG Stage: Generation --------
response = llm(prompt)

print("\nFinal Answer:\n", response[0]["generated_text"])


In [None]:
# -------- RAG Stage: Retrieval --------
question = "How do Python loops work?"

retrieved_docs = retriever.invoke(question)

print("Retrieved Chunks:\n")
for i, doc in enumerate(retrieved_docs):
    print(i+1, "->", doc.page_content[:150], "\n")

# -------- RAG Stage: Augmentation --------
context = "\n".join([doc.page_content for doc in retrieved_docs])

prompt = f"""
Answer the question using only the context below.

Context:
{context}

Question: {question}

Answer:
"""

print("Prompt sent to LLM:\n", prompt)

# -------- RAG Stage: Generation --------
response = llm(prompt)

print("\nFinal Answer:\n", response[0]["generated_text"])
