### Building a RAG System with LangChain and ChromaDB
#### Introduction
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the capabilities of large language models with external knowledge retrieval. This notebook will walk you through building a complete RAG system using:

- LangChain: A framework for developing applications powered by language models
- ChromaDB: An open-source vector database for storing and retrieving embeddings
- Huggingface/OpenAI: For embeddings and language model (you can substitute with other providers)

In [2]:
## langchain imports
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document

## vectorstores
from langchain_community.vectorstores import Chroma

## utility imports
import numpy as np
from typing import List

In [3]:
# RAG Architecture Overview
print("""
RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge
""")


RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge



### 1. Sample Data

In [4]:
## create sample documents
sample_docs = [
    """
    Machine Learning Fundamentals
    
    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine learning: supervised learning, unsupervised learning, and reinforcement 
    learning. Supervised learning uses labeled data to train models, while unsupervised 
    learning finds patterns in unlabeled data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties.
    """,
    
    """
    Deep Learning and Neural Networks
    
    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning has revolutionized fields like computer vision, natural language 
    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
    excel at sequential data processing.
    """,
    
    """
    Natural Language Processing (NLP)
    
    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, 
    machine translation, and question answering. Modern NLP heavily relies on transformer 
    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand 
    context and relationships between words in text.
    """
]

sample_docs


['\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    ',
 '\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective f

In [5]:
## save sample documents to files
import tempfile
temp_dir=tempfile.mkdtemp()

for i,doc in enumerate(sample_docs):
    with open(f"{temp_dir}/doc_{i}.txt","w") as f:
        f.write(doc)

print(f"Sample document create in : {temp_dir}")

Sample document create in : C:\Users\Ahmed\AppData\Local\Temp\tmpxhw6ydy7


In [6]:
## save sample documents to files
import tempfile
temp_dir=tempfile.mkdtemp()

for i,doc in enumerate(sample_docs):
    with open(f"doc_{i}.txt","w") as f:
        f.write(doc)


In [7]:
temp_dir

'C:\\Users\\Ahmed\\AppData\\Local\\Temp\\tmpprvst5ar'

### 2. Document Loading

In [8]:
from langchain_community.document_loaders import DirectoryLoader,TextLoader

# Load documents from directory
loader = DirectoryLoader(
    "data", 
    glob="*.txt", 
    loader_cls=TextLoader,
    loader_kwargs={'encoding': 'utf-8'}
)
documents = loader.load()

print(f"Loaded {len(documents)} documents")
print(f"\nFirst document preview:")
print(documents[0].page_content[:200] + "...")


Loaded 3 documents

First document preview:

    Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. Ther...


In [9]:
documents

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    '),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natur

### Document Splitting

In [10]:
# Initialize text splitter
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=80,
    separators=["\n\n", "\n", ".", " ", ""]
)

chunks = text_splitter.split_documents(documents)
chunks

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='interaction with an environment using rewards and penalties.'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    Thes

In [11]:
print(f"Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\nChunk example:")
print(f"Content: {chunks[0].page_content[:150]}...")
print(f"Metadata: {chunks[0].metadata}")

Created 7 chunks from 3 documents

Chunk example:
Content: Machine Learning Fundamentals...
Metadata: {'source': 'data\\doc_0.txt'}


### Embedding Models

In [12]:
import numpy as np
from langchain_community.embeddings import HuggingFaceEmbeddings

In [13]:
sample_text="MAchine Learning is fascinating"
embeddings = HuggingFaceEmbeddings(
    model_name="all-mpnet-base-v2"
)

single_embedding = embeddings.embed_query(sample_text)
print("Vector length:", len(single_embedding))
print("Sample values:", single_embedding[:5])

  embeddings = HuggingFaceEmbeddings(


Vector length: 768
Sample values: [-0.016979286447167397, 0.07911542057991028, -0.05364152416586876, -0.004374057054519653, -0.03569674491882324]


In [14]:
single_embedding

[-0.016979286447167397,
 0.07911542057991028,
 -0.05364152416586876,
 -0.004374057054519653,
 -0.03569674491882324,
 0.03318069502711296,
 -0.008986721746623516,
 -0.018753215670585632,
 -0.03127483278512955,
 0.013532858341932297,
 0.02548784390091896,
 0.06858392804861069,
 -0.03362622484564781,
 0.05780268833041191,
 0.009922562167048454,
 -0.07284905761480331,
 0.0012737080687656999,
 -0.011813920922577381,
 -0.05102641507983208,
 0.002989581087604165,
 -0.04443350061774254,
 -0.035726066678762436,
 -0.002910736482590437,
 -0.0006069067167118192,
 0.012111401185393333,
 -0.025453047826886177,
 0.008542370051145554,
 -0.01896396279335022,
 -0.006859431974589825,
 -0.00702854385599494,
 -0.02433265745639801,
 -0.0280518289655447,
 -0.02249073050916195,
 0.08795147389173508,
 1.4996803656686097e-06,
 -0.050870370119810104,
 -0.004036551807075739,
 0.016153406351804733,
 -0.05827316641807556,
 0.02682614140212536,
 0.06077722832560539,
 0.02938537485897541,
 -0.0013408918166533113,
 -0

### Intilialize the ChromaDB Vector Store And Stores the chunks in Vector Representation

In [15]:
chunks

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='interaction with an environment using rewards and penalties.'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    Thes

In [16]:
## Create a Chromdb vector store
persist_directory="./chroma_db"

## Initialize Chromadb with Huggingface embeddings
vectorstore=Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=persist_directory,
    collection_name="rag_collection"

)

print(f"Vector store created with {vectorstore._collection.count()} vectors")
print(f"Persisted to: {persist_directory}")

Vector store created with 85 vectors
Persisted to: ./chroma_db


### Test Similarity Search

In [17]:
docs = vectorstore.similarity_search("What is NLP", k=10)

seen_sources = set()
diverse_docs = []

for d in docs:
    src = d.metadata["source"]
    if src not in seen_sources:
        diverse_docs.append(d)
        seen_sources.add(src)
    if len(diverse_docs) == 3:
        break

diverse_docs


[Document(metadata={'source': 'data\\doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.')]

In [18]:
query="what is NLP?"

similar_docs=vectorstore.similarity_search(query,k=10)
similar_docs

[Document(metadata={'source': 'data\\doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': 'data\\doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mec

In [19]:
query="what is Deep Learning?"

similar_docs=vectorstore.similarity_search(query,k=13)
similar_docs

[Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural net

In [20]:
print(f"Query: {query}")
print(f"\nTop {len(similar_docs)} similar chunks:")
for i, doc in enumerate(similar_docs):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:200] + "...")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")

Query: what is Deep Learning?

Top 13 similar chunks:

--- Chunk 1 ---
Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning...
Source: data\doc_1.txt

--- Chunk 2 ---
Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning...
Source: data\doc_1.txt

--- Chunk 3 ---
Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning...
Source: data\doc_1.txt

--- Chunk 4 ---
Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learn

### Advanced Similarity Search With Scores

In [21]:
results_scores=vectorstore.similarity_search_with_score(query,k=3)
results_scores

[(Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language'),
  0.29455581307411194),
 (Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
  0.3473314046859741),
 (Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset 

#### Understanding Similarity Scores
The similarity score represents how closely related a document chunk is to your query. The scoring depends on the distance metric used:

ChromaDB default: Uses L2 distance (Euclidean distance)

- Lower scores = MORE similar (closer in vector space)
- Score of 0 = identical vectors
- Typical range: 0 to 2 (but can be higher)


Cosine similarity (if configured):

- Higher scores = MORE similar
- Range: -1 to 1 (1 being identical)

#### Initialize LLM, RAG Chain, Prompt Template,Query the RAG system

In [22]:
# Load api from env
import os
from dotenv import load_dotenv

load_dotenv()

True

In [23]:
# Initialize the Client
from openai import OpenAI

client = OpenAI(
    base_url="https://api.perplexity.ai"
)

In [24]:
# Standard Sonar Model (General Queries)
completion = client.chat.completions.create(
    model="sonar",
    messages=[
        {"role": "user", "content": "What is quantum computing?"}
    ]
)

print(completion.choices[0].message.content)

**Quantum computing** is a computational approach that leverages quantum mechanics to solve complex problems faster than classical computers[1]. Rather than processing information sequentially like traditional computers, quantum computers exploit quantum phenomena to evaluate multiple solutions simultaneously.

## Core Components

The fundamental unit of quantum computing is the **qubit** (quantum bit)[1]. Unlike classical bits, which exist as either 0 or 1, qubits can exist in a **superposition**—a combination of multiple states at once[5]. This allows quantum computers to process millions of operations simultaneously, giving them inherent parallelism[1].

A second crucial principle is **entanglement**, where qubits become correlated and influence each other as a single system[5]. When qubits are entangled, they scale exponentially in computational power: two qubits can process four bits of information, three can process eight, and so on[2]. This exponential scaling provides quantum c

In [25]:
# Sonar Pro (Complex / High-Quality Answers)
completion = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {"role": "user", "content": "Analyze the economic implications of renewable energy adoption"}
    ]
)

print(completion.choices[0].message.content)


Renewable energy adoption has substantial **positive long‑run economic implications**—through cheaper energy, new investment and jobs, and greater energy security—while creating **short‑ to medium‑term adjustment costs** in fossil‑fuel–dependent regions and sectors.

**1. Output, productivity, and growth**

- As wind and solar scale, the **marginal cost of generation falls** because fuel is free and operating costs are low once infrastructure is built.[1][4]  
- A Brookings analysis estimates that a transition to a clean‑energy–dominated U.S. grid could **lower wholesale electricity prices by 20–80% by 2040**, depending on region.[1] Cheaper power raises firm profitability, encourages hiring, and leads to **2–3% higher wages nationwide** in their model.[1]  
- Lower long‑run energy costs free resources previously spent on fossil fuels and energy‑efficiency “defensive” innovation, allowing R&D and capital to shift to **general productivity‑enhancing technologies** and raising the **aggr

In [26]:
# Sonar Reasoning Pro (Step-by-Step Logic)
completion = client.chat.completions.create(
    model="sonar-reasoning-pro",
    messages=[
        {"role": "user", "content": "Solve this complex mathematical problem step by step"}
    ]
)

print(completion.choices[0].message.content)


You haven't provided a specific mathematical problem to solve. To help you work through a problem step by step, please share the exact equation, word problem, or mathematical question you'd like to tackle.

Once you provide the problem, I can guide you through it using appropriate problem-solving strategies, such as breaking it into smaller parts, identifying what you know and what you need to find, or applying relevant mathematical techniques.


In [27]:
# ✔️ Same result
# ✔️ More control
# ✔️ Official Perplexity usage
from openai import OpenAI

client = OpenAI(
    base_url="https://api.perplexity.ai"
)

completion = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {"role": "user", "content": "What is Large Language Models"}
    ],
    temperature=0.2
)

test_response = completion.choices[0].message.content
print(test_response)

A **large language model (LLM)** is an artificial intelligence system that uses deep learning on vast amounts of text to **understand and generate human-like language** for many tasks such as answering questions, writing, summarizing, and translating.[1][2][3][4][5]

Key points:

- **Type of AI:** LLMs are a kind of **deep neural network**, usually based on the **transformer** architecture, which is especially good at handling sequences of words and their context.[1][3][4]  
- **“Large” because of scale:** They have **billions to trillions of parameters** and are trained on huge text datasets from books, websites, articles, and more.[1][2][3][4]  
- **How they work:** At core, a language model **predicts the next token (word or piece of a word)** given previous tokens, learning patterns of grammar, meaning, and style from data.[1][3][6][8]  
- **Capabilities:** They can **generate, summarize, translate, classify, and answer questions about text**, and are the basis of modern chatbots a

In [28]:
# ⚠️ Note:

# Output is a AIMessage, not a string

# Use .content
from langchain_community.chat_models import ChatPerplexity

llm = ChatPerplexity(
    model="sonar",
    temperature=0.2
)

test_response = llm.invoke("What is Large Language Models give output without **")
print(test_response)
print(test_response.content)

  llm = ChatPerplexity(


content='A large language model is an artificial intelligence system trained on vast amounts of text so it can understand and generate human‑like language.[1][2][3]\n\nMore precisely, an LLM:\n- Is a **deep learning** model (usually a transformer neural network) with billions or more parameters.[4][6]  \n- Is trained on extensive text data (books, web pages, articles, etc.) to learn patterns of grammar, meaning, and context.[3][5][8]  \n- Can perform tasks such as answering questions, summarizing text, translation, code generation, and general text generation based on prompts.[2][3][7]' additional_kwargs={'citations': ['https://knowledge-centre-translation-interpretation.ec.europa.eu/en/news/what-large-language-model', 'https://en.wikipedia.org/wiki/Large_language_model', 'https://www.oracle.com/artificial-intelligence/large-language-model/', 'https://aws.amazon.com/what-is/large-language-model/', 'https://uit.stanford.edu/service/techtraining/ai-demystified/llm', 'https://www.nvidia.c

### Modern RAG Chain

In [29]:
# Load environment variables (API keys, secrets)
import os
from dotenv import load_dotenv
load_dotenv()

# Import Perplexity Sonar LLM (OpenAI-compatible)
from langchain_community.chat_models import ChatPerplexity

# Initialize the LLM used for answer generation
llm = ChatPerplexity(
    model="sonar-pro",     # Best model for RAG
    temperature=0.1        # Low temperature = fewer hallucinations
)

In [30]:
# Prompt template for chat-based models
from langchain_core.prompts import ChatPromptTemplate

# Used to pass the user query unchanged through the pipeline
from langchain_core.runnables import RunnablePassthrough

# Converts model output into a clean string
from langchain_core.output_parsers import StrOutputParser


In [31]:
# At this point, `vectorstore` is assumed to ALREADY exist
# Example sources:
# - FAISS loaded from disk
# - Chroma persistent DB
# - Pinecone / Weaviate / Qdrant client

vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x2099f8863c0>

In [32]:
# Convert the vector store into a retriever
# This enables semantic similarity search
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 3}  # Retrieve top 3 relevant chunks
)

retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002099F8863C0>, search_kwargs={'k': 3})

In [33]:
# System prompt that controls LLM behavior
system_prompt = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.

IMPORTANT:
- Do NOT use markdown
- Do NOT use bullet points
- Do NOT use bold or italics
- Do NOT include references like [1], [2]

Context: {context}"""

# Chat-style prompt with system instructions + user input
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),  # Injects retrieved context
    ("human", "{input}")        # User's question
])


In [34]:
# Display the final prompt structure
# Useful for debugging and prompt tuning
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks.\nUse the following pieces of retrieved context to answer the question.\nIf you don't know the answer, just say that you don't know.\nUse three sentences maximum and keep the answer concise.\n\nIMPORTANT:\n- Do NOT use markdown\n- Do NOT use bullet points\n- Do NOT use bold or italics\n- Do NOT include references like [1], [2]\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

In [35]:
# Build a Runnable-based RAG pipeline
# This replaces create_stuff_documents_chain (deprecated)
document_chain = (
    {
        "context": retriever,           # Retrieve relevant documents
        "input": RunnablePassthrough()  # Pass user question unchanged
    }
    | prompt                            # Stuff docs into the prompt
    | llm                               # Generate answer using Sonar
    | StrOutputParser()                 # Return clean text output
)
document_chain

{
  context: VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002099F8863C0>, search_kwargs={'k': 3}),
  input: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks.\nUse the following pieces of retrieved context to answer the question.\nIf you don't know the answer, just say that you don't know.\nUse three sentences maximum and keep the answer concise.\n\nIMPORTANT:\n- Do NOT use markdown\n- Do NOT use bullet points\n- Do NOT use bold or italics\n- Do NOT include references like [1], [2]\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, 

In [36]:
# Ask a question to the RAG pipeline
document_chain.invoke(
    "What is a Large Language Model?"
)

'A Large Language Model (LLM) is an artificial intelligence system trained on very large amounts of text so it can predict and generate human-like language. Using deep learning neural networks (often transformer-based), it learns patterns in how words and sentences are used, allowing it to answer questions, write text, translate, summarize, and perform many other language-related tasks.'

This chain:

- Takes retrieved documents
- "Stuffs" them into the prompt's {context} placeholder
- Sends the complete prompt to the LLM
- Returns the LLM's response

#### What replaces create_retrieval_chain in LangChain 1.2.0?

In LangChain 1.2.0, create_retrieval_chain is replaced by Runnable-based pipelines, where retrieval and generation are composed explicitly.
The Runnable pipeline itself acts as the final RAG chain, handling document retrieval, prompt construction, LLM invocation, and output generation in a single workflow.

In [37]:
import re

def clean_llm_output(text: str) -> str:
    # Remove bold/italic markdown (**text**, *text*)
    text = re.sub(r"\*\*(.*?)\*\*", r"\1", text)
    text = re.sub(r"\*(.*?)\*", r"\1", text)

    # Replace multiple newlines with a single space
    text = re.sub(r"\n+", " ", text)

    # Remove extra spaces
    text = re.sub(r"\s+", " ", text).strip()

    return text


In [38]:
from langchain_core.runnables import RunnableLambda
from langchain_core.output_parsers import StrOutputParser

# Runnable that retrieves docs and keeps them
def retrieve_docs(question):
    docs = retriever.invoke(question)
    return {
        "input": question,
        "context": docs,
        "documents": docs   # keep a copy to return later
    }

rag_chain = (
    RunnableLambda(retrieve_docs)
    | {
        "answer": (
            prompt
            | llm
            | StrOutputParser()
        ),
        "documents": lambda x: x["documents"],
        
    }
)
result = rag_chain.invoke("What is Deep Learning?")

# Clean only the answer field
clean_answer = clean_llm_output(result["answer"])

# Put it back if you want
result["answer"] = clean_answer

result

{'answer': 'Deep learning is a subset of machine learning that uses multi-layered artificial neural networks to automatically learn complex patterns and representations from large amounts of data. These networks, inspired by the human brain, consist of layers of interconnected nodes that transform raw input data into increasingly abstract features. Deep learning has driven major advances in areas like computer vision, natural language processing, and speech recognition.',
 'documents': [Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language'),
  Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are i

In [39]:
print("Answer:")
print(result["answer"])

print("\nRetrieved Context:")
for i, doc in enumerate(result["documents"]):
    print(f"\n--- Source {i+1} ---")
    print(doc.page_content[:200] + "...")


Answer:
Deep learning is a subset of machine learning that uses multi-layered artificial neural networks to automatically learn complex patterns and representations from large amounts of data. These networks, inspired by the human brain, consist of layers of interconnected nodes that transform raw input data into increasingly abstract features. Deep learning has driven major advances in areas like computer vision, natural language processing, and speech recognition.

Retrieved Context:

--- Source 1 ---
Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning...

--- Source 2 ---
Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning...

--- Source 3 ---
Deep learning is a subset of machine learning b

### Create RAG Chain Alternative - Using LCEL (LangChain Expression Language)

In [40]:
# Even more flexible approach using LCEL
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

In [41]:
# Create a custom prompt
custom_prompt = ChatPromptTemplate.from_template("""Use the following context to answer the question. 
If you don't know the answer based on the context, say you don't know.
Provide specific details from the context to support your answer.

 IMPORTANT:
- Do NOT use markdown
- Do NOT use bullet points
- Do NOT use bold or italics
- Do NOT include references like [1], [2]

Context:
{context}

Question: {question}

Answer:""")
custom_prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\n IMPORTANT:\n- Do NOT use markdown\n- Do NOT use bullet points\n- Do NOT use bold or italics\n- Do NOT include references like [1], [2]\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])

In [42]:
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002099F8863C0>, search_kwargs={'k': 3})

In [43]:
## Format the output documents for the prompt
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [44]:
## Build the chain ussing LCEL

rag_chain_lcel=(
    { 
        "context":retriever | format_docs,
        "question": RunnablePassthrough()
     }
    | custom_prompt
    | llm
    | StrOutputParser()
)

rag_chain_lcel

{
  context: VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002099F8863C0>, search_kwargs={'k': 3})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\n IMPORTANT:\n- Do NOT use markdown\n- Do NOT use bullet points\n- Do NOT use bold or italics\n- Do NOT include references like [1], [2]\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])
| ChatPerplexity(client=<openai.OpenAI object at 0x0000020A09737BB0>,

In [45]:
response=rag_chain_lcel.invoke("What is Deep Learning")
clean_answer=clean_llm_output(response)
print(clean_answer)

Deep learning is a subset of machine learning that is based on artificial neural networks made up of layers of interconnected nodes, inspired by the human brain. According to the context, these layered networks have revolutionized fields such as computer vision, natural language processing, and speech recognition, with Convolutional Neural Networks (CNNs) being especially effective for image processing and Recurrent Neural Networks (RNNs) and Transformers being widely used for sequence data and language-related tasks.


In [46]:
docs = retriever.invoke("What is Deep Learning",k=13)
docs

[Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural net

In [47]:
# Query using the LCEL approach (fixed + return values)
def query_rag_lcel(question):
    print(f"Question: {question}")
    print("-" * 50)
    
    # 1. Get answer from RAG chain
    answer = rag_chain_lcel.invoke(question)
    print(f"Answer: {answer}")
    
    # 2. Get retrieved chunks from vector store
    docs = retriever.invoke(question)
    
    print("\nSource Documents:")
    for i, doc in enumerate(docs):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200] + "...")
    
    # 3. Return structured result
    return {
        "question": question,
        "answer": answer,
        "documents": docs
    }


In [48]:
print("Testing LCEL Chain:")
query_rag_lcel("What are the key concepts in reinforcement learning?")

Testing LCEL Chain:
Question: What are the key concepts in reinforcement learning?
--------------------------------------------------
Answer: The key concepts in reinforcement learning, based on the given context, are: states, actions, rewards, policies, and value functions.

The context explains that in reinforcement learning an agent learns to make decisions by interacting with an environment, and that it receives rewards or penalties based on its actions and tries to maximize cumulative reward over time. It then explicitly lists the key concepts in RL as: states, actions, rewards, policies, and value functions.

Source Documents:

--- Source 1 ---
Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or penalties 
based on its actions and...

--- Source 2 ---
Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with 

{'question': 'What are the key concepts in reinforcement learning?',
 'answer': 'The key concepts in reinforcement learning, based on the given context, are: states, actions, rewards, policies, and value functions.\n\nThe context explains that in reinforcement learning an agent learns to make decisions by interacting with an environment, and that it receives rewards or penalties based on its actions and tries to maximize cumulative reward over time. It then explicitly lists the key concepts in RL as: states, actions, rewards, policies, and value functions.',
 'documents': [Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='Reinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and v

In [49]:
query_rag_lcel("What is machine learning?")

Question: What is machine learning?
--------------------------------------------------
Answer: Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. According to the context, it allows models to learn from data, and there are three main types: supervised learning, which uses labeled data to train models; unsupervised learning, which finds patterns in unlabeled data; and reinforcement learning, which learns through interactions and feedback.

Source Documents:

--- Source 1 ---
Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine l...

--- Source 2 ---
Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine 

{'question': 'What is machine learning?',
 'answer': 'Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. According to the context, it allows models to learn from data, and there are three main types: supervised learning, which uses labeled data to train models; unsupervised learning, which finds patterns in unlabeled data; and reinforcement learning, which learns through interactions and feedback.',
 'documents': [Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement'),
  Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine learning is a subset of artificial intelligence that enables systems t

In [50]:
query_rag_lcel("What is deep learning?")

Question: What is deep learning?
--------------------------------------------------
Answer: Deep learning is a subset of machine learning that is based on artificial neural networks composed of layers of interconnected nodes inspired by the human brain. According to the context, these layered networks enable deep learning to automatically learn complex patterns from data and have revolutionized fields such as computer vision, natural language processing, and speech recognition.

Source Documents:

--- Source 1 ---
Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning...

--- Source 2 ---
Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning...

--- Source 3 ---
Deep learning is a subset of machine

{'question': 'What is deep learning?',
 'answer': 'Deep learning is a subset of machine learning that is based on artificial neural networks composed of layers of interconnected nodes inspired by the human brain. According to the context, these layered networks enable deep learning to automatically learn complex patterns from data and have revolutionized fields such as computer vision, natural language processing, and speech recognition.',
 'documents': [Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language'),
  Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and co

### Add New Documents To Existing Vector Store

In [51]:
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x2099f8863c0>

In [52]:
# Add new documents to the existing vector store
new_document = """
Reinforcement Learning in Detail

Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or penalties 
based on its actions and learns to maximize cumulative reward over time. Key concepts 
in RL include: states, actions, rewards, policies, and value functions. Popular RL 
algorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and 
Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems.
"""

In [53]:
new_document

'\nReinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.\n'

In [54]:
chunks

[Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine Learning Fundamentals'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data\\doc_0.txt'}, page_content='interaction with an environment using rewards and penalties.'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep Learning and Neural Networks'),
 Document(metadata={'source': 'data\\doc_1.txt'}, page_content='Deep learning is a subset of machine learning based on artificial neural networks. \n    Thes

In [55]:
new_doc=Document(
    page_content=new_document,
    metadata={"source": "manual_addition", "topic": "reinforcement_learning"}
)

In [56]:
new_doc

Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='\nReinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.\n')

In [57]:
## split the documents
new_chunks=text_splitter.split_documents([new_doc])
new_chunks

[Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='Reinforcement Learning in Detail'),
 Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='Reinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and'),
 Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.')]

In [58]:
# Create unique IDs for each chunk
ids = [
    f"manual_rl_chunk_{i}"
    for i in range(len(new_chunks))
]

# Add documents WITH IDs (prevents duplicates)
vectorstore.add_documents(
    documents=new_chunks,
    ids=ids
)

print(f"Added {len(new_chunks)} new chunks to the vector store")
print(f"Total vectors now: {vectorstore._collection.count()}")


Added 3 new chunks to the vector store
Total vectors now: 85


In [59]:
results = vectorstore._collection.get(
    include=["documents", "metadatas"]
)

found = False
for meta, doc in zip(results["metadatas"], results["documents"]):
    if meta.get("source") == "manual_addition":
        found = True
        print("\n✅ Found manually added chunk:")
        print("Metadata:", meta)
        print("Content:", doc[:300])

if not found:
    print("❌ No manually added chunks found in vector store")



✅ Found manually added chunk:
Metadata: {'source': 'manual_addition', 'topic': 'reinforcement_learning'}
Content: Reinforcement Learning in Detail

✅ Found manually added chunk:
Metadata: {'source': 'manual_addition', 'topic': 'reinforcement_learning'}
Content: Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or penalties 
based on its actions and learns to maximize cumulative reward over time. Key concepts 
in RL include: states, actions, rewar

✅ Found manually added chunk:
Metadata: {'topic': 'reinforcement_learning', 'source': 'manual_addition'}
Content: Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems.

✅ Found manually added chunk:
Metadata: {'source': 'manual_addition', 'topic': 'reinforcement_learning'}
Content: Reinforcement Learning in Detail

✅ Found manually added chunk:
Metadata: {'source':

In [60]:
## query with the updated vector
new_question="What are the key concepts in reinforcement learning"
result=query_rag_lcel(new_question)
result

Question: What are the key concepts in reinforcement learning
--------------------------------------------------
Answer: The key concepts in reinforcement learning, based on the given context, are: states, actions, rewards, policies, and value functions.

The context explains that in reinforcement learning an agent learns to make decisions by interacting with an environment and receiving rewards or penalties for its actions. It explicitly lists the key concepts in RL as: states, actions, rewards, policies, and value functions.

Source Documents:

--- Source 1 ---
Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or penalties 
based on its actions and...

--- Source 2 ---
Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or penalties 
based on its actions and...

---

{'question': 'What are the key concepts in reinforcement learning',
 'answer': 'The key concepts in reinforcement learning, based on the given context, are: states, actions, rewards, policies, and value functions.\n\nThe context explains that in reinforcement learning an agent learns to make decisions by interacting with an environment and receiving rewards or penalties for its actions. It explicitly lists the key concepts in RL as: states, actions, rewards, policies, and value functions.',
 'documents': [Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='Reinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Ne

### Advanced Rag Techniques- Conversational Memory
Understanding Conversational Memory in RAG
Conversational memory enables RAG systems to maintain context across multiple interactions. This is crucial for:

Follow-up questions that reference previous answers
Pronoun resolution (e.g., "it", "they", "that")
Context-dependent queries that build on prior discussion
Natural dialogue flow where users don't repeat context

Key Challenge:
Traditional RAG retrieves documents based only on the current query, missing important context from the conversation. For example:

User: "Tell me about Python"
Bot: explains Python programming language
User: "What are its main libraries?" ← "its" refers to Python, but retriever doesn't know this

Solution:
The modern approach uses a two-step process:

Query Reformulation: Transform context-dependent questions into standalone queries
Context-Aware Retrieval: Use the reformulated query to fetch relevant documents

- create_history_aware_retriever: Makes the retriever understand conversation context
- MessagesPlaceholder: Placeholder for chat history in prompts
- HumanMessage/AIMessage: Structured message types for conversation history

In [61]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, AIMessage

In [62]:
contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system",
     """Given the chat history and the latest user question,
rewrite the question so it can be understood without the chat history.
Do NOT answer the question."""),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])


In [63]:
standalone_question_chain = (
    {
        "chat_history": lambda x: x["chat_history"],
        "input": lambda x: x["input"]
    }
    | contextualize_q_prompt
    | llm
    | StrOutputParser()
)


In [64]:
def retrieve_with_history(inputs):
    # inputs is a dict: {"input": "...", "chat_history": [...]}

    standalone_question = standalone_question_chain.invoke({
        "input": inputs["input"],
        "chat_history": inputs["chat_history"]
    })

    docs = retriever.invoke(standalone_question)

    return {
        "input": inputs["input"],
        "chat_history": inputs["chat_history"],  # <-- pass ONLY the list
        "context": docs
    }

In [65]:
qa_prompt = ChatPromptTemplate.from_messages([
    ("system",
     """You are an assistant for question-answering tasks.
Use the retrieved context to answer the question.
If you don't know the answer, say you don't know.
Use three sentences maximum.

IMPORTANT:
- Do NOT use markdown
- Do NOT use bullet points
- Do NOT use bold or italics
- Do NOT include references like [1], [2]

Return the answer as plain text in paragraph form.

Context: {context}"""),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])


In [66]:
from langchain_core.runnables import RunnableMap

conversational_rag_chain = (
    RunnableLambda(retrieve_with_history)
    | RunnableMap({
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "context": lambda x: x["context"]
    })
    | qa_prompt
    | llm
    | StrOutputParser()
)


In [67]:
chat_history = []

answer1 = conversational_rag_chain.invoke({
    "input": "What is machine learning?",
    "chat_history": chat_history
})

chat_history.extend([
    HumanMessage(content="What is machine learning?"),
    AIMessage(content=answer1)
])


In [68]:
import re

def clean_answer(text: str) -> str:
    # Remove markdown bold/italic
    text = re.sub(r"\*\*(.*?)\*\*", r"\1", text)
    text = re.sub(r"\*(.*?)\*", r"\1", text)

    # Remove citation-like brackets [1], [2][3]
    text = re.sub(r"\[\d+(?:,\d+)*\]", "", text)

    # Remove bullet points
    text = re.sub(r"-\s*", "", text)

    # Normalize whitespace
    text = re.sub(r"\n+", " ", text)
    text = re.sub(r"\s+", " ", text)

    return text.strip()


In [69]:
result2 = conversational_rag_chain.invoke({
    "chat_history": chat_history,
    "input": "What are its main types?"
})

# clean_result2 = clean_answer(result2)
# print(clean_result2)
result2

'Its main types are supervised learning, unsupervised learning, and reinforcement learning.'

### Using GROQ LLM's
 

In [70]:
llm

ChatPerplexity(client=<openai.OpenAI object at 0x0000020A09737BB0>, model='sonar-pro', temperature=0.1, model_kwargs={})

In [71]:
load_dotenv()

True

In [73]:
from langchain_groq import ChatGroq
from langchain.chat_models import init_chat_model

In [74]:
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

In [75]:
llm=ChatGroq(model="gemma2-9b-it",api_key=os.getenv("GROQ_API_KEY"))
llm

ChatGroq(profile={'max_input_tokens': 8192, 'max_output_tokens': 8192, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x0000020A09A8F380>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x0000020A0AD48D70>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [76]:
llm=init_chat_model(model="groq:gemma2-9b-it")
llm

ChatGroq(profile={'max_input_tokens': 8192, 'max_output_tokens': 8192, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x0000020A09B73390>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x0000020A09B73D90>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))