# Langchain RAG Tutorial

https://www.youtube.com/watch?v=yF9kGESAi3M

![RAG_1](RAG_1.JPG)

![RAG_2](RAG_2.JPG)


In [None]:
# !pip install langchain-core
# !pip install langchain==0.3.20
# !pip install langchain-community==0.3.19
# !pip install langchain-openai==0.3.7
# !pip install langchain-text-splitters==0.3.6
# !pip install langchain-chroma
# !pip install sentence-transformers 

In [None]:
from langchain_openai import AzureChatOpenAI
import os

os.environ["AZURE_OPENAI_API_KEY"] = os.getenv("AZURE_OPENAI_API_KEY")
os.environ["AZURE_OPENAI_ENDPOINT"] = os.getenv("AZURE_OPENAI_ENDPOINT")
os.environ["AZURE_OPENAI_API_VERSION"] = os.getenv("AZURE_OPENAI_API_VERSION")  # Use the correct API version

In [3]:
# Setup azure openai connection
model = AzureChatOpenAI(
    model="gpt-4o",
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.environ["AZURE_OPENAI_API_VERSION"])

# 1a. RAG Basics - Setup Vector Store and Embeddings

In [None]:
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
#from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import HuggingFaceBgeEmbeddings

In [14]:
import os
current_dir = os.path.abspath("")
file_path = os.path.join(current_dir, "books", "AI_Engineering.txt")
persistent_directory = os.path.join(current_dir, "db", "chroma_db")

In [None]:
"""
This script initializes a Chroma vector store if it does not already exist.
It reads text from a specified file, splits it into chunks, creates embeddings,
and then creates and persists the vector store.
"""

# Check if Chroma vector store already exists
if not os.path.exists(persistent_directory):
    print("Persistent directory does not exist. Initializing vector store...")

    # Ensure the text file exists
    if not os.path.exists(file_path):
        raise FileNotFoundError(
            f"The file {file_path} does not exist. Please check the path."
        )

    # Read the text content from the file
    loader = TextLoader(file_path)
    documents = loader.load()

    # Split document into chunks
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    docs = text_splitter.split_documents(documents)

    # Display info about split documents
    print("\n----Document Chunk Information-----")
    print(f"Number of document chunks: {len(docs)}")
    print(f"Sample chunk: \n{docs[0].page_content}\n")

    # Create embeddings
    print("\n---Create embeddings---")
    embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/multi-qa-mpnet-base-dot-v1")  # Update to any valid embedding model
    print("\n-----Finished creating embeddings----")

    # Create the vector store and persist it automatically
    print("\n----Creating vector store----")
    db = Chroma.from_documents(docs, embeddings, persist_directory=persistent_directory)
    print("\n---Finished creating vector store----")

else:
    print("Vector store already exists. No need to initialize.")



Created a chunk of size 1384, which is longer than the specified 1000
Created a chunk of size 1145, which is longer than the specified 1000
Created a chunk of size 1024, which is longer than the specified 1000
Created a chunk of size 1686, which is longer than the specified 1000
Created a chunk of size 1771, which is longer than the specified 1000
Created a chunk of size 1054, which is longer than the specified 1000
Created a chunk of size 1109, which is longer than the specified 1000
Created a chunk of size 1151, which is longer than the specified 1000
Created a chunk of size 1458, which is longer than the specified 1000
Created a chunk of size 1674, which is longer than the specified 1000
Created a chunk of size 2743, which is longer than the specified 1000
Created a chunk of size 2103, which is longer than the specified 1000
Created a chunk of size 2328, which is longer than the specified 1000
Created a chunk of size 2424, which is longer than the specified 1000
Created a chunk of s

Created a chunk of size 1404, which is longer than the specified 1000
Created a chunk of size 1100, which is longer than the specified 1000
Created a chunk of size 1696, which is longer than the specified 1000
Created a chunk of size 1890, which is longer than the specified 1000
Created a chunk of size 1441, which is longer than the specified 1000
Created a chunk of size 1458, which is longer than the specified 1000
Created a chunk of size 2038, which is longer than the specified 1000
Created a chunk of size 1137, which is longer than the specified 1000
Created a chunk of size 1856, which is longer than the specified 1000
Created a chunk of size 1475, which is longer than the specified 1000


Persistent directory does not exist. Intializing vector store..


Created a chunk of size 1394, which is longer than the specified 1000
Created a chunk of size 1160, which is longer than the specified 1000
Created a chunk of size 1837, which is longer than the specified 1000
Created a chunk of size 1779, which is longer than the specified 1000
Created a chunk of size 2066, which is longer than the specified 1000
Created a chunk of size 2130, which is longer than the specified 1000
Created a chunk of size 2013, which is longer than the specified 1000
Created a chunk of size 1274, which is longer than the specified 1000
Created a chunk of size 1259, which is longer than the specified 1000
Created a chunk of size 2200, which is longer than the specified 1000
Created a chunk of size 1939, which is longer than the specified 1000
Created a chunk of size 1305, which is longer than the specified 1000
Created a chunk of size 2099, which is longer than the specified 1000
Created a chunk of size 2171, which is longer than the specified 1000
Created a chunk of s


----Document Chunk Information-----
Number of document chunks: 1143
Sample chunk: 
AI Engineering
Building Applications
with Foundation Models

Chip Huyen

“This book offers a comprehensive, well-structured guide to the essential
aspects of building generative AI systems. A must-read for any professional
looking to scale AI across the enterprise.”
Vittorio Cretella, former global CIO at P&G and Mars

“Chip Huyen gets generative AI. She is a remarkable teacher and writer
whose work has been instrumental in helping teams bring AI into production.
Drawing on her deep expertise, AI Engineering is a comprehensive and
holistic guide to building generative AI applications in production.”
		 Luke Metz, cocreator of ChatGPT, former research manager at OpenAI


---Create embeddings---


  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development



-----Finished creating embeddings----
n----Creating vector store----

---Finished creating vector store----


# 1b - RAG Basics - Do querying and retrieve chunks from vector store

In [18]:
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceBgeEmbeddings


In [19]:
# Define the persistent directory
import os
current_dir = os.path.abspath("")
persistent_directory = os.path.join(current_dir, "db", "chroma_db")

In [None]:
# Define the embedding model
embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/multi-qa-mpnet-base-dot-v1") 


In [21]:
# Loading existing vector store with the embedding function
db = Chroma(persist_directory=persistent_directory,
            embedding_function=embeddings)

  db = Chroma(persist_directory=persistent_directory,


In [53]:
# Define the user's question
query = "What is model distillation?"

In [54]:
# Retrieve the relevant documents based on query
retriever = db.as_retriever(
    search_type = "similarity",
    search_kwargs={"k":3},
)
relevant_docs = retriever.invoke(query)

In [55]:
# Display the relevant results with metadata
print("\n--- Relevant Documents ---")
for i, doc in enumerate(relevant_docs, 1):
    print(f"Document {i}:\n{doc.page_content}\n")
    if doc.metadata:
        print(f"Source: {doc.metadata.get('source', 'Unknown')}\n")



--- Relevant Documents ---
Document 1:
Model Distillation
Model distillation (also called knowledge distillation) is a method in which a small
model (student) is trained to mimic a larger model (teacher) (Hinton et al., 2015).
The knowledge of the big model is distilled into the small model, hence the term dis‐
tillation.
Traditionally, the goal of model distillation is to produce smaller models for deploy‐
ment. Deploying a big model can be resource-intensive. Distillation can produce a
smaller, faster student model that retains performance comparable to the teacher. For
example, DistilBERT, a model distilled from BERT, reduces the size of a BERT model
by 40% while retaining 97% of its language comprehension capabilities and being
60% faster (Sanh et al., 2019).
The student model can be trained from scratch like DistilBERT or finetuned from a
pre-trained model like Alpaca. In 2023, Taori et al. finetuned Llama-7B, the 7-billionparameter version of Llama, on examples generated by text

# 2a. RAG basics with metadata and multiple files

In [59]:
import os
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceBgeEmbeddings


In [None]:
# Define the directory containing the text files and the persistent directory
current_dir = os.path.abspath("")
books_dir = os.path.join(current_dir, "books") # Earlier he just described one file with an extra argument. Now its entire folder.
db_dir = os.path.join(current_dir, "db")
persistent_directory = os.path.join(db_dir, "chroma_db_with_metadata") # Earlier it was just chroma but now its chroma with metadata

In [58]:
print(f"Books directory: {books_dir}")
print(f"Persistent directory: {persistent_directory}")

Books directory: d:\Terragent\GenAI Experiments\Langchain Experiments\books
Persistent directory: d:\Terragent\GenAI Experiments\Langchain Experiments\db\chroma_db_with_metadata


In [62]:
# Check if chroma vector store already exists
if not os.path.exists(persistent_directory):
    print("persistent directory does not exist. Initializing vector store...")

    # Ensure the books directory exists
    if not os.path.exists(books_dir):
        raise FileNotFoundError(
            f"The directory {book_dir} does not exists. Please check the path"
        )
    
    # List all text files in directory (earlier it was just one book)
    book_files = [f for f in os.listdir(books_dir) if f.endswith(".txt")]

    # Read the text content for each file and store it with metadata
    # Chunking multiple files and storing them here ergo the for loop
    documents = []
    for book_file in book_files:
        file_path = os.path.join(books_dir, book_file)
        loader = TextLoader(file_path)
        book_docs = loader.load()
        for doc in book_docs:
            # Add metadata to each document indicating its source
            doc.metadata = {"source": book_file}
            documents.append(doc)

    # Split the document into chunks
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    docs = text_splitter.split_documents(documents)

    # display information about the split documents
    print("\n---- Document Chunks Information----")
    print(f"Number of document chunks: {len(docs)}")

    # Create embeddings
    print("\n-----Create embeddings------")
    embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/multi-qa-mpnet-base-dot-v1")
    print("\n--- Finished creating embeddings----")

    # Create the vector store and persist it
    db = Chroma.from_documents(
        docs, embedding=embeddings, persist_directory=persistent_directory
    )

else:
    print("vector store already exists. No need to initialize")


Created a chunk of size 1384, which is longer than the specified 1000
Created a chunk of size 1145, which is longer than the specified 1000
Created a chunk of size 1024, which is longer than the specified 1000
Created a chunk of size 1686, which is longer than the specified 1000
Created a chunk of size 1771, which is longer than the specified 1000
Created a chunk of size 1054, which is longer than the specified 1000
Created a chunk of size 1109, which is longer than the specified 1000
Created a chunk of size 1151, which is longer than the specified 1000
Created a chunk of size 1458, which is longer than the specified 1000
Created a chunk of size 1674, which is longer than the specified 1000
Created a chunk of size 2743, which is longer than the specified 1000
Created a chunk of size 2103, which is longer than the specified 1000
Created a chunk of size 2328, which is longer than the specified 1000
Created a chunk of size 2424, which is longer than the specified 1000
Created a chunk of s

persistent directory does not exist. Initializing vector store...


Created a chunk of size 2038, which is longer than the specified 1000
Created a chunk of size 1137, which is longer than the specified 1000
Created a chunk of size 1856, which is longer than the specified 1000
Created a chunk of size 1475, which is longer than the specified 1000
Created a chunk of size 1394, which is longer than the specified 1000
Created a chunk of size 1160, which is longer than the specified 1000
Created a chunk of size 1837, which is longer than the specified 1000
Created a chunk of size 1779, which is longer than the specified 1000
Created a chunk of size 2066, which is longer than the specified 1000
Created a chunk of size 2130, which is longer than the specified 1000
Created a chunk of size 2013, which is longer than the specified 1000
Created a chunk of size 1274, which is longer than the specified 1000
Created a chunk of size 1259, which is longer than the specified 1000
Created a chunk of size 2200, which is longer than the specified 1000
Created a chunk of s


---- Document Chunks Information----
Number of document chunks: 1278

-----Create embeddings------

--- Finished creating embeddings----


# 2b RAG Basics with Metadata and Multiple Files

In [63]:
import os
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceBgeEmbeddings


In [64]:
# Define the persistent directory
current_dir = os.path.abspath("")
db_dir = os.path.join(current_dir, "db")
persistent_directory = os.path.join(db_dir, "chroma_db_with_metadata") 

In [65]:
# Define the embeddings
embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/multi-qa-mpnet-base-dot-v1")

In [66]:
# Load the existing vector store with the embedding function
db = Chroma(persist_directory=persistent_directory,
            embedding_function=embeddings)

In [67]:
# Define the user's question
query = "What is unique market insight?"

In [68]:
# Retrieve the relevant documents based on query
retriever = db.as_retriever(
    search_type = "similarity",
    search_kwargs={"k":3},
)
relevant_docs = retriever.invoke(query)

In [69]:
# Display the relevant results with metadata
print("\n--- Relevant Documents ---")
for i, doc in enumerate(relevant_docs, 1):
    print(f"Document {i}:\n{doc.page_content}\n")
    print(f"Source: {doc.metadata['source']}\n")


--- Relevant Documents ---
Document 1:
Consider the Point of View of
Your Best-Fit Customers
One way of thinking about your unique market insight is to ask this
question: “What do your best-fit prospects need to know to understand why
your unique value is important to them?”

Your unique insight
into the market is what
leads you to build a
product that is different
and better than the
alternatives.

Source: Sales_Pitch_April_Dunford.txt

Document 2:
Insight That Isn’t Unique
Some companies will try to use a common industry trend as insight at the
start of their sales pitch, rather than the company’s unique insight into the
market. For example, I have seen variations of “Companies are generating
more data than ever before,” or “The pace of AI adoption is accelerating,”
or “Companies need to digitally transform their businesses.” All of these are
obvious trends in the market. The key to Step 1, Insight, is to go beyond
these surface-level observations and get down to the insight that ma

# 3. Text Splitting Deep-Dive

(Recursvie text splitter seems to be the best)

LangChain provides several types of text splitters, each designed for specific use cases. Here's a breakdown of the most common ones:

1. **CharacterTextSplitter**:
   - Splits text based on a fixed number of characters.
   - Useful for simple and consistent chunking.
   - Example: Splitting a document into chunks of 1,000 characters each.

2. **RecursiveCharacterTextSplitter**:
   - Splits text recursively by attempting to break it at natural boundaries (e.g., paragraphs, sentences) while respecting a maximum chunk size.
   - Ideal for preserving semantic meaning and context within chunks.
   - Recommended for most use cases due to its balance of structure and flexibility.

3. **TokenTextSplitter**:
   - Splits text based on the number of tokens, which are units of text used by language models.
   - Ensures that chunks fit within the token limit of a model.
   - Useful when working with models that have strict token constraints.

4. **MarkdownTextSplitter**:
   - Specifically designed for splitting Markdown documents.
   - Preserves the structure of Markdown content, such as headings and lists.
   - Ideal for processing technical documentation or structured text.

5. **HTMLTextSplitter**:
   - Splits HTML content while maintaining its structure.
   - Useful for web scraping or processing HTML documents.

6. **SemanticTextSplitter**:
   - Uses embeddings and similarity measures to split text at semantically meaningful points.
   - Ensures that related content stays together.
   - Best for applications requiring high semantic relevance.

Each splitter has its strengths, and the choice depends on your specific requirements, such as the type of text, the model's input constraints, and the importance of preserving semantic meaning. Would you like help implementing any of these?

In [70]:
import os
from langchain.text_splitter import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter,
    SentenceTransformersTokenTextSplitter,
    TextSplitter,
    TokenTextSplitter,
)

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceBgeEmbeddings




---

### **Step 1: Load the Text File**
- Use the `TextLoader` class to read the file from `file_path`.
- Extract the file content into `documents`, which will later be split into smaller chunks.

---

### **Step 2: Select the Embedding Model**
- Initialize an `OpenAIEmbeddings` model that will later be used to convert text into numerical embeddings. 
- At this step, the model is simply being set up—it does not yet process the text.

---

### **Step 3: Define the Vector Store Function**
- Create a function, `create_vector_store`, that:
  1. Takes in the processed document chunks.
  2. Generates embeddings using the selected model.
  3. Saves these embeddings to a persistent database (a vector store) using the `Chroma` library.

---

### **Step 4: Split Text into Chunks and Store Them**

To prepare the text for embedding, it’s split into smaller, manageable pieces using different techniques. Each technique creates a new set of chunks, which are then saved in their respective vector stores:

1. **Character-based Splitting**
   - Divide text into chunks of 1,000 characters with 100-character overlaps.
   - Save the chunks in a database called `"chroma_db_char"`.

2. **Sentence-based Splitting**
   - Break text into chunks at sentence boundaries, ensuring complete sentences in each chunk.
   - Store these chunks in `"chroma_db_sent"`.

3. **Token-based Splitting**
   - Divide the text into chunks of 512 tokens (e.g., words or subwords) to match transformer models' input limits.
   - Save the chunks in `"chroma_db_token"`.

4. **Recursive Character-based Splitting**
   - Attempt to split text at natural boundaries (e.g., sentences, paragraphs) while keeping chunks under 1,000 characters.
   - Save the resulting chunks in `"chroma_db_rec_char"`.

---

### **Workflow Summary**
1. **Load the text**: Read the file into memory.
2. **Pick the embedding model**: Set up the embeddings tool.
3. **Process the text**: Split it into smaller chunks using various methods.
4. **Store the embeddings**: Create and save a vector store for each splitting technique.

---



In [73]:
# Define the directory containing the text file
current_dir = os.path.abspath("")
file_path = os.path.join(current_dir, "books", "AI_Engineering.txt")
db_dir = os.path.join(current_dir, "db")

In [72]:
# Check if the text file exists
if not os.path.exists(file_path):
    raise FileNotFoundError(
        f"The file {file_path} does not exist. Please check the path."
    )


In [74]:
# Read the text content from the file
loader = TextLoader(file_path)
documents = loader.load()

In [75]:
embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/multi-qa-mpnet-base-dot-v1")

In [76]:
# Function to create and persist vector store

def create_vector_store(docs, store_name):
    persistent_directory = os.path.join(db_dir, store_name)
    if not os.path.exists(persistent_directory):
        print(f"\n--- Creating vector store {store_name} ---")
        db = Chroma.from_documents(
            docs, embeddings, persist_directory=persistent_directory
        )
        print(f"--- Finished creating vector store {store_name} ---")
    else:
        print(
            f"Vector store {store_name} already exists. No need to initialize.")

In [77]:
# 1. Character-based Splitting
# Splits text into chunks based on a specified number of characters.
# Useful for consistent chunk sizes regardless of content structure.
print("\n--- Using Character-based Splitting ---")
char_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
char_docs = char_splitter.split_documents(documents)
create_vector_store(char_docs, "chroma_db_char")

Created a chunk of size 1384, which is longer than the specified 1000
Created a chunk of size 1145, which is longer than the specified 1000
Created a chunk of size 1024, which is longer than the specified 1000
Created a chunk of size 1686, which is longer than the specified 1000
Created a chunk of size 1771, which is longer than the specified 1000
Created a chunk of size 1054, which is longer than the specified 1000
Created a chunk of size 1109, which is longer than the specified 1000
Created a chunk of size 1151, which is longer than the specified 1000
Created a chunk of size 1458, which is longer than the specified 1000
Created a chunk of size 1674, which is longer than the specified 1000
Created a chunk of size 2743, which is longer than the specified 1000
Created a chunk of size 2103, which is longer than the specified 1000
Created a chunk of size 2328, which is longer than the specified 1000
Created a chunk of size 2424, which is longer than the specified 1000
Created a chunk of s


--- Using Character-based Splitting ---


Created a chunk of size 2038, which is longer than the specified 1000
Created a chunk of size 1137, which is longer than the specified 1000
Created a chunk of size 1856, which is longer than the specified 1000
Created a chunk of size 1475, which is longer than the specified 1000
Created a chunk of size 1394, which is longer than the specified 1000
Created a chunk of size 1160, which is longer than the specified 1000
Created a chunk of size 1837, which is longer than the specified 1000
Created a chunk of size 1779, which is longer than the specified 1000
Created a chunk of size 2066, which is longer than the specified 1000
Created a chunk of size 2130, which is longer than the specified 1000
Created a chunk of size 2013, which is longer than the specified 1000
Created a chunk of size 1274, which is longer than the specified 1000
Created a chunk of size 1259, which is longer than the specified 1000
Created a chunk of size 2200, which is longer than the specified 1000
Created a chunk of s


--- Creating vector store chroma_db_char ---
--- Finished creating vector store chroma_db_char ---


In [78]:
# 2. Sentence-based Splitting
# Splits text into chunks based on sentences, ensuring chunks end at sentence boundaries.
# Ideal for maintaining semantic coherence within chunks.
print("\n--- Using Sentence-based Splitting ---")
sent_splitter = SentenceTransformersTokenTextSplitter(chunk_size=1000)
sent_docs = sent_splitter.split_documents(documents)
create_vector_store(sent_docs, "chroma_db_sent")


--- Using Sentence-based Splitting ---


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development



--- Creating vector store chroma_db_sent ---
--- Finished creating vector store chroma_db_sent ---


In [79]:
# 3. Token-based Splitting
# Splits text into chunks based on tokens (words or subwords), using tokenizers like GPT-2.
# Useful for transformer models with strict token limits.
print("\n--- Using Token-based Splitting ---")
token_splitter = TokenTextSplitter(chunk_overlap=0, chunk_size=512)
token_docs = token_splitter.split_documents(documents)
create_vector_store(token_docs, "chroma_db_token")



--- Using Token-based Splitting ---

--- Creating vector store chroma_db_token ---
--- Finished creating vector store chroma_db_token ---


In [80]:
# 4. Recursive Character-based Splitting
# Attempts to split text at natural boundaries (sentences, paragraphs) within character limit.
# Balances between maintaining coherence and adhering to character limits.
print("\n--- Using Recursive Character-based Splitting ---")
rec_char_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=100)
rec_char_docs = rec_char_splitter.split_documents(documents)
create_vector_store(rec_char_docs, "chroma_db_rec_char")


--- Using Recursive Character-based Splitting ---

--- Creating vector store chroma_db_rec_char ---
--- Finished creating vector store chroma_db_rec_char ---


In [82]:
# 5. Custom Splitting
# Allows creating custom splitting logic based on specific requirements.
# Useful for documents with unique structure that standard splitters can't handle.
print("\n--- Using Custom Splitting ---")
class CustomTextSplitter(TextSplitter):
    def split_text(self, text):
        # Custom logic for splitting text
        return text.split("\n\n")  # Example: split by paragraphs



--- Using Custom Splitting ---


In [83]:
#Store in custom chroma db
custom_splitter = CustomTextSplitter()
custom_docs = custom_splitter.split_documents(documents)
create_vector_store(custom_docs, "chroma_db_custom")


--- Creating vector store chroma_db_custom ---
--- Finished creating vector store chroma_db_custom ---


In [84]:
# Function to query a vector store
def query_vector_store(store_name, query):
    persistent_directory=os.path.join(db_dir, store_name)
    if os.path.exists(persistent_directory):
        print(f"\n----Queying the vector store {store_name}----")
        db = Chroma(
            persist_directory=persistent_directory, embedding_function=embeddings
        )
        retriever = db.as_retriever(
            search_type = "similarity",
            search_kwargs={"k":3},
            
        )
        relevant_docs = retriever.invoke(query)
        # Display the relevant results with metadata
        print(f"\n-- Relevant Documents for {store_name}---")
        for i, doc in enumerate(relevant_docs, 1):
            print(f"Document {i} \n{doc.page_content}\n")
            if doc.metadata:
                print(f"Source: {doc.metadata.get('source', 'Unknown')}\n")

    else:
        print(f"Vector store {store_name} does not exist.")

In [91]:
# Define the user's question
query = "What is model distillation?"

In [92]:
# Query each vector store
query_vector_store("chroma_db_char", query)
query_vector_store("chroma_db_sent", query)
query_vector_store("chroma_db_token", query)
query_vector_store("chroma_db_rec_char", query)
query_vector_store("chroma_db_custom", query)



----Queying the vector store chroma_db_char----

-- Relevant Documents for chroma_db_char---
Document 1 
Model Distillation
Model distillation (also called knowledge distillation) is a method in which a small
model (student) is trained to mimic a larger model (teacher) (Hinton et al., 2015).
The knowledge of the big model is distilled into the small model, hence the term dis‐
tillation.
Traditionally, the goal of model distillation is to produce smaller models for deploy‐
ment. Deploying a big model can be resource-intensive. Distillation can produce a
smaller, faster student model that retains performance comparable to the teacher. For
example, DistilBERT, a model distilled from BERT, reduces the size of a BERT model
by 40% while retaining 97% of its language comprehension capabilities and being
60% faster (Sanh et al., 2019).
The student model can be trained from scratch like DistilBERT or finetuned from a
pre-trained model like Alpaca. In 2023, Taori et al. finetuned Llama-7B, the 

# 4. Rag Retriever Deep Dive


Types of search available:

---

### **1. Similarity Search**
- **Purpose**: Retrieves documents most similar to the query based on vector embeddings.
- **How it works**: Computes the cosine similarity (or a similar metric) between the query vector and document vectors.
- **Example**:
  ```python
  retriever = db.as_retriever()
  results = retriever.invoke(query)
  ```
- Use case: General-purpose searches to retrieve the closest matching documents.

---

### **2. Similarity Search with a Score Threshold**
- **Purpose**: Retrieves documents only if their similarity score meets a minimum threshold.
- **How it works**: Filters out less relevant results to ensure only highly similar documents are returned.
- **Example**:
  ```python
  retriever = db.as_retriever(
      search_type="similarity_score_threshold",
      search_kwargs={"score_threshold": 0.5}
  )
  ```
- Use case: High-precision searches where only very relevant documents are needed.

---

### **3. Top-K Similarity Search**
- **Purpose**: Retrieves the top `k` most similar documents to the query.
- **How it works**: Sorts all documents by similarity score and returns the top `k` matches.
- **Example**:
  ```python
  retriever = db.as_retriever(
      search_kwargs={"k": 3}  # Retrieves top 3 most similar documents
  )
  ```
- Use case: Focused searches with a limited number of results.

---

### **4. MMR (Maximal Marginal Relevance) Search**
- **Purpose**: Balances relevance and diversity of the results.
- **How it works**: Selects results based on both similarity to the query and dissimilarity to other results.
- **Example**:
  ```python
  retriever = db.as_retriever(
      search_type="mmr",
      search_kwargs={"k": 3, "lambda": 0.5}  # `lambda` controls relevance vs diversity
  )
  ```
- Use case: When you want a diverse set of results while keeping relevance.

---

### **5. Filtered Search**
- **Purpose**: Retrieves documents that meet specific metadata or attribute-based conditions.
- **How it works**: Filters documents based on custom metadata (e.g., source, date, category).
- **Example**:
  ```python
  retriever = db.as_retriever(
      search_type="filter",
      search_kwargs={"filter": {"source": "report"}}
  )
  ```
- Use case: Use cases requiring strict filtering of results (e.g., documents from a specific source or time period).

---

### **6. Hybrid Search**
- **Purpose**: Combines multiple search methods, such as lexical (keyword) search and vector similarity.
- **How it works**: Merges results from different search techniques for better precision and recall.
- **Use case**: When you want to leverage both embeddings and keyword-based searches.

---

Let me know if you'd like to go deeper into any of these or need examples with your use case in mind!

In [93]:
import os

from dotenv import load_dotenv
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceBgeEmbeddings

In [94]:
# Define the directory containing the text files and the persistent directory
current_dir = os.path.abspath("")
db_dir = os.path.join(current_dir, "db")
persistent_directory = os.path.join(db_dir, "chroma_db_with_metadata")

In [95]:
embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/multi-qa-mpnet-base-dot-v1")

In [96]:
# Load the existing vector store with the embedding function
db = Chroma(persist_directory=persistent_directory,
            embedding_function=embeddings)

In [98]:
# Function to query a vector store with different search types and parameters
def query_vector_store(
        store_name, query, embedding_function, search_type, search_kwargs
):
    if os.path.exists(persistent_directory):
        print(f"\n--Querying the vector store {store_name}------")
        db = Chroma(
            persist_directory=persistent_directory,
            embedding_function=embedding_function
        )
        retriever = db.as_retriever(
            search_type=search_type,
            search_kwargs=search_kwargs,
        )
        relevant_docs = retriever.invoke(query)
        # Display the relevant results with metadata
        print(f"\n--- Relevant Documents for {store_name} ---")
        for i, doc in enumerate(relevant_docs, 1):
            print(f"Document {i}:\n{doc.page_content}\n")
            if doc.metadata:
                print(f"Source: {doc.metadata.get('source', 'Unknown')}\n")
    else:
        print(f"Vector store {store_name} does not exist.")

In [106]:
# Define user query
query = "What is llm model distillation?"


In [107]:
# Showcase different retrieval methods

# 1. Similarity Search
# This method retrieves documents based on vector similarity.
# It finds the most similar documents to the query vector based on cosine similarity.
# Use this when you want to retrieve the top k most similar documents.
print("\n--- Using Similarity Search ---")
query_vector_store("chroma_db_with_metadata", query,
                   embeddings, "similarity", {"k": 3})


--- Using Similarity Search ---

--Querying the vector store chroma_db_with_metadata------

--- Relevant Documents for chroma_db_with_metadata ---
Document 1:
You can either provide the model with the necessary context or give it tools to gather
context. The process of gathering necessary context for a given query is called context
construction. Context construction tools include data retrieval, such as in a RAG
pipeline, and web search. These tools are discussed in Chapter 6.

Source: AI_Engineering.txt

Document 2:
Term-based retrieval
Given a query, the most straightforward way to find relevant documents is with key‐
words. Some people call this approach lexical retrieval. For example, given the query
“AI engineering”, the model will retrieve all the documents that contain “AI engi‐
neering”. However, this approach has two problems:
• Many documents might contain the given term, and your model might not have
sufficient context space to include all of them as context. A heuristic is

In [101]:
# 2. Max Marginal Relevance (MMR)
# This method balances between selecting documents that are relevant to the query and diverse among themselves.
# 'fetch_k' specifies the number of documents to initially fetch based on similarity.
# 'lambda_mult' controls the diversity of the results: 1 for minimum diversity, 0 for maximum.
# Use this when you want to avoid redundancy and retrieve diverse yet relevant documents.
# Note: Relevance measures how closely documents match the query.
# Note: Diversity ensures that the retrieved documents are not too similar to each other,
#       providing a broader range of information.
print("\n--- Using Max Marginal Relevance (MMR) ---")
query_vector_store(
    "chroma_db_with_metadata",
    query,
    embeddings,
    "mmr",
    {"k": 3, "fetch_k": 20, "lambda_mult": 0.5},
)


--- Using Max Marginal Relevance (MMR) ---

--Querying the vector store chroma_db_with_metadata------

--- Relevant Documents for chroma_db_with_metadata ---
Document 1:
You can either provide the model with the necessary context or give it tools to gather
context. The process of gathering necessary context for a given query is called context
construction. Context construction tools include data retrieval, such as in a RAG
pipeline, and web search. These tools are discussed in Chapter 6.

Source: AI_Engineering.txt

Document 2:
364

|

Chapter 8: Dataset Engineering

The model-centric and data-centric division helps guide research. In reality, however,
meaningful technological progress often requires investment in both model and data
improvements.

Source: AI_Engineering.txt

Document 3:
430

|

Chapter 9: Inference Optimization

Figure 9-10. Two examples of inference with reference. The text spans that are success‐
fully copied from the input are in red and green. Image from Yang e

In [102]:
# 3. Similarity Score Threshold
# This method retrieves documents that exceed a certain similarity score threshold.
# 'score_threshold' sets the minimum similarity score a document must have to be considered relevant.
# Use this when you want to ensure that only highly relevant documents are retrieved, filtering out less relevant ones.
print("\n--- Using Similarity Score Threshold ---")
query_vector_store(
    "chroma_db_with_metadata",
    query,
    embeddings,
    "similarity_score_threshold",
    {"k": 3, "score_threshold": 0.1},
)


--- Using Similarity Score Threshold ---

--Querying the vector store chroma_db_with_metadata------


  self.vectorstore.similarity_search_with_relevance_scores(
No relevant docs were retrieved using the relevance score threshold 0.1



--- Relevant Documents for chroma_db_with_metadata ---


# 5. RAG One Off Question

In [108]:
import os
from langchain_community.vectorstores import Chroma
from langchain_core.messages import HumanMessage, SystemMessage
from langchain.embeddings import HuggingFaceEmbeddings

In [109]:
embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/multi-qa-mpnet-base-dot-v1")

In [110]:
# Define the directory containing the text files and the persistent directory
current_dir = os.path.abspath("")
db_dir = os.path.join(current_dir, "db")
persistent_directory = os.path.join(db_dir, "chroma_db_with_metadata")

In [111]:
db = Chroma(persist_directory=persistent_directory, embedding_function=embeddings)

In [121]:
query = "What is a LLM evaluation?"

In [122]:
retriever = db.as_retriever(
    search_type='similarity',
    search_kwargs={"k": 3}
)

relevant_docs = retriever.invoke(query)

In [123]:
# Display the results
print("\n----Relevant Documents-----")
for i, doc in enumerate(relevant_docs, 1):
    print(f"Document {i}: \n {doc.page_content}\n")


----Relevant Documents-----
Document 1: 
 Chapter 3: Evaluation Methodology

Document 2: 
 3. Evaluation Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Challenges of Evaluating Foundation Models
Understanding Language Modeling Metrics
Entropy
Cross Entropy
Bits-per-Character and Bits-per-Byte
Perplexity
Perplexity Interpretation and Use Cases
Exact Evaluation
Functional Correctness
Similarity Measurements Against Reference Data
Introduction to Embedding
AI as a Judge
Why AI as a Judge?
How to Use AI as a Judge
Limitations of AI as a Judge
What Models Can Act as Judges?
Ranking Models with Comparative Evaluation
Challenges of Comparative Evaluation
The Future of Comparative Evaluation
Summary

vi

|

Table of Contents

114
118
119
120
121
121
122
125
126
127
134
136
137
138
141
145
148
152
155
156

Document 3: 
 4 Textual entailment is also known as natural language inference (NLI).

168

|

Chapter 4: Evaluate AI Sy

In [125]:
# Combine the query and the relevant document contents

combined_input = (
    "Here are some documents that might help answer the question: "
    + query
    + "\n\nRelevant Documents:\n"
    + "\n\n".join([doc.page_content for doc in relevant_docs])
    + "\n\nPlease provide an answer based only on the provided documents. If the answer is not found in the documents, respond with 'I'm not sure'."
)


In [None]:
from langchain_openai import AzureChatOpenAI
import os

os.environ["AZURE_OPENAI_API_KEY"] = os.getenv("AZURE_OPENAI_API_KEY")
os.environ["AZURE_OPENAI_ENDPOINT"] = os.getenv("AZURE_OPENAI_ENDPOINT")
os.environ["AZURE_OPENAI_API_VERSION"] = os.getenv("AZURE_OPENAI_API_VERSION")  # Use the correct API version

model = AzureChatOpenAI(
    model="gpt-4o",
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.environ["AZURE_OPENAI_API_VERSION"])


In [127]:
# Define the messages for the model
messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content=combined_input),
]

# Invoke the model with the combined input
result = model.invoke(messages)

# Display the full result and content only
print("\n--- Generated Response ---")
# print("Full result:")
# print(result)
print("Content only:")
print(result.content)



--- Generated Response ---
Content only:
An LLM (Language Model) evaluation refers to the process of assessing the performance and effectiveness of a language model. The evaluation methodology includes various metrics and approaches such as understanding language modeling metrics, entropy, cross entropy, bits-per-character and bits-per-byte, perplexity and its use cases, exact evaluation, functional correctness, similarity measurements against reference data, and the use of embedding. Additionally, AI can be used as a judge to rank models through comparative evaluation, although there are challenges associated with this process. 

Metrics like perplexity help interpret and understand the model's predictive performance, while exact evaluation and functional correctness ensure the accuracy and reliability of the model's outputs. Comparative evaluation and the use of embedding for similarity measurements are also important aspects of the evaluation methodology.

In summary, LLM evaluatio

# 7. RAG conversation

In [129]:
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.vectorstores import Chroma
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.embeddings import HuggingFaceBgeEmbeddings

In [130]:
embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/multi-qa-mpnet-base-dot-v1")

In [131]:
# Define the directory containing the text files and the persistent directory
current_dir = os.path.abspath("")
db_dir = os.path.join(current_dir, "db")
persistent_directory = os.path.join(db_dir, "chroma_db_with_metadata")

In [132]:
# Load the existing vector store with the embedding function
db = Chroma(persist_directory=persistent_directory, embedding_function=embeddings)


In [133]:
# Create a retriever for querying the vector store
# `search_type` specifies the type of search (e.g., similarity)
# `search_kwargs` contains additional arguments for the search (e.g., number of results to return)
retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3},
)


In [None]:
# Create openAI model
from langchain_openai import AzureChatOpenAI
import os

os.environ["AZURE_OPENAI_API_KEY"] = os.getenv("AZURE_OPENAI_API_KEY")
os.environ["AZURE_OPENAI_ENDPOINT"] = os.getenv("AZURE_OPENAI_ENDPOINT")
os.environ["AZURE_OPENAI_API_VERSION"] = os.getenv("AZURE_OPENAI_API_VERSION")  # Use the correct API version

llm = AzureChatOpenAI(
    model="gpt-4o",
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.environ["AZURE_OPENAI_API_VERSION"])


In [135]:
# Contextualize question prompt
# This system prompt helps the AI understand that it should reformulate the question
# based on the chat history to make it a standalone question

contextualize_q_system_prompt = (
    "Give a chat history and the latest user question"
    "which might reference context in the chat history"
    "formulate a standalone question which can be understood"
    "without the chat history. Do NOT answer the question, just"
    "reformulate it if needed and otherwise return it as it is"
)


In [136]:
# Create a prompt template for contextualizing questions
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")
    ]
)

In [137]:
# Create a history-aware retriever
# This uses the LLM to help reformulate the question based on chat history
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)


In [138]:
qa_system_prompt = (
    "You are an assistant for question-answering tasks. Use"
    "the following pieces of retrieved context to answer the"
    "question. If you dont know the answer, just say that you"
    "dont know. Use three sentences maximum and keep the answer"
    "concise."
    "\n\n"
    "{context}"
)

In [None]:
# Create a prompt template for answering questions
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [141]:
#Create a chain to combine documents for question answering
# `create_stuff_documents_chain` feeds all retrieved context into the LLM
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

In [142]:
# Create a retrieval chain that combines the history-aware retriever and the question answering chain
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

In [143]:
# Function to simulate a continual chat
def continual_chat():
    print("Start chatting with the AI! Type 'exit' to end the conversation.")
    chat_history = []  # Collect chat history here (a sequence of messages)
    while True:
        query = input("You: ")
        if query.lower() == "exit":
            break
        # Process the user's query through the retrieval chain
        result = rag_chain.invoke({"input": query, "chat_history": chat_history})
        # Display the AI's response
        print(f"AI: {result['answer']}")
        # Update the chat history
        chat_history.append(HumanMessage(content=query))
        chat_history.append(SystemMessage(content=result["answer"]))

In [144]:
# Main function to start the continual chat
if __name__ == "__main__":
    continual_chat()

Start chatting with the AI! Type 'exit' to end the conversation.
AI: Model distillation is a process where a smaller, more efficient model is trained to reproduce the behavior of a larger, more complex model. The knowledge from the large model is "distilled" into the smaller one, allowing it to achieve similar performance with reduced computational resources. This technique is often used to make models more suitable for deployment on devices with limited processing power.
AI: April Dunford emphasizes the importance of understanding what best-fit prospects need to know to appreciate the unique value of your product. She suggests that your market insight should guide you in highlighting why your product is different and better than alternatives, which helps in building a strong sales pitch. Understanding your prospects' point of view is key to articulating your product’s unique market insight effectively.
AI: Chip Huyen emphasizes that building AI products with defensibility is crucial f