# **🔷🔷Improving the RAG Architecture🔷🔷**

Discover state-of-the-art techniques for loading, splitting, and retrieving documents, including loading Python files, splitting semantically, and using MRR and self-query retrieval methods. Learn to evaluate your RAG architecture using robust metrics and frameworks.

![img_1](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0501.jpeg)

## **⭐01: Loading and Splitting code files**

This is useful for integrating codebases into RAG systems—for tasks like code `summarization`, `documentation generation`, or `code assistance`.

### **⭕Loading Markdown Files**

In [2]:
from langchain_community.document_loaders import UnstructuredMarkdownLoader

PATH = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\README.md"

loader = UnstructuredMarkdownLoader(file_path=PATH)
markdown_content = loader.load()

print(markdown_content[0].page_content)  # Print the content of the first document
print(markdown_content[0].metadata)      # Print the metadata of the first document

GenAI with Langchain and Huggingface 🤗

This repository serves as a comprehensive guide for integrating Langchain with Huggingface models, enabling you to build, deploy, and optimize cutting-edge AI applications through hands-on projects and real-world examples.

GenAI Overview

Overview of Generative AI Pipeline

author

Python 3.9+

Streamlit

Ollama

LangChain

HuggingFace

License: MIT

Table of Contents

GenAI with Langchain and Huggingface 🤗

Table of Contents

Overview

Key Features

Types of Generative AI

Supported Model Types

⭐Builder's Perspective

1. Foundation Model Architecture

2. Model Training Pipeline

3. Data Processing

4. Model Architecture

5. Training Infrastructure

6. Deployment Strategy

⭐User's Perspective

1. Interface Design

2. User Interaction

3. Response Generation

4. System Integration

5. Performance Metrics

Installation

Getting Started

Examples

Contributing

⚖ ➤ License

❤️ Support

🪙Credits and Inspiration

🔗Connect with me

Overview

This rep

### **⭕Loading Python Files**

In [3]:
from langchain_community.document_loaders import PythonLoader

PATH = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\pyfile.py"

loader = PythonLoader(file_path=PATH)
python_data = loader.load()

print(python_data[0])

page_content='from abc import ABC, abstractmethod

# Abstract base class for all LLMs
class LLM(ABC):
    @abstractmethod
    def complete_sentence(self, prompt):
        pass

# Concrete implementations of LLM
class OpenAI(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ... OpenAI end of sentence."

class Anthropic(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ... Anthropic end of sentence."

class GooglePaLM(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ... Google PaLM end of sentence."

class Cohere(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ... Cohere end of sentence."

class Mistral(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ... Mistral end of sentence."

class CustomLLM(LLM):
    def complete_sentence(self, prompt):
        suffix = " ... CustomLLM generated response."
        return prompt + suffix

# Test function to run all LLMs
def test_ll

### **⭕Splitting Code Files**

In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

python_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=10
    )

chunks = python_splitter.split_documents(python_data)
for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i+1}:\n{chunk.page_content}\n")

Chunk 1:
from abc import ABC, abstractmethod

Chunk 2:
# Abstract base class for all LLMs
class LLM(ABC):
    @abstractmethod
    def complete_sentence(self, prompt):
        pass

Chunk 3:
# Concrete implementations of LLM
class OpenAI(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ... OpenAI end of sentence."



### **⭕Language-Specific Splitting**

- Instead of naive splitting, LangChain can split code using language-aware separators like:

  - `\nclass`, `\ndef` , `\n\tdef` 

- This ensures that each chunk is a logical code unit—such as an entire function or class—rather than arbitrary lines.

- Especially beneficial for code analysis or generation, as it maintains semantic structure.

In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=150,
    chunk_overlap=10
)

chunks = python_splitter.split_documents(python_data)

for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i+1}:\n{chunk.page_content}\n")

Chunk 1:
from abc import ABC, abstractmethod

# Abstract base class for all LLMs

Chunk 2:
class LLM(ABC):
    @abstractmethod
    def complete_sentence(self, prompt):
        pass

# Concrete implementations of LLM

Chunk 3:
class OpenAI(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ... OpenAI end of sentence."



## **⭐02:Advanced Splitting Methods**

![img_2](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0502.jpeg)

⚠️ **Limitations of Basic Splitting**

- **Lack of Context Awareness:** Simple character-based splitting might break a function or paragraph in unnatural places, reducing model performance.

- **Mismatch with Model Processing:** Since LLMs process tokens, character limits may not align with model capabilities, leading to token overflow or inefficient use of input space.

### **⭕Token-Based Splitting**

- Splits are calculated by token count, which aligns with how LLMs consume input.
- This ensures each chunk fits within the model’s token limit and avoids truncation.
- Prevents loss of meaning due to mid-token splits.

In [None]:
import tiktoken
from langchain_text_splitters import TokenTextSplitter

example_string = "Mary had a little lamb, it's fleece was white as snow."

# Get encoding for model
encoding = tiktoken.encoding_for_model('gpt-4o-mini')

# Initialize the TokenTextSplitter
splitter = TokenTextSplitter(
    encoding_name=encoding.name,
    chunk_size=10,
    chunk_overlap=2
)

# Split the text into chunks
chunks = splitter.split_text(example_string)

# Count tokens in each chunk and print them
for i, chunk in enumerate(chunks):
    token_count = len(encoding.encode(chunk))
    print(f"Chunk {i+1}:\nNo. tokens: {token_count}\n{chunk}\n")

Chunk 1:
No. tokens: 10
Mary had a little lamb, it's fleece was white

Chunk 2:
No. tokens: 5
 was white as snow.



`cl100k_base` is the tokenizer encoding used for models like:

- gpt-4
- gpt-4-32k
- gpt-3.5-turbo
- gpt-3.5-turbo-16k
- and now also used as a fallback when a model like gpt-4o-mini isn't directly supported.

In [None]:
import tiktoken
from langchain.text_splitter import TokenTextSplitter
from langchain.schema import Document

example_string = "Mary had a little lamb, its fleece was white as snow."

# Get encoding for the model
# Use the 'cl100k_base' encoding for GPT-3.5 and GPT-4 models
encoding = tiktoken.get_encoding("cl100k_base")

# Set up token-based text splitter
token_splitter = TokenTextSplitter(
    encoding_name=encoding.name,
    chunk_size=100,
    chunk_overlap=10
)

# Wrap the string in a Document object and split into chunks
documents = [Document(page_content=example_string)]
chunks = token_splitter.split_documents(documents)

# Display the token count in each chunk
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\nNo. tokens: {len(encoding.encode(chunk.page_content))}\n{chunk.page_content}\n")

Chunk 1:
No. tokens: 13
Mary had a little lamb, its fleece was white as snow.



### **⭕Semantic Splitting**

- Uses embedding models to understand the content and split based on semantic boundaries (logical breakpoints in meaning).

- Employs gradient thresholding to decide where one idea ends and another begins.

- Produces coherent, context-rich chunks that enhance downstream task accuracy (like answering or summarizing).

```python
from langchain_community.document_transformers import SemanticChunker
from langchain.embeddings import OpenAIEmbeddings

# Instantiate an OpenAI embeddings model
embedding_model = OpenAIEmbeddings(api_key="<OPENAI_API_TOKEN>", model='text-embedding-3-small')

# Create the semantic text splitter with desired parameters
semantic_splitter = SemanticChunker(
    embeddings=embedding_model, breakpoint_threshold_type="gradient", breakpoint_threshold_amount=0.8
)

# Split the document
chunks = semantic_splitter.split_documents(document)
print(chunks[0])

```

In [None]:
from dotenv import load_dotenv
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI  
from langchain_experimental.text_splitter import SemanticChunker

load_dotenv()

# Initialize the Google embedding model used to convert text into high-dimensional vectors
# This model helps in understanding the meaning of text for semantic processing
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Create an instance of SemanticChunker to split text based on semantic changes (meaningful segments)
semantic_splitter = SemanticChunker(
    embeddings=embeddings,                         # Pass the embedding model
    breakpoint_threshold_type="gradient",          # Method to detect split points based on semantic gradient
    breakpoint_threshold_amount=0.8                # Sensitivity of chunk splitting (higher = fewer splits)
)

# Split the input documents into semantically coherent chunks
chunks = semantic_splitter.split_documents(python_data)

print(chunks[0])

page_content='from abc import ABC, abstractmethod

# Abstract base class for all LLMs
class LLM(ABC):
    @abstractmethod
    def complete_sentence(self, prompt):
        pass

# Concrete implementations of LLM
class OpenAI(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ...' metadata={'source': 'E:\\01_Github_Repo\\GenAI-with-Langchain-and-Huggingface\\_Developing_LLMs_Applications_with_LangChain\\_data\\pyfile.py'}


## **⭐03: Optimizing document retrieval**

### 🔍 **Dense vs. Sparse Retrieval in RAG Pipelines**


When building Retrieval-Augmented Generation (RAG) systems—like those in LangChain—you typically choose between **dense** and **sparse** retrieval methods.

- 🧪 **Dense Retrieval**
  -  Uses neural networks (e.g., transformers) to encode documents and queries into **dense vectors**—compact numerical representations that capture meaning.
  -  Relevance is measured via **vector similarity** (like cosine similarity or dot product).
  - `Pros.` Vs `Cons.`:
     - **✅ Pros:**
       - Captures **semantic meaning**—good with synonyms, paraphrasing, and abstract queries.
       - Powerful for **open-domain** or fuzzy information retrieval.
     - **⚠️ Cons:**
       -  Requires **expensive training** and GPU-based inference.
       -  Harder to **interpret** why a document was retrieved.


-  📚 **Sparse Retrieval**
   - Based on **keyword matching** using traditional IR methods.
   - Works with **bag-of-words** models—each word is treated separately and sparsely.
   - **Common Techniques**:
      -  **TF-IDF** (*Term Frequency–Inverse Document Frequency*):
         - Measures how important a word is to a document.
         - > If a term appears often in one document but rarely across others, it gets a higher score.
      - **BM25** (*Best Matching 25*):
        - An advanced ranking function in the Okapi family.
        - > It refines TF-IDF by adjusting for **term frequency saturation** and **document length**.
     - `Pros.` Vs `Cons.`:
       - **✅ Pros:**
         -  **Fast**, resource-efficient, and easy to **interpret**.
         -  Great for **rare terms** and exact keyword matches.
       - **⚠️ Cons:**
         -  Struggles with **synonyms** or **semantic similarity**.
         -  Can miss documents that are relevant but use **different wording**.


### 🧠 **TF-IDF vs. BM25: Quick Comparison**

| Feature           | TF-IDF                                 | BM25                                    |
| ----------------- | -------------------------------------- | --------------------------------------- |
| Scoring Basis     | Term frequency × inverse document freq | Improved term weighting with saturation |
| Handles Long Docs | ❌ No                                   | ✅ Yes                                   |
| Customizable      | Limited                                | ✅ Adjustable with `k1` and `b` params   |
| Used In           | Classic search engines, baseline NLP   | Modern IR, LangChain RAG pipelines      |



- 🛠️ **In LangChain Pipelines**
  - **`BM25` is often preferred** over `TF-IDF` because it:
    - Handles **longer documents** better.
    - Reduces over-penalization for **repeated keywords**.
    - Generally provides more **balanced scoring**.

## **⭐04: Introduction to RAG evaluation**

# 🧩 ***Full code***

In [9]:
# Import necessary modules
from dotenv import load_dotenv
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.document_loaders import PythonLoader
from langchain.schema import Document

# Step 1: Load environment variables (expects GOOGLE_API_KEY in .env)
load_dotenv()

# Step 2: Define the path to the Python file
PATH = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\pyfile.py"

# Step 3: Load the Python file as LangChain Documents
loader = PythonLoader(file_path=PATH)
python_data = loader.load()  # Returns a list of Document objects

# Step 4: Initialize Google embedding model for semantic chunking
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Step 5: Create a SemanticChunker instance
semantic_splitter = SemanticChunker(
    embeddings=embeddings,
    breakpoint_threshold_type="gradient",  # Use gradient-based breakpoints
    breakpoint_threshold_amount=0.8        # Threshold for chunk separation
)

# Step 6: Perform semantic chunking on the loaded documents
chunks = semantic_splitter.split_documents(python_data)

# Step 7: Print out all chunks
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\n{chunk.page_content}\n{'-'*60}")


Chunk 1:
from abc import ABC, abstractmethod

# Abstract base class for all LLMs
class LLM(ABC):
    @abstractmethod
    def complete_sentence(self, prompt):
        pass

# Concrete implementations of LLM
class OpenAI(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ...
------------------------------------------------------------
Chunk 2:
OpenAI end of sentence."

class Anthropic(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ...
------------------------------------------------------------
Chunk 3:
Anthropic end of sentence."

class GooglePaLM(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ...
------------------------------------------------------------
Chunk 4:
Google PaLM end of sentence."

class Cohere(LLM):
    def complete_sentence(self, prompt):
        return prompt + " ...
------------------------------------------------------------
Chunk 5:
Cohere end of sentence."

class Mistral(LLM):
    def complet