# **üî∑üî∑Improving the RAG Architectureüî∑üî∑**

Discover state-of-the-art techniques for loading, splitting, and retrieving documents, including loading Python files, splitting semantically, and using MRR and self-query retrieval methods. Learn to evaluate your RAG architecture using robust metrics and frameworks.

## **‚≠ê01: Loading and Splitting code files**

![img_1](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0501.jpeg)

This is useful for integrating codebases into RAG systems‚Äîfor tasks like code `summarization`, `documentation generation`, or `code assistance`.

### **‚≠ïLoading Markdown Files**

In [None]:
from langchain_community.document_loaders import UnstructuredMarkdownLoader

PATH = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\README.md"

loader = UnstructuredMarkdownLoader(file_path=PATH)
markdown_content = loader.load()

print(markdown_content[0].page_content)  # Print the content of the first document
print(markdown_content[0].metadata)      # Print the metadata of the first document

### **‚≠ïLoading Python Files**

In [None]:
from langchain_community.document_loaders import PythonLoader

PATH = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\pyfile.py"

loader = PythonLoader(file_path=PATH)
python_data = loader.load()

print(python_data[0])

### **‚≠ïSplitting Code Files**

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

python_splitter = RecursiveCharacterTextSplitter(
    chunk_size=150,
    chunk_overlap=10
    )

chunks = python_splitter.split_documents(python_data)
for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i+1}:\n{chunk.page_content}\n")

### **‚≠ïLanguage-Specific Splitting**

- Instead of naive splitting, LangChain can split code using language-aware separators like:

  - `\nclass`, `\ndef` , `\n\tdef` 

- This ensures that each chunk is a logical code unit‚Äîsuch as an entire function or class‚Äîrather than arbitrary lines.

- Especially beneficial for code analysis or generation, as it maintains semantic structure.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=150,
    chunk_overlap=10
)

chunks = python_splitter.split_documents(python_data)

for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i+1}:\n{chunk.page_content}\n")

## **‚≠ê02:Advanced Splitting Methods**

![img_2](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0502.jpeg)

‚ö†Ô∏è **Limitations of Basic Splitting**

- **Lack of Context Awareness:** Simple character-based splitting might break a function or paragraph in unnatural places, reducing model performance.

- **Mismatch with Model Processing:** Since LLMs process tokens, character limits may not align with model capabilities, leading to token overflow or inefficient use of input space.

### **‚≠ïToken-Based Splitting**

- Splits are calculated by token count, which aligns with how LLMs consume input.
- This ensures each chunk fits within the model‚Äôs token limit and avoids truncation.
- Prevents loss of meaning due to mid-token splits.

In [None]:
import tiktoken
from langchain_text_splitters import TokenTextSplitter

example_string = "Mary had a little lamb, it's fleece was white as snow."

# Get encoding for model
encoding = tiktoken.encoding_for_model('gpt-4o-mini')

# Initialize the TokenTextSplitter
splitter = TokenTextSplitter(
    encoding_name=encoding.name,
    chunk_size=10,
    chunk_overlap=2
)

# Split the text into chunks
chunks = splitter.split_text(example_string)

# Count tokens in each chunk and print them
for i, chunk in enumerate(chunks):
    token_count = len(encoding.encode(chunk))
    print(f"Chunk {i+1}:\nNo. tokens: {token_count}\n{chunk}\n")

`cl100k_base` is the tokenizer encoding used for models like:

- gpt-4
- gpt-4-32k
- gpt-3.5-turbo
- gpt-3.5-turbo-16k
- and now also used as a fallback when a model like gpt-4o-mini isn't directly supported.

In [None]:
import tiktoken
from langchain.text_splitter import TokenTextSplitter
from langchain.schema import Document

example_string = "Mary had a little lamb, its fleece was white as snow."

# Get encoding for the model
# Use the 'cl100k_base' encoding for GPT-3.5 and GPT-4 models
encoding = tiktoken.get_encoding("cl100k_base")

# Set up token-based text splitter
token_splitter = TokenTextSplitter(
    encoding_name=encoding.name,
    chunk_size=100,
    chunk_overlap=10
)

# Wrap the string in a Document object and split into chunks
documents = [Document(page_content=example_string)]
chunks = token_splitter.split_documents(documents)

# Display the token count in each chunk
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\nNo. tokens: {len(encoding.encode(chunk.page_content))}\n{chunk.page_content}\n")

### **‚≠ïSemantic Splitting**

- Uses embedding models to understand the content and split based on semantic boundaries (logical breakpoints in meaning).

- Employs gradient thresholding to decide where one idea ends and another begins.

- Produces coherent, context-rich chunks that enhance downstream task accuracy (like answering or summarizing).

```python
from langchain_community.document_transformers import SemanticChunker
from langchain.embeddings import OpenAIEmbeddings

# Instantiate an OpenAI embeddings model
embedding_model = OpenAIEmbeddings(api_key="<OPENAI_API_TOKEN>", model='text-embedding-3-small')

# Create the semantic text splitter with desired parameters
semantic_splitter = SemanticChunker(
    embeddings=embedding_model, breakpoint_threshold_type="gradient", breakpoint_threshold_amount=0.8
)

# Split the document
chunks = semantic_splitter.split_documents(document)
print(chunks[0])

```

In [None]:
from dotenv import load_dotenv
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI  
from langchain_experimental.text_splitter import SemanticChunker

load_dotenv()

# Initialize the Google embedding model used to convert text into high-dimensional vectors
# This model helps in understanding the meaning of text for semantic processing
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Create an instance of SemanticChunker to split text based on semantic changes (meaningful segments)
semantic_splitter = SemanticChunker(
    embeddings=embeddings,                         # Pass the embedding model
    breakpoint_threshold_type="gradient",          # Method to detect split points based on semantic gradient
    breakpoint_threshold_amount=0.8                # Sensitivity of chunk splitting (higher = fewer splits)
)

# Split the input documents into semantically coherent chunks
chunks = semantic_splitter.split_documents(python_data)

print(chunks[0])

## **‚≠ê03: Optimizing document retrieval**

![img_3](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0503.jpeg)

### üîç **Dense vs. Sparse Retrieval in RAG Pipelines**


![img_4](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0504.jpeg)

When building Retrieval-Augmented Generation (RAG) systems‚Äîlike those in LangChain‚Äîyou typically choose between **dense** and **sparse** retrieval methods.

- üß™ **Dense Retrieval**
  -  Uses neural networks (e.g., transformers) to encode documents and queries into **dense vectors**‚Äîcompact numerical representations that capture meaning.
  -  Relevance is measured via **vector similarity** (like cosine similarity or dot product).
  - `Pros.` Vs `Cons.`:
     - **‚úÖ Pros:**
       - Captures **semantic meaning**‚Äîgood with synonyms, paraphrasing, and abstract queries.
       - Powerful for **open-domain** or fuzzy information retrieval.
     - **‚ö†Ô∏è Cons:**
       -  Requires **expensive training** and GPU-based inference.
       -  Harder to **interpret** why a document was retrieved.


-  üìö **Sparse Retrieval**
   - Based on **keyword matching** using traditional IR methods.
   - Works with **bag-of-words** models‚Äîeach word is treated separately and sparsely.
   - **Common Techniques**:
      -  **TF-IDF** (*Term Frequency‚ÄìInverse Document Frequency*):
         - Measures how important a word is to a document.
         - > If a term appears often in one document but rarely across others, it gets a higher score.
      - **BM25** (*Best Matching 25*):
        - An advanced ranking function in the Okapi family.
        - > It refines TF-IDF by adjusting for **term frequency saturation** and **document length**.
     - `Pros.` Vs `Cons.`:
       - **‚úÖ Pros:**
         -  **Fast**, resource-efficient, and easy to **interpret**.
         -  Great for **rare terms** and exact keyword matches.
       - **‚ö†Ô∏è Cons:**
         -  Struggles with **synonyms** or **semantic similarity**.
         -  Can miss documents that are relevant but use **different wording**.


### üß† **TF-IDF vs. BM25: Quick Comparison**

| Feature           | TF-IDF                                 | BM25                                    |
| ----------------- | -------------------------------------- | --------------------------------------- |
| Scoring Basis     | Term frequency √ó inverse document freq | Improved term weighting with saturation |
| Handles Long Docs | ‚ùå No                                   | ‚úÖ Yes                                   |
| Customizable      | Limited                                | ‚úÖ Adjustable with `k1` and `b` params   |
| Used In           | Classic search engines, baseline NLP   | Modern IR, LangChain RAG pipelines      |



- üõ†Ô∏è **In LangChain Pipelines**
  - **`BM25` is often preferred** over `TF-IDF` because it:
    - Handles **longer documents** better.
    - Reduces over-penalization for **repeated keywords**.
    - Generally provides more **balanced scoring**.

In [1]:
# --- Required Imports ---
from langchain_community.retrievers import BM25Retriever    # For keyword-based document retrieval
from langchain_core.runnables import RunnablePassthrough    # Passes question directly through in the chain
from langchain_core.prompts import PromptTemplate           # Used to format input to the LLM
from langchain_core.output_parsers import StrOutputParser   # Extracts string outputs from LLM responses
from langchain_google_genai import ChatGoogleGenerativeAI   # Gemini wrapper for LLM inference
from langchain_core.documents import Document               # Structure for text chunks used in retrieval

from dotenv import load_dotenv  # Loads env variables like API keys
load_dotenv()

# --- Step 1: Input Text Chunks for Retrieval ---
chunks = [
    "Python was created by Guido van Rossum and released in 1991.",
    "Python is a popular language for machine learning (ML).",
    "The PyTorch library is a popular Python library for AI and ML.",
    "Python is also used for web development, data analysis, and automation."
]

# --- Step 2: Create BM25 Retriever from Text Chunks ---
bm25_retriever = BM25Retriever.from_texts(chunks, k=3)  # `k` defines how many top results to return

# --- Step 3: Test Simple Keyword-Based Retrieval ---
results = bm25_retriever.invoke("Who created Python?")
print("Most relevant Documents:")
print(results[0].page_content)

# --- Step 4: Convert Raw Strings to LangChain Document Objects ---
documents = [Document(page_content=text) for text in chunks]

# --- Step 5: Create a More Structured Retriever Using Documents ---
retriever = BM25Retriever.from_documents(documents=documents, k=5)  # More flexible for later RAG chains

# --- Step 6: Configure Gemini Flash Language Model ---
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",          # Lightweight but fast Gemini model
    max_output_tokens=100,            # Limit response length
    temperature=0.3                   # Lower temp = more deterministic response
)

# --- Step 7: Define RAG Prompt Template ---
prompt = PromptTemplate.from_template(
    "You are an expert assistant. Using the following context:\n\n{context}\n\nAnswer the question:\n{question}"
)

# --- Step 8: Build Full Retrieval-Augmented Generation Chain ---
chain = (
    {"context": retriever, "question": RunnablePassthrough()}  # Step 1: Retrieve relevant docs
    | prompt                                                   # Step 2: Format input for LLM
    | llm                                                      # Step 3: Call Gemini LLM
    | StrOutputParser()                                        # Step 4: Clean string output
)

# --- Step 9: Run the Chain with a Sample Query ---
question = "How can LLM hallucination impact a RAG application?"
response = chain.invoke(question)
print("\nResponse from Gemini LLM:")
print(response)

Most relevant Documents:
Python was created by Guido van Rossum and released in 1991.

Response from Gemini LLM:
In a RAG (Retrieval Augmented Generation) application, LLM hallucination, the tendency of LLMs to generate incorrect or nonsensical information, can significantly impact its accuracy and reliability in several ways:

1. **Fabricated Information:**  If the LLM hallucinates facts not present in the retrieved documents, the final answer will be wrong. For example, if asked "What libraries does Python use for AI and ML besides PyTorch?", a hallucinating LLM might invent libraries that don't


## **‚≠ê04: Introduction to RAG evaluation**

![img_5](https://raw.githubusercontent.com/mohd-faizy/GenAI-with-Langchain-and-Huggingface/refs/heads/main/_Developing_LLMs_Applications_with_LangChain/_img/0505.jpeg)

In [3]:
# ============================
# 1: Imports and Setup
# ============================
from langchain.prompts import PromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI
from langsmith.evaluation import LangChainStringEvaluator 
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Access your Gemini API key from the environment
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY not found. Make sure it's set in your .env file.")

# ============================
# 2: Input Data
# ============================
query = "What are the main components of RAG architecture?"
predicted_answer = "Training and encoding"
ref_answer = "Retrieval and Generation"

# ============================
# 3: Prompt Template
# ============================
prompt_template = """You are an expert professor specialized in grading students' answers.
You are grading the following question: {query}
Here is the real answer: {answer}
You are grading the following predicted answer: {result}
Respond with CORRECT or INCORRECT:
Grade:"""

prompt = PromptTemplate(
    input_variables=["query", "answer", "result"],
    template=prompt_template
)

# ============================
# 4: LLM Setup
# ============================
eval_llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    max_output_tokens=100,
    temperature=0.3,
    google_api_key=GOOGLE_API_KEY
)

# ============================
# 5: Evaluator Setup
# ============================
qa_evaluator = LangChainStringEvaluator(
    "qa",  # <--- Name of the evaluator type; others include "criteria", "embedding_distance", etc. -> question answering - evaluation
    config={ 
        "llm": eval_llm,
        "prompt": prompt,
    }
)


# ============================
# 6: Run Evaluation
# ============================
score = qa_evaluator.evaluator.evaluate_strings(prediction=predicted_answer,
                                                reference=ref_answer,
                                                input=query
                                                ) 

print(f"Score: {score}")

Score: {'reasoning': 'Grade: INCORRECT', 'value': 'INCORRECT', 'score': 0}


### üîç What Is the RAGAS Framework?

**RAGAS** (Retrieval-Augmented Generation Assessment Score) is a **framework** that evaluates both **retrieval** and **generation quality** in RAG pipelines. It provides **automatic evaluation** without needing human-annotated answers.

---

### ‚úÖ RAGAS Evaluation Metrics (From the Image)

#### üß† Generation Metrics:

1. **Faithfulness**:

   * Measures if the generated answer **truthfully reflects** the context.
   * Formula:

     $$
     \text{Faithfulness} = \frac{\text{No. of claims made that can be inferred from context}}{\text{Total no. of claims}}
     $$
   * Normalized between **0 and 1** (closer to 1 = more faithful).
   * Detects **hallucination** in LLM outputs.

2. **Answer Relevancy**:

   * How relevant is the generated answer to the original **query**?

#### üîç Retrieval Metrics:

3. **Context Precision**:

   * Measures the **signal-to-noise** ratio in the retrieved context.
   * High precision = mostly relevant documents.

4. **Context Recall**:

   * Measures how much of the **necessary context** was retrieved.
   * Can the retriever fetch **all relevant** information?

---

### ‚úÖ Code Example Using `ragas` in Python

To use these RAGAS metrics in code, you typically integrate it with `LangChain`, `ragas`, and optionally `Haystack` or `FAISS` for retrieval.


### üí° Summary of Each Metric:

| Metric                | Type       | Measures                                       | Goal                |
| --------------------- | ---------- | ---------------------------------------------- | ------------------- |
| **Faithfulness**      | Generation | If answer is supported by retrieved context    | Avoid hallucination |
| **Answer Relevancy**  | Generation | If answer is relevant to the original question | Stay on-topic       |
| **Context Precision** | Retrieval  | % of retrieved context that is relevant        | Reduce noise        |
| **Context Recall**    | Retrieval  | % of required info retrieved from corpus       | Maximize coverage   |




In [6]:
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness, context_precision
import os
from dotenv import load_dotenv

# Load API key
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY not found. Please check your .env file.")

# Initialize Gemini model and embeddings
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    google_api_key=GOOGLE_API_KEY
)


embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=GOOGLE_API_KEY
)

# Faithfulness Evaluation
faithfulness_chain = EvaluatorChain(
    metric=faithfulness,
    llm=llm,
    embeddings=embeddings
)

eval_result = faithfulness_chain({
    "question": "How does the RAG model improve question answering with LLMs?", 
    "answer": "The RAG model improves question answering by combining the retrieval of documents...",
    "contexts": [
        "The RAG model integrates document retrieval with LLMs by first retrieving relevant passages...", 
        "By incorporating retrieval mechanisms, RAG leverages external knowledge sources, allowing the...",
        ]
    })

print("Faithfulness Score:", eval_result['faithfulness'])

# Context Precision Evaluation
context_precision_chain = EvaluatorChain(
    metric=context_precision,
    llm=llm,
    embeddings=embeddings
)

context_precision_result = context_precision_chain({
    "question": "How does the RAG model improve question answering with large language models?",
    "ground_truth": "The RAG model improves question answering by combining the retrieval of...",
    "contexts": [ 
        "The RAG model integrates document retrieval with LLMs by first retrieving...", 
        "By incorporating retrieval mechanisms, RAG leverages external knowledge sources...",
        ]
    })

print("Context Precision Score:", context_precision_result['context_precision'])

Faithfulness Score: 0.0
Context Precision Score: 0.0


In [18]:
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness, context_precision, answer_relevancy
from ragas.metrics._aspect_critic import AspectCritic
import os
from dotenv import load_dotenv

# ‚îÄ‚îÄ‚îÄ Load API Key ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY not found. Please check your .env file.")

# ‚îÄ‚îÄ‚îÄ Initialize Gemini Model & Embeddings ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    google_api_key=GOOGLE_API_KEY,
    temperature=0  # deterministic
)
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=GOOGLE_API_KEY
)

# ‚îÄ‚îÄ‚îÄ Define Conciseness Metric ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
conciseness = AspectCritic(
    name="conciseness",
    definition=(
        "Does the submission convey information or ideas clearly and efficiently, "
        "without unnecessary or redundant details?"
    ),
    strictness=3
)

# ‚îÄ‚îÄ‚îÄ 1. Faithfulness Evaluation ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print("\n--- Faithfulness Evaluation ---")
faithfulness_chain = EvaluatorChain(
    metric=faithfulness,
    llm=llm,
    embeddings=embeddings
)
faithfulness_eval_data = {
    "question": "How does the RAG model improve question answering with LLMs?",
    "answer":   "The RAG model improves question answering by combining the retrieval of documents, which provides external knowledge, with the generation capabilities of large language models. This allows the LLM to provide more accurate and up-to-date answers.",
    "contexts": [
        "The RAG model integrates document retrieval with LLMs by first retrieving relevant passages from a knowledge base based on the user's query.",
        "By incorporating retrieval mechanisms, RAG leverages external knowledge sources, allowing the language model to ground its responses in factual information beyond its training data.",
        "Traditional LLMs might hallucinate or provide outdated information; RAG mitigates this by providing a current and relevant context for generation."
    ]
}
res_faith = faithfulness_chain(faithfulness_eval_data)
print(f"Faithfulness Score: {res_faith['faithfulness']:.4f}")

# ‚îÄ‚îÄ‚îÄ 2. Context Precision Evaluation ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print("\n--- Context Precision Evaluation ---")
context_chain = EvaluatorChain(
    metric=context_precision,
    llm=llm,
    embeddings=embeddings
)
context_precision_eval_data = {
    "question": "How does the RAG model improve question answering with large language models?",
    "ground_truth": "The RAG model improves question answering by dynamically retrieving relevant information from a vast knowledge base and then using this information to inform the large language model's response, leading to more accurate and factual answers.",
    "contexts": [
        "The RAG model integrates document retrieval with LLMs by first retrieving relevant passages from a knowledge base based on the user's query.",
        "By incorporating retrieval mechanisms, RAG leverages external knowledge sources, allowing the language model to ground its responses in factual information beyond its training data.",
        "Traditional LLMs might hallucinate or provide outdated information; RAG mitigates this by providing a current and relevant context for generation.",
        "The RAG approach enhances the ability of LLMs to answer complex questions by accessing information that was not part of their initial training set."
    ]
}
res_ctx = context_chain(context_precision_eval_data)
print(f"Context Precision Score: {res_ctx['context_precision']:.4f}")

# ‚îÄ‚îÄ‚îÄ 3. Answer Relevance Evaluation ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print("\n--- Answer Relevance Evaluation ---")
relevancy_chain = EvaluatorChain(
    metric=answer_relevancy,
    llm=llm,
    embeddings=embeddings
)
answer_relevancy_eval_data = {
    "question": "What is the capital of France?",
    "answer":   "Paris is the capital of France, known for its iconic Eiffel Tower and rich history."
}
res_rel = relevancy_chain(answer_relevancy_eval_data)
print(f"Answer Relevance Score: {res_rel['answer_relevancy']:.4f}")

# ‚îÄ‚îÄ‚îÄ 4. Conciseness Evaluation ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print("\n--- Conciseness Evaluation ---")
conciseness_chain = EvaluatorChain(
    metric=conciseness,
    llm=llm,
    embeddings=embeddings
)
conciseness_eval_data = {
    "question": "Describe the main function of a CPU.",
    "answer":   "The central processing unit, often abbreviated as CPU, is essentially the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control, and input/output (I/O) operations specified by the instructions. It's often called the 'brain' of the computer."
}
res_conc = conciseness_chain(conciseness_eval_data)
print(f"Conciseness Score: {res_conc['conciseness']:.4f}")


--- Faithfulness Evaluation ---
Faithfulness Score: 0.0000

--- Context Precision Evaluation ---
Context Precision Score: 0.0000

--- Answer Relevance Evaluation ---
Answer Relevance Score: 0.8271

--- Conciseness Evaluation ---
Conciseness Score: 1.0000


In [17]:
import os
from dotenv import load_dotenv

from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness, context_precision, answer_relevancy
from ragas.metrics._aspect_critic import AspectCritic

# ‚îÄ‚îÄ‚îÄ Helpers ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def load_api_key(env_var: str = "GOOGLE_API_KEY") -> str:
    load_dotenv()
    key = os.getenv(env_var)
    if not key:
        raise EnvironmentError(f"{env_var} not found in .env file.")
    return key

def initialize_models(api_key: str):
    llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-flash",
        google_api_key=api_key,
        temperature=0
    )
    embeddings = GoogleGenerativeAIEmbeddings(
        model="models/embedding-001",
        google_api_key=api_key
    )
    return llm, embeddings

# Optional: define conciseness if you ever need it
def build_conciseness_metric():
    return AspectCritic(
        name="conciseness",
        definition=(
            "Does the submission convey information or ideas clearly and efficiently, "
            "without unnecessary or redundant details?"
        ),
        strictness=3
    )

def run_evaluation(metric, llm, embeddings, inputs: dict, display_name: str, output_key: str):
    chain = EvaluatorChain(metric=metric, llm=llm, embeddings=embeddings)
    result = chain.invoke(inputs)
    score = result.get(output_key)
    print(f"{display_name} Score:", score if score is not None else "No score returned")
    return score

# ‚îÄ‚îÄ‚îÄ Task‚ÄêSpecific Evaluations ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def evaluate_faithfulness(llm, embeddings):
    inputs = {
        "question": "What are the causes of climate change?",
        "answer":   "Climate change is caused mainly by increased use of electric vehicles.",
        "contexts": [
            "Climate change is primarily driven by greenhouse gas emissions from burning fossil fuels like coal and oil.",
            "Deforestation and industrial pollution are also major contributors to global warming."
        ]
    }
    return run_evaluation(
        metric=faithfulness,
        llm=llm,
        embeddings=embeddings,
        inputs=inputs,
        display_name="Faithfulness",
        output_key="faithfulness"
    )

def evaluate_context_precision(llm, embeddings):
    inputs = {
        "question": "What are the causes of climate change?",
        "ground_truth": "Climate change is caused by greenhouse gas emissions, deforestation, and industrial activity.",
        "contexts": [
            "Greenhouse gases from fossil fuels trap heat in the atmosphere, causing the planet to warm.",
            "Deforestation reduces the Earth‚Äôs capacity to absorb CO2, contributing to global warming.",
            "Eating healthy foods can prevent heart disease and diabetes."
        ]
    }
    return run_evaluation(
        metric=context_precision,
        llm=llm,
        embeddings=embeddings,
        inputs=inputs,
        display_name="Context Precision",
        output_key="context_precision"
    )

def evaluate_answer_relevancy(llm, embeddings):
    inputs = {
        "question": "What is the capital of France?",
        "answer":   "Paris is the capital of France, known for its iconic Eiffel Tower and rich history."
    }
    return run_evaluation(
        metric=answer_relevancy,
        llm=llm,
        embeddings=embeddings,
        inputs=inputs,
        display_name="Answer Relevance",
        output_key="answer_relevancy"
    )

def evaluate_conciseness(llm, embeddings):
    conciseness = build_conciseness_metric()
    inputs = {
        "question": "Describe the main function of a CPU.",
        "answer":   (
            "The central processing unit, often abbreviated as CPU, is essentially "
            "the electronic circuitry within a computer that carries out the instructions "
            "of a computer program by performing the basic arithmetic, logical, control, "
            "and input/output (I/O) operations specified by the instructions. It's often "
            "called the 'brain' of the computer."
        )
    }
    return run_evaluation(
        metric=conciseness,
        llm=llm,
        embeddings=embeddings,
        inputs=inputs,
        display_name="Conciseness",
        output_key="conciseness"
    )

# ‚îÄ‚îÄ‚îÄ Main Entry Point ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

def main():
    api_key = load_api_key()
    llm, embeddings = initialize_models(api_key)

    evaluate_faithfulness(llm, embeddings)
    evaluate_context_precision(llm, embeddings)
    evaluate_answer_relevancy(llm, embeddings)
    evaluate_conciseness(llm, embeddings)

if __name__ == "__main__":
    main()

Faithfulness Score: 0.0
Context Precision Score: 0.0
Answer Relevance Score: 0.8305495956386126
Conciseness Score: 1


# üß© ***Oth-code***

In [None]:
# Import necessary modules
from dotenv import load_dotenv
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.document_loaders import PythonLoader
from langchain.schema import Document

# Step 1: Load environment variables (expects GOOGLE_API_KEY in .env)
load_dotenv()

# Step 2: Define the path to the Python file
PATH = r"E:\01_Github_Repo\GenAI-with-Langchain-and-Huggingface\_Developing_LLMs_Applications_with_LangChain\_data\pyfile.py"

# Step 3: Load the Python file as LangChain Documents
loader = PythonLoader(file_path=PATH)
python_data = loader.load()  # Returns a list of Document objects

# Step 4: Initialize Google embedding model for semantic chunking
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Step 5: Create a SemanticChunker instance
semantic_splitter = SemanticChunker(
    embeddings=embeddings,
    breakpoint_threshold_type="gradient",  # Use gradient-based breakpoints
    breakpoint_threshold_amount=0.8        # Threshold for chunk separation
)

# Step 6: Perform semantic chunking on the loaded documents
chunks = semantic_splitter.split_documents(python_data)

# Step 7: Print out all chunks
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\n{chunk.page_content}\n{'-'*60}")
