### Testing RAG Applications - (Advanced ⚡️) 📑

#### RAG Application
This application reads data about Model Context Protocol (MCP) server from internet, stores in vector stores, chunks the data with embedding and useful to answer the question about MCP while inferenced.

<img src="./img/RAG.png" width="500" height="400" style="display: block; margin: auto;">

In [1]:
#!pip install -qU langchain-chroma

In [2]:
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.document import Document
from langchain_ollama import ChatOllama

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [3]:
llm = ChatOllama(
    base_url="http://localhost:11434",
    model = "qwen2.5:latest",
    temperature=0.5,
    max_tokens = 250
)

In [4]:
# Load data from Web
loader = WebBaseLoader("https://www.descope.com/learn/post/mcp")
data = loader.load()

# Split text into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# Add text to vector db
embedding = OllamaEmbeddings(model="llama3.2:latest")
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

# Create a retriever
retriever = vectordb.as_retriever()

def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([d.page_content for d in docs])


template = """Answer the question based only on the following context:

    {context}
    
    Give a summary not the full detail

    Question: {question}
    """
prompt = ChatPromptTemplate.from_template(template)


def retrieve_and_format(question):
    docs = retriever.get_relevant_documents(question)
    return format_docs(docs)

chain = {"context": retrieve_and_format, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser()


#### Output of the LLM Application

In [6]:
response = chain.invoke("What is MCP")

print(response)

MCP, or Model Context Protocol, is a protocol designed to enable AI assistants to interact with various external APIs and platforms. It supports actions like retrieving channel history from messaging apps and performing Git operations on GitHub. MCP servers, which include reference, official integrations, and community servers, demonstrate how different systems can integrate with this protocol to enhance their functionality with AI assistant capabilities.


### Testing RAG Application with DeepEval
<img src="./img/RAGTesting.png" width="800" height="400" style="display: block; margin: auto;">

In [14]:
import deepeval

deepeval.login_with_confident_api_key("chf7LtTWtK1foTOAiK+vHFZ622I16kZtcpzfC+7FAVU=")

In [15]:
!deepeval set-ollama deepseek-r1:8b

🙌 Congratulations! You're now using a local Ollama model for all evals that 
require an LLM.


In [17]:
test_data = [
    {
        "input": "What is MCP",
        "expected_output": "The Model Context Protocol (MCP) addresses this challenge by providing a standardized way for LLMs to connect with external data sources and tools—essentially a “universal remote” for AI apps. Released by Anthropic as an open-source protocol, MCP builds on existing function calling by eliminating the need for custom integration between LLMs and other apps."
    },
    {
        "input": "What is Relationship between function calling & Model Context Protocol",
        "expected_output": "The Model Context Protocol (MCP) builds on top of function calling, a well-established feature that allows large language models (LLMs) to invoke predetermined functions based on user requests. MCP simplifies and standardizes the development process by connecting AI applications to context while leveraging function calling to make API interactions more consistent across different applications and model vendors."
    },
    {
        "input": "What are the core components of MCP, just give the heading",
        "expected_output":""" 
                    - MCP Client
                    - MCP Servers
                    - Protocol Handshake
                    - Capability Discovery
                """
    }
]

### Creating Goldens

In [18]:
from deepeval.dataset import Golden, EvaluationDataset

goldens = []

for data in test_data:
    golden = Golden(
        input=data['input'],
        expected_output=data['expected_output']
    )
    
    goldens.append(golden)
    

dataset = EvaluationDataset(goldens=goldens)

In [20]:
dataset.push("test")

In [35]:
dataset

EvaluationDataset(test_cases=[], goldens=[Golden(input='What is MCP', actual_output=None, expected_output='The Model Context Protocol (MCP) addresses this challenge by providing a standardized way for LLMs to connect with external data sources and tools—essentially a “universal remote” for AI apps. Released by Anthropic as an open-source protocol, MCP builds on existing function calling by eliminating the need for custom integration between LLMs and other apps.', context=None, retrieval_context=None, additional_metadata=None, comments=None, tools_called=None, expected_tools=None, source_file=None), Golden(input='What is Relationship between function calling & Model Context Protocol', actual_output=None, expected_output='The Model Context Protocol (MCP) builds on top of function calling, a well-established feature that allows large language models (LLMs) to invoke predetermined functions based on user requests. MCP simplifies and standardizes the development process by connecting AI app

In [36]:
dataset.pull(alias="test")

In [37]:
dataset

EvaluationDataset(test_cases=[LLMTestCase(input='What is MCP', actual_output=None, expected_output='The Model Context Protocol (MCP) addresses this challenge by providing a standardized way for LLMs to connect with external data sources and tools—essentially a “universal remote” for AI apps. Released by Anthropic as an open-source protocol, MCP builds on existing function calling by eliminating the need for custom integration between LLMs and other apps.', context=None, retrieval_context=None, additional_metadata=None, comments=None, tools_called=None, expected_tools=None, reasoning=None, name=None), LLMTestCase(input='What is Relationship between function calling & Model Context Protocol', actual_output=None, expected_output='The Model Context Protocol (MCP) builds on top of function calling, a well-established feature that allows large language models (LLMs) to invoke predetermined functions based on user requests. MCP simplifies and standardizes the development process by connecting

In [None]:
from langchain.chains import RetrievalQA

# It is going to use the LLM and Vector database stored information (RAG)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

In [22]:
response = qa_chain("What is MCP")

  response = qa_chain("What is MCP")


In [None]:
# Is the data which is stored in Vector DB
retrieved_document = retrieve_and_format("What is MCP")
print(retrieved_document)

reactions, retrieve channel history, and more. While straightforward, this underscores the potential for MCP to retrieve information from a wide variety of sources, including popular messaging apps.GitHub: Provides support for a wide variety of actions, including creating forks or branches, listing issues, making pull requests, and even searching for code across GItHub repositories. The GitHub MCP server serves as a benchmark for how AI assistants can interact with external APIs.Official MCP

LangChain adapters, and platforms like Superinterface, which helps developers build in-app AI assistants with MCP functionality.Examples of MCP serversThe MCP ecosystem comprises a diverse range of servers including reference servers (created by the protocol maintainers as implementation examples), official integrations (maintained by companies for their platforms), and community servers (developed by independent contributors).Reference serversReference servers demonstrate core MCP

What Is the Mo

In [25]:
def query_with_context(question):
    retrieved_document = retrieve_and_format(question)
    response = qa_chain.run(question)
    return retrieved_document, response

In [26]:
actual, context = query_with_context("What is MCP")

actual, context

  response = qa_chain.run(question)


("reactions, retrieve channel history, and more. While straightforward, this underscores the potential for MCP to retrieve information from a wide variety of sources, including popular messaging apps.GitHub: Provides support for a wide variety of actions, including creating forks or branches, listing issues, making pull requests, and even searching for code across GItHub repositories. The GitHub MCP server serves as a benchmark for how AI assistants can interact with external APIs.Official MCP\n\nLangChain adapters, and platforms like Superinterface, which helps developers build in-app AI assistants with MCP functionality.Examples of MCP serversThe MCP ecosystem comprises a diverse range of servers including reference servers (created by the protocol maintainers as implementation examples), official integrations (maintained by companies for their platforms), and community servers (developed by independent contributors).Reference serversReference servers demonstrate core MCP\n\nWhat Is 

### Creating LLMTestCase with Goldens

In [38]:
from deepeval.dataset import Golden
from deepeval.test_case import LLMTestCase
from typing import List


def convert_goldens_to_test_cases(goldens: List[Golden]) -> List[LLMTestCase]:
    test_cases = []
    for golden in goldens:
        context, rag_response = query_with_context(golden.input)
        test_case = LLMTestCase(
            input=golden.input,
            actual_output=rag_response,
            expected_output=golden.expected_output,
            retrieval_context=[context],
        )
        test_cases.append(test_case)
    return test_cases

data = convert_goldens_to_test_cases(dataset)
        

In [39]:
data

[LLMTestCase(input='What is MCP', actual_output="MCP stands for Model Context Protocol. It's a protocol designed to enable AI assistants to interact with external APIs and retrieve information from various sources, including popular messaging apps and GitHub repositories. The protocol supports a wide range of actions such as creating forks or branches, listing issues, making pull requests, and searching for code across GitHub repositories. MCP servers can be reference, official integrations by companies, or community-developed, demonstrating its versatility within the AI assistant ecosystem.", expected_output='The Model Context Protocol (MCP) addresses this challenge by providing a standardized way for LLMs to connect with external data sources and tools—essentially a “universal remote” for AI apps. Released by Anthropic as an open-source protocol, MCP builds on existing function calling by eliminating the need for custom integration between LLMs and other apps.', context=None, retriev

In [40]:
import deepeval.metrics


deepeval.evaluate(
    data, 
    metrics= [
        deepeval.metrics.AnswerRelevancyMetric(),
        deepeval.metrics.FaithfulnessMetric(),
        deepeval.metrics.ContextualPrecisionMetric(),
        deepeval.metrics.ContextualRelevancyMetric()
    ]
)

Evaluating 3 test case(s) in parallel: |██████████|100% (3/3) [Time Taken: 02:33, 51.11s/test case] 




Metrics Summary

  - ✅ Answer Relevancy (score: 1.0, threshold: 0.5, strict: False, evaluation model: deepseek-r1:8b (Ollama), reason: The score is 1.00 because the answer directly addresses the question by listing the core components with appropriate headings., error: None)
  - ✅ Faithfulness (score: 1.0, threshold: 0.5, strict: False, evaluation model: deepseek-r1:8b (Ollama), reason: The score is 1.00 because there are no contradictions in the context, meaning the actual output aligns perfectly with the retrieval context., error: None)
  - ✅ Contextual Precision (score: 0.9166666666666666, threshold: 0.5, strict: False, evaluation model: deepseek-r1:8b (Ollama), reason: The score is 0.92 because the retrieval contexts provided include four relevant nodes that directly address the core components of MCP with clear explanations, while one node does not contribute to the topic. Although all 'yes' verdicts offer valuable information, their reasons are slightly less detailed in some ca

I0000 00:00:1742528120.407456 6411327 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers


EvaluationResult(test_results=[TestResult(name='test_case_2', success=False, metrics_data=[MetricData(name='Answer Relevancy', threshold=0.5, success=True, score=1.0, reason='The score is 1.00 because the answer directly addresses the question by listing the core components with appropriate headings.', strict_mode=False, evaluation_model='deepseek-r1:8b (Ollama)', error=None, evaluation_cost=0.0, verbose_logs='Statements:\n[\n    "The MCP ecosystem includes both client and server components."\n] \n \nVerdicts:\n[\n    {\n        "verdict": "yes",\n        "reason": null\n    }\n]'), MetricData(name='Faithfulness', threshold=0.5, success=True, score=1.0, reason='The score is 1.00 because there are no contradictions in the context, meaning the actual output aligns perfectly with the retrieval context.', strict_mode=False, evaluation_model='deepseek-r1:8b (Ollama)', error=None, evaluation_cost=0.0, verbose_logs='Truths (limit=None):\n[\n    "MCP connects AI apps to context while building 