# Ragas Evaluation for Ever Quint RAG System

This notebook evaluates the RAG system using Ragas metrics: **Context Precision**, **Faithfulness**, and **Answer Relevancy**.
It uses the "Ever Quint" content as the ground truth context.

In [15]:
from ragas.run_config import RunConfig
import sys
import os
import pandas as pd

# Ensure backend modules can be imported
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../backend')))

from backend.backend.rag_search import initialize_system, answer_query, create_llm, hybrid_retrieve
from datasets import Dataset 
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall
)
# Import Ragas Langchain Wrappers (Available in newer Ragas versions)
# Note: Ragas automatically wraps Langchain LLMs if passed directly in recent versions.
from langchain_groq import ChatGroq
from langchain_huggingface import HuggingFaceEmbeddings

## 1. Initialize RAG System and Load Documents
We initialize the same system used in the app, which loads the `about_everquint.txt` we just saved.

In [16]:
vector_ret, wiki_ret, prompt, summary_prompt = initialize_system()
rag_llm = create_llm()

[INFO] Initializing system...
[INFO] Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2
[INFO] Vector store loaded with 4800 existing documents. Skipping ingestion.


## 2. Define Test Set (Questions & Ground Truths)
We define a set of questions relevant to the Ever Quint documents to test the system.

In [17]:
test_questions = [
    "What is the People app designed for at Perkins & Will?",
    "What are the key roles needed for the Senior React Developer position?",
    "Where is Ever Quint Technologies located?",
    "How does the Sales Pipeline App help with project management?",
    "What technologies are required for the MERN Stack Developer role?"
]

ground_truths = [
    ["The People app is designed to capture all essential details about an individual within a company, integrating employee data with their projects and pursuits to streamline management and improve efficiency."],
    ["The Senior React Developer role requires developing and implementing UI components, optimizing performance, managing application state, mentoring the team, ensuring code quality, and collaborating with cross-functional teams."],
    ["Ever Quint Technologies is located at Suite #302, Workafella High Street, 431, Anna Salai, Teynampet, Chennai, Tamil Nadu, India – 600018."],
    ["The Sales Pipeline App simplifies lead tracking, streamlines pipeline management, and enhances deal closure by integrating data from Excel, PowerBI, and Deltek. It offers filters for smart searching and real-time revenue tracking."],
    ["The MERN Stack Developer role requires MongoDB, Express.js, React.js, and Node.js, along with database querying (SQL, Redis, Elastic Search) and message queueing with Kafka."]
]

data_samples = {
    'question': [],
    'answer': [],
    'contexts': [],
    'ground_truth': []
}

# Run Inference
print("Running RAG Inference...")
for i, q in enumerate(test_questions):
    # Retrieve
    combined_text, docs = hybrid_retrieve(vector_ret, wiki_ret, q)
    # Ragas expects contexts as a list of strings
    contexts_list = [d.page_content for d in docs]
    
    # Generate Answer
    ans = answer_query(rag_llm, prompt, q, combined_text)
    
    # Store
    data_samples['question'].append(q)
    data_samples['answer'].append(ans)
    data_samples['contexts'].append(contexts_list)
    data_samples['ground_truth'].append(ground_truths[i][0]) # Ragas dataset needs string for ground_truth column usually, or list of strings?
    # Actually 'ground_truth' in older Ragas was list[str], let's stick to list of strings

# data_samples['ground_truth'] = ground_truths

Running RAG Inference...


[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"


## 3. Configure Ragas with Groq and HuggingFace
We use Groq (LLaMA 3) as the judge LLM and local HuggingFace embeddings for validaton metrics.

In [18]:
# Wrap Groq for Ragas
# Ragas uses Langchain Embeddings/LLM interfaces
evaluator_llm = create_llm(model_name="llama-3.3-70b-versatile")

evaluator_embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

# Create Dataset
rag_dataset = Dataset.from_dict(data_samples)


[INFO] Use pytorch device_name: mps
[INFO] Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2


## 4. Run Evaluation
Calculating metrics...

In [None]:
results = evaluate(
    rag_dataset,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
    ],
    llm=evaluator_llm,
    embeddings=evaluator_embeddings,

    run_config=RunConfig(max_workers=1, timeout=120)
)

print("Evaluation Complete!")

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s][INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
Evaluating:   5%|▌         | 1/20 [00:17<05:32, 17.48s/it][INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
Evaluating:  10%|█         | 2/20 [00:23<03:08, 10.45s/it][INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 400 Bad Reques

KeyboardInterrupt: 

[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 400 Bad Request"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 400 Bad Request"
[INFO] HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 400 Bad Request"
[ERROR] Exception raised in Job[6]: TimeoutError()
[ERROR] Exception raised in Job[7]: AssertionError(set LLM before use)
[ERROR] Exception raised in Job[8]: AssertionError(LLM is not set)
[ERROR] Exception raised in Job[9]: AssertionError(LLM is not set)
[ERROR] Exception raised in Job[10]: AssertionError(LLM is not set)
[ERROR] Exception raised in Job[11]: AssertionError(set LLM before use)
[ERROR] Exception raised in Job[12]: AssertionError(LLM is not set)
[ERROR] Exception raised in Job[13]: AssertionError(LLM is not set)
[ERROR] Exception raised in Job[14]: AssertionError(LLM is not set)
[ERROR] Exception raised in Job[15]: AssertionError(set LLM before use)
[ERROR] Exception raised 

In [None]:
df = results.to_pandas()
df.head()

Unnamed: 0,user_input,retrieved_contexts,response,reference,context_precision,faithfulness,answer_relevancy,context_recall
0,What is the People app designed for at Perkins...,[The People app is designed to capture all the...,The People app is designed to capture all the ...,The People app is designed to capture all esse...,1.0,,0.952722,
1,What are the key roles needed for the Senior R...,[Roles and Responsibilities\nDeveloping and Im...,The key roles needed for the Senior React Deve...,The Senior React Developer role requires devel...,1.0,,0.929983,
2,Where is Ever Quint Technologies located?,[Logo\nEver Quint Technologies Private Limited...,Ever Quint Technologies Private Limited is loc...,Ever Quint Technologies is located at Suite #3...,,1.0,0.94697,1.0
3,How does the Sales Pipeline App help with proj...,[Drive your sales team to success with Sales P...,The Sales Pipeline App helps with project mana...,The Sales Pipeline App simplifies lead trackin...,,1.0,0.837696,
4,What technologies are required for the MERN St...,[Identifying new product opportunities and ana...,The MERN Stack Developer role requires the fol...,The MERN Stack Developer role requires MongoDB...,1.0,1.0,0.955356,1.0


In [None]:
# Display average scores
print(results)
df.to_csv("ragas_results.csv", index=False)

{'context_precision': 1.0000, 'faithfulness': 1.0000, 'answer_relevancy': 0.9245, 'context_recall': 1.0000}
