# 🚀 Comprehensive RAG Pipeline Evaluation

This notebook is the final tool for evaluating our advanced RAG pipeline.

**Workflow:**
1.  Loads the high-performance `BAAI/bge-large-en-v1.5` model and re-ranker.
2.  Loops through a formal list of evaluation questions from a CSV file.
3.  For each question, it generates an answer and displays it with its sources.
4.  It then prompts for a **manual quality score (1-5)** and **qualitative comments**.
5.  Finally, it saves all results, including the manual scores, to a CSV file.

In [9]:
import sys
import os

# Go two levels up from the notebook to the project root
project_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))

# Join the path to 'src'
src_path = os.path.join(project_root, "src")

# Add 'src' to Python path
if src_path not in sys.path:
    sys.path.append(src_path)

# Confirm it's added
print("src path added:", src_path)

src path added: c:\Users\sumey\Desktop\10Acadamy\week_6\Intelligent-Complaint-Analysis-for-Financial-Services\src


In [10]:
import yaml
import pandas as pd
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer, CrossEncoder
from transformers import pipeline
from typing import Dict, List, Tuple
from RAG_pipeline_eval import RAGPipeline 

In [11]:
# --- THE FIX IS HERE ---
# 1. Import your prompt template from your module
from prompts import PROMPT_TEMPLATE

# 2. Update the config to use the imported template
config = {
    "embedding": { "model_name": "BAAI/bge-base-en-v1.5" },
    "reranker": { "model_name": "cross-encoder/ms-marco-MiniLM-L-6-v2", "k": 5 },
    "llm": { "model_name": "google/flan-t5-base", "max_new_tokens": 256 },
    "retrieval": { "k": 25 },
    "prompt": {
        "template": PROMPT_TEMPLATE # Use the imported variable here
    },
    "data": {
        "index_path": r"C:\Users\sumey\Desktop\10Acadamy\week_6\Intelligent-Complaint-Analysis-for-Financial-Services\data\vector_store11\index_bge_base_300_20.faiss",
        "meta_path": r"C:\Users\sumey\Desktop\10Acadamy\week_6\Intelligent-Complaint-Analysis-for-Financial-Services\data\vector_store11\meta_bge_base_300_20.csv"
    }
}

In [None]:
# 1. Initialize the pipeline
rag_system = RAGPipeline(config=config)

# 2. Load the evaluation questions from your CSV file
# Using a corrected, more standard path
eval_df = pd.read_csv(r"C:\Users\sumey\Desktop\10Acadamy\week_6\Intelligent-Complaint-Analysis-for-Financial-Services\csv_files\evaluation_dataset.csv")
questions = eval_df["question"].tolist()
print(f"\nLoaded {len(questions)} evaluation questions.")

# 3. Loop through questions, generate answers, and collect manual feedback
evaluation_results = []

for q in questions:
    print("\n" + "="*80)
    print(f"🔍 Evaluating Question: {q}")
    print("="*80)
    
    # UPDATED: Unpack three values (answer, sources, sentiment)
    answer, sources, sentiment = rag_system.query(q)
    
    print(f"\n🧠 Generated Answer:\n{answer}\n")

    # NEW: Display the sentiment summary
    print("--- Sentiment of Sources ---")
    print(f"📊 {sentiment}")
    print("----------------------------\n")

    print("--- Top Sources Used ---")
    for i, source in enumerate(sources, 1):
        print(f"Source {i}: {source['chunk_text'][:200]}...")
    print("-" * 26)

    # Manual input for scoring
    while True:
        try:
            quality_score = int(input("💯 Enter quality score (1–5): "))
            if 1 <= quality_score <= 5:
                break
            else:
                print("❌ Please enter a number between 1 and 5.")
        except ValueError:
            print("❌ Invalid input. Please enter a number.")
    
    comments = input("📝 Enter your comments/analysis: ")

    # UPDATED: Add the sentiment summary to the results
    evaluation_results.append({
        "Question": q,
        "Generated Answer": answer,
        "Sentiment of Sources": sentiment, # New column
        "Retrieved Sources": " | ".join([s['chunk_text'] for s in sources]),
        "Manual Quality Score (1-5)": quality_score,
        "Comments/Analysis": comments
    })

# 4. Save the results to a CSV file
results_df = pd.DataFrame(evaluation_results)
results_df.to_csv("C:/Users/sumey/Desktop/10Acadamy/week_6/Intelligent-Complaint-Analysis-for-Financial-Services/evaluation/manual_evaluation_results.csv", index=False)

print("\n\n✅ Evaluation complete! Results saved to 'evaluation/manual_evaluation_results.csv'")

--- Initializing RAG Pipeline ---


Device set to use cpu


Loading Sentiment Analysis model...


Device set to use cpu


--- RAG Pipeline Initialized Successfully ---

Loaded 30 evaluation questions.

🔍 Evaluating Question: What are the most common complaints about credit cards?
Analyzing sentiment of retrieved sources...

🧠 Generated Answer:
If the context does not contain the answer, state clearly that "Based on the provided documents, there is not enough information to answer this question."

--- Sentiment of Sources ---
📊 Sentiment of Sources: 5 NEGATIVE, 0 POSITIVE (Avg. Confidence: 1.00)
----------------------------

--- Top Sources Used ---
Source 1: the basis of my complaint. incidentally, i am debt free and have a very high credit score. discover canceling id theft protection on false grounds needs to be addressed. again i wonder how many other ...
Source 2: late fees or late payment charges and a lower interest rate. this would have been 58 monthly. i was informed this could not be done so i was left with no option other than to dispute the account closi...
Source 3: i am filing a complaint aga

Token indices sequence length is longer than the specified maximum sequence length for this model (532 > 512). Running this sequence through the model will result in indexing errors



🧠 Generated Answer:
affrim is reporting on my credit report xxxx. their response time is later than usual but never respond. however they are sending payment reminders and i was late xxxx and they called repeatedly demanding payment. they are out of control. i simply want to lower my payment amount to about half until this situation resolves. please help. payment isnt until due xxxx xxxx, then 3 days later i receive a text message saying my payment is past due the 2 seconds later i receive another text message saying please disregard you have a 60 day no pay. then today xx xx 2018 my step daughter receives a call from them saying i am behind on my late fees or late payment charges and a lower interest rate. this would have been 58 monthly. i was informed this could not be done so i was left with no option other than to dispute the account closing until a new hardship payment plan could be reached. that is why i filed this complaint. us bank i am filing a complaint against cash app blo

In [None]:
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall

print("\n--- Starting Automated RAGAs Evaluation ---")

# 1. Load the manual evaluation results you just created
manual_results_df = pd.read_csv("../../csv_file/manual_evaluation_results.csv")

# 2. Format the data for RAGAs (RAGAs needs the sources as a list of strings)
# Note: We are splitting the saved string of sources back into a list
ragas_data = {
    "question": manual_results_df["Question"].tolist(),
    "answer": manual_results_df["Generated Answer"].tolist(),
    "contexts": [s.split(' | ') for s in manual_results_df["Retrieved Sources"]],
}
ragas_dataset = Dataset.from_dict(ragas_data)

# 3. Run the RAGAs evaluation
result = evaluate(
    ragas_dataset,
    metrics=[faithfulness, answer_relevancy, context_precision, context_recall],
)

# 4. Combine your manual scores with the automated RAGAs scores
ragas_df = result.to_pandas()
comprehensive_df = pd.concat([manual_results_df, ragas_df.drop(columns=['question', 'answer', 'contexts'])], axis=1)

# 5. Save the final comprehensive report
comprehensive_df.to_csv("evaluation/comprehensive_evaluation_results.csv", index=False)

print("\n✅ Comprehensive evaluation complete!")
print("Final results with both manual and RAGAs scores saved to 'evaluation/comprehensive_evaluation_results.csv'")

display(comprehensive_df)