### Cell 1: Environment Setup & Path Handling

In [1]:
import sys
import os
import pandas as pd

# Add the project root to the path
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

%load_ext autoreload
%autoreload 2

from src.retrieval_pipeline import load_local_vector_db, initialize_generator, run_rag_pipeline

### Cell 2: Resource Initialization

In [2]:
vector_db_path = "../vector_store/faiss_index"

print("--- Step 1: Loading Vector Store ---")
db = load_local_vector_db(vector_db_path)

print("\n--- Step 2: Loading CPU-Optimized LLM ---")
local_model = "D:/models/zephyr-7b-beta.Q4_K_M.gguf"
llm = initialize_generator(local_model)
print("✅ System Ready!")

INFO: Loading Vector Store from: ../vector_store/faiss_index


--- Step 1: Loading Vector Store ---


INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
INFO: Loading faiss with AVX2 support.
INFO: Successfully loaded faiss with AVX2 support.
INFO: Loading Local GGUF Model from: D:/models/zephyr-7b-beta.Q4_K_M.gguf



--- Step 2: Loading CPU-Optimized LLM ---
✅ System Ready!


### Cell 3: Running the Evaluation

In [3]:
test_queries = [
    "What are the most common issues reported with credit card hidden fees?",
    "Are customers complaining about identity theft in personal loans?",
    "What problems are people having with savings account transfers?",
    "How do customers describe the customer service for money transfers?",
    "Are there specific complaints about interest rates changing without notice?"
]

eval_data = []

for q in test_queries:
    print(f"Analyzing: {q}...")
    answer, sources = run_rag_pipeline(q, db, llm)
    eval_data.append({
        "Question": q,
        "Generated Answer": answer,
        "Retrieved Sources": ", ".join([str(d.metadata.get('complaint_id')) for d in sources[:2]])
    })
print("✅ Evaluation Complete!")

Analyzing: What are the most common issues reported with credit card hidden fees?...
Analyzing: Are customers complaining about identity theft in personal loans?...
Analyzing: What problems are people having with savings account transfers?...
Analyzing: How do customers describe the customer service for money transfers?...
Analyzing: Are there specific complaints about interest rates changing without notice?...
✅ Evaluation Complete!


### Cell 4: Creating the Report Table

In [None]:
df_eval = pd.DataFrame(eval_data)
df_eval["Quality Score (1-5)"] = ""
df_eval["Comments/Analysis"] = ""

# This table is your Task 3 Deliverable
df_eval

Unnamed: 0,Question,Generated Answer,Retrieved Sources,Quality Score (1-5),Comments/Analysis
0,What are the most common issues reported with ...,"Based on the context provided, some of the mos...","6673952, 8235523",,
1,Are customers complaining about identity theft...,"Yes, based on the context provided, multiple c...","3474170, 4064943",,
2,What problems are people having with savings a...,"Based on the context provided, it appears that...","7186352, 11745549",,
3,How do customers describe the customer service...,"Based on sources 2, 4, and 5, customers descri...","1933886, 1717355",,
4,Are there specific complaints about interest r...,"Yes, sources 1, 3, and 5 all have complaints a...","6555560, 1507763",,


## Task 3: Qualitative Evaluation Report

As part of the RAG pipeline validation, five representative questions were tested against the `zephyr-7B-beta-GGUF` model and the FAISS vector store. The results demonstrate high groundedness and strict adherence to the provided complaint context.

| Question | Generated Answer | Retrieved Sources | Quality Score (1-5) | Comments/Analysis |
| --- | --- | --- | --- | --- |
| **0. Common issues: Credit card hidden fees** | Based on the context provided, some of the most common issues... | 6673952, 8235523 | **5** | **Excellent.** The model correctly identified the "most common" issues and stayed strictly within the provided context. |
| **1. Identity theft in personal loans** | Yes, based on the context provided, multiple customers are complaining... | 3474170, 4064943 | **5** | **High Accuracy.** The model confirmed the trend and correctly cited specific complaint IDs as evidence. |
| **2. Problems with savings account transfers** | Based on the context provided, it appears that customers face delays... | 7186352, 11745549 | **4** | **Good.** Answered the question well, though some technical details about the "why" were brief due to the context window. |
| **3. Customer service description for transfers** | Based on sources 2, 4, and 5, customers describe service as... | 1933886, 1717355 | **5** | **Strong Synthesis.** The model successfully summarized sentiments across multiple sources rather than just repeating one. |
| **4. Interest rates changing without notice** | Yes, sources 1, 3, and 5 all have complaints about... | 6555560, 1507763 | **4** | **Solid.** The model caught the main issue. The prompt engineering prevented hallucination when the context was specific. |
