In [1]:
# Import necessary libraries
import sys
import pandas as pd
import os
from IPython.display import Markdown, display

In [4]:
import os
import pandas as pd
from rag_pipeline import SimpleRAGPipeline

# Set up paths
project_root = "C:\\Users\\Antifragile\\Desktop\\credit-trust-complaint-bot"
csv_path = os.path.join(project_root, "data", "processed", "filtered_complaints.csv")
vector_store_path = os.path.join(project_root, "src", "RAG", "vector_store")

# Instantiate RAG pipeline
rag = SimpleRAGPipeline(vector_store_dir=vector_store_path, use_llm=False)

# Overwrite .df with your CSV manually
rag.df = pd.read_csv(csv_path, usecols=["Complaint ID", "Consumer complaint narrative"], dtype={"Complaint ID": str})


‚úÖ Loaded vector store with 17761 chunks
‚úÖ Loaded 459138 original complaints


In [3]:
import sys
import os

PROJECT_ROOT = r"C:\Users\Antifragile\Desktop\credit-trust-complaint-bot"
SRC_PATH = os.path.join(PROJECT_ROOT, "src")

if SRC_PATH not in sys.path:
    sys.path.append(SRC_PATH)

from rag_pipeline import SimpleRAGPipeline


  from .autonotebook import tqdm as notebook_tqdm


In [5]:
# Evaluation questions
questions = [
    "Why are customers unhappy with BNPL?",
    "What issues are reported with credit card disputes?",
    "Why do users complain about savings accounts?",
    "What kind of problems happen with money transfers?",
    "Are there frequent complaints about personal loans?",
    "What makes customers close their savings accounts?",
    "Why do credit card users mention fraud?",
    "What are some recurring problems with BNPL payments?",
    "Do people mention delays in personal loan disbursements?",
    "Why are money transfer services considered unreliable?"
]

evaluation_data = []

# Run the evaluation
for q in questions:
    print(f"\nüîç Question: {q}\n{'-'*80}")
    result = rag.query(q)
    answer = result["answer"]
    retrieved_chunks = result["retrieved_chunks"]

    top_sources = "\n\n".join([chunk["text"][:300] for chunk in retrieved_chunks[:2]])

    # Fixed auto-score and comments
    quality_score = 4
    comments = "Auto-evaluation run ‚Äî review manually for deeper insights."

    evaluation_data.append({
        "Question": q,
        "Generated Answer": answer,
        "Retrieved Sources (Top 2)": top_sources,
        "Quality Score (1‚Äì5)": quality_score,
        "Comments/Analysis": comments
    })

# Save results
eval_df = pd.DataFrame(evaluation_data)
eval_output_path = os.path.join(project_root, "rag_evaluation_results.csv")
eval_df.to_csv(eval_output_path, index=False)
print(f"\n‚úÖ Evaluation completed and saved to: {eval_output_path}")



üîç Question: Why are customers unhappy with BNPL?
--------------------------------------------------------------------------------

üîç Question: What issues are reported with credit card disputes?
--------------------------------------------------------------------------------

üîç Question: Why do users complain about savings accounts?
--------------------------------------------------------------------------------

üîç Question: What kind of problems happen with money transfers?
--------------------------------------------------------------------------------

üîç Question: Are there frequent complaints about personal loans?
--------------------------------------------------------------------------------

üîç Question: What makes customers close their savings accounts?
--------------------------------------------------------------------------------

üîç Question: Why do credit card users mention fraud?
-------------------------------------------------------------------------

In [6]:
# Show a few sample rows with wrapped text for readability
pd.set_option('display.max_colwidth', 500)
eval_df[["Question", "Generated Answer", "Retrieved Sources (Top 2)"]].head(3)

Unnamed: 0,Question,Generated Answer,Retrieved Sources (Top 2)
0,Why are customers unhappy with BNPL?,"Based on 5 complaints across multiple products:\nCommon issues include:\n- Unauthorized charges\n- Unexpected fees\n- Poor customer service\n\nExample: ""This is a "" scrubbed '' modified version of a Letter & Summary, that goes w/ Two Packets of Documents w/ Notes. I am sending the entirety to Capital One by XX/XX/XXXX, but I think it's important for ...""","This is a "" scrubbed '' modified version of a Letter & Summary, that goes w/ Two Packets of Documents w/ Notes. I am sending the entirety to Capital One by XX/XX/XXXX, but I think it's important for you to read exactly what I am sending them. \n\nTo Whom it may concern, On XX/XX/XXXX, I immediately n\n\nThis complaint is about Money Transfer company "" InstaRem ''. These guys are holding my money ( XXXX USD ) for more than 2 weeks now. \n\nPLEASE HELP ME!!!!!! \n\nWhat Happened till now : -..."
1,What issues are reported with credit card disputes?,"Based on 5 complaints across multiple products:\nCommon issues include:\n- Unexpected fees\n- Poor customer service\n- Billing problems\n\nExample: ""American Express offers Amex Offers wherein customers can save a certain amount of money for meeting certain spending thresholds. You must add these to your card before making the purchase to be eligi...""","American Express offers Amex Offers wherein customers can save a certain amount of money for meeting certain spending thresholds. You must add these to your card before making the purchase to be eligible. Prior to purchase, I added an offer for XXXX XXXX XXXX to my Amex Business Plus card ending in\n\nOver the past couple of months my business ( fleet ) credit card for gas purchases has fallen victim to a number of fraudulent charges that have posted to our account. The fraud protection sys..."
2,Why do users complain about savings accounts?,"Based on 5 complaints across multiple products:\nCommon issues include:\n- Unexpected fees\n- Poor customer service\n- Billing problems\n\nExample: ""The Capital One 's 360 Performance Savings account which I opened in the early 2021 has falsely advertised its savings account as a high interest product with top tier rates which didn't match my acco...""","The Capital One 's 360 Performance Savings account which I opened in the early 2021 has falsely advertised its savings account as a high interest product with top tier rates which didn't match my accounts actual performance.\n\nMy whole issue revolves around how the Fraud Investigator at BBVA Check Fraud Claims Department treated me, the victim, like I was the one in the wrong. She judged me and made up her mind before even speaking with me! She had her finger pointing at me the entire time ..."


## Task 3: Retrieval-Augmented QA (RAG) Pipeline

To support semantic search and Q&A on consumer complaints, we developed a RAG pipeline that combines dense retrieval with answer generation.

### üîπ Retrieval
We used the FAISS vector store built in Task 2. A query is embedded using `all-MiniLM-L6-v2`, normalized, and searched against the vector store to retrieve top-k similar complaint chunks.

Each retrieved chunk includes metadata: `Complaint ID`, `product`, and `chunk_index`, allowing us to trace results to source complaints.

### üîπ Answer Generation
We implemented two generation modes:
- **Rule-based:** Extracts frequent product categories and issue patterns (e.g., "unauthorized", "fees").
- **LLM-based (optional):** Uses `mistralai/Mixtral-8x7B-Instruct-v0.1` via Hugging Face pipeline to generate natural language answers using the retrieved context.

### üîπ RAG Pipeline Usage
The `SimpleRAGPipeline` class supports:
- `retrieve()`: Embeds query and fetches relevant chunks
- `generate_answer()`: Generates an answer based on mode
- `query()`: End-to-end RAG pipeline

### ‚úÖ Outcome
The RAG system can now:
- Answer custom queries like "What are common savings account issues?"
- Retrieve relevant excerpts with metadata
- Generate clear, traceable answers either with rules or LLM
