# Building the RAG Core Logic and Qualitative Evaluation

This notebook focuses on implementing the core Retrieval-Augmented Generation (RAG) pipeline and performing an initial qualitative evaluation of its effectiveness. We will integrate the retriever from Task 2's vector stores with a large language model (LLM) to generate answers based on retrieved context.

**Objectives:**
1.  Mount Google Drive to access vector stores and save evaluation results.
2.  Install necessary libraries for LLM integration.
3.  Implement the Retriever component: embed user questions and perform similarity search against FAISS and ChromaDB.
4.  Design a robust prompt template for the LLM.
5.  Implement the Generator component: combine prompt, question, and retrieved context to get LLM responses.
6.  Create a `RAGPipeline` class to orchestrate the retrieval and generation.
7.  Conduct a qualitative evaluation using a set of representative questions, analyzing generated answers and retrieved sources.
8.  Prepare the evaluation table content for the final report.

## 1. Setup and Google Drive Mounting

Mount your Google Drive to access the vector stores created in Task 2. You will be prompted to authenticate your Google account.

**Important:** Ensure `PROJECT_ROOT` matches the actual location of your project folder within your Google Drive.

In [1]:
from google.colab import drive
import os
import sys

# Mount Google Drive
drive.mount('/content/drive')

# Define your project root within Google Drive
PROJECT_ROOT = '/content/drive/My Drive/Colab_Project/'

# Change current working directory to your project root
os.makedirs(PROJECT_ROOT, exist_ok=True)
os.chdir(PROJECT_ROOT)

print(f"Current working directory set to: {os.getcwd()}")

# Add the src directory to Python's path to import custom modules
if './src' not in sys.path:
    sys.path.insert(0, './src')

print(f"Python sys.path updated: {sys.path}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Current working directory set to: /content/drive/My Drive/Colab_Project
Python sys.path updated: ['./src', '/content', '/env/python', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages', '/usr/local/lib/python3.11/dist-packages/IPython/extensions', '/usr/local/lib/python3.11/dist-packages/setuptools/_vendor', '/root/.ipython']


## 2. Install Required Libraries

Install `transformers` for the LLM, `torch` (its backend), and `sentence-transformers`, `faiss-cpu`, `chromadb` if not already installed from Task 2. `tqdm` and `pandas` should also be present.

In [2]:
# Install necessary libraries
!pip install transformers torch sentence-transformers faiss-cpu chromadb tqdm pandas numpy --quiet


In [3]:
# Import necessary libraries for the notebook
import pandas as pd
import numpy as np
from typing import List, Dict, Union
from IPython.display import display, Markdown

# Add the src directory to Python's path to import RAP pipline
import sys
if '../src' not in sys.path:
    sys.path.append('../src')

# Import the RAG pipeline components
from src.rag_pipeline import Retriever, Generator, RAGPipeline

print("Libraries installed and modules imported.")

Libraries installed and modules imported.


## 3. Configuration and Path Setup

Define paths to your vector stores and choose your LLM. Make sure these paths match where your vector stores were saved in Task 2.

In [4]:
# Define a variable for the base data directory in Google Drive
BASE_DATA_DIR = '/content/drive/MyDrive/10accademy/Week-6/Data'

# Paths to your saved vector stores (relative to PROJECT_ROOT)
FAISS_INDEX_PATH = os.path.join(BASE_DATA_DIR, 'vector_store', 'faiss_index', 'faiss_index.bin')
FAISS_METADATA_PATH = os.path.join(BASE_DATA_DIR, 'vector_store', 'faiss_index', 'faiss_metadata.csv')
CHROMADB_PATH = os.path.join(BASE_DATA_DIR, 'vector_store', 'chroma_db')
CHROMADB_COLLECTION_NAME = 'complaint_chunks'

# Embedding Model (same as Task 2)
EMBEDDING_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"

# Large Language Model for Generation
# Choose a model suitable for your Colab resources. Flan-T5-small is a good starting point.
# For stronger GPUs, consider 'google/flan-t5-base', 'google/flan-t5-large', or quantized Llama/Mistral models.
LLM_MODEL_NAME = "google/flan-t5-small"

# Number of top-k chunks to retrieve
TOP_K_RETRIEVAL = 5

print("Configuration parameters set.")

Configuration parameters set.


## 4. Initialize RAG Components

Initialize the `Retriever` and `Generator` components, then combine them into the `RAGPipeline`.

In [5]:
try:
    retriever = Retriever(
        embedding_model_name=EMBEDDING_MODEL_NAME,
        faiss_index_path=FAISS_INDEX_PATH,
        faiss_metadata_path=FAISS_METADATA_PATH,
        chromadb_path=CHROMADB_PATH,
        chromadb_collection_name=CHROMADB_COLLECTION_NAME
    )
    generator = Generator(model_name=LLM_MODEL_NAME)
    rag_pipeline = RAGPipeline(retriever, generator)
    print("RAG pipeline components initialized successfully.")
except Exception as e:
    print(f"Error initializing RAG components: {e}")
    print("Please ensure Task 2 was completed and vector stores are correctly saved and accessible.")
    retriever = None
    generator = None
    rag_pipeline = None

Initializing Retriever...
Loading embedding model: sentence-transformers/all-MiniLM-L6-v2...
Embedding model loaded successfully.
Loading FAISS index from /content/drive/MyDrive/10accademy/Week-6/Data/vector_store/faiss_index/faiss_index.bin...
FAISS index loaded with 190335 vectors.
Loading FAISS metadata from /content/drive/MyDrive/10accademy/Week-6/Data/vector_store/faiss_index/faiss_metadata.csv...
FAISS metadata loaded. Shape: (190335, 3)
ChromaDB collection 'complaint_chunks' loaded successfully. Count: 190335
Retriever initialized.
Initializing Generator with model: google/flan-t5-small...


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


LLM pipeline loaded successfully.
RAGPipeline initialized.
RAG pipeline components initialized successfully.


## 5. Qualitative Evaluation

We will now define a set of representative questions and run the RAG pipeline for each. The results will be compiled into an evaluation table.

### 5.1. Define Test Questions

In [6]:
test_questions = [
    "What are the most common issues people face with credit card billing?",
    "Can you tell me about complaints related to personal loan interest rates?",
    "What problems do customers report with Buy Now, Pay Later services?",
    "Are there common complaints about accessing funds in savings accounts?",
    "Describe typical issues with unauthorized money transfers.",
    "What kind of disputes arise from incorrect information on credit reports?",
    "How do customers complain about hidden fees in personal loans?",
    "What are the security concerns mentioned for money transfer services?",
    "Are there complaints about difficulty closing a credit card account?",
    "Summarize issues regarding delays in receiving funds from savings accounts."
]

print(f"Defined {len(test_questions)} test questions.")

Defined 10 test questions.


### 5.2. Run RAG Pipeline for Each Question and Collect Results

We will run the RAG pipeline using both FAISS and ChromaDB for comparison. For the final report, you'll choose the best performing one or discuss both.

In [7]:
evaluation_results = []

if rag_pipeline is not None:
    for i, question in enumerate(test_questions):
        print(f"\n--- Processing Question {i+1}/{len(test_questions)}: {question} ---")

        # Run with FAISS
        print("Running with FAISS...")
        result_faiss = rag_pipeline.run(question, k=TOP_K_RETRIEVAL, vector_store_type="faiss")
        retrieved_sources_faiss = result_faiss['sources']
        generated_answer_faiss = result_faiss['answer']

        # Run with ChromaDB
        print("Running with ChromaDB...")
        result_chroma = rag_pipeline.run(question, k=TOP_K_RETRIEVAL, vector_store_type="chromadb")
        retrieved_sources_chroma = result_chroma['sources']
        generated_answer_chroma = result_chroma['answer']

        # Store results for evaluation table
        evaluation_results.append({
            'Question': question,
            'Generated Answer (FAISS)': generated_answer_faiss,
            'Retrieved Sources (FAISS)': retrieved_sources_faiss,
            'Generated Answer (ChromaDB)': generated_answer_chroma,
            'Retrieved Sources (ChromaDB)': retrieved_sources_chroma
        })
else:
    print("RAG pipeline not initialized. Skipping evaluation.")


--- Processing Question 1/10: What are the most common issues people face with credit card billing? ---
Running with FAISS...

--- Running RAG Pipeline for query: 'What are the most common issues people face with credit card billing?' ---
Retrieving top 5 chunks from FAISS...
FAISS retrieval complete. Found 5 chunks.
Generating response with LLM...
LLM response generated.
--- RAG Pipeline Complete ---
Running with ChromaDB...

--- Running RAG Pipeline for query: 'What are the most common issues people face with credit card billing?' ---
Retrieving top 5 chunks from ChromaDB...
ChromaDB retrieval complete. Found 5 chunks.
Generating response with LLM...
LLM response generated.
--- RAG Pipeline Complete ---

--- Processing Question 2/10: Can you tell me about complaints related to personal loan interest rates? ---
Running with FAISS...

--- Running RAG Pipeline for query: 'Can you tell me about complaints related to personal loan interest rates?' ---
Retrieving top 5 chunks from FAISS..

### 5.3. Create Evaluation Table (for Report)

This section generates a Markdown table summarizing the qualitative evaluation. You will manually fill in the 'Quality Score' and 'Comments/Analysis' columns after reviewing the results. This table can be directly copied into your final report.

In [10]:
if evaluation_results:
    eval_df = pd.DataFrame(evaluation_results)
    eval_df['Quality Score (1-5)'] = '' # Placeholder for manual scoring
    eval_df['Comments/Analysis'] = '' # Placeholder for manual comments

    # Format retrieved sources for display
    def format_sources(sources, num_to_show=2):
        if not sources: return "N/A"
        formatted = []
        for i, source in enumerate(sources[:num_to_show]):
            text_snippet = source['text'][:150] + '...' if len(source['text']) > 150 else source['text']
            formatted.append(f"- **Product:** {source['product']}, **ID:** {source['original_id']}\n  **Snippet:** {text_snippet}")
        return "\n".join(formatted)

    eval_df['Retrieved Sources (FAISS) Formatted'] = eval_df['Retrieved Sources (FAISS)'].apply(format_sources)
    eval_df['Retrieved Sources (ChromaDB) Formatted'] = eval_df['Retrieved Sources (ChromaDB)'].apply(format_sources)

    # Display the table in Markdown format
    markdown_table = """
### Qualitative Evaluation Results

This table summarizes the RAG pipeline's performance on a set of representative questions. The 'Quality Score' (1-5, 5 being excellent) and 'Comments/Analysis' should be filled in manually after reviewing the generated answers and their corresponding sources.

| Question | Generated Answer (FAISS) | Retrieved Sources (FAISS) | Quality Score (1-5) | Comments/Analysis (FAISS) | Generated Answer (ChromaDB) | Retrieved Sources (ChromaDB) | Quality Score (1-5) | Comments/Analysis (ChromaDB) |
|---|---|---|---|---|---|---|---|---|
"""

for index, row in eval_df.iterrows():
    # Perform the replace operation before inserting into the f-string
    question = row['Question'].replace('\n', '<br>')
    answer_faiss = row['Generated Answer (FAISS)'].replace('\n', '<br>')
    sources_faiss = row['Retrieved Sources (FAISS) Formatted'].replace('\n', '<br>')
    comments = row['Comments/Analysis'].replace('\n', '<br>')
    answer_chroma = row['Generated Answer (ChromaDB)'].replace('\n', '<br>')
    sources_chroma = row['Retrieved Sources (ChromaDB) Formatted'].replace('\n', '<br>')
    quality_score = row['Quality Score (1-5)'] # No replace needed here

    markdown_table += (
        f"| {question} "
        f"| {answer_faiss} "
        f"| {sources_faiss} "
        f"| {quality_score} "
        f"| {comments} "
        f"| {answer_chroma} "
        f"| {sources_chroma} "
        f"| {quality_score} " # Assuming the same quality score and comments for both for now, user will fill manually
        f"| {comments} |\n"
    )

display(Markdown(markdown_table))
print("Qualitative evaluation table generated. Please review and fill in scores/comments manually for your report.")


### Qualitative Evaluation Results

This table summarizes the RAG pipeline's performance on a set of representative questions. The 'Quality Score' (1-5, 5 being excellent) and 'Comments/Analysis' should be filled in manually after reviewing the generated answers and their corresponding sources.

| Question | Generated Answer (FAISS) | Retrieved Sources (FAISS) | Quality Score (1-5) | Comments/Analysis (FAISS) | Generated Answer (ChromaDB) | Retrieved Sources (ChromaDB) | Quality Score (1-5) | Comments/Analysis (ChromaDB) |
|---|---|---|---|---|---|---|---|---|
| What are the most common issues people face with credit card billing? | not providing an adequate credit line | - **Product:** Credit card, **ID:** 8463187<br>  **Snippet:** among the worst credit card companies with the worst support to their customers in resolving cases for service that did not receive issues<br>- **Product:** Credit card, **ID:** 9277339<br>  **Snippet:** implications on my financial affairs i would like to bring to your attention that according to the rules and regulations governing credit card issuers... |  |  | The level of difficulty resolving problems with this credit card is incomparable to any credit card company experience i had and i always pay full every month without miss | - **Product:** Credit card, **ID:** 8463187<br>  **Snippet:** among the worst credit card companies with the worst support to their customers in resolving cases for service that did not receive issues<br>- **Product:** Credit card, **ID:** 9277339<br>  **Snippet:** implications on my financial affairs i would like to bring to your attention that according to the rules and regulations governing credit card issuers... |  |  |
| Can you tell me about complaints related to personal loan interest rates? | What are some complaints related to interest rates? | - **Product:** Credit card, **ID:** 8015887<br>  **Snippet:** interest rate for which i am grateful i am lodging this complaint because i had a vastly different experience with xxxx xxxx xxxx xxxx xxxx xxxx xxxx ...<br>- **Product:** Credit card, **ID:** 8010075<br>  **Snippet:** my security interest this raises serious concerns about the treatment of my personal and financial information i am filing this complaint with the i k... |  |  | What is the name of the complaint that the bank misled customers about interest rates misrepresented time to repay the loan and when interest would start took advantage of an elderly person with poor eye sight and poor hearing charged 269 interest? | - **Product:** Credit card, **ID:** 8015887<br>  **Snippet:** interest rate for which i am grateful i am lodging this complaint because i had a vastly different experience with xxxx xxxx xxxx xxxx xxxx xxxx xxxx ...<br>- **Product:** Credit card, **ID:** 8010075<br>  **Snippet:** my security interest this raises serious concerns about the treatment of my personal and financial information i am filing this complaint with the i k... |  |  |
| What problems do customers report with Buy Now, Pay Later services? | They are being sued for overcharging merchants on consumer purchases | - **Product:** Credit card, **ID:** 8915923<br>  **Snippet:** at their level of money grabbing and placing customers in difficult situations asking them to wait then blaming them for waiting essentially i would l...<br>- **Product:** Credit card, **ID:** 2032020<br>  **Snippet:** about this some of the items are really cool they also had issues with pricing the items alot of my payments i made on due date and they would say it ... |  |  | Customers report late payment remarks on their accounts. | - **Product:** Credit card, **ID:** 9186851<br>  **Snippet:** period i have not been late in 6 years never had an issue until now payments go to a xxxx xxxx xxxx they offer no means to assist just make another pa...<br>- **Product:** Credit card, **ID:** 8887826<br>  **Snippet:** ive always made sure that payments on this account are made promptly never allowing them to become overdue however im uncertain about the reasons behi... |  |  |
| Are there common complaints about accessing funds in savings accounts? | no | - **Product:** Credit card, **ID:** 11974605<br>  **Snippet:** checking account funds leaving me unable to access my own money for context i maintain an excellent financial history with other institutions 1200000 ...<br>- **Product:** Credit card, **ID:** 1575654<br>  **Snippet:** and their behavior suggests that they are holding the funds purposely for their own gain they are collecting interest on my funds |  |  | no | - **Product:** Credit card, **ID:** 11974605<br>  **Snippet:** checking account funds leaving me unable to access my own money for context i maintain an excellent financial history with other institutions 1200000 ...<br>- **Product:** Credit card, **ID:** 1575654<br>  **Snippet:** and their behavior suggests that they are holding the funds purposely for their own gain they are collecting interest on my funds |  |  |
| Describe typical issues with unauthorized money transfers. | No illegal issues or falsified transfers happening. | - **Product:** Credit card, **ID:** 10509550<br>  **Snippet:** there was no illegal issues or falsified transfers happening mind you that the deposit request that they were most concerned about was a 300 deposit r...<br>- **Product:** Credit card, **ID:** 12979328<br>  **Snippet:** unacceptable and has caused me significant frustration and financial inconvenience back in 2022 i maintained a xxxx account and had linked my card to ... |  |  | No illegal issues or falsified transfers happening. | - **Product:** Credit card, **ID:** 10509550<br>  **Snippet:** there was no illegal issues or falsified transfers happening mind you that the deposit request that they were most concerned about was a 300 deposit r...<br>- **Product:** Credit card, **ID:** 12979328<br>  **Snippet:** unacceptable and has caused me significant frustration and financial inconvenience back in 2022 i maintained a xxxx account and had linked my card to ... |  |  |
| What kind of disputes arise from incorrect information on credit reports? | denial of any kind credit and loans report incorrect on my credit report | - **Product:** Credit card, **ID:** 1840495<br>  **Snippet:** they are intentionally reporting inconsistent and inaccurate information to the credit reporting agencies<br>- **Product:** Credit card, **ID:** 10065329<br>  **Snippet:** i have seen inaccurate information on my credit report in |  |  | denial of any kind credit and loans report incorrect on my credit report are operating under the assumption that credit reporting agencies have accurate information and was obtained legally and through good faith which is not true | - **Product:** Credit card, **ID:** 1840495<br>  **Snippet:** they are intentionally reporting inconsistent and inaccurate information to the credit reporting agencies<br>- **Product:** Credit card, **ID:** 10065329<br>  **Snippet:** i have seen inaccurate information on my credit report in |  |  |
| How do customers complain about hidden fees in personal loans? | they report a customer to credit bureau | - **Product:** Credit card, **ID:** 13343696<br>  **Snippet:** fee and interest on the fee amount every month this is a clear case of predatory lending practices designed to trap consumers making purchases in reta...<br>- **Product:** Credit card, **ID:** 9328627<br>  **Snippet:** statements but even more ridiculous to penalize them when they dont receive them my credit score is xxxx and i have literally never missed a payment b... |  |  | they report a customer to credit bureau | - **Product:** Credit card, **ID:** 13343696<br>  **Snippet:** fee and interest on the fee amount every month this is a clear case of predatory lending practices designed to trap consumers making purchases in reta...<br>- **Product:** Credit card, **ID:** 9328627<br>  **Snippet:** statements but even more ridiculous to penalize them when they dont receive them my credit score is xxxx and i have literally never missed a payment b... |  |  |
| What are the security concerns mentioned for money transfer services? | not be able to trust in the security and privacy of that data if capital one can not be trusted to safeguard bank account information of people calling in to make one time payments then they should not be in business | - **Product:** Credit card, **ID:** 8312233<br>  **Snippet:** safeguards and transparency and in general this is very concerning because they have thousands and thousands of customers who store funds with them an...<br>- **Product:** Credit card, **ID:** 12906710<br>  **Snippet:** requires financial institutions to provide consumers with reasonable access to electronic funds transfers and account information i |  |  | if capital one can not be trusted to safeguard bank account information of people calling in to make one time payments then they should not be in business | - **Product:** Credit card, **ID:** 8312233<br>  **Snippet:** safeguards and transparency and in general this is very concerning because they have thousands and thousands of customers who store funds with them an...<br>- **Product:** Credit card, **ID:** 12906710<br>  **Snippet:** requires financial institutions to provide consumers with reasonable access to electronic funds transfers and account information i |  |  |
| Are there complaints about difficulty closing a credit card account? | no | - **Product:** Credit card, **ID:** 8071064<br>  **Snippet:** of our cards previously i had several issues when trying to use the card where the card didnt work or took several times to work and i was never told ...<br>- **Product:** Credit card, **ID:** 1851673<br>  **Snippet:** my credit card accounts was close for no reason i have been a customer for over 20 years and this is absolutely ridiculous |  |  | no | - **Product:** Credit card, **ID:** 8071064<br>  **Snippet:** of our cards previously i had several issues when trying to use the card where the card didnt work or took several times to work and i was never told ...<br>- **Product:** Credit card, **ID:** 1851673<br>  **Snippet:** my credit card accounts was close for no reason i have been a customer for over 20 years and this is absolutely ridiculous |  |  |
| Summarize issues regarding delays in receiving funds from savings accounts. | What are some of the issues that are causing problems in receiving funds from savings accounts? | - **Product:** Credit card, **ID:** 7977516<br>  **Snippet:** issue is highly appreciated and i am confident that with the evidence provided you will be able to facilitate the return of funds to my account withou...<br>- **Product:** Credit card, **ID:** 8391730<br>  **Snippet:** ensuring punctual payments i have never experienced delays on this account im unsure why there are indications of late payment remarks on my accounts |  |  | What are some of the issues that have arisen in the process of receiving funds from savings accounts? | - **Product:** Credit card, **ID:** 7977516<br>  **Snippet:** issue is highly appreciated and i am confident that with the evidence provided you will be able to facilitate the return of funds to my account withou...<br>- **Product:** Credit card, **ID:** 8294839<br>  **Snippet:** ensuring punctual payments i have never experienced delays on this account im unsure why there are indications of late payment remarks on my accounts |  |  |


Qualitative evaluation table generated. Please review and fill in scores/comments manually for your report.


## 6. Report Section Content: RAG Core Logic and Evaluation Analysis

### RAG Core Logic Implementation

The Retrieval-Augmented Generation (RAG) pipeline forms the intelligent core of the complaint analysis chatbot. It combines the power of semantic search with a large language model to provide accurate and contextually relevant answers to user queries.

**Retriever Component:**
The retriever is responsible for fetching the most relevant complaint narratives based on a user's question. It utilizes the same `sentence-transformers/all-MiniLM-L6-v2` embedding model used in the previous task to convert the user's natural language query into a vector representation. This query vector is then used to perform a similarity search against the pre-indexed vector stores (both FAISS and ChromaDB). The retriever is configured to fetch the top-K (defaulting to 5) most similar text chunks. Crucially, it retrieves not only the text content but also associated metadata such as the original complaint ID and product category, which is vital for providing verifiable sources.

**Prompt Engineering:**
A robust prompt template was designed to guide the Large Language Model (LLM) effectively. The template clearly defines the LLM's role as a 'financial analyst assistant for CrediTrust' and instructs it to answer questions *only* based on the provided retrieved complaint excerpts. A critical instruction is included to prevent hallucination: 'If the context doesn't contain the answer, state that you don't have enough information.' This ensures the chatbot remains truthful to its knowledge base.

**Generator Component:**
The generator component takes the user's question, the concatenated text from the retrieved chunks (context), and the engineered prompt as input. This combined input is then fed into a pre-trained Large Language Model. For this implementation, `google/flan-t5-small` was chosen due to its ability to follow instructions well and its relatively small size, making it suitable for execution within Google Colab's resource constraints. The LLM processes this input and generates a concise, context-aware answer to the user's question.

### Qualitative Evaluation and Analysis

A qualitative evaluation was performed using a set of 10 representative questions covering various product categories and types of complaints. For each question, the complete RAG pipeline was executed, and the generated answers, along with the top retrieved sources, were recorded. This manual review is critical for understanding the system's real-world performance beyond quantitative metrics.

**Observations (To be filled in after running the notebook):**

* **Retrieval Accuracy:** (Discuss if the retrieved sources were generally relevant to the question. Did the top 1-2 chunks contain the necessary information? Were there instances of irrelevant retrieval? Compare FAISS vs. ChromaDB if noticeable differences.)
    *Example: "For most questions, the retriever successfully identified highly relevant complaint snippets, particularly when the query closely matched keywords or semantic themes within the complaints. FAISS and ChromaDB performed similarly in terms of relevance, indicating both vector stores are well-indexed. However, for more ambiguous queries, sometimes less relevant chunks were retrieved, suggesting potential areas for improvement in chunking or embedding quality."*

* **Generation Quality:** (Assess the LLM's ability to synthesize information from the retrieved context. Was the answer coherent, concise, and directly address the question? Did it adhere to the prompt's instructions, especially regarding not hallucinating?)
    *Example: "The `google/flan-t5-small` model generally produced coherent and grammatically correct answers. It was effective at summarizing information from the provided context. Crucially, it largely adhered to the instruction of stating 'I don't have enough information' when the context was insufficient, which is a positive sign for preventing hallucinations. However, for very complex questions or when multiple nuances were present across retrieved chunks, the summarization could sometimes be too brief or miss subtle details."*

* **Impact of `k` (Top-K Retrieval):** (Discuss if `k=5` seemed appropriate. Would more or fewer chunks have been better?)
    *Example: "The initial `k=5` for retrieval seemed to strike a reasonable balance. In most cases, the core information was present within these top 5 chunks. Increasing `k` might add more context but also risks introducing noise, while decreasing it might miss crucial details. Further fine-tuning of `k` could be explored."*

* **Comparison of FAISS vs. ChromaDB (if any noticeable differences in qualitative results):** (Did one perform better than the other in terms of retrieved relevance or ease of use in this RAG setup?)
    *Example: "Qualitatively, both FAISS and ChromaDB yielded very similar retrieval results in terms of the relevance of the top-k chunks. This suggests that the embedding quality is the primary driver of retrieval accuracy at this stage. Both were straightforward to integrate into the RAG pipeline."*

* **Limitations and Areas for Improvement:** (Identify any recurring issues like answers being too generic, missing specific details, or difficulty with certain types of questions. How could these be addressed?)
    *Example: "A limitation observed was the model's tendency to provide somewhat generic answers when the retrieved context itself was broad. Future improvements could include: 1) Experimenting with larger, more capable LLMs (if resources permit) for more nuanced generation. 2) Implementing a re-ranking step after initial retrieval to ensure the absolute most relevant chunks are passed to the LLM. 3) Exploring more advanced prompt engineering techniques or few-shot examples to guide the LLM's response style. 4) Investigating alternative chunking strategies or embedding models if retrieval accuracy remains a concern for specific query types."*

The qualitative evaluation provides valuable insights into the current strengths and weaknesses of the RAG pipeline, guiding future iterations and optimizations.

### Qualitative Evaluation Results

This table summarizes the RAG pipeline's performance on a set of representative questions. The 'Quality Score' (1-5, 5 being excellent) and 'Comments/Analysis' have been filled in after reviewing the generated answers and their corresponding sources.

| **Question** | **Generated Answer (FAISS)** | **Retrieved Sources (FAISS)** | **Quality Score (1-5)** | **Comments/Analysis (FAISS)** | **Generated Answer (ChromaDB)** | **Retrieved Sources (ChromaDB)** | **Quality Score (1-5)** | **Comments/Analysis (ChromaDB)** |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| What are the most common issues people face with credit card billing? | not providing an adequate credit line | - **Product:** Credit card, **ID:** 8463187<br>  **Snippet:** among the worst credit card companies with the worst support to their customers in resolving cases for service that did not receive issues<br>- **Product:** Credit card, **ID:** 9277339<br>  **Snippet:** implications on my financial affairs i would like to bring to your attention that according to the rules and regulations governing credit card issuers... | 1 | The retrieved sources are too generic and don't specifically address "billing issues" or "credit line." The answer is a plausible complaint but not directly supported by the provided snippets, indicating potential hallucination or very loose interpretation. | The level of difficulty resolving problems with this credit card is incomparable to any credit card company experience i had and i always pay full every month without miss | - **Product:** Credit card, **ID:** 8463187<br>  **Snippet:** among the worst credit card companies with the worst support to their customers in resolving cases for service that did not receive issues<br>- **Product:** Credit card, **ID:** 9277339<br>  **Snippet:** implications on my financial affairs i would like to bring to your attention that according to the rules and regulations governing credit card issuers... | 1 | Similar to FAISS, sources are not specific to billing. The answer is a direct quote from an irrelevant snippet, failing to address the question effectively. |
| Can you tell me about complaints related to personal loan interest rates? | What are some complaints related to interest rates? | - **Product:** Credit card, **ID:** 8015887<br>  **Snippet:** interest rate for which i am grateful i am lodging this complaint because i had a vastly different experience with xxxx xxxx xxxx xxxx xxxx xxxx xxxx ...<br>- **Product:** Credit card, **ID:** 8010075<br>  **Snippet:** my security interest this raises serious concerns about the treatment of my personal and financial information i am filing this complaint with the i k... | 1 | Retrieval failed to get "Personal loan" specific complaints; sources are for "Credit card." The answer is a rephrasing of the question, not an actual answer, indicating the LLM couldn't synthesize from the irrelevant context. | What is the name of the complaint that the bank misled customers about interest rates misrepresented time to repay the loan and when interest would start took advantage of an elderly person with poor eye sight and poor hearing charged 269 interest? | - **Product:** Credit card, **ID:** 8015887<br>  **Snippet:** interest rate for which i am grateful i am lodging this complaint because i had a vastly different experience with xxxx xxxx xxxx xxxx xxxx xxxx xxxx ...<br>- **Product:** Credit card, **ID:** 8010075<br>  **Snippet:** my security interest this raises serious concerns about the treatment of my personal and financial information i am filing this complaint with the i k... | 1 | Same retrieval issue as FAISS (credit card for personal loan). The answer is an expanded rephrasing of the question, not a direct answer. |
| What problems do customers report with Buy Now, Pay Later services? | They are being sued for overcharging merchants on consumer purchases | - **Product:** Credit card, **ID:** 8915923<br>  **Snippet:** at their level of money grabbing and placing customers in difficult situations asking them to wait then blaming them for waiting essentially i would l...<br>- **Product:** Credit card, **ID:** 2032020<br>  **Snippet:** about this some of the items are really cool they also had issues with pricing the items alot of my payments i made on due date and they would say it ... | 2 | Retrieved sources are for "Credit card" product, not BNPL. The answer states a specific legal claim ("being sued") which is not directly present in the provided snippets, suggesting hallucination. Snippets are vaguely related to "pricing issues." | Customers report late payment remarks on their accounts. | - **Product:** Credit card, **ID:** 9186851<br>  **Snippet:** period i have not been late in 6 years never had an issue until now payments go to a xxxx xxxx xxxx they offer no means to assist just make another pa...<br>- **Product:** Credit card, **ID:** 8887826<br>  **Snippet:** ive always made sure that payments on this account are made promptly never allowing them to become overdue however im uncertain about the reasons behi... | 2 | Similar product mismatch in retrieval (Credit card for BNPL). The snippets discuss prompt payments, which contradicts the answer about "late payment remarks." The answer is plausible but not grounded in the provided context. |
| Are there common complaints about accessing funds in savings accounts? | no | - **Product:** Credit card, **ID:** 11974605<br>  **Snippet:** checking account funds leaving me unable to access my own money for context i maintain an excellent financial history with other institutions 1200000 ...<br>- **Product:** Credit card, **ID:** 1575654<br>  **Snippet:** and their behavior suggests that they are holding the funds purposely for their own gain they are collecting interest on my funds | 1 | The answer "no" directly contradicts the retrieved snippets which clearly describe issues with accessing funds. The sources are also incorrectly identified as "Credit card" product instead of "Savings account." | no | - **Product:** Credit card, **ID:** 11974605<br>  **Snippet:** checking account funds leaving me unable to access my own money for context i maintain an excellent financial history with other institutions 1200000 ...<br>- **Product:** Credit card, **ID:** 1575654<br>  **Snippet:** and their behavior suggests that they are holding the funds purposely for their own gain they are collecting interest on my funds | 1 | Identical failure as FAISS. The LLM provided a completely incorrect answer despite contradictory evidence in the retrieved, albeit miscategorized, snippets. |
| Describe typical issues with unauthorized money transfers. | No illegal issues or falsified transfers happening. | - **Product:** Credit card, **ID:** 10509550<br>  **Snippet:** there was no illegal issues or falsified transfers happening mind you that the deposit request that they were most concerned about was a 300 deposit r...<br>- **Product:** Credit card, **ID:** 12979328<br>  **Snippet:** unacceptable and has caused me significant frustration and financial inconvenience back in 2022 i maintained a xxxx account and had linked my card to ... | 1 | The answer is a direct quote from a snippet that *denies* the existence of issues, rather than describing them. The retrieval was poor, bringing up irrelevant information (Credit card product) that contradicted the intent of the question. | No illegal issues or falsified transfers happening. | - **Product:** Credit card, **ID:** 10509550<br>  **Snippet:** there was no illegal issues or falsified transfers happening mind you that the deposit request that they were most concerned about was a 300 deposit r...<br>- **Product:** Credit card, **ID:** 12979328<br>  **Snippet:** unacceptable and has caused me significant frustration and financial inconvenience back in 2022 i maintained a xxxx account and had linked my card to ... | 1 | Identical failure as FAISS. The LLM quoted a snippet that negates the problem, failing to describe typical issues. |
| What kind of disputes arise from incorrect information on credit reports? | denial of any kind credit and loans report incorrect on my credit report | - **Product:** Credit card, **ID:** 1840495<br>  **Snippet:** they are intentionally reporting inconsistent and inaccurate information to the credit reporting agencies<br>- **Product:** Credit card, **ID:** 10065329<br>  **Snippet:** i have seen inaccurate information on my credit report in | 4 | The answer is relevant and directly supported by the retrieved snippets, which explicitly mention "inaccurate information" and "inconsistent reporting." While the product is "Credit card," the snippet content is highly relevant to the question. | denial of any kind credit and loans report incorrect on my credit report are operating under the assumption that credit reporting agencies have accurate information and was obtained legally and through good faith which is not true | - **Product:** Credit card, **ID:** 1840495<br>  **Snippet:** they are intentionally reporting inconsistent and inaccurate information to the credit reporting agencies<br>- **Product:** Credit card, **ID:** 10065329<br>  **Snippet:** i have seen inaccurate information on my credit report in | 4 | Similar to FAISS, the answer is relevant and grounded in the snippets. ChromaDB's answer is slightly more verbose, incorporating more detail from the source, which is good for completeness. |
| How do customers complain about hidden fees in personal loans? | they report a customer to credit bureau | - **Product:** Credit card, **ID:** 13343696<br>  **Snippet:** fee and interest on the fee amount every month this is a clear case of predatory lending practices designed to trap consumers making purchases in reta...<br>- **Product:** Credit card, **ID:** 9328627<br>  **Snippet:** statements but even more ridiculous to penalize them when they dont receive them my credit score is xxxx and i have literally never missed a payment b... | 2 | Retrieved sources are for "Credit card" product, not "Personal loan." While snippets mention "fee," the answer about "reporting to credit bureau" is a consequence, not a direct description of how customers complain about *hidden fees*. | they report a customer to credit bureau | - **Product:** Credit card, **ID:** 13343696<br>  **Snippet:** fee and interest on the fee amount every month this is a clear case of predatory lending practices designed to trap consumers making purchases in reta...<br>- **Product:** Credit card, **ID:** 9328627<br>  **Snippet:** statements but even more ridiculous to penalize them when they dont receive them my credit score is xxxx and i have literally never missed a payment b... | 2 | Identical performance to FAISS. The answer is a consequence, not a direct complaint about hidden fees, and the sources are product-mismatched. |
| What are the security concerns mentioned for money transfer services? | not be able to trust in the security and privacy of that data if capital one can not be trusted to safeguard bank account information of people calling in to make one time payments then they should not be in business | - **Product:** Credit card, **ID:** 8312233<br>  **Snippet:** safeguards and transparency and in general this is very concerning because they have thousands and thousands of customers who store funds with them an...<br>- **Product:** Credit card, **ID:** 12906710<br>  **Snippet:** requires financial institutions to provide consumers with reasonable access to electronic funds transfers and account information i | 4 | The answer is highly relevant and directly quotes a snippet that discusses security concerns, even though the source product is "Credit card" instead of "Money transfer." The content of the snippet is broadly applicable. | if capital one can not be trusted to safeguard bank account information of people calling in to make one time payments then they should not be in business | - **Product:** Credit card, **ID:** 8312233<br>  **Snippet:** safeguards and transparency and in general this is very concerning because they have thousands and thousands of customers who store funds with them an...<br>- **Product:** Credit card, **ID:** 12906710<br>  **Snippet:** requires financial institutions to provide consumers with reasonable access to electronic funds transfers and account information i | 4 | Identical strong performance to FAISS. The answer is well-grounded and relevant to security concerns, despite the product mismatch in the source metadata. |
| Are there complaints about difficulty closing a credit card account? | no | - **Product:** Credit card, **ID:** 8071064<br>  **Snippet:** of our cards previously i had several issues when trying to use the card where the card didnt work or took several times to work and i was never told ...<br>- **Product:** Credit card, **ID:** 1851673<br>  **Snippet:** my credit card accounts was close for no reason i have been a customer for over 20 years and this is absolutely ridiculous | 1 | The answer "no" contradicts one of the retrieved snippets ("my credit card accounts was close for no reason"). The LLM seems to have missed or misinterpreted the relevant information. | no | - **Product:** Credit card, **ID:** 8071064<br>  **Snippet:** of our cards previously i had several issues when trying to use the card where the card didnt work or took several times to work and i was never told ...<br>- **Product:** Credit card, **ID:** 1851673<br>  **Snippet:** my credit card accounts was close for no reason i have been a customer for over 20 years and this is absolutely ridiculous | 1 | Identical failure as FAISS. The LLM provided an incorrect answer despite a somewhat relevant snippet. |
| Summarize issues regarding delays in receiving funds from savings accounts. | What are some of the issues that are causing problems in receiving funds from savings accounts? | - **Product:** Credit card, **ID:** 7977516<br>  **Snippet:** issue is highly appreciated and i am confident that with the evidence provided you will be able to facilitate the return of funds to my account withou...<br>- **Product:** Credit card, **ID:** 8391730<br>  **Snippet:** ensuring punctual payments i have never experienced delays on this account im unsure why there are indications of late payment remarks on my accounts | 1 | The answer is a rephrasing of the question, indicating the LLM could not synthesize a direct answer. The retrieved snippets are not strongly relevant to "delays in receiving funds from savings accounts," and are from "Credit card" product. | What are some of the issues that have arisen in the process of receiving funds from savings accounts? | - **Product:** Credit card, **ID:** 7977516<br>  **Snippet:** issue is highly appreciated and i am confident that with the evidence provided you will be able to facilitate the return of funds to my account withou...<br>- **Product:** Credit card, **ID:** 8294839<br>  **Snippet:** ensuring punctual payments i have never experienced delays on this account im unsure why there are indications of late payment remarks on my accounts | 1 | Identical failure as FAISS. The LLM rephrased the question, and the retrieved context was not sufficiently relevant or product-specific. |