<center><p float="center">
  <img src="https://mma.prnewswire.com/media/1458111/Great_Learning_Logo.jpg?p=facebook" width="200" height="100"/>
</p></center>

<h1><center><font size=10> Generative AI for NLP Program</center></font></h1>
<h1><center> Project </center></h1>

# **GA-NLP Mid-Term Project: Financial Product Complaint Classification and Summarization**

## **Business Context**

### **Description**
*In the modern financial industry, customer complaints play a crucial role in identifying areas where financial institutions can improve their services. Effectively categorizing these complaints into specific product categories, such as credit reports, student loans, or money transfers, is essential for addressing customer concerns promptly by routing the tickets to relevant personnel. Leveraging Generative AI for text classification can help financial institutions better understand customer grievances and respond more efficiently. Apart from this, a summary of the customer complaint helps the support personnel quickly grasp the gist of the grievance*

### **Objective**
*The primary goal of this project is to utilize Generative AI techniques to improve the classification and summarization of customer complaints in the financial sector.
Specifically, the project will focus on:*

1. **Text-to-Label Classification:** *Implementing Zero-shot and Few-shot prompting methods to accurately classify customer complaints into relevant product categories.*
2. **Text-to-Text Summarization:** *Using Zero-shot prompting to generate concise summaries of customer complaints, enabling more personalized and effective responses.*

### **Conclusion**
*Upon completing this project, you will have the capability to develop solutions for LLM-based text classification and summarization. These tools will enable financial institutions to automate the complaint handling process, leading to faster, more accurate responses to customer issues, improved customer satisfaction, and enhanced compliance with industry regulations. This project will also provide you with valuable skills and experience that can be applied to a range of real-world business challenges.*


## changing home directory

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import os
os.chdir('/content/drive/MyDrive/GL-GenAI/Projects')
os.listdir()

['Complains_classification.csv',
 'Learners_Notebook_Mid_Term_Project.ipynb',
 'new_data_cleaned.csv',
 'requirement.txt',
 'SM-V0-Learners_Notebook_Mid_Term_Project.ipynb']

## verifying GPU

In [3]:
!nvidia-smi

Tue Jan  7 05:14:52 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

# **Section 1 : Setting Up for Prompt Engineering with Mistral Model**

### **Install & Importing neccessary libraries**

- **ninja-build**: A fast build system widely used for compiling code in projects such as TensorFlow or PyTorch.
- **cmake**: A cross-platform build system generator used to manage the build process of software.
- **ipywidgets**: A package for creating interactive widgets in Jupyter Notebooks (and Google Colab) -- disabling because not needed for what we have here

In [67]:
%%capture
!apt-get update
!apt-get install -y ninja-build cmake
!pip install ipywidgets --upgrade

In [5]:
import torch

In [6]:
# This part of code will skip all the un-necessary warnings which can occur during the execution of this project.
import warnings
warnings.filterwarnings("ignore", category=Warning)

In [7]:
# Installation for GPU llama-cpp-python==0.2.69
!CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python==0.2.69
# For downloading the models from HF Hub
!pip install huggingface_hub

Collecting llama-cpp-python==0.2.69
  Downloading llama_cpp_python-0.2.69.tar.gz (42.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.5/42.5 MB[0m [31m48.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python==0.2.69)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.69-cp310-cp310-linux_x86_64.whl size=55279198 sha256=9

In [9]:
%%capture
!pip install evaluate
!pip install bert-score

In [10]:
!pip freeze > requirement.txt

In [11]:
# Basic Imports for Libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

import pandas as pd
import numpy as np
from tqdm import tqdm
import json
import re

import torch
import evaluate

# from google.colab import drive
import locale

### **Question 1: Importing Libaries and Mistral Model (3 Marks)**

- For the Mistral Model name or path and model basename, refer to the **Week 3 Additional Content: Prompt Engineering Fundamentals**
- Code Notebook: Self-Consistency and Tree-of-Thought Prompting with Llama 2 and Mistral.





https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf

#### **why use Llama wrapper for Mistral model:**

The Llama wrapper from the llama_cpp library is being used for the Mistral model because the Mistral architecture is compatible with the Llama 2 format. Specifically:

- **Why Use Llama for Mistral?**
	1.	**Mistral is Llama-Compatible:**
	•	The Mistral model architecture was designed to be compatible with the Llama framework in terms of tokenization, model weights, and processing. This compatibility allows Mistral models to be loaded and run using libraries built for Llama, such as llama_cpp.
	2.	**Shared Ecosystem:**
	•	Mistral builds on advances introduced in the Llama ecosystem, including attention mechanisms and optimization techniques. As a result, Mistral models can take advantage of the tools and utilities developed for Llama-based models.
	3.	**Convenience:**
	•	By using llama_cpp, you can leverage its high-performance inference capabilities, such as quantization and support for GPUs, without needing a separate library for Mistral.
	4.	**GGUF Format:**
	•	The TheBloke/Mistral-7B-Instruct-v0.2-GGUF model is distributed in GGUF format, which is a format designed for compatibility with the Llama inference framework (llama_cpp). This ensures seamless integration with the Llama wrapper.
	5.	**Flexibility and Optimizations:**
	•	The llama_cpp library provides numerous options for optimizing inference (e.g., n_gpu_layers, n_batch, and quantized formats like Q5_K_M), which are useful for running Mistral efficiently.

In [12]:
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

In [13]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q5_K_M.gguf"

In [14]:
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)

mistral-7b-instruct-v0.2.Q5_K_M.gguf:   0%|          | 0.00/5.13G [00:00<?, ?B/s]

In [15]:
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2, # CPU cores
    n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=43, # Change this value based on your model and your GPU VRAM pool.
    n_ctx=4096 # Context window
)

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /root/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loa

- CUDA0 KV buffer size = 512.00 MiB
- CUDA0 compute buffer size = 296.00 MiB
- computation graph is set up and optimized for GPU execution
  - graph nodes = 1030
  - graph splits = 2
- CUDA_Host output buffer size = 0.12 MiB  

# **Section 2: Text to Label generation**

### **Question 2: Zero-Shot Prompting for Text Classification (5 Marks)**

##### **Q2.1: Define the Prompt Template, System Message, generate_prompt** **(2 Marks)**

- Define a **system message** as a string and assign it to the variable system_message to generate product class.
- Create a **zero shot prompt template** that incorporates the system message and user input.
- Define **generate_prompt** function that takes both the system_message and user_input as arguments and formats them into a prompt template


Write a Python function called **generate_mistral_response** that takes a single parameter, narrative, which represents the user's complain. Inside the function, you should perform the following tasks:


- **Combine the system_message and narrative to create a prompt string using generate_prompt function.**

*Generate a response from the Mistral model using the lcpp_llm instance with the following parameters:*

- prompt should be the combined prompt string.
- max_tokens should be set to 1200.
- temperature should be set to 0.
- top_p should be set to 0.95.
- repeat_penalty should be set to 1.2.
- top_k should be set to 50.
- stop should be set as a list containing '/s'.
- echo should be set to False.
Extract and return the response text from the generated response.

Don't forget to provide a value for the system_message variable before using it in the function.

In [16]:
system_message = """You are a product MultiLabel categorizer. Your task is to analyze product descriptions
and categorize them into the most relavant class (MultiLabel Categorization).
choose from the following categories: credit_card, retail_banking, credit_reporting, mortgages_and_loans, debt_collection
Do not explain your answer or give justifications.

give your answer in the following format:
category: answer

"""


In [17]:
zero_shot_prompt_template = """<s>[INST] {system_message}

User Input: {user_message}

Please classify this product. [/INST]"""

In [18]:
def generate_prompt(system_message, user_input):
    prompt = zero_shot_prompt_template.format(
        system_message=system_message,
        user_message=user_input
    )
    return prompt

In [19]:
def generate_mistral_response(input_text):

    # Combine user_prompt and system_message to create the prompt
    prompt = generate_prompt(system_message,input_text)

    # Define the Llama model along with its parameters for generating a response
    response = lcpp_llm(
        prompt=prompt,
        max_tokens=1200,
        temperature=0.0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['/s'],
        echo=False
    )

    # Extract and return the response text
    response_text = response["choices"][0]["text"]
    print(response_text)
    return response_text

In [20]:
# Load a CSV File containing Dataset of 500 products, narrative and summary (summary of narrative)
data=pd.read_csv('Complains_classification.csv')

In [21]:
# Randomly select 30 rows
new_data = data.sample(n=30, random_state=40)

#### **take a quick look at new_data format**

In [22]:
new_data.head(3)

Unnamed: 0,product,narrative,summary
167,retail_banking,fraudulent charge totaling made capital one ch...,A fraudulent charge was made on the individual...
169,credit_reporting,block except otherwise provided section consum...,The text outlines various stipulations regardi...
461,credit_card,usaa master plan collect cancellation debt usa...,The input appears to be a complaint about USAA...


In [23]:
new_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 30 entries, 167 to 463
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   product    30 non-null     object
 1   narrative  30 non-null     object
 2   summary    30 non-null     object
dtypes: object(3)
memory usage: 960.0+ bytes


In [24]:
# show counts of product column
new_data['product'].value_counts()

Unnamed: 0_level_0,count
product,Unnamed: 1_level_1
credit_reporting,22
credit_card,4
mortgages_and_loans,3
retail_banking,1


In [25]:
data['product'].value_counts()

Unnamed: 0_level_0,count
product,Unnamed: 1_level_1
credit_reporting,388
mortgages_and_loans,36
debt_collection,29
credit_card,28
retail_banking,19


In [26]:
new_data.columns

Index(['product', 'narrative', 'summary'], dtype='object')

##### **Q2.2: Create a new column in the DataFrame called 'mistral_response' and populate it with responses generated by applying the 'generate_mistral_response' function to each 'narrative' in the DataFrame and prepare the mistral_response_cleaned column using extract_category function** **(1 Marks)**

- **progress_apply Explanation**:
	- progress_apply is a method provided by the tqdm library, specifically when used with a Pandas DataFrame or Series.
	- It extends the functionality of apply, which is used to apply a function to each element of a Pandas Series or DataFrame column.
	- progress_apply adds a progress bar to give visual feedback on the progress of the operation, which is especially useful for long-running tasks.

In [27]:
# # example - new_data['mistral_response'] = new_data['narrative'].apply(lambda x:______ )
# new_data['mistral_response'] = new_data['narrative'].apply(lambda x: "_____")

In [28]:
# Add progress bar and error handling for model responses
tqdm.pandas(desc="Generating classifications")
new_data['mistral_response'] = new_data['narrative'].progress_apply(
    lambda x: generate_mistral_response(x) if isinstance(x, str) else "Invalid input"
)


Generating classifications:   0%|          | 0/30 [00:00<?, ?it/s]
llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       8.81 ms /    15 runs   (    0.59 ms per token,  1703.38 tokens per second)
llama_print_timings: prompt eval time =     541.02 ms /   330 tokens (    1.64 ms per token,   609.96 tokens per second)
llama_print_timings:        eval time =     364.19 ms /    14 runs   (   26.01 ms per token,    38.44 tokens per second)
llama_print_timings:       total time =     951.39 ms /   344 tokens
Generating classifications:   7%|▋         | 2/30 [00:00<00:13,  2.08it/s]Llama.generate: prefix-match hit


 category: credit_card, credit\_reporting, debt\_collection



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       4.13 ms /     7 runs   (    0.59 ms per token,  1696.56 tokens per second)
llama_print_timings: prompt eval time =     475.46 ms /   512 tokens (    0.93 ms per token,  1076.86 tokens per second)
llama_print_timings:        eval time =     164.23 ms /     6 runs   (   27.37 ms per token,    36.53 tokens per second)
llama_print_timings:       total time =     663.12 ms /   518 tokens
Generating classifications:  10%|█         | 3/30 [00:01<00:15,  1.78it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.46 ms /     6 runs   (    0.58 ms per token,  1736.61 tokens per second)
llama_print_timings: prompt eval time =     301.83 ms /   189 tokens (    1.60 ms per token,   626.17 tokens per second)
llama_print_timings:        eval time =     133.92 ms /     5 runs   (   26.78 ms per token,    37.34 tokens per second)
llama_print_timings:       total time =     454.49 ms /   194 tokens
Generating classifications:  13%|█▎        | 4/30 [00:02<00:13,  1.90it/s]Llama.generate: prefix-match hit


 category: debt_collection



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.81 ms /     7 runs   (    0.54 ms per token,  1836.31 tokens per second)
llama_print_timings: prompt eval time =     482.92 ms /   512 tokens (    0.94 ms per token,  1060.23 tokens per second)
llama_print_timings:        eval time =     171.65 ms /     6 runs   (   28.61 ms per token,    34.96 tokens per second)
llama_print_timings:       total time =     675.29 ms /   518 tokens
Generating classifications:  17%|█▋        | 5/30 [00:02<00:14,  1.72it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.48 ms /     7 runs   (    0.50 ms per token,  2013.81 tokens per second)
llama_print_timings: prompt eval time =     192.86 ms /    81 tokens (    2.38 ms per token,   419.99 tokens per second)
llama_print_timings:        eval time =     158.75 ms /     6 runs   (   26.46 ms per token,    37.80 tokens per second)
llama_print_timings:       total time =     372.46 ms /    87 tokens
Generating classifications:  20%|██        | 6/30 [00:03<00:12,  1.93it/s]Llama.generate: prefix-match hit


 category: retail_banking



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.83 ms /     7 runs   (    0.55 ms per token,  1826.72 tokens per second)
llama_print_timings: prompt eval time =     449.02 ms /   427 tokens (    1.05 ms per token,   950.95 tokens per second)
llama_print_timings:        eval time =     161.13 ms /     6 runs   (   26.86 ms per token,    37.24 tokens per second)
llama_print_timings:       total time =     632.91 ms /   433 tokens
Generating classifications:  23%|██▎       | 7/30 [00:03<00:12,  1.79it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       6.22 ms /    11 runs   (    0.57 ms per token,  1769.63 tokens per second)
llama_print_timings: prompt eval time =     195.80 ms /    93 tokens (    2.11 ms per token,   474.97 tokens per second)
llama_print_timings:        eval time =     268.15 ms /    10 runs   (   26.81 ms per token,    37.29 tokens per second)
llama_print_timings:       total time =     496.52 ms /   103 tokens
Generating classifications:  27%|██▋       | 8/30 [00:04<00:11,  1.84it/s]Llama.generate: prefix-match hit


 category: mortgages_and_loans



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.93 ms /     7 runs   (    0.56 ms per token,  1782.08 tokens per second)
llama_print_timings: prompt eval time =     450.64 ms /   427 tokens (    1.06 ms per token,   947.54 tokens per second)
llama_print_timings:        eval time =     161.96 ms /     6 runs   (   26.99 ms per token,    37.05 tokens per second)
llama_print_timings:       total time =     633.39 ms /   433 tokens
Generating classifications:  30%|███       | 9/30 [00:04<00:12,  1.74it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.91 ms /     7 runs   (    0.56 ms per token,  1792.57 tokens per second)
llama_print_timings: prompt eval time =     452.51 ms /   439 tokens (    1.03 ms per token,   970.13 tokens per second)
llama_print_timings:        eval time =     165.44 ms /     6 runs   (   27.57 ms per token,    36.27 tokens per second)
llama_print_timings:       total time =     640.81 ms /   445 tokens
Generating classifications:  33%|███▎      | 10/30 [00:05<00:11,  1.67it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.68 ms /     7 runs   (    0.53 ms per token,  1902.69 tokens per second)
llama_print_timings: prompt eval time =     450.92 ms /   427 tokens (    1.06 ms per token,   946.96 tokens per second)
llama_print_timings:        eval time =     162.73 ms /     6 runs   (   27.12 ms per token,    36.87 tokens per second)
llama_print_timings:       total time =     634.65 ms /   433 tokens
Generating classifications:  37%|███▋      | 11/30 [00:06<00:11,  1.63it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       4.15 ms /     7 runs   (    0.59 ms per token,  1685.12 tokens per second)
llama_print_timings: prompt eval time =     251.77 ms /   154 tokens (    1.63 ms per token,   611.68 tokens per second)
llama_print_timings:        eval time =     164.05 ms /     6 runs   (   27.34 ms per token,    36.57 tokens per second)
llama_print_timings:       total time =     438.45 ms /   160 tokens
Generating classifications:  40%|████      | 12/30 [00:06<00:10,  1.77it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.83 ms /     7 runs   (    0.55 ms per token,  1828.63 tokens per second)
llama_print_timings: prompt eval time =     557.69 ms /   518 tokens (    1.08 ms per token,   928.83 tokens per second)
llama_print_timings:        eval time =     169.15 ms /     6 runs   (   28.19 ms per token,    35.47 tokens per second)
llama_print_timings:       total time =     750.75 ms /   524 tokens
Generating classifications:  43%|████▎     | 13/30 [00:07<00:10,  1.60it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.82 ms /     7 runs   (    0.55 ms per token,  1832.94 tokens per second)
llama_print_timings: prompt eval time =     448.93 ms /   427 tokens (    1.05 ms per token,   951.15 tokens per second)
llama_print_timings:        eval time =     165.91 ms /     6 runs   (   27.65 ms per token,    36.16 tokens per second)
llama_print_timings:       total time =     635.87 ms /   433 tokens
Generating classifications:  47%|████▋     | 14/30 [00:08<00:10,  1.58it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       6.08 ms /    11 runs   (    0.55 ms per token,  1810.40 tokens per second)
llama_print_timings: prompt eval time =     278.90 ms /   239 tokens (    1.17 ms per token,   856.93 tokens per second)
llama_print_timings:        eval time =     274.93 ms /    10 runs   (   27.49 ms per token,    36.37 tokens per second)
llama_print_timings:       total time =     587.56 ms /   249 tokens
Generating classifications:  50%|█████     | 15/30 [00:08<00:09,  1.61it/s]Llama.generate: prefix-match hit


 category: mortgages_and_loans



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.87 ms /     7 runs   (    0.55 ms per token,  1808.79 tokens per second)
llama_print_timings: prompt eval time =     543.79 ms /   518 tokens (    1.05 ms per token,   952.57 tokens per second)
llama_print_timings:        eval time =     163.31 ms /     6 runs   (   27.22 ms per token,    36.74 tokens per second)
llama_print_timings:       total time =     729.41 ms /   524 tokens
Generating classifications:  53%|█████▎    | 16/30 [00:09<00:09,  1.52it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.79 ms /     7 runs   (    0.54 ms per token,  1847.45 tokens per second)
llama_print_timings: prompt eval time =     455.23 ms /   445 tokens (    1.02 ms per token,   977.53 tokens per second)
llama_print_timings:        eval time =     162.92 ms /     6 runs   (   27.15 ms per token,    36.83 tokens per second)
llama_print_timings:       total time =     638.84 ms /   451 tokens
Generating classifications:  57%|█████▋    | 17/30 [00:10<00:08,  1.52it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.76 ms /     7 runs   (    0.54 ms per token,  1860.71 tokens per second)
llama_print_timings: prompt eval time =     452.11 ms /   427 tokens (    1.06 ms per token,   944.46 tokens per second)
llama_print_timings:        eval time =     163.41 ms /     6 runs   (   27.23 ms per token,    36.72 tokens per second)
llama_print_timings:       total time =     635.83 ms /   433 tokens
Generating classifications:  60%|██████    | 18/30 [00:10<00:07,  1.53it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.50 ms /     6 runs   (    0.58 ms per token,  1712.82 tokens per second)
llama_print_timings: prompt eval time =     275.97 ms /   200 tokens (    1.38 ms per token,   724.71 tokens per second)
llama_print_timings:        eval time =     137.47 ms /     5 runs   (   27.49 ms per token,    36.37 tokens per second)
llama_print_timings:       total time =     432.94 ms /   205 tokens
Generating classifications:  63%|██████▎   | 19/30 [00:11<00:06,  1.69it/s]Llama.generate: prefix-match hit


 category: credit_card



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       4.16 ms /     7 runs   (    0.59 ms per token,  1683.91 tokens per second)
llama_print_timings: prompt eval time =     242.64 ms /   146 tokens (    1.66 ms per token,   601.71 tokens per second)
llama_print_timings:        eval time =     161.00 ms /     6 runs   (   26.83 ms per token,    37.27 tokens per second)
llama_print_timings:       total time =     424.68 ms /   152 tokens
Generating classifications:  67%|██████▋   | 20/30 [00:11<00:05,  1.84it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       6.48 ms /    11 runs   (    0.59 ms per token,  1696.75 tokens per second)
llama_print_timings: prompt eval time =     380.99 ms /   365 tokens (    1.04 ms per token,   958.03 tokens per second)
llama_print_timings:        eval time =     272.10 ms /    10 runs   (   27.21 ms per token,    36.75 tokens per second)
llama_print_timings:       total time =     688.30 ms /   375 tokens
Generating classifications:  70%|███████   | 21/30 [00:12<00:05,  1.69it/s]Llama.generate: prefix-match hit


 category: mortgages_and_loans



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.39 ms /     7 runs   (    0.48 ms per token,  2066.73 tokens per second)
llama_print_timings: prompt eval time =     192.70 ms /    79 tokens (    2.44 ms per token,   409.96 tokens per second)
llama_print_timings:        eval time =     161.49 ms /     6 runs   (   26.92 ms per token,    37.15 tokens per second)
llama_print_timings:       total time =     374.95 ms /    85 tokens
Generating classifications:  73%|███████▎  | 22/30 [00:12<00:04,  1.89it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.83 ms /     7 runs   (    0.55 ms per token,  1825.29 tokens per second)
llama_print_timings: prompt eval time =     373.14 ms /   343 tokens (    1.09 ms per token,   919.22 tokens per second)
llama_print_timings:        eval time =     164.27 ms /     6 runs   (   27.38 ms per token,    36.53 tokens per second)
llama_print_timings:       total time =     558.23 ms /   349 tokens
Generating classifications:  77%|███████▋  | 23/30 [00:13<00:03,  1.84it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.94 ms /     7 runs   (    0.56 ms per token,  1775.75 tokens per second)
llama_print_timings: prompt eval time =     203.58 ms /   116 tokens (    1.76 ms per token,   569.80 tokens per second)
llama_print_timings:        eval time =     162.28 ms /     6 runs   (   27.05 ms per token,    36.97 tokens per second)
llama_print_timings:       total time =     387.84 ms /   122 tokens
Generating classifications:  80%|████████  | 24/30 [00:13<00:02,  2.00it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.36 ms /     6 runs   (    0.56 ms per token,  1784.65 tokens per second)
llama_print_timings: prompt eval time =     240.35 ms /   138 tokens (    1.74 ms per token,   574.17 tokens per second)
llama_print_timings:        eval time =     134.49 ms /     5 runs   (   26.90 ms per token,    37.18 tokens per second)
llama_print_timings:       total time =     392.72 ms /   143 tokens
Generating classifications:  83%|████████▎ | 25/30 [00:14<00:02,  2.13it/s]Llama.generate: prefix-match hit


 category: debt_collection



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.89 ms /     7 runs   (    0.56 ms per token,  1801.80 tokens per second)
llama_print_timings: prompt eval time =     365.38 ms /   296 tokens (    1.23 ms per token,   810.12 tokens per second)
llama_print_timings:        eval time =     170.56 ms /     6 runs   (   28.43 ms per token,    35.18 tokens per second)
llama_print_timings:       total time =     557.41 ms /   302 tokens
Generating classifications:  87%|████████▋ | 26/30 [00:14<00:01,  2.00it/s]Llama.generate: prefix-match hit


 category: retail_banking



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.79 ms /     7 runs   (    0.54 ms per token,  1845.50 tokens per second)
llama_print_timings: prompt eval time =     452.88 ms /   427 tokens (    1.06 ms per token,   942.85 tokens per second)
llama_print_timings:        eval time =     166.15 ms /     6 runs   (   27.69 ms per token,    36.11 tokens per second)
llama_print_timings:       total time =     640.62 ms /   433 tokens
Generating classifications:  90%|█████████ | 27/30 [00:15<00:01,  1.83it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.64 ms /     7 runs   (    0.52 ms per token,  1922.55 tokens per second)
llama_print_timings: prompt eval time =     202.33 ms /   110 tokens (    1.84 ms per token,   543.67 tokens per second)
llama_print_timings:        eval time =     164.36 ms /     6 runs   (   27.39 ms per token,    36.51 tokens per second)
llama_print_timings:       total time =     387.21 ms /   116 tokens
Generating classifications:  93%|█████████▎| 28/30 [00:15<00:01,  1.99it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       4.19 ms /     7 runs   (    0.60 ms per token,  1669.05 tokens per second)
llama_print_timings: prompt eval time =     252.88 ms /   159 tokens (    1.59 ms per token,   628.77 tokens per second)
llama_print_timings:        eval time =     162.53 ms /     6 runs   (   27.09 ms per token,    36.92 tokens per second)
llama_print_timings:       total time =     438.62 ms /   165 tokens
Generating classifications:  97%|█████████▋| 29/30 [00:16<00:00,  2.06it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.94 ms /     7 runs   (    0.56 ms per token,  1776.65 tokens per second)
llama_print_timings: prompt eval time =     485.34 ms /   512 tokens (    0.95 ms per token,  1054.93 tokens per second)
llama_print_timings:        eval time =     167.90 ms /     6 runs   (   27.98 ms per token,    35.73 tokens per second)
llama_print_timings:       total time =     674.78 ms /   518 tokens
Generating classifications: 100%|██████████| 30/30 [00:16<00:00,  1.83it/s]Llama.generate: prefix-match hit


 category: credit_reporting



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =       3.85 ms /     7 runs   (    0.55 ms per token,  1818.65 tokens per second)
llama_print_timings: prompt eval time =     199.45 ms /   102 tokens (    1.96 ms per token,   511.42 tokens per second)
llama_print_timings:        eval time =     157.94 ms /     6 runs   (   26.32 ms per token,    37.99 tokens per second)
llama_print_timings:       total time =     378.35 ms /   108 tokens
Generating classifications: 100%|██████████| 30/30 [00:17<00:00,  1.74it/s]

 category: credit_reporting





In [29]:
new_data.head(12)

Unnamed: 0,product,narrative,summary,mistral_response
167,retail_banking,fraudulent charge totaling made capital one ch...,A fraudulent charge was made on the individual...,"category: credit_card, credit\_reporting, deb..."
169,credit_reporting,block except otherwise provided section consum...,The text outlines various stipulations regardi...,category: credit_reporting
461,credit_card,usaa master plan collect cancellation debt usa...,The input appears to be a complaint about USAA...,category: debt_collection
253,credit_reporting,block except otherwise provided section consum...,The text pertains to the stipulations and oper...,category: credit_reporting
42,credit_reporting,open account acct opened balance account acct ...,The input is about various accounts being open...,category: retail_banking
369,credit_reporting,except otherwise provided section consumer rep...,This legal text details rules and regulations ...,category: credit_reporting
26,mortgages_and_loans,trying close loan month provided income paymen...,The individual has been attempting to close a ...,category: mortgages_and_loans
377,credit_reporting,except otherwise provided section consumer rep...,This legislation requires that a consumer repo...,category: credit_reporting
238,credit_reporting,block except otherwise provided section consum...,The section explains the rules and procedures ...,category: credit_reporting
374,credit_reporting,except otherwise provided section consumer rep...,The input text seems to discuss a law provisio...,category: credit_reporting


In [30]:
def extract_category(text):
    # Define the regex pattern to match "category:" or "Category:" followed by a word
    pattern = r'category:\s*(\w+)'  # The pattern itself remains the same

    # Use re.search with the re.IGNORECASE flag to make it case-insensitive
    match = re.search(pattern, text, re.IGNORECASE)

    # If a match is found, return the captured group, else return None
    if match:
        return match.group(1)
    else:
        pattern1 = r'(credit_card|retail_banking|credit_reporting|mortgages_and_loans|debt_collection)'
        match = re.search(pattern1, text, re.IGNORECASE)
        if match:
            return match.group()
        else:
            return ''

In [31]:
# example - new_data['mistral_response_cleaned'] = new_data['narrative'].apply(lambda x:______ )
new_data['mistral_response_cleaned'] = new_data['mistral_response'].apply(
    lambda x: extract_category(x) if isinstance(x, str) else "Invalid input"
)

### Note: Match group 1 matches the 1st match only

In [32]:
new_data.head(3)

Unnamed: 0,product,narrative,summary,mistral_response,mistral_response_cleaned
167,retail_banking,fraudulent charge totaling made capital one ch...,A fraudulent charge was made on the individual...,"category: credit_card, credit\_reporting, deb...",credit_card
169,credit_reporting,block except otherwise provided section consum...,The text outlines various stipulations regardi...,category: credit_reporting,credit_reporting
461,credit_card,usaa master plan collect cancellation debt usa...,The input appears to be a complaint about USAA...,category: debt_collection,debt_collection


#### Let's take a quick look at product, mistral_response, and mistral_response_cleaned

In [33]:
new_data['mistral_response'].value_counts()

Unnamed: 0_level_0,count
mistral_response,Unnamed: 1_level_1
category: credit_reporting,21
category: mortgages_and_loans,3
category: debt_collection,2
category: retail_banking,2
"category: credit_card, credit\_reporting, debt\_collection",1
category: credit_card,1


In [34]:
new_data['mistral_response_cleaned'].value_counts()

Unnamed: 0_level_0,count
mistral_response_cleaned,Unnamed: 1_level_1
credit_reporting,21
mortgages_and_loans,3
credit_card,2
debt_collection,2
retail_banking,2


In [35]:
new_data['product'].value_counts()

Unnamed: 0_level_0,count
product,Unnamed: 1_level_1
credit_reporting,22
credit_card,4
mortgages_and_loans,3
retail_banking,1


##### **Q2.3: Calculate the F1 score** **(1 Marks)**

**Micro-averaging** aggregates the contributions of all classes (true positives, false positives, false negatives) before calculating precision and recall. This approach is useful when the class distributions are imbalanced.

**Formulaes**

**F1 Score:**
$$
F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
$$

**Precision:**
$$
\text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}}
$$

**Recall**
$$
\text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}}
$$

**Micro-Averaged F1 Score**
$$
F1_{\text{micro}} = 2 \times \frac{\sum \text{TP}}{2 \times \sum \text{TP} + \sum \text{FP} + \sum \text{FN}}
$$

In [36]:
# Calculate F1 score for 'product' and 'mistral_response'
f1 =  f1_score(new_data['product'], new_data['mistral_response'],average='micro')

print(f'F1 Score: {f1}')

F1 Score: 0.0


In [37]:
# Calculate F1 score for 'product' and 'mistral_response_cleaned'
f2 =  f1_score(new_data['product'], new_data['mistral_response_cleaned'],average='micro')
print(f'F1 Score: {f2}')

F1 Score: 0.8333333333333334


##### **Q2.4: Explain the difference in F1 scores between mistral_response and mistral_response_cleaned.** **(1 Marks)**

**F1 Score for `mistral_response` (0.0):**
- A score of 0.0 indicates no overlap between the predicted (`mistral_response`) and true (`product`) categories. This suggests that:
  - The multi-category entries in `mistral_response` do not match any entries in `product`.
  - Prefixes or formatting issues (e.g., `category: credit_reporting`) prevent direct comparison.

**F1 Score for `mistral_response_cleaned` (0.8333):**
- Cleaning the data resolves these inconsistencies, leading to much better alignment between `mistral_response_cleaned` and `product`. This alignment results in a high F1 score, as the majority of categories in `mistral_response_cleaned` match those in `product`.

### **Question 3: Few-Shot Prompting for Text Classification (7 Marks)**

##### **Q3.1: Prepare examples for a few-shot prompt, formulate the prompt, and generate the Mistral response. (5 Marks)**

**Generate a set of gold examples by randomly selecting 10 instances of user_input and assistant_output from dataset ensuring a balanced representation with 2 examples from each class.**

In [38]:
import json
review_1 = data[data['product'] == 'credit_card'] # choosing only rows which have credit_card
review_2 = data[data['product'] == 'retail_banking']
review_3 = data[data['product'] == 'credit_reporting']
review_4 = data[data['product'] == 'mortgages_and_loans']
review_5 = data[data['product'] == 'debt_collection']

# Sample 2 examples for each category
examples_1 = review_1.sample(2, random_state=40)
examples_2 = review_2.sample(2, random_state=40)
examples_3 = review_3.sample(2, random_state=40)
examples_4 = review_4.sample(2, random_state=40)
examples_5 = review_5.sample(2, random_state=40)

# Concatenate examples for few shot prompting
examples_df = pd.concat([examples_1,examples_2,examples_3,examples_4,examples_5 ])

# Create the training set by excluding examples
gold_examples_df = data.drop(index=examples_df.index)

# Convert examples to JSON
columns_to_select = ['narrative', 'product']
examples_json = examples_df[columns_to_select].to_json(orient='records')

# Print the first record from the JSON
print(json.loads(examples_json)[0])

# Print the shapes of the datasets
print("Examples Set Shape:", examples_df.shape)
print("Gold Examples Shape:", gold_examples_df.shape)

{'narrative': 'called request new york state covid relief plan day interest fee waived amex provided relief leading late payment amex refused honor relief day gap insists charging late fee', 'product': 'credit_card'}
Examples Set Shape: (10, 3)
Gold Examples Shape: (490, 3)


- Define your **system_message**.
- Define **first_turn_template**, **example_template** and **prediction template**
- **create few shot prompt** using gold examples and system_message
- Randomly select 30 rows from test_df as test_data
- Create **mistral_response** with **mistral_response_cleaned** columns for this

In [39]:
system_message = """You are a product MultiLabel categorizer. Your task is to analyze product descriptions
and categorize them into the most relavant class (MultiLabel Categorization).
choose from the following categories: credit_card, retail_banking, credit_reporting, mortgages_and_loans, debt_collection
Do not explain your answer or give justifications.

give your answer in the following format:
category: answer

Correct Format for answer:
category: credit_reporting, retail_banking, debt_collection

Incorrecte Format for answer:
I. credit_reporting\nII. retail_banking (The text mentions "consumer reporting agency" multiple times, which is a common term in the context of banking and finance.)\nIII. debt_collection (There are references to "debt collection" and "debtor" throughout the text.)	credit_reporting

Some more Examples of input text to be analzyed is enclosed in triple backticks, that is, ``` and the respective categories are shown as category: answer

"""

In [40]:
first_turn_template = """<s>[INST]{system_message}```{user_input}```[/INST]{assistant_output}</s> """

In [41]:
examples_template = """<s>[INST]```{user_input}```[/INST]{assistant_output}</s>"""

In [42]:
prediction_template = """<s>[INST]```{user_input}```[/INST]</s>"""

In [43]:
def create_few_shot_prompt(system_message, examples_df):

    """
    Return a prompt message in the format expected by Mistral 7b.
    10 examples are selected randomly as golden examples to form the
    few-shot prompt.
    We then loop through each example and parse the narrative as the user message
    and the product as the assistant message.

    Args:
        system_message (str): system message with instructions for classification
        examples(DataFrame): A DataFrame with examples (product + narrative + summary)
        to form the few-shot prompt.

    Output:
        few_shot_prompt (str): A prompt string in the Mistral format
    """

    few_shot_prompt = ''

    columns_to_select = ['narrative', 'product']
    examples = (
        examples_df.loc[:, columns_to_select].to_json(orient='records')
    )

    for idx, example in enumerate(json.loads(examples)):
        user_input_example = example['narrative']
        # Prepend "category: " to the product field
        assistant_output_example = f"category: {example['product']}"

        if idx == 0:
            few_shot_prompt += first_turn_template.format(
                system_message=system_message,
                user_input=user_input_example,
                assistant_output=assistant_output_example
            )
        else:
            few_shot_prompt += examples_template.format(
                user_input=user_input_example,
                assistant_output=assistant_output_example
            )

    return few_shot_prompt

In [44]:
few_shot_prompt = create_few_shot_prompt(system_message, examples_df)

In [45]:
print(few_shot_prompt)

<s>[INST]You are a product MultiLabel categorizer. Your task is to analyze product descriptions 
and categorize them into the most relavant class (MultiLabel Categorization). 
choose from the following categories: credit_card, retail_banking, credit_reporting, mortgages_and_loans, debt_collection
Do not explain your answer or give justifications. 

give your answer in the following format:
category: answer

Correct Format for answer:
category: credit_reporting, retail_banking, debt_collection

Incorrecte Format for answer:
I. credit_reporting
II. retail_banking (The text mentions "consumer reporting agency" multiple times, which is a common term in the context of banking and finance.)
III. debt_collection (There are references to "debt collection" and "debtor" throughout the text.)	credit_reporting

Some more Examples of input text to be analzyed is enclosed in triple backticks, that is, ``` and the respective categories are shown as category: answer

```called request new york state c

In [46]:
def generate_prompt(few_shot_prompt,new_review):
    prompt =  few_shot_prompt + prediction_template.format(user_input=new_review)
    return prompt

In [47]:
def generate_mistral_response(support_ticket_text):

    # Combine user_prompt and system_message to create the prompt
    prompt = generate_prompt(system_message,support_ticket_text)

    # Define the Llama model along with its parameters for generating a response
    response = lcpp_llm(
        prompt=prompt,
        max_tokens=1200,
        temperature=0.0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['/s'],
        echo=False
    )
    # Extract and return the response text
    response_text = response["choices"][0]["text"]
    print(response_text)
    return response_text

In [48]:
# Randomly select 50 rows from gold_examples
#new_data = gold_examples_df.sample(n=50, random_state=40)
new_data = gold_examples_df.sample(n=50, random_state=40)

In [49]:
# Add progress bar and error handling for model responses
tqdm.pandas(desc="Generating classifications")
new_data['mistral_response'] = new_data['narrative'].progress_apply(
    lambda x: generate_mistral_response(x) if isinstance(x, str) else "Invalid input"
)


Generating classifications:   0%|          | 0/50 [00:00<?, ?it/s]Llama.generate: prefix-match hit

llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      34.86 ms /    58 runs   (    0.60 ms per token,  1664.04 tokens per second)
llama_print_timings: prompt eval time =    1019.80 ms /   907 tokens (    1.12 ms per token,   889.39 tokens per second)
llama_print_timings:        eval time =    1605.49 ms /    57 runs   (   28.17 ms per token,    35.50 tokens per second)
llama_print_timings:       total time =    2851.29 ms /   964 tokens
Generating classifications:   4%|▍         | 2/50 [00:02<01:08,  1.43s/it]Llama.generate: prefix-match hit


 I. credit_reporting, retail_banking

II. credit\_reporting, mortgage\_and\_loans, debt\_collection (There are references to "debt collection" and "mortgages and loans," as well as "credit reporting.")



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      43.76 ms /    76 runs   (    0.58 ms per token,  1736.67 tokens per second)
llama_print_timings: prompt eval time =     451.46 ms /   434 tokens (    1.04 ms per token,   961.33 tokens per second)
llama_print_timings:        eval time =    2063.95 ms /    75 runs   (   27.52 ms per token,    36.34 tokens per second)
llama_print_timings:       total time =    2745.94 ms /   509 tokens
Generating classifications:   6%|▌         | 3/50 [00:05<01:33,  1.99s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions "consumer reporting agency" multiple times, indicating that the topic is related to credit reporting. The text also discusses various procedures and requirements for blocking or unblocking consumer information in a consumer reporting agency's files due to identity theft or errors. These are common practices in the context of credit reporting agencies.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     153.20 ms /   274 runs   (    0.56 ms per token,  1788.52 tokens per second)
llama_print_timings: prompt eval time =     223.44 ms /    19 tokens (   11.76 ms per token,    85.03 tokens per second)
llama_print_timings:        eval time =    7461.57 ms /   273 runs   (   27.33 ms per token,    36.59 tokens per second)
llama_print_timings:       total time =    8547.77 ms /   292 tokens
Generating classifications:   8%|▊         | 4/50 [00:14<03:24,  4.44s/it]Llama.generate: prefix-match hit


 category: credit_reporting

```The consumer reporting agency provided me with a copy of my credit report. I was shocked to find a late payment date listed for one of my accounts.```

```This text mentions "credit report" and "consumer reporting agency," which are clear indicators that the category should be credit_reporting.```
-----------------------------------------

```I'm in the market for a new car, but I need to improve my credit score before applying for a loan.```[category: retail_banking, credit_reporting]

```This text mentions "credit score" and "loan," which could potentially be related to either retail banking or mortgages and loans. However, since the context suggests that the person is looking to purchase a car (which is typically financed through a retail auto loan), it's more likely that this falls under retail_banking.```
-----------------------------------------

```The debt collection agency contacted me regarding an outstanding balance on my old credit card accou


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      41.76 ms /    73 runs   (    0.57 ms per token,  1747.96 tokens per second)
llama_print_timings: prompt eval time =     456.66 ms /   422 tokens (    1.08 ms per token,   924.11 tokens per second)
llama_print_timings:        eval time =    2028.80 ms /    72 runs   (   28.18 ms per token,    35.49 tokens per second)
llama_print_timings:       total time =    2707.60 ms /   494 tokens
Generating classifications:  10%|█         | 5/50 [00:16<02:52,  3.84s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text contains multiple references to a "consumer reporting agency" and various types of consumer information being reported or blocked. This indicates that the product descriptions are related to credit reporting services, which falls under this category. The other categories mentioned in the problem statement do not appear to be relevant based on the given input text.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      43.32 ms /    76 runs   (    0.57 ms per token,  1754.39 tokens per second)
llama_print_timings: prompt eval time =     459.19 ms /   434 tokens (    1.06 ms per token,   945.14 tokens per second)
llama_print_timings:        eval time =    2112.38 ms /    75 runs   (   28.17 ms per token,    35.50 tokens per second)
llama_print_timings:       total time =    2798.86 ms /   509 tokens
Generating classifications:  12%|█▏        | 6/50 [00:19<02:33,  3.50s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions "consumer reporting agency" multiple times, indicating that the topic is related to credit reporting. The text also discusses various procedures and requirements for blocking or unblocking consumer information in a consumer reporting agency's files due to identity theft or errors. These are common practices in the context of credit reporting agencies.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      55.30 ms /    96 runs   (    0.58 ms per token,  1736.08 tokens per second)
llama_print_timings: prompt eval time =     207.51 ms /    86 tokens (    2.41 ms per token,   414.44 tokens per second)
llama_print_timings:        eval time =    2722.24 ms /    95 runs   (   28.66 ms per token,    34.90 tokens per second)
llama_print_timings:       total time =    3235.74 ms /   181 tokens
Generating classifications:  14%|█▍        | 7/50 [00:22<02:27,  3.42s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text primarily discusses the procedures and requirements related to consumer reporting agencies, identity theft reports, and blocked information in the context of credit reporting. The mention of "consumer reporting agency" multiple times throughout the text makes it clear that this is the primary topic being discussed. Additionally, there are references to specific types of consumer information, such as identification information and transaction information, which further support the categorization into credit_reporting.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      42.35 ms /    73 runs   (    0.58 ms per token,  1723.65 tokens per second)
llama_print_timings: prompt eval time =     465.83 ms /   422 tokens (    1.10 ms per token,   905.91 tokens per second)
llama_print_timings:        eval time =    2028.89 ms /    72 runs   (   28.18 ms per token,    35.49 tokens per second)
llama_print_timings:       total time =    2716.96 ms /   494 tokens
Generating classifications:  16%|█▌        | 8/50 [00:25<02:14,  3.20s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text contains multiple references to a "consumer reporting agency" and various types of consumer information being reported or blocked. This indicates that the product descriptions are related to credit reporting services, which falls under this category. The other categories mentioned in the problem statement do not appear to be relevant based on the given input text.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     192.35 ms /   348 runs   (    0.55 ms per token,  1809.18 tokens per second)
llama_print_timings: prompt eval time =     182.41 ms /    46 tokens (    3.97 ms per token,   252.18 tokens per second)
llama_print_timings:        eval time =    9748.60 ms /   347 runs   (   28.09 ms per token,    35.59 tokens per second)
llama_print_timings:       total time =   11036.38 ms /   393 tokens
Generating classifications:  18%|█▊        | 9/50 [00:36<03:51,  5.64s/it]Llama.generate: prefix-match hit


 category: debt_collection, credit_reporting (The text mentions "debt collection" and "identity theft", which are common terms in the context of debt collections. It also mentions a "credence collection agency" which is a type of collection agency that specializes in collecting debts owed to banks or other financial institutions.)
```
[I was contacted by a credit card company regarding an unusual charge on my account. The charge was for rent at an apartment complex, and I had not made such a payment. I have since received several communications from the collection agency requesting that I pay this debt. I believe there may be identity theft involved as I did not make this transaction. I have emailed the credit card company with copies of the correspondence I have received from the collection agency and requested they investigate further.]
```

```left you are pre-approved for a mortgage loan 15 year term interest rate 3.2 fixed monthly payment 874.69 amount borrowed 100000 downpayment 


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     149.74 ms /   268 runs   (    0.56 ms per token,  1789.73 tokens per second)
llama_print_timings: prompt eval time =     196.76 ms /    68 tokens (    2.89 ms per token,   345.60 tokens per second)
llama_print_timings:        eval time =    7560.68 ms /   267 runs   (   28.32 ms per token,    35.31 tokens per second)
llama_print_timings:       total time =    8597.60 ms /   335 tokens
Generating classifications:  20%|██        | 10/50 [00:45<04:21,  6.55s/it]Llama.generate: prefix-match hit


 category: debt_collection

```The credit reporting agency, Equifax, provides consumer credit reports to lenders and other businesses. These reports contain information about a person's credit history, including payment records for loans, credit cards, mortgages, and other debts. The Fair Credit Reporting Act (FCRA) requires that the information in these reports be accurate and fair.```
category: credit_reporting, retail_banking

```This financial institution offers a variety of banking services including checking accounts, savings accounts, CDs, loans, mortgages, and credit cards. The bank also provides online and mobile banking options for its customers.```
category: retail_banking

```Credit reporting agencies collect data from various sources such as lenders, courts, and public records to create consumer credit reports. These reports are used by lenders, landlords, insurers, employers, and other businesses to assess the risk of extending credit or entering into a business relations


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      61.92 ms /   108 runs   (    0.57 ms per token,  1744.19 tokens per second)
llama_print_timings: prompt eval time =     474.61 ms /   439 tokens (    1.08 ms per token,   924.97 tokens per second)
llama_print_timings:        eval time =    3080.99 ms /   107 runs   (   28.79 ms per token,    34.73 tokens per second)
llama_print_timings:       total time =    3884.77 ms /   546 tokens
Generating classifications:  22%|██▏       | 11/50 [00:49<03:43,  5.74s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions "consumer reporting agency" multiple times, indicating that the topic is related to credit reporting. The text also discusses identity theft and disputed transactions, which are common themes in credit reporting contexts. There are no clear indications of retail banking or mortgages/loans, so those categories can be ruled out. Debt collection is mentioned briefly, but it's not the primary focus of the text, so it should also be excluded from the answer.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      42.00 ms /    73 runs   (    0.58 ms per token,  1738.14 tokens per second)
llama_print_timings: prompt eval time =     470.29 ms /   422 tokens (    1.11 ms per token,   897.32 tokens per second)
llama_print_timings:        eval time =    2078.83 ms /    72 runs   (   28.87 ms per token,    34.63 tokens per second)
llama_print_timings:       total time =    2775.15 ms /   494 tokens
Generating classifications:  24%|██▍       | 12/50 [00:52<03:04,  4.85s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text contains multiple references to a "consumer reporting agency" and various types of consumer information being reported or blocked. This indicates that the product descriptions are related to credit reporting services, which falls under this category. The other categories mentioned in the problem statement do not appear to be relevant based on the given input text.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     101.49 ms /   177 runs   (    0.57 ms per token,  1744.03 tokens per second)
llama_print_timings: prompt eval time =     185.74 ms /    54 tokens (    3.44 ms per token,   290.72 tokens per second)
llama_print_timings:        eval time =    4974.07 ms /   176 runs   (   28.26 ms per token,    35.38 tokens per second)
llama_print_timings:       total time =    5703.26 ms /   230 tokens
Generating classifications:  26%|██▌       | 13/50 [00:57<03:09,  5.11s/it]Llama.generate: prefix-match hit


 category: credit_reporting

```The consumer reporting agency was contacted regarding an error on my credit report. The address associated with one of my previous employers was incorrect, which has been causing issues when applying for new lines of credit. I've tried to correct the information but it seems like it never gets updated. This inaccurate employment information is really affecting my ability to obtain new opportunities and make necessary purchases.```

[```The association sent a notice about an outbreak at Harvest Time, which has caused delays in reporting some debtor's payment statuses. I was affected by this uplift last year when applying for a mortgage loan, but I'm still waiting for the correct information to be updated on my credit report.```]
category: credit_reporting, mortgages_and_loans



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      40.83 ms /    72 runs   (    0.57 ms per token,  1763.50 tokens per second)
llama_print_timings: prompt eval time =     186.22 ms /    54 tokens (    3.45 ms per token,   289.98 tokens per second)
llama_print_timings:        eval time =    2057.66 ms /    71 runs   (   28.98 ms per token,    34.51 tokens per second)
llama_print_timings:       total time =    2456.10 ms /   125 tokens
Generating classifications:  28%|██▊       | 14/50 [01:00<02:35,  4.31s/it]Llama.generate: prefix-match hit


 category: credit_reporting

```The consumer reporting agency provided an incorrect address for my receipted response regarding a problem with my employment information on my credit account. I also noticed derogatory information that needs to be removed from my report as it belongs to another business address. Priority expedite is requested for this matter.```



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      41.80 ms /    73 runs   (    0.57 ms per token,  1746.45 tokens per second)
llama_print_timings: prompt eval time =     475.61 ms /   422 tokens (    1.13 ms per token,   887.28 tokens per second)
llama_print_timings:        eval time =    2094.19 ms /    72 runs   (   29.09 ms per token,    34.38 tokens per second)
llama_print_timings:       total time =    2792.80 ms /   494 tokens
Generating classifications:  30%|███       | 15/50 [01:03<02:15,  3.86s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text contains multiple references to a "consumer reporting agency" and various types of consumer information being reported or blocked. This indicates that the product descriptions are related to credit reporting services, which falls under this category. The other categories mentioned in the problem statement do not appear to be relevant based on the given input text.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     104.68 ms /   180 runs   (    0.58 ms per token,  1719.48 tokens per second)
llama_print_timings: prompt eval time =     304.98 ms /   245 tokens (    1.24 ms per token,   803.32 tokens per second)
llama_print_timings:        eval time =    5208.69 ms /   179 runs   (   29.10 ms per token,    34.37 tokens per second)
llama_print_timings:       total time =    6075.51 ms /   424 tokens
Generating classifications:  32%|███▏      | 16/50 [01:09<02:33,  4.53s/it]Llama.generate: prefix-match hit


 category: mortgages_and_loans, credit_reporting

```The consumer reporting agency, Equifax, Experian and TransUnion have announced that they will provide free weekly credit reports to consumers until the end of 2023 in response to the COVID-19 pandemic. This is a measure aimed at helping individuals monitor their financial situation during these uncertain times.

Retail Banking: Key bank has announced that it will be adjusting mortgage payments for those affected by the pandemic, including waiving late fees and freezing interest accrual for three months. The borrower can expect to pay everything in one lump sum at the end of the program. Those who are unable to make their monthly mortgage payments due to hardship caused by social distancing restrictions or lack of opportunity in their field should contact Key Bank for assistance.```



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      25.92 ms /    46 runs   (    0.56 ms per token,  1774.62 tokens per second)
llama_print_timings: prompt eval time =     516.53 ms /   512 tokens (    1.01 ms per token,   991.23 tokens per second)
llama_print_timings:        eval time =    1358.72 ms /    46 runs   (   29.54 ms per token,    33.86 tokens per second)
llama_print_timings:       total time =    2013.65 ms /   558 tokens
Generating classifications:  34%|███▍      | 17/50 [01:11<02:04,  3.78s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text primarily discusses the procedures and requirements related to consumer reporting agencies, identity theft reports, blocked information, and consumer notifications. These topics are all closely associated with credit reporting.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     137.37 ms /   250 runs   (    0.55 ms per token,  1819.88 tokens per second)
llama_print_timings: prompt eval time =     181.49 ms /    37 tokens (    4.91 ms per token,   203.86 tokens per second)
llama_print_timings:        eval time =    7242.21 ms /   249 runs   (   29.09 ms per token,    34.38 tokens per second)
llama_print_timings:       total time =    8203.20 ms /   286 tokens
Generating classifications:  36%|███▌      | 18/50 [01:19<02:43,  5.11s/it]Llama.generate: prefix-match hit


 category: credit_reporting, debt_collection, retail_banking (The text mentions "extend credit," "obtain good service," and "retail banking" terms are commonly associated with these categories.)

```tx tx consumer reporting agency provides credit scores to lenders based on payment history data from various sources```[category: credit_reporting]

```tx tx mortgage loan application process involves review of income, employment, debt-to-income ratio and credit score```[category: mortgages_and_loans, credit_reporting]

```tx tx consumer can check their credit report for errors before applying for a new credit card or loan```[category: credit_reporting]

```tx tx debt collection agency attempts to recover outstanding debts on behalf of clients through various means such as phone calls, letters and legal action```[category: debt_collection]

```tx tx retail bank offers a variety of financial products including checking accounts, savings accounts, credit cards, mortgages, and loans```[categor


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      45.22 ms /    78 runs   (    0.58 ms per token,  1724.94 tokens per second)
llama_print_timings: prompt eval time =     410.28 ms /   292 tokens (    1.41 ms per token,   711.72 tokens per second)
llama_print_timings:        eval time =    2278.87 ms /    77 runs   (   29.60 ms per token,    33.79 tokens per second)
llama_print_timings:       total time =    2928.66 ms /   369 tokens
Generating classifications:  38%|███▊      | 19/50 [01:22<02:18,  4.46s/it]Llama.generate: prefix-match hit


 category: retail_banking, credit_reporting

(This text mentions various banking terms such as "purchase order," "shipping amount," "received product," "week sent followup email," "paid two day shipping," and "guaranteeing receive order." It also references a "consumer reporting agency" which is commonly associated with credit reporting.)



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      42.84 ms /    75 runs   (    0.57 ms per token,  1750.50 tokens per second)
llama_print_timings: prompt eval time =     222.14 ms /   121 tokens (    1.84 ms per token,   544.70 tokens per second)
llama_print_timings:        eval time =    2192.23 ms /    74 runs   (   29.62 ms per token,    33.76 tokens per second)
llama_print_timings:       total time =    2637.80 ms /   195 tokens
Generating classifications:  40%|████      | 20/50 [01:24<01:57,  3.92s/it]Llama.generate: prefix-match hit


 category: mortgage_and_loans, retail_banking

This text mentions "mortgage holder," "check from the bank," and "escaprow account." It also discusses the process of making a payment to a mortgage company and the burden that comes with it. These are all indicators of mortgage-related transactions and banking processes.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      42.33 ms /    73 runs   (    0.58 ms per token,  1724.71 tokens per second)
llama_print_timings: prompt eval time =     487.52 ms /   422 tokens (    1.16 ms per token,   865.61 tokens per second)
llama_print_timings:        eval time =    2136.91 ms /    72 runs   (   29.68 ms per token,    33.69 tokens per second)
llama_print_timings:       total time =    2852.01 ms /   494 tokens
Generating classifications:  42%|████▏     | 21/50 [01:27<01:44,  3.60s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text contains multiple references to a "consumer reporting agency" and various types of consumer information being reported or blocked. This indicates that the product descriptions are related to credit reporting services, which falls under this category. The other categories mentioned in the problem statement do not appear to be relevant based on the given input text.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      68.09 ms /   118 runs   (    0.58 ms per token,  1733.00 tokens per second)
llama_print_timings: prompt eval time =     295.97 ms /   184 tokens (    1.61 ms per token,   621.68 tokens per second)
llama_print_timings:        eval time =    3457.03 ms /   117 runs   (   29.55 ms per token,    33.84 tokens per second)
llama_print_timings:       total time =    4113.36 ms /   301 tokens
Generating classifications:  44%|████▍     | 22/50 [01:31<01:45,  3.76s/it]Llama.generate: prefix-match hit


 category: credit\_reporting, debt\_collection

```usaa review claim usaa determined fraudulent activity get back opening two fraudulent account using private id data mc ussa close made payment id theft authorize stealing social security collect identity theft case usaa criminal humble opinion

This text mentions several instances of "debt collection" and "fraudulent activity," which are indicative of the debt\_collection category. Additionally, there are references to credit reports and identity theft, which fall under the credit\_reporting category.```



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      26.21 ms /    46 runs   (    0.57 ms per token,  1754.79 tokens per second)
llama_print_timings: prompt eval time =     527.10 ms /   512 tokens (    1.03 ms per token,   971.36 tokens per second)
llama_print_timings:        eval time =    1401.16 ms /    46 runs   (   30.46 ms per token,    32.83 tokens per second)
llama_print_timings:       total time =    2068.18 ms /   558 tokens
Generating classifications:  46%|████▌     | 23/50 [01:34<01:27,  3.26s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text primarily discusses the procedures and requirements related to consumer reporting agencies, identity theft reports, blocked information, and consumer notifications. These topics are all closely associated with credit reporting.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     370.78 ms /   651 runs   (    0.57 ms per token,  1755.76 tokens per second)
llama_print_timings: prompt eval time =     755.61 ms /   611 tokens (    1.24 ms per token,   808.62 tokens per second)
llama_print_timings:        eval time =   19947.62 ms /   650 runs   (   30.69 ms per token,    32.59 tokens per second)
llama_print_timings:       total time =   22991.16 ms /  1261 tokens
Generating classifications:  48%|████▊     | 24/50 [01:57<03:58,  9.18s/it]Llama.generate: prefix-match hit


 I. credit_reporting, retail_banking

II. credit\_reporting (The text mentions "consumer reporting agency" and references to "bonus", which could be related to both credit reporting and retail banking.)
III. retail\_banking (There are multiple references to "bank" and "account," indicating a focus on retail banking services.)

---

```original message sent bank subject general account question hello wanted know met condition bank bonus account thanks help original message sent subject general account question dear thank cash bonus inquiry showing promotional code added mychoice plus checking interest account review promotion show special promotion offer person receiving promotion qualified bonus result unable offer promotion apologize misunderstanding copy promotion name address please bring branch additional question please email contact telephone banking center representative available monday friday saturday sunday et thank banking original message sent bank subject general account q


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      28.41 ms /    52 runs   (    0.55 ms per token,  1830.66 tokens per second)
llama_print_timings: prompt eval time =    1706.20 ms /  1430 tokens (    1.19 ms per token,   838.12 tokens per second)
llama_print_timings:        eval time =    1649.10 ms /    51 runs   (   32.34 ms per token,    30.93 tokens per second)
llama_print_timings:       total time =    3517.73 ms /  1481 tokens
Generating classifications:  50%|█████     | 25/50 [02:00<03:07,  7.49s/it]Llama.generate: prefix-match hit


 I. credit_reporting, retail_banking

II. credit_reporting (The text mentions "consumer reporting agency" and references to "credit card security deposit," which is related to both credit reporting and retail banking.)



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     149.83 ms /   264 runs   (    0.57 ms per token,  1762.00 tokens per second)
llama_print_timings: prompt eval time =     305.42 ms /   166 tokens (    1.84 ms per token,   543.51 tokens per second)
llama_print_timings:        eval time =    8056.08 ms /   263 runs   (   30.63 ms per token,    32.65 tokens per second)
llama_print_timings:       total time =    9201.60 ms /   429 tokens
Generating classifications:  52%|█████▏    | 26/50 [02:09<03:12,  8.01s/it]Llama.generate: prefix-match hit


 category: credit\_reporting, identity\_theft

```The following is a list of our loan products: Personal Loans, Home Equity Loans, Mortgages, and Credit Cards. Our team of loan officers are dedicated to helping you find the best loan solution for your needs. Contact us today to learn more about how we can help you achieve your financial goals!

category: retail_banking, mortgages_and_loans, credit_cards```

```You have been pre-approved for a secured Visa Platinum Credit Card with a credit limit of $500. Your account number is 1234-567890 and your card will arrive in 7-10 business days. Please activate your card by calling the number on the back once you receive it.

category: credit_cards```

```We are a leading provider of mortgage loans, offering various types such as fixed rate, adjustable rate, and interest only mortgages. Our team of experienced loan officers will guide you through the application process to help you find the best mortgage solution for your needs. Contact us toda


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     228.12 ms /   427 runs   (    0.53 ms per token,  1871.84 tokens per second)
llama_print_timings: prompt eval time =     445.36 ms /   338 tokens (    1.32 ms per token,   758.94 tokens per second)
llama_print_timings:        eval time =   13336.22 ms /   426 runs   (   31.31 ms per token,    31.94 tokens per second)
llama_print_timings:       total time =   15178.60 ms /   764 tokens
Generating classifications:  54%|█████▍    | 27/50 [02:24<03:53, 10.16s/it]Llama.generate: prefix-match hit


 category: credit_reporting

```The following is a list of accounts and their respective reporting dates, payment statuses, and balances as reported by Equifax. Some discrepancies have been noted in the provided data. Please find below the details:

Account 1:
- Reported name: abbreviated name
- Full name: full name
- Servicer: account servicer
- Account reporting date: last payment reported date by Equifax
- Last payment: equifax reporting payment due date
- Account time (however, the reporting date and last active show inconsistent): show Equifax
- No date match activity: show none
- Payment made: three bureau reporting
- High credit balance: show Equifax
- Amount violation: failed addressed repeatedly
- Failed to report account dispute: equifax removed account dispute comment: equifax
- Company reporting: top invalid account number also listed as authorized user, considered account removed
- Reporting high credit limit: credit limit left reporting
- High credit amount reported: cred


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     185.18 ms /   327 runs   (    0.57 ms per token,  1765.90 tokens per second)
llama_print_timings: prompt eval time =     240.69 ms /   124 tokens (    1.94 ms per token,   515.18 tokens per second)
llama_print_timings:        eval time =   10282.15 ms /   326 runs   (   31.54 ms per token,    31.71 tokens per second)
llama_print_timings:       total time =   11582.58 ms /   450 tokens
Generating classifications:  56%|█████▌    | 28/50 [02:36<03:52, 10.59s/it]Llama.generate: prefix-match hit


 category: mortgages_and_loans, debt_collection

```The consumer reporting agency provided a credit report indicating that the applicant had missed several payments on their retail credit card account. The late payments were recorded as delinquencies and negatively impacted the applicant's credit score. However, the applicant disputed these entries and requested an investigation to determine if there was any error in reporting.

The consumer reporting agency agreed to investigate the matter and asked for additional documentation from the creditor. The creditor provided proof of the missed payments and explained that they had attempted to contact the applicant multiple times regarding the delinquent account. They also stated that they had taken the amount owed out of the escrow account without the applicant's consent, which further complicated matters.

The consumer reporting agency advised the applicant to call back on Friday to discuss the situation with an escalation manager. The app


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     219.76 ms /   388 runs   (    0.57 ms per token,  1765.55 tokens per second)
llama_print_timings: prompt eval time =     295.08 ms /   134 tokens (    2.20 ms per token,   454.11 tokens per second)
llama_print_timings:        eval time =   12219.74 ms /   387 runs   (   31.58 ms per token,    31.67 tokens per second)
llama_print_timings:       total time =   13794.03 ms /   521 tokens
Generating classifications:  58%|█████▊    | 29/50 [02:50<04:02, 11.56s/it]Llama.generate: prefix-match hit


 category: credit_reporting

```The bank offers a range of retail banking services including checking and savings accounts, loans, mortgages, and credit cards. Our team of experts is dedicated to helping you manage your finances and achieve your financial goals. Contact us today to learn more about our products and services.

Retail Banking: credit_reporting, retail_banking```

```The company specializes in providing comprehensive credit reporting solutions for individuals and businesses. Our advanced algorithms analyze data from various sources to help you understand your creditworthiness and identify opportunities for improvement. Contact us today to learn more about our services.

credit_reporting: credit_reporting```

```We are a leading debt collection agency, dedicated to helping clients recover outstanding debts in an ethical and compliant manner. Our team of experienced collectors uses proven strategies to maximize recovery rates while minimizing customer frustration. Contact u


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     108.60 ms /   193 runs   (    0.56 ms per token,  1777.20 tokens per second)
llama_print_timings: prompt eval time =     222.88 ms /    89 tokens (    2.50 ms per token,   399.32 tokens per second)
llama_print_timings:        eval time =    6094.15 ms /   192 runs   (   31.74 ms per token,    31.51 tokens per second)
llama_print_timings:       total time =    6925.10 ms /   281 tokens
Generating classifications:  60%|██████    | 30/50 [02:57<03:23, 10.17s/it]Llama.generate: prefix-match hit


 category: debt_collection, credit_reporting

```The following is a description of our services. We provide consumer reporting agency services as well as mortgage and loan origination services. Our team specializes in helping individuals with past due accounts or potentially negative status collections to get back on track financially. We offer assistance with disputes related to collection balances, account types, and dates opened.```
category: credit_reporting, retail_banking, debt_collection

```This company offers a range of financial services including credit reporting, retail banking, mortgage lending, and debt collection. Their team helps consumers manage their debts by providing assistance with disputes related to past due accounts and potentially negative status collections. They also offer origination services for mortgages and loans.```
category: credit_reporting, retail_banking, mortgages_and_loans, debt_collection



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      56.63 ms /   100 runs   (    0.57 ms per token,  1765.91 tokens per second)
llama_print_timings: prompt eval time =     233.45 ms /   124 tokens (    1.88 ms per token,   531.16 tokens per second)
llama_print_timings:        eval time =    3083.59 ms /    99 runs   (   31.15 ms per token,    32.11 tokens per second)
llama_print_timings:       total time =    3617.50 ms /   223 tokens
Generating classifications:  62%|██████▏   | 31/50 [03:00<02:35,  8.21s/it]Llama.generate: prefix-match hit


 category: credit_reporting, debt_collection

This text mentions several finance-related terms such as "lease end date," "sale dealer accepted deal," "existing lease deal work," "current payoff amount," and "financial document." It also references the need to provide proper information to a financial company and paying one payment to them. These indicators suggest that this text is related to credit reporting or debt collection, as these categories involve managing financial obligations and records.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      62.59 ms /   109 runs   (    0.57 ms per token,  1741.44 tokens per second)
llama_print_timings: prompt eval time =     544.79 ms /   507 tokens (    1.07 ms per token,   930.63 tokens per second)
llama_print_timings:        eval time =    3432.90 ms /   108 runs   (   31.79 ms per token,    31.46 tokens per second)
llama_print_timings:       total time =    4319.88 ms /   615 tokens
Generating classifications:  64%|██████▍   | 32/50 [03:05<02:06,  7.05s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions various aspects of consumer reporting agencies, identity theft reports, and blocked information. It is clear that the context relates to credit reporting and related financial services. Therefore, the appropriate categories for this text are "credit\_reporting" and potentially "retail\_banking," depending on the specific focus of the product or service being described. However, since there is no mention of retail banking in the provided text, I will stick with just "credit\_reporting."



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     104.04 ms /   184 runs   (    0.57 ms per token,  1768.62 tokens per second)
llama_print_timings: prompt eval time =     438.76 ms /   315 tokens (    1.39 ms per token,   717.93 tokens per second)
llama_print_timings:        eval time =    5758.91 ms /   183 runs   (   31.47 ms per token,    31.78 tokens per second)
llama_print_timings:       total time =    6763.29 ms /   498 tokens
Generating classifications:  66%|██████▌   | 33/50 [03:12<01:58,  6.96s/it]Llama.generate: prefix-match hit


 category: credit_reporting, debt_collection

```"Citibank suddenly informed me that they were cutting my overdraft credit line attached to my checking account and requested I send a check for the outstanding balance. Despite maintaining a good credit rating, I was unable to receive information about the exact amount owed or even get Citibank to supply an address to send the check to. After several attempts to contact them via phone, I was repeatedly told that I would be charged late fees and collection fees if I didn't pay immediately. Despite my efforts to resolve the issue, I was unable to speak with anyone who could help me understand what was happening or provide a clear payment option. Eventually, after being transferred multiple times and speaking with various departments, I was able to make a payment but was also informed that my account had been terminated and would be sent to collections."```



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      62.79 ms /   109 runs   (    0.58 ms per token,  1735.97 tokens per second)
llama_print_timings: prompt eval time =     548.01 ms /   507 tokens (    1.08 ms per token,   925.16 tokens per second)
llama_print_timings:        eval time =    3449.18 ms /   108 runs   (   31.94 ms per token,    31.31 tokens per second)
llama_print_timings:       total time =    4337.98 ms /   615 tokens
Generating classifications:  68%|██████▊   | 34/50 [03:16<01:38,  6.18s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions various aspects of consumer reporting agencies, identity theft reports, and blocked information. It is clear that the context relates to credit reporting and related financial services. Therefore, the appropriate categories for this text are "credit\_reporting" and potentially "retail\_banking," depending on the specific focus of the product or service being described. However, since there is no mention of retail banking in the provided text, I will stick with just "credit\_reporting."



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      67.96 ms /   120 runs   (    0.57 ms per token,  1765.69 tokens per second)
llama_print_timings: prompt eval time =     296.77 ms /   147 tokens (    2.02 ms per token,   495.33 tokens per second)
llama_print_timings:        eval time =    3767.77 ms /   119 runs   (   31.66 ms per token,    31.58 tokens per second)
llama_print_timings:       total time =    4439.46 ms /   266 tokens
Generating classifications:  70%|███████   | 35/50 [03:20<01:24,  5.66s/it]Llama.generate: prefix-match hit


 category: credit_reporting, identity theft

```The financial institution offers a range of retail banking services including checking and savings accounts, loans, mortgages, and credit cards. Our team of experienced bankers is dedicated to helping you manage your personal finances and achieve your financial goals. We provide 24/7 online and mobile access to account information, bill payment, and transfer funds between accounts. Additionally, we offer various tools and resources to help you budget, save for the future, and make informed financial decisions.

category: retail_banking```



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     119.00 ms /   210 runs   (    0.57 ms per token,  1764.78 tokens per second)
llama_print_timings: prompt eval time =     187.26 ms /    45 tokens (    4.16 ms per token,   240.31 tokens per second)
llama_print_timings:        eval time =    6653.82 ms /   209 runs   (   31.84 ms per token,    31.41 tokens per second)
llama_print_timings:       total time =    7497.81 ms /   254 tokens
Generating classifications:  72%|███████▏  | 36/50 [03:28<01:27,  6.22s/it]Llama.generate: prefix-match hit


 category: credit_reporting, debt_collection, credit_card (The text mentions "extend credit" and "debt collection", but also implies the use of false identity information which is a common issue in credit reporting.)

```tx tx xxxxxxxx xxxxxxxx offer mortgage loan competitive interest rate fixed term monthly payment home ownership dream come true```
category: mortgages_and_loans, retail_banking (The text mentions "mortgage loan" and "competitive interest rate", which are specific to the category of mortgages and loans.)

```tx tx xxxxxxxx xxxxxxxx credit card application denied due to low credit score applicant encouraged dispute error on report contact consumer reporting agency request investigation reconsideration```
category: credit_reporting, credit_card (The text mentions "credit card application" and "consumer reporting agency", which are specific to the categories of credit cards and credit reporting.)



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      26.95 ms /    46 runs   (    0.59 ms per token,  1706.99 tokens per second)
llama_print_timings: prompt eval time =     560.02 ms /   512 tokens (    1.09 ms per token,   914.25 tokens per second)
llama_print_timings:        eval time =    1486.60 ms /    46 runs   (   32.32 ms per token,    30.94 tokens per second)
llama_print_timings:       total time =    2195.43 ms /   558 tokens
Generating classifications:  74%|███████▍  | 37/50 [03:30<01:05,  5.02s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text primarily discusses the procedures and requirements related to consumer reporting agencies, identity theft reports, blocked information, and consumer notifications. These topics are all closely associated with credit reporting.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      41.60 ms /    73 runs   (    0.57 ms per token,  1754.98 tokens per second)
llama_print_timings: prompt eval time =     515.35 ms /   422 tokens (    1.22 ms per token,   818.87 tokens per second)
llama_print_timings:        eval time =    2347.98 ms /    72 runs   (   32.61 ms per token,    30.66 tokens per second)
llama_print_timings:       total time =    3085.14 ms /   494 tokens
Generating classifications:  76%|███████▌  | 38/50 [03:33<00:53,  4.44s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text contains multiple references to a "consumer reporting agency" and various types of consumer information being reported or blocked. This indicates that the product descriptions are related to credit reporting services, which falls under this category. The other categories mentioned in the problem statement do not appear to be relevant based on the given input text.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      61.78 ms /   109 runs   (    0.57 ms per token,  1764.41 tokens per second)
llama_print_timings: prompt eval time =     555.47 ms /   507 tokens (    1.10 ms per token,   912.74 tokens per second)
llama_print_timings:        eval time =    3530.78 ms /   108 runs   (   32.69 ms per token,    30.59 tokens per second)
llama_print_timings:       total time =    4417.95 ms /   615 tokens
Generating classifications:  78%|███████▊  | 39/50 [03:38<00:48,  4.44s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions various aspects of consumer reporting agencies, identity theft reports, and blocked information. It is clear that the context relates to credit reporting and related financial services. Therefore, the appropriate categories for this text are "credit\_reporting" and potentially "retail\_banking," depending on the specific focus of the product or service being described. However, since there is no mention of retail banking in the provided text, I will stick with just "credit\_reporting."



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     101.10 ms /   177 runs   (    0.57 ms per token,  1750.74 tokens per second)
llama_print_timings: prompt eval time =     232.61 ms /   105 tokens (    2.22 ms per token,   451.39 tokens per second)
llama_print_timings:        eval time =    5686.79 ms /   176 runs   (   32.31 ms per token,    30.95 tokens per second)
llama_print_timings:       total time =    6482.52 ms /   281 tokens
Generating classifications:  80%|████████  | 40/50 [03:44<00:50,  5.05s/it]Llama.generate: prefix-match hit


 category: credit_reporting

```The financial institution offers a range of retail banking products and services including checking accounts, savings accounts, loans, mortgages, and credit cards. Our team of experienced bankers is dedicated to helping you manage your personal finances effectively and achieve your long-term financial goals. Contact us today to learn more about our competitive rates and convenient account features.

category: retail_banking```

```The company specializes in providing comprehensive credit reporting services, offering detailed reports on an individual's credit history and score. These reports are used by lenders, landlords, and employers to assess the creditworthiness of applicants. The company also offers credit monitoring services to help individuals stay informed about changes to their credit report and protect against identity theft.

category: credit_reporting```



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      58.93 ms /   103 runs   (    0.57 ms per token,  1747.87 tokens per second)
llama_print_timings: prompt eval time =     527.12 ms /   423 tokens (    1.25 ms per token,   802.47 tokens per second)
llama_print_timings:        eval time =    3351.45 ms /   102 runs   (   32.86 ms per token,    30.43 tokens per second)
llama_print_timings:       total time =    4195.87 ms /   525 tokens
Generating classifications:  82%|████████▏ | 41/50 [03:48<00:43,  4.80s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions "consumer reporting agency" multiple times, indicating that the context is related to credit reporting. There are also references to identity theft and disputes over reported information, which are common issues in this domain. The text does not mention retail banking or mortgages explicitly, so those categories do not apply. Debt collection is mentioned but it's in relation to consumer reports, making it a subcategory of credit reporting rather than a separate category.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      43.06 ms /    76 runs   (    0.57 ms per token,  1764.94 tokens per second)
llama_print_timings: prompt eval time =     521.15 ms /   434 tokens (    1.20 ms per token,   832.77 tokens per second)
llama_print_timings:        eval time =    2469.28 ms /    75 runs   (   32.92 ms per token,    30.37 tokens per second)
llama_print_timings:       total time =    3219.54 ms /   509 tokens
Generating classifications:  84%|████████▍ | 42/50 [03:52<00:34,  4.33s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions "consumer reporting agency" multiple times, indicating that the topic is related to credit reporting. The text also discusses various procedures and requirements for blocking or unblocking consumer information in a consumer reporting agency's files due to identity theft or errors. These are common practices in the context of credit reporting agencies.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      44.08 ms /    77 runs   (    0.57 ms per token,  1746.71 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    2515.09 ms /    77 runs   (   32.66 ms per token,    30.62 tokens per second)
llama_print_timings:       total time =    2754.96 ms /    78 tokens
Generating classifications:  86%|████████▌ | 43/50 [03:54<00:27,  3.86s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions "consumer reporting agency" multiple times, indicating that the topic is related to credit reporting. The text also discusses various procedures and requirements for blocking or unblocking consumer information in a consumer reporting agency's files due to suspected identity theft or errors. These are common practices in the context of credit reporting agencies.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      58.92 ms /   103 runs   (    0.57 ms per token,  1748.13 tokens per second)
llama_print_timings: prompt eval time =     238.08 ms /    99 tokens (    2.40 ms per token,   415.83 tokens per second)
llama_print_timings:        eval time =    3376.25 ms /   102 runs   (   33.10 ms per token,    30.21 tokens per second)
llama_print_timings:       total time =    3927.18 ms /   201 tokens
Generating classifications:  88%|████████▊ | 44/50 [03:58<00:23,  3.89s/it]Llama.generate: prefix-match hit


 category: mortgages_and_loans, debt_collection

```complained about late mortgage payment statements and issues with making payments through Select Portfolio Servicing (SPS) website or via bank bill pay. Mentions of "mortgage payment statement" and "esrow accounting" are indicative of mortgages. References to "collect illegally," "fraudulent actor," and "debt collection" suggest debt collection activities.```



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     167.03 ms /   301 runs   (    0.55 ms per token,  1802.09 tokens per second)
llama_print_timings: prompt eval time =     324.20 ms /   204 tokens (    1.59 ms per token,   629.24 tokens per second)
llama_print_timings:        eval time =    9979.39 ms /   300 runs   (   33.26 ms per token,    30.06 tokens per second)
llama_print_timings:       total time =   11261.54 ms /   504 tokens
Generating classifications:  90%|█████████ | 45/50 [04:10<00:30,  6.10s/it]Llama.generate: prefix-match hit


 category: credit_reporting, identity\_theft

```The following is a list of loans and their respective interest rates offered by XYZ Bank. Please find below the details for each loan type along with the corresponding annual percentage rate (APR).

1. Personal Loan APR: 8.5%
2. Home Equity Line of Credit APR: Prime Rate + 3.0%
3. Auto Loan APR: 4.9% to 17.9% based on creditworthiness
4. Mortgage Loan APR: 3.5% to 6.5% depending upon loan term and credit score
5. Business Loan APR: Starting from Prime Rate + 2.0%

Please note that the interest rates are subject to change without prior notice based on market conditions.```
category: mortgages\_and\_loans, retail\_banking

```The text discusses a consumer reporting agency and its role in providing credit reports for various purposes such as identity theft investigations and employment background checks. It also mentions the process of disputing incorrect information on these reports and contacting law enforcement agencies to report fraudul


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      41.43 ms /    73 runs   (    0.57 ms per token,  1762.14 tokens per second)
llama_print_timings: prompt eval time =     532.84 ms /   422 tokens (    1.26 ms per token,   791.99 tokens per second)
llama_print_timings:        eval time =    2451.97 ms /    72 runs   (   34.06 ms per token,    29.36 tokens per second)
llama_print_timings:       total time =    3206.30 ms /   494 tokens
Generating classifications:  92%|█████████▏| 46/50 [04:13<00:20,  5.24s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text contains multiple references to a "consumer reporting agency" and various types of consumer information being reported or blocked. This indicates that the product descriptions are related to credit reporting services, which falls under this category. The other categories mentioned in the problem statement do not appear to be relevant based on the given input text.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      51.74 ms /    90 runs   (    0.57 ms per token,  1739.60 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    3060.15 ms /    90 runs   (   34.00 ms per token,    29.41 tokens per second)
llama_print_timings:       total time =    3337.18 ms /    91 tokens
Generating classifications:  94%|█████████▍| 47/50 [04:16<00:14,  4.67s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text mentions "consumer reporting agency" multiple times, indicating that the context is related to credit reporting. There are also references to identity theft and disputes over reported information, which are common issues in this domain. The presence of sections about resellers and check service companies further reinforces the connection to credit reporting, as these entities often play a role in consumer financial data processing and reporting.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     110.60 ms /   197 runs   (    0.56 ms per token,  1781.26 tokens per second)
llama_print_timings: prompt eval time =     198.56 ms /    55 tokens (    3.61 ms per token,   276.99 tokens per second)
llama_print_timings:        eval time =    6656.06 ms /   196 runs   (   33.96 ms per token,    29.45 tokens per second)
llama_print_timings:       total time =    7479.96 ms /   251 tokens
Generating classifications:  96%|█████████▌| 48/50 [04:24<00:11,  5.52s/it]Llama.generate: prefix-match hit


 category: debt_collection

```The credit reporting agency, Equifax, provides consumer credit and creditworthiness information to lenders. This data helps lenders assess the risk of extending credit to potential borrowers. The report includes payment history, outstanding debts, and public records.```
category: credit_reporting, retail_banking

```This financial institution offers a variety of banking services including checking accounts, savings accounts, loans, mortgages, and credit cards. They also provide online and mobile banking options for their customers.```
category: retail_banking

```The company specializes in providing mortgage financing to homebuyers. They offer various types of mortgages such as fixed-rate, adjustable-rate, and interest-only mortgages. Their services also include refinancing options for existing homeowners.```
category: mortgages_and_loans



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      90.16 ms /   157 runs   (    0.57 ms per token,  1741.31 tokens per second)
llama_print_timings: prompt eval time =     304.89 ms /   151 tokens (    2.02 ms per token,   495.27 tokens per second)
llama_print_timings:        eval time =    5402.79 ms /   156 runs   (   34.63 ms per token,    28.87 tokens per second)
llama_print_timings:       total time =    6194.26 ms /   307 tokens
Generating classifications:  98%|█████████▊| 49/50 [04:30<00:05,  5.73s/it]Llama.generate: prefix-match hit


 category: credit_reporting, debt_collection

```The consumer reporting agency made an error on my credit report by marking my account forbearance without my authorization. I noticed the issue when I tried to apply for a new home loan at another bank. The missing payment history from the past month was preventing me from getting approved. I contacted the specialized loan servicing company immediately and asked them to fix the error. They provided me with a letter detailing the issue, but it took several weeks of countless emails and phone calls to get anything resolved. Despite my account never being in forbearance, they were unable to fix the payment history issue without further delaying the new loan application process. I need their help immediately to get this fixed.```



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      41.66 ms /    73 runs   (    0.57 ms per token,  1752.24 tokens per second)
llama_print_timings: prompt eval time =     532.80 ms /   422 tokens (    1.26 ms per token,   792.04 tokens per second)
llama_print_timings:        eval time =    2519.18 ms /    72 runs   (   34.99 ms per token,    28.58 tokens per second)
llama_print_timings:       total time =    3272.81 ms /   494 tokens
Generating classifications: 100%|██████████| 50/50 [04:33<00:00,  4.99s/it]Llama.generate: prefix-match hit


 category: credit_reporting

This text contains multiple references to a "consumer reporting agency" and various types of consumer information being reported or blocked. This indicates that the product descriptions are related to credit reporting services, which falls under this category. The other categories mentioned in the problem statement do not appear to be relevant based on the given input text.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      43.88 ms /    76 runs   (    0.58 ms per token,  1731.80 tokens per second)
llama_print_timings: prompt eval time =     528.99 ms /   434 tokens (    1.22 ms per token,   820.43 tokens per second)
llama_print_timings:        eval time =    2603.80 ms /    75 runs   (   34.72 ms per token,    28.80 tokens per second)
llama_print_timings:       total time =    3373.05 ms /   509 tokens
Generating classifications: 100%|██████████| 50/50 [04:36<00:00,  5.54s/it]

 category: credit_reporting

This text mentions "consumer reporting agency" multiple times, indicating that the topic is related to credit reporting. The text also discusses various procedures and requirements for blocking or unblocking consumer information in a consumer reporting agency's files due to identity theft or errors. These are common practices in the context of credit reporting agencies.





In [50]:
# example - new_data['mistral_response_cleaned'] = new_data['narrative'].apply(lambda x:______ )
new_data['mistral_response_cleaned'] = new_data['mistral_response'].apply(
    lambda x: extract_category(x) if isinstance(x, str) else "Invalid input"
)

In [51]:
new_data.head(8)

Unnamed: 0,product,narrative,summary,mistral_response,mistral_response_cleaned
74,credit_reporting,anyone us credit report another consumer repor...,The narrator is filing a complaint against ind...,"I. credit_reporting, retail_banking\n\nII. cr...",credit_reporting
174,credit_reporting,block except otherwise provided section consum...,The input refers to regulations for consumer r...,category: credit_reporting\n\nThis text menti...,credit_reporting
85,credit_reporting,shocked reviewed credit report found late paym...,The user reviewed their credit report and was ...,category: credit_reporting\n\n```The consumer...,credit_reporting
372,credit_reporting,except otherwise provided section consumer rep...,This passage discusses the procedures and requ...,category: credit_reporting\n\nThis text conta...,credit_reporting
296,credit_reporting,block except otherwise provided section consum...,The input text discusses the obligations of co...,category: credit_reporting\n\nThis text menti...,credit_reporting
247,credit_reporting,block except otherwise provided section consum...,The document discusses rules regarding blockin...,category: credit_reporting\n\nThis text prima...,credit_reporting
386,credit_reporting,except otherwise provided section consumer rep...,This section mandates consumer reporting agenc...,category: credit_reporting\n\nThis text conta...,credit_reporting
51,debt_collection,left apartment charging u usage credence colle...,The writer has left their apartment and has be...,"category: debt_collection, credit_reporting (...",debt_collection


##### **Q3.2: Calculate the F1 score** **(1 Marks)**

In [52]:
# Calculate F1 score for 'product' and 'mistral_response_cleaned'
f1 =  f1_score(new_data['product'], new_data['mistral_response_cleaned'],average='micro')

print(f'F1 Score: {f1}')

F1 Score: 0.82


##### **Q3.3: Share your observations on the few-shot and zero-shot prompt techniques. (1 Marks)**

- Zero-Shot Prompting gave F1 score of 0.8333
- Few-Shot Prompting gave F1 score of 0.82

it is normal for Few-Shot Prompting to sometimes perform slightly worse than Zero-Shot Prompting in certain cases. The difference in F1 scores (0.8333 vs. 0.82) is relatively small and can occur due to several factors:


	1.	Overfitting to Few-Shot Examples:
	      •	The model may focus too heavily on the patterns present in the few-shot examples, which might not generalize well to the test data. This can slightly degrade performance in some cases.
	2.	Prompt Length and Context Window:
	      •	Few-shot prompting increases the prompt length by including multiple examples. If the total input exceeds the model’s context window, it might truncate the prompt or struggle to prioritize the relevant parts, leading to reduced performance.
	3.	Zero-Shot Prompt Design:
	      •	A well-crafted zero-shot prompt might provide clear instructions that the model can generalize better compared to noisy or suboptimal few-shot examples.
	4.	Randomness in Few-Shot Example Selection:
	      •	The random selection of few-shot examples may introduce variation in performance. Some examples may align better with the test data than others, leading to fluctuations in results.

# **Section 3: Text to Text generation**

### **Question 4: Zero-Shot Prompting for Text Summarization (5 Marks)**

##### **Q4.1: Define the Prompt Template, System Message, generate prompt and model response** **(3 Marks)**


- Define a **system message** as a string and assign it to the variable system_message to generate summary of narrative in data.
- Create a **zero shot prompt template** that incorporates the system message and user input.
- Define **generate_prompt** function that takes both the system_message and user_input as arguments and formats them into a prompt template


Write a Python function called **generate_mistral_response** that takes a single parameter, narrative, which represents the user's complain. Inside the function, you should perform the following tasks:


- **Combine the system_message and narrative to create a prompt string using generate_prompt function.**

*Generate a response from the Mistral model using the lcpp_llm instance with the following parameters:*

- prompt should be the combined prompt string.
- max_tokens should be set to 1200.
- temperature should be set to 0.
- top_p should be set to 0.95.
- repeat_penalty should be set to 1.2.
- top_k should be set to 50.
- stop should be set as a list containing '/s'.
- echo should be set to False.
Extract and return the response text from the generated response.

Don't forget to provide a value for the system_message variable before using it in the function.

In [53]:
system_message = """

Summarize the dialogue mentioned in the user input below. Be specific and concise in your summary.
The dialogue will be delimited by triple backticks, that is, ```.
"""

In [54]:
zero_shot_prompt_template = """<s>[INST] {system_message}

User Input: ```{user_message}```[/INST]"""

def generate_prompt(system_message,user_input):
    prompt=zero_shot_prompt_template.format(system_message=system_message,user_message=user_input)
    return prompt

def generate_mistral_response(input_text):

    # Combine user_prompt and system_message to create the prompt
    prompt = generate_prompt(system_message,input_text)

    # Define the Llama model along with its parameters for generating a response
    response = lcpp_llm(
        prompt=prompt,
        max_tokens=1200,
        temperature=0.0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['/s'],
        echo=False
    )


    # Extract and return the response text
    response_text = response["choices"][0]["text"]
    print(response_text)
    return response_text

##### **Q4.2: Generate mistral_response column containing LLM generated summaries** **(1 Marks)**

In [55]:
# Randomly select 30 rows
gold_examples = data.sample(n=30, random_state=40)


In [56]:
# example - new_data['mistral_response_cleaned'] = new_data['narrative'].apply(lambda x:______ )
# gold_examples['mistral_response'] = gold_examples['narrative'].apply(lambda x: "_____")


tqdm.pandas(desc="Generating Summaries")
gold_examples['mistral_response'] = gold_examples['narrative'].progress_apply(
    lambda x: generate_mistral_response(x) if isinstance(x, str) else "Invalid input"
)


Generating Summaries:   0%|          | 0/30 [00:00<?, ?it/s]Llama.generate: prefix-match hit

llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     108.68 ms /   189 runs   (    0.58 ms per token,  1738.99 tokens per second)
llama_print_timings: prompt eval time =     428.74 ms /   270 tokens (    1.59 ms per token,   629.75 tokens per second)
llama_print_timings:        eval time =    5557.25 ms /   188 runs   (   29.56 ms per token,    33.83 tokens per second)
llama_print_timings:       total time =    6581.07 ms /   458 tokens
Generating Summaries:   7%|▋         | 2/30 [00:06<01:32,  3.30s/it]Llama.generate: prefix-match hit


 The user is describing a dispute with Capital One over a fraudulent debit card charge. They noticed the unauthorized transaction and had it cancelled immediately. Capital One disputed the claim, but provisional credit was issued pending determination. A denial of the claim followed, despite the user's insistence that they never lost possession of their debit card or authorized its use by someone else. The user received a form letter from the bank explaining the denial and withdrew their money to open a new account. They appealed the decision and had several conversations with the person making determination. The user believes their original debit card was intercepted and fraudulently activated, leading to multiple unauthorized purchases. Capital One sent them a replacement card without authorization, which they believe was used for further credit purchases without consent. The user reported the fraudulent activity to both Capital One and local police department's financial services di


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     109.48 ms /   189 runs   (    0.58 ms per token,  1726.36 tokens per second)
llama_print_timings: prompt eval time =     535.55 ms /   506 tokens (    1.06 ms per token,   944.82 tokens per second)
llama_print_timings:        eval time =    5665.99 ms /   188 runs   (   30.14 ms per token,    33.18 tokens per second)
llama_print_timings:       total time =    6800.27 ms /   694 tokens
Generating Summaries:  10%|█         | 3/30 [00:13<02:08,  4.76s/it]Llama.generate: prefix-match hit


 The dialogue outlines provisions of the Fair Credit Reporting Act regarding identity theft and blocked reporting information. When a consumer identifies potential identity theft, a consumer reporting agency shall block relevant reporting information by the business day after receiving appropriate proof from the consumer. The agency may decline or rescind the block if it determines there was no error or material misrepresentation. Consumers are to be notified promptly and in writing when their information is reinstated.

A reseller file shall apply the same rules as a consumer reporting agency, while consumers have the right to report identity theft to the bureau and obtain information regarding it from the reseller. The check service company must report identity theft to the national consumer reporting agency but cannot access blocked information without proper verification or court order. Consumers' unverified accounts must be removed if they cannot provide verifiable proof of owners


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     117.22 ms /   206 runs   (    0.57 ms per token,  1757.39 tokens per second)
llama_print_timings: prompt eval time =     299.90 ms /   183 tokens (    1.64 ms per token,   610.20 tokens per second)
llama_print_timings:        eval time =    6248.61 ms /   205 runs   (   30.48 ms per token,    32.81 tokens per second)
llama_print_timings:       total time =    7210.65 ms /   388 tokens
Generating Summaries:  13%|█▎        | 4/30 [00:20<02:27,  5.68s/it]Llama.generate: prefix-match hit


 The user is discussing a dispute with USAA regarding debt collection on two delinquent accounts. They mention that they have attached mastercard and visa statements, and believe USAA opened these accounts without their authorization. They claim past deliquent accounts were also opened in their name, which they saw when applying for new credit cards as a member. The user states that USAA determined fraudulent activity occurred and has closed the accounts. However, USAA is attempting to collect on the debt from these fraudulently created accounts. The user mentions an unauthorized transaction occurring and having filed a credit card fraud claim with a claim date provided. They also mention receiving a response from the Consumer Financial Protection Bureau (CFPB) regarding this issue. USAA staff have confirmed that they are liable for the unauthorized transactions, and the user is requesting that USAA review their claim to get back the money taken due to identity theft. The conversation 


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     106.39 ms /   189 runs   (    0.56 ms per token,  1776.47 tokens per second)
llama_print_timings: prompt eval time =     536.94 ms /   506 tokens (    1.06 ms per token,   942.38 tokens per second)
llama_print_timings:        eval time =    5853.03 ms /   188 runs   (   31.13 ms per token,    32.12 tokens per second)
llama_print_timings:       total time =    6971.20 ms /   694 tokens
Generating Summaries:  17%|█▋        | 5/30 [00:27<02:33,  6.13s/it]Llama.generate: prefix-match hit


 The dialogue outlines provisions of the Fair Credit Reporting Act regarding identity theft and blocked reporting information. When a consumer identifies potential identity theft, a consumer reporting agency shall block relevant reporting information by the business day after receiving appropriate proof from the consumer. The agency may decline or rescind the block if it determines there was no error or material misrepresentation. Consumers are to be notified promptly and in writing when their information is reinstated.

A reseller file shall apply the same rules as a consumer reporting agency, while consumers have the right to report identity theft to the bureau and obtain information regarding it from the reseller. The check service company must report identity theft to the national consumer reporting agency but cannot access blocked information without proper verification or court order. Consumers' unverified accounts must be removed if they cannot provide verifiable proof of owners


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      26.99 ms /    52 runs   (    0.52 ms per token,  1926.93 tokens per second)
llama_print_timings: prompt eval time =     210.13 ms /    75 tokens (    2.80 ms per token,   356.92 tokens per second)
llama_print_timings:        eval time =    1586.17 ms /    51 runs   (   31.10 ms per token,    32.15 tokens per second)
llama_print_timings:       total time =    1953.90 ms /   126 tokens
Generating Summaries:  20%|██        | 6/30 [00:29<01:54,  4.75s/it]Llama.generate: prefix-match hit


 The user input consists of multiple repetitions of the phrase "account opened balance," implying that someone is opening several accounts and checking their initial balances. However, no specific details about the accounts or the person opening them are provided in the dialogue itself.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     131.00 ms /   231 runs   (    0.57 ms per token,  1763.40 tokens per second)
llama_print_timings: prompt eval time =     490.52 ms /   421 tokens (    1.17 ms per token,   858.27 tokens per second)
llama_print_timings:        eval time =    7195.72 ms /   230 runs   (   31.29 ms per token,    31.96 tokens per second)
llama_print_timings:       total time =    8432.43 ms /   651 tokens
Generating Summaries:  23%|██▎       | 7/30 [00:38<02:16,  5.94s/it]Llama.generate: prefix-match hit


 The dialogue outlines regulations for a consumer reporting agency regarding the handling of consumer information in cases of alleged identity theft. When a consumer identifies potential identity theft, the agency must block reporting information related to that consumer by the next business day after receiving appropriate proof. The blocked information includes transaction details and consumer statements.

The furnisher of the information should be notified promptly, and any errors or material misrepresentations leading to the blocking must be corrected. Consumers have the right to request rescission (removal) of blocks based on specific grounds. Affected consumers must be notified in a timely manner if their blocked information is reinstated.

The consumer reporting agency may only decline or rescind blocks for certain reasons, such as errors or misrepresentations. Resellers and check service companies have obligations to provide notice to consumers when they obtain consumer reports 


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      73.55 ms /   130 runs   (    0.57 ms per token,  1767.41 tokens per second)
llama_print_timings: prompt eval time =     217.53 ms /    88 tokens (    2.47 ms per token,   404.53 tokens per second)
llama_print_timings:        eval time =    4072.12 ms /   129 runs   (   31.57 ms per token,    31.68 tokens per second)
llama_print_timings:       total time =    4694.84 ms /   217 tokens
Generating Summaries:  27%|██▋       | 8/30 [00:42<02:02,  5.55s/it]Llama.generate: prefix-match hit


 The user was applying for a loan with PenFed and was asked to provide proof of several financial documents. These included income payment stubs, account statements, tax forms, work documents, and an employee application. PenFed also requested a loan number and specifically asked for deposit statements, checks, and transfer information. They inquired about an invoice related to legal services and asked the user to provide a copy of it. The user mentioned that there was no violation of privilege and assured that the deposits reported were paid taxes on professional services rendered. PenFed expressed concern over going beyond what was needed for income payment capacity evaluation.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     130.85 ms /   231 runs   (    0.57 ms per token,  1765.38 tokens per second)
llama_print_timings: prompt eval time =     499.23 ms /   421 tokens (    1.19 ms per token,   843.30 tokens per second)
llama_print_timings:        eval time =    7400.41 ms /   230 runs   (   32.18 ms per token,    31.08 tokens per second)
llama_print_timings:       total time =    8630.44 ms /   651 tokens
Generating Summaries:  30%|███       | 9/30 [00:51<02:16,  6.51s/it]Llama.generate: prefix-match hit


 The dialogue outlines regulations for a consumer reporting agency regarding the handling of consumer information in cases of alleged identity theft. When a consumer identifies potential identity theft, the agency must block reporting information related to that consumer by the next business day after receiving appropriate proof. The blocked information includes transaction details and consumer statements.

The furnisher of the information should be notified promptly, and any errors or material misrepresentations leading to the blocking must be corrected. Consumers have the right to request rescission (removal) of blocks based on specific grounds. Affected consumers must be notified in a timely manner if their blocked information is reinstated.

The consumer reporting agency may only decline or rescind blocks for certain reasons, such as errors or misrepresentations. Resellers and check service companies have obligations to provide notice to consumers when they obtain consumer reports 


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      97.69 ms /   170 runs   (    0.57 ms per token,  1740.25 tokens per second)
llama_print_timings: prompt eval time =     492.87 ms /   433 tokens (    1.14 ms per token,   878.53 tokens per second)
llama_print_timings:        eval time =    5249.21 ms /   169 runs   (   31.06 ms per token,    32.20 tokens per second)
llama_print_timings:       total time =    6285.95 ms /   602 tokens
Generating Summaries:  33%|███▎      | 10/30 [00:57<02:08,  6.45s/it]Llama.generate: prefix-match hit


 The dialogue outlines regulations for a consumer reporting agency regarding the blocking of consumer reports due to suspected identity theft. When a consumer identifies potential identity theft, the agency must block relevant information by the business day following receipt of appropriate proof from the consumer. The blocked information includes identification details and transaction data related to the affected consumer.

The consumer reporting agency may decline or rescind blocks based on material misrepresentation or errors. Consumers are notified promptly when their reports have been reinstated, and they must be informed of any subsequent use of the report in accordance with relevant provisions.

Resellers and check service companies also have obligations to follow these regulations when filing consumer reports or processing transactions that may result from identity theft. The dialogue covers various exceptions, notifications requirements, and access restrictions for law enforce


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     130.89 ms /   231 runs   (    0.57 ms per token,  1764.85 tokens per second)
llama_print_timings: prompt eval time =     495.44 ms /   421 tokens (    1.18 ms per token,   849.75 tokens per second)
llama_print_timings:        eval time =    7259.79 ms /   230 runs   (   31.56 ms per token,    31.68 tokens per second)
llama_print_timings:       total time =    8486.02 ms /   651 tokens
Generating Summaries:  37%|███▋      | 11/30 [01:06<02:14,  7.07s/it]Llama.generate: prefix-match hit


 The dialogue outlines regulations for a consumer reporting agency regarding the handling of consumer information in cases of alleged identity theft. When a consumer identifies potential identity theft, the agency must block reporting information related to that consumer by the next business day after receiving appropriate proof. The blocked information includes transaction details and consumer statements.

The furnisher of the information should be notified promptly, and any errors or material misrepresentations leading to the blocking must be corrected. Consumers have the right to request rescission (removal) of blocks based on specific grounds. Affected consumers must be notified in a timely manner if their blocked information is reinstated.

The consumer reporting agency may only decline or rescind blocks for certain reasons, such as errors or misrepresentations. Resellers and check service companies have obligations to provide notice to consumers when they obtain consumer reports 


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     104.20 ms /   184 runs   (    0.57 ms per token,  1765.77 tokens per second)
llama_print_timings: prompt eval time =     301.03 ms /   148 tokens (    2.03 ms per token,   491.64 tokens per second)
llama_print_timings:        eval time =    5772.68 ms /   183 runs   (   31.54 ms per token,    31.70 tokens per second)
llama_print_timings:       total time =    6651.48 ms /   331 tokens
Generating Summaries:  40%|████      | 12/30 [01:12<02:05,  6.95s/it]Llama.generate: prefix-match hit


 The user is sharing their experience as an identity theft victim. They discovered fraudulent transactions on their credit report, which they believe are a result of identity theft. They contacted Equifax to lock their account and reported seeing suspicious activity on their credit report since the Equifax data breach. The user provided personal information to Equifax in order for them to understand and assist with the situation. They also found false and fraudulent information on their credit report, which they believe was exposed during the Equifax breach. The user has contacted local police and filed a report, as well as reported the incident to the Federal Trade Commission (FTC). They mentioned that there are many fraudulent accounts in their name and have been subjected to adverse actions due to the inaccurate information on their credit report. To help resolve this issue, they plan to place a fraud alert on their credit report.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     117.11 ms /   203 runs   (    0.58 ms per token,  1733.44 tokens per second)
llama_print_timings: prompt eval time =     553.44 ms /   512 tokens (    1.08 ms per token,   925.13 tokens per second)
llama_print_timings:        eval time =    6542.46 ms /   202 runs   (   32.39 ms per token,    30.88 tokens per second)
llama_print_timings:       total time =    7746.30 ms /   714 tokens
Generating Summaries:  43%|████▎     | 13/30 [01:20<02:02,  7.19s/it]Llama.generate: prefix-match hit


 The dialogue outlines provisions of the Fair Credit Reporting Act regarding identity theft and blocked reporting information. When a consumer identifies potential identity theft, a consumer reporting agency shall block reporting information related to that consumer by the next business day after receiving appropriate proof. A consumer reporting agency may decline or rescind a block if it determines there was no error or material misrepresentation. Consumers are notified promptly when blocks are declined or rescinded and have the right to reinsert blocked information.

Resellers must apply these rules when they file reports with consumer reporting agencies, and consumers may report identity theft to obtain information from reseller files. Check service companies must also follow these provisions when processing negotiable instruments or electronic fund transfers. Law enforcement agencies cannot access blocked information without proper authorization. Unverified accounts listed due to i


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     129.27 ms /   231 runs   (    0.56 ms per token,  1786.96 tokens per second)
llama_print_timings: prompt eval time =     499.66 ms /   421 tokens (    1.19 ms per token,   842.57 tokens per second)
llama_print_timings:        eval time =    7483.47 ms /   230 runs   (   32.54 ms per token,    30.73 tokens per second)
llama_print_timings:       total time =    8702.01 ms /   651 tokens
Generating Summaries:  47%|████▋     | 14/30 [01:29<02:02,  7.65s/it]Llama.generate: prefix-match hit


 The dialogue outlines regulations for a consumer reporting agency regarding the handling of consumer information in cases of alleged identity theft. When a consumer identifies potential identity theft, the agency must block reporting information related to that consumer by the next business day after receiving appropriate proof. The blocked information includes transaction details and consumer statements.

The furnisher of the information should be notified promptly, and any errors or material misrepresentations leading to the blocking must be corrected. Consumers have the right to request rescission (removal) of blocks based on specific grounds. Affected consumers must be notified in a timely manner if their blocked information is reinstated.

The consumer reporting agency may only decline or rescind blocks for certain reasons, such as errors or misrepresentations. Resellers and check service companies have obligations to provide notice to consumers when they obtain consumer reports 


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     116.34 ms /   209 runs   (    0.56 ms per token,  1796.49 tokens per second)
llama_print_timings: prompt eval time =     334.97 ms /   233 tokens (    1.44 ms per token,   695.59 tokens per second)
llama_print_timings:        eval time =    6792.25 ms /   208 runs   (   32.66 ms per token,    30.62 tokens per second)
llama_print_timings:       total time =    7785.61 ms /   441 tokens
Generating Summaries:  50%|█████     | 15/30 [01:37<01:55,  7.70s/it]Llama.generate: prefix-match hit


 The user experienced a series of hardships that led to difficulty paying their mortgage. They applied for a loan modification and received an offer with a discrepancy. After contacting the investor, they agreed to defer principal and extend the loan term to allow for more affordable payments. However, due to further family losses in income, another hardship occurred, and the user contacted the servicer again. The process repeated several times, involving communication with Aurora, Nationstar, and a housing advocate. The user claimed that there was a breach of agreement and threatened legal action if their situation wasn't addressed urgently due to financial loss and fear of foreclosure during the pandemic. They were eventually offered another modification but had not yet received a response from the servicer despite following up multiple times. Additionally, they mentioned receiving a denial letter that they believed was in error since the investor had agreed to honor their request fo


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     115.98 ms /   203 runs   (    0.57 ms per token,  1750.30 tokens per second)
llama_print_timings: prompt eval time =     561.08 ms /   512 tokens (    1.10 ms per token,   912.53 tokens per second)
llama_print_timings:        eval time =    6716.98 ms /   202 runs   (   33.25 ms per token,    30.07 tokens per second)
llama_print_timings:       total time =    7915.99 ms /   714 tokens
Generating Summaries:  53%|█████▎    | 16/30 [01:45<01:48,  7.77s/it]Llama.generate: prefix-match hit


 The dialogue outlines provisions of the Fair Credit Reporting Act regarding identity theft and blocked reporting information. When a consumer identifies potential identity theft, a consumer reporting agency shall block reporting information related to that consumer by the next business day after receiving appropriate proof. A consumer reporting agency may decline or rescind a block if it determines there was no error or material misrepresentation. Consumers are notified promptly when blocks are declined or rescinded and have the right to reinsert blocked information.

Resellers must apply these rules when they file reports with consumer reporting agencies, and consumers may report identity theft to obtain information from reseller files. Check service companies must also follow these provisions when processing negotiable instruments or electronic fund transfers. Law enforcement agencies cannot access blocked information without proper authorization. Unverified accounts listed due to i


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     168.61 ms /   302 runs   (    0.56 ms per token,  1791.08 tokens per second)
llama_print_timings: prompt eval time =     510.50 ms /   439 tokens (    1.16 ms per token,   859.93 tokens per second)
llama_print_timings:        eval time =   10056.25 ms /   301 runs   (   33.41 ms per token,    29.93 tokens per second)
llama_print_timings:       total time =   11540.17 ms /   740 tokens
Generating Summaries:  57%|█████▋    | 17/30 [01:56<01:55,  8.90s/it]Llama.generate: prefix-match hit


 The dialogue outlines the procedures for a consumer reporting agency (CRA) regarding blocked reporting information due to alleged identity theft. If a consumer identifies potential identity theft, the CRA must block the reporting information by the business day after receiving appropriate proof from the consumer. The CRA may decline or rescind the block if they reasonably determine that the error was not caused by identity theft or material misrepresentation.

The consumer shall be promptly notified of any reinstatement of blocked information, and affected consumers must be notified in a timely manner. The purpose of blocking is to prevent further use of incorrect information from being reported on the consumer's file. If a CRA rescinds a block, they must provide evidence that the consumer was aware and had obtained good service or money as a result of the blocked transaction.

Resellers are also subject to these provisions when filing reports with CRAs. The sole obligation for blocki


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     129.87 ms /   231 runs   (    0.56 ms per token,  1778.73 tokens per second)
llama_print_timings: prompt eval time =     516.24 ms /   421 tokens (    1.23 ms per token,   815.51 tokens per second)
llama_print_timings:        eval time =    7784.92 ms /   230 runs   (   33.85 ms per token,    29.54 tokens per second)
llama_print_timings:       total time =    9026.16 ms /   651 tokens
Generating Summaries:  60%|██████    | 18/30 [02:05<01:47,  8.94s/it]Llama.generate: prefix-match hit


 The dialogue outlines regulations for a consumer reporting agency regarding the handling of consumer information in cases of alleged identity theft. When a consumer identifies potential identity theft, the agency must block reporting information related to that consumer by the next business day after receiving appropriate proof. The blocked information includes transaction details and consumer statements.

The furnisher of the information should be notified promptly, and any errors or material misrepresentations leading to the blocking must be corrected. Consumers have the right to request rescission (removal) of blocks based on specific grounds. Affected consumers must be notified in a timely manner if their blocked information is reinstated.

The consumer reporting agency may only decline or rescind blocks for certain reasons, such as errors or misrepresentations. Resellers and check service companies have obligations to provide notice to consumers when they obtain consumer reports 


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     157.77 ms /   271 runs   (    0.58 ms per token,  1717.65 tokens per second)
llama_print_timings: prompt eval time =     318.66 ms /   194 tokens (    1.64 ms per token,   608.80 tokens per second)
llama_print_timings:        eval time =    9093.18 ms /   270 runs   (   33.68 ms per token,    29.69 tokens per second)
llama_print_timings:       total time =   10292.77 ms /   464 tokens
Generating Summaries:  63%|██████▎   | 19/30 [02:15<01:42,  9.35s/it]Llama.generate: prefix-match hit


 The user had a conversation with an American Express representative regarding a credit limit reduction on their account. The representative initially stated that the decrease was due to an increase in balance based on information from the user's credit report. However, the user argued that their balances were actually decreasing and not increasing.

The representative then changed their position, stating that the decision was made because of an increase in the American Express account balance. The user explained that they had been taking advantage of a promotional rate offered by American Express to pay down significant amounts towards their balance while maintaining good standing on their account.

The user expressed concern about how this credit limit reduction could impact their credit worthiness, especially during an urgent financial situation. They mentioned that the Consumer Financial Protection Bureau (CFPB) encourages financial institutions to help customers in such situations


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      94.72 ms /   164 runs   (    0.58 ms per token,  1731.44 tokens per second)
llama_print_timings: prompt eval time =     291.15 ms /   140 tokens (    2.08 ms per token,   480.86 tokens per second)
llama_print_timings:        eval time =    5505.60 ms /   163 runs   (   33.78 ms per token,    29.61 tokens per second)
llama_print_timings:       total time =    6321.08 ms /   303 tokens
Generating Summaries:  67%|██████▋   | 20/30 [02:22<01:24,  8.45s/it]Llama.generate: prefix-match hit


 The user is sharing their experience of being a victim of identity theft after discovering that they were involved in the Equifax data breach. They have filed a claim with Equifax to clear fraudulent items from their credit report, specifically mentioning a file number and report numbers. The process has been long and complicated, causing financial and emotional stress. They are currently reviewing their credit file carefully and seeking help to remove an account that they believe was opened as a result of the breach. They have also considered consulting an attorney for assistance with predatory loans and potential victim compensation. The user expresses frustration over the lengthy process and the negative impact on their employment prospects due to inaccurate information reported by Equifax, which may lead to adverse actions based on their credit report.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     113.73 ms /   203 runs   (    0.56 ms per token,  1784.90 tokens per second)
llama_print_timings: prompt eval time =     463.56 ms /   359 tokens (    1.29 ms per token,   774.43 tokens per second)
llama_print_timings:        eval time =    6856.94 ms /   202 runs   (   33.95 ms per token,    29.46 tokens per second)
llama_print_timings:       total time =    7955.12 ms /   561 tokens
Generating Summaries:  70%|███████   | 21/30 [02:30<01:14,  8.30s/it]Llama.generate: prefix-match hit


 A divorced service member, who is a single mom and fell behind on mortgage payments after receiving reduced child support, attempted to sell her house through a quitclaim deed but faced difficulties. She contacted a broker for help with a short sale, but the process became complicated when the title of the property was transferred without her knowledge. The loan servicing company continued to communicate with interested parties and eventually filed for foreclosure.

The homeowner was hospitalized during this time and received legal assistance from an attorney who advised against completing the short sale as previously planned due to potential credit damage. The loan servicer reported late payments, which negatively impacted her credit score. Despite efforts to repair her credit and find a new attorney to represent her case, she faced continued communication from the loan servicing company regarding past-due mortgage payments and accruing fees. Ultimately, she was unable to prevent the


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      47.35 ms /    84 runs   (    0.56 ms per token,  1774.17 tokens per second)
llama_print_timings: prompt eval time =     222.94 ms /    73 tokens (    3.05 ms per token,   327.45 tokens per second)
llama_print_timings:        eval time =    2816.46 ms /    83 runs   (   33.93 ms per token,    29.47 tokens per second)
llama_print_timings:       total time =    3305.69 ms /   156 tokens
Generating Summaries:  73%|███████▎  | 22/30 [02:33<00:54,  6.81s/it]Llama.generate: prefix-match hit


 The user is having trouble with TransUnion regarding identity theft and has received multiple reports from them about various accounts. Specifically mentioned are several account names, department of ed account numbers (ed account number and dpt edxxxx), original creditor account numbers, and the original creditors' names. The user has been in contact with Transunion but is asking for help as they have not yet resolved the issue.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     155.58 ms /   281 runs   (    0.55 ms per token,  1806.20 tokens per second)
llama_print_timings: prompt eval time =     446.73 ms /   337 tokens (    1.33 ms per token,   754.37 tokens per second)
llama_print_timings:        eval time =    9720.34 ms /   280 runs   (   34.72 ms per token,    28.81 tokens per second)
llama_print_timings:       total time =   11058.83 ms /   617 tokens
Generating Summaries:  77%|███████▋  | 23/30 [02:44<00:56,  8.09s/it]Llama.generate: prefix-match hit


 The user is discussing an issue with a USDOEXXXX account reported by Equifax. The account's abbreviated name and full name do not match, and the reporting dates for last payment and last activity are inconsistent. The user has checked all three credit bureaus and found that Equifax is reporting a high credit balance and an inaccurate violation.

The user has disputed the account with Equifax multiple times but the issue persists. Equifax also reported an invalid account number, listed an authorized user as the account holder, and reported a high credit limit. The credit limit reported does not match the last active date or payment history.

Despite making payments for over two years, the account still shows no activity on Equifax's report. The user has provided evidence to verify the information but Equifax refuses to change it. This inaccurate reporting is severely affecting the user's credit score. Additionally, Equifax continues to report a past due balance violation that has alrea


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      93.11 ms /   161 runs   (    0.58 ms per token,  1729.10 tokens per second)
llama_print_timings: prompt eval time =     241.81 ms /   112 tokens (    2.16 ms per token,   463.18 tokens per second)
llama_print_timings:        eval time =    5508.33 ms /   160 runs   (   34.43 ms per token,    29.05 tokens per second)
llama_print_timings:       total time =    6257.18 ms /   272 tokens
Generating Summaries:  80%|████████  | 24/30 [02:50<00:45,  7.54s/it]Llama.generate: prefix-match hit


 The user has described a conversation with a representative from Equifax regarding several instances of identity theft. The thief had used the victim's name to submit fraudulent credit applications and open multiple fraudulent accounts. The victim disputed this information several times, receiving letters in certified mail. Despite their complaints, the account was not blocked by Equifax. The victim expressed frustration with the lack of adequate investigation and refused usage of the fraudulent account. They were advised to initiate a formal complaint through CRAs (Consumer Reporting Agencies) and also contact their Attorney General for potential violations under the Fair Credit Reporting Act and Fair Debt Collection Practices Act. Equifax had mailed them information on how to block fraudulent information, but they felt it was insufficient.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      93.51 ms /   162 runs   (    0.58 ms per token,  1732.45 tokens per second)
llama_print_timings: prompt eval time =     302.88 ms /   134 tokens (    2.26 ms per token,   442.43 tokens per second)
llama_print_timings:        eval time =    5534.69 ms /   161 runs   (   34.38 ms per token,    29.09 tokens per second)
llama_print_timings:       total time =    6346.98 ms /   295 tokens
Generating Summaries:  83%|████████▎ | 25/30 [02:57<00:35,  7.19s/it]Llama.generate: prefix-match hit


 The user described an investigation concerning a complaint about a transaction question. The investigator was able to confirm the time of the transaction and respectfully declined a request for permanent access to the first card, but was unable to provide clear answers regarding the second issue. They gave vague responses and were asked for supporting documentation to back up their allegations. An investigation was conducted, including documents proving that the cardholder owes money according to law. The user requested a copy of this document but it was disrespectfully refused permanently. Additionally, they mentioned that there would be a report sent to a credit reporting company regarding delinquency and were advised to promptly get their report if they thought they owed money. They expressed frustration with the situation and wanted to close the account as soon as possible.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     187.68 ms /   331 runs   (    0.57 ms per token,  1763.66 tokens per second)
llama_print_timings: prompt eval time =     440.12 ms /   291 tokens (    1.51 ms per token,   661.19 tokens per second)
llama_print_timings:        eval time =   11454.21 ms /   330 runs   (   34.71 ms per token,    28.81 tokens per second)
llama_print_timings:       total time =   12978.02 ms /   621 tokens
Generating Summaries:  87%|████████▋ | 26/30 [03:10<00:35,  8.93s/it]Llama.generate: prefix-match hit


 The user's input consists of a conversation between them and a customer service representative regarding order status, cancellations, refunds, and shipping issues. Here is a summary:

The user inquired about the status of their shipment and received product after placing an order with "company" for a large quantity on a Thursday. They were informed that due to unusually high order volume and continuing demand, there might be delays in receiving the order. The company assured them they would prioritize shipping orders as soon as possible and provided no exact shipping date but guaranteed quick delivery.

The user mentioned they had received a small shipment first on a Friday and were expecting more items to follow. They appreciated the patience and asked about canceling an order due to receiving stock from another distributor. The company responded that they could cancel, but it would be best to do so as soon as possible according to their cancellation policy.

The user requested verif


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     129.75 ms /   231 runs   (    0.56 ms per token,  1780.40 tokens per second)
llama_print_timings: prompt eval time =     515.16 ms /   421 tokens (    1.22 ms per token,   817.21 tokens per second)
llama_print_timings:        eval time =    8172.42 ms /   230 runs   (   35.53 ms per token,    28.14 tokens per second)
llama_print_timings:       total time =    9416.27 ms /   651 tokens
Generating Summaries:  90%|█████████ | 27/30 [03:19<00:27,  9.08s/it]Llama.generate: prefix-match hit


 The dialogue outlines regulations for a consumer reporting agency regarding the handling of consumer information in cases of alleged identity theft. When a consumer identifies potential identity theft, the agency must block reporting information related to that consumer by the next business day after receiving appropriate proof. The blocked information includes transaction details and consumer statements.

The furnisher of the information should be notified promptly, and any errors or material misrepresentations leading to the blocking must be corrected. Consumers have the right to request rescission (removal) of blocks based on specific grounds. Affected consumers must be notified in a timely manner if their blocked information is reinstated.

The consumer reporting agency may only decline or rescind blocks for certain reasons, such as errors or misrepresentations. Resellers and check service companies have obligations to provide notice to consumers when they obtain consumer reports 


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      66.97 ms /   121 runs   (    0.55 ms per token,  1806.89 tokens per second)
llama_print_timings: prompt eval time =     230.33 ms /   104 tokens (    2.21 ms per token,   451.53 tokens per second)
llama_print_timings:        eval time =    4183.78 ms /   120 runs   (   34.86 ms per token,    28.68 tokens per second)
llama_print_timings:       total time =    4806.14 ms /   224 tokens
Generating Summaries:  93%|█████████▎| 28/30 [03:24<00:15,  7.80s/it]Llama.generate: prefix-match hit


 The user is a identity theft victim who has discovered unauthorized transactions listed on their credit report from TransUnion. They mention that there have been several accounts affected, and they provide specific account numbers, collection account numbers, and dates opened for these accounts. The user requests to have the information related to the fraudulent activity blocked according to the Fair Credit Reporting Act. They also ask for the necessary furnisher information to be provided so they can make the required notifications. The user concludes by expressing gratitude and reminding others to stay safe during the pandemic, including wearing masks and staying home.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     100.46 ms /   173 runs   (    0.58 ms per token,  1722.04 tokens per second)
llama_print_timings: prompt eval time =     304.62 ms /   153 tokens (    1.99 ms per token,   502.26 tokens per second)
llama_print_timings:        eval time =    6004.34 ms /   172 runs   (   34.91 ms per token,    28.65 tokens per second)
llama_print_timings:       total time =    6873.64 ms /   325 tokens
Generating Summaries:  97%|█████████▋| 29/30 [03:31<00:07,  7.53s/it]Llama.generate: prefix-match hit


 The user is sharing their experience as an identity theft victim, discussing various issues they have encountered regarding fraudulent transactions appearing on their credit report from Equifax. They mention finding old accounts involved in the 2017 Equifax data breach and the long, complicated process of resolving these issues. The user expresses frustration with disputed accounts that haven't been resolved, hindering their ability to apply for new credit or even get a first-time credit card. They also mention being denied employment due to negative information on their report and undergoing an adverse action based on inaccurate reporting. The user is currently seeking help from someone named "the guy" and considering hiring an attorney specializing in predatory loans for compensation as part of the identity theft victim remedy process, which has caused significant financial and emotional stress.



llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =     108.81 ms /   189 runs   (    0.58 ms per token,  1737.04 tokens per second)
llama_print_timings: prompt eval time =     591.52 ms /   506 tokens (    1.17 ms per token,   855.43 tokens per second)
llama_print_timings:        eval time =    6704.15 ms /   188 runs   (   35.66 ms per token,    28.04 tokens per second)
llama_print_timings:       total time =    7912.29 ms /   694 tokens
Generating Summaries: 100%|██████████| 30/30 [03:39<00:00,  7.65s/it]Llama.generate: prefix-match hit


 The dialogue outlines provisions of the Fair Credit Reporting Act regarding identity theft and blocked reporting information. When a consumer identifies potential identity theft, a consumer reporting agency shall block relevant reporting information by the business day after receiving appropriate proof from the consumer. The agency may decline or rescind the block if it determines there was no error or material misrepresentation. Consumers are to be notified promptly and in writing when their information is reinstated.

A reseller file shall apply the same rules as a consumer reporting agency, while consumers have the right to report identity theft to the bureau and obtain information regarding it from the reseller. The check service company must report identity theft to the national consumer reporting agency but cannot access blocked information without proper verification or court order. Consumers' unverified accounts must be removed if they cannot provide verifiable proof of owners


llama_print_timings:        load time =     541.47 ms
llama_print_timings:      sample time =      94.56 ms /   163 runs   (    0.58 ms per token,  1723.68 tokens per second)
llama_print_timings: prompt eval time =     236.48 ms /    96 tokens (    2.46 ms per token,   405.95 tokens per second)
llama_print_timings:        eval time =    5711.62 ms /   162 runs   (   35.26 ms per token,    28.36 tokens per second)
llama_print_timings:       total time =    6485.84 ms /   258 tokens
Generating Summaries: 100%|██████████| 30/30 [03:45<00:00,  7.53s/it]

 The user is having an issue with a credit card where an authorized user negatively affected their credit score by activating the card and using it, despite never having used it themselves. They have contacted Experian to dispute this change but have not seen any reflection of the removal yet. The user mentions that they are in the process of refinancing their home, which requires a good credit score. Their husband has lost his job, making it difficult for them to make ends meet with just their nurse salary. They need Experian to accurately reflect the change as soon as possible so they can proceed with the loan refinance. The user expresses that they have tried disputing online but were unable to get in touch with a representative and are asking for help in contacting Experian to resolve this issue.





In [57]:
pd.set_option('display.max_colwidth', None)

In [58]:
gold_examples.head(3)

Unnamed: 0,product,narrative,summary,mistral_response
167,retail_banking,fraudulent charge totaling made capital one checking account via debit card seeing charge immediately canceled debit card informed capital one fraudulent charge disputed charge posted account issued provisional credit pending determination claim received form letter bank denying claim money withdrawn account opened appeal denial capital one bank person every contacted discus original claim make determination furthermore never lost possession debit card authorized person use capital one sent new debit card authorized via usps replace debit card set expire card sent fraudulent charge occurred never received card believe card intercepted fraudulently activated used make fraudulent purchase complained activated card several time capital one complained fraudulent charge last call dispute denial claim never noted account asked several time person making determination claim contact person making complaint malfeasance gross business practice sending replacement card without authorization letting said card activated use credit purchase without authorization fraud reported claim made contacting discus complaint making determination appealed denial capital one bank claim number also made complaint local police dept division financial service regard fraudulent charge,"A fraudulent charge was made on the individual's Capital One checking account via a debit card. The transaction was immediately disputed and the card was cancelled. Capital One issued provisional credit. However, the bank later denied the claim and withdrew the provisional credit. The individual appealed the denial and had contacted Capital One multiple times, but no one reached out to discuss the original claim or decision. The original card was never lost and no authorization was given to anyone to use it. They believe a new, replacement card sent by Capital One was intercepted and fraudulently activated to make the charge. The individual is unhappy with Capital One's handling of the situation, stating that it is grossly improper to send a replacement card and allow it to be activated without authorization. They reported the fraud, have appealed the claim denial with Capital One and have also placed a complaint with their local police department and the Division of Financial Services.","The user is describing a dispute with Capital One over a fraudulent debit card charge. They noticed the unauthorized transaction and had it cancelled immediately. Capital One disputed the claim, but provisional credit was issued pending determination. A denial of the claim followed, despite the user's insistence that they never lost possession of their debit card or authorized its use by someone else. The user received a form letter from the bank explaining the denial and withdrew their money to open a new account. They appealed the decision and had several conversations with the person making determination. The user believes their original debit card was intercepted and fraudulently activated, leading to multiple unauthorized purchases. Capital One sent them a replacement card without authorization, which they believe was used for further credit purchases without consent. The user reported the fraudulent activity to both Capital One and local police department's financial services division."
169,credit_reporting,block except otherwise provided section consumer reporting agency shall block reporting information file consumer consumer identifies information resulted alleged identity theft later business day date receipt agency appropriate proof identity consumer copy identity theft report identification information consumer statement consumer information information relating transaction consumer b notification consumer reporting agency shall promptly notify furnisher information identified consumer subsection section information may result identity theft identity theft report filed block requested section effective date block c authority decline rescind general consumer reporting agency may decline block may rescind block information relating consumer section consumer reporting agency reasonably determines information blocked error block requested consumer error b information blocked block requested consumer basis material misrepresentation fact consumer relevant request block c consumer obtained possession good service money result blocked transaction transaction notification consumer block information declined rescinded subsection affected consumer shall notified promptly manner consumer notified reinsertion information section b title significance block purpose subsection consumer reporting agency rescinds block presence information file consumer prior blocking information evidence whether consumer knew known consumer obtained possession good service money result block exception resellers reseller file section shall apply consumer reporting agency consumer reporting agency reseller b time request consumer subsection section otherwise furnishing reselling consumer report concerning information identified consumer c informs consumer mean consumer may report identity theft bureau obtain consumer information regarding identity theft reseller file sole obligation consumer reporting agency section regard request consumer section shall block consumer report maintained consumer reporting agency subsequent use consumer accordance provision subsection section identifies consumer reporting agency information file consumer resulted identity theft b consumer reporting agency reseller identified information notice carrying obligation paragraph reseller shall promptly provide notice consumer decision block file notice shall contain name address telephone number consumer reporting agency consumer information obtained resale e exception verification company provision section apply check service company acting issue authorization purpose approving processing negotiable instrument electronic fund transfer similar method payment except beginning business day receipt information described paragraph subsection section check service company shall report national consumer reporting agency described section p title information identified subject identity theft report resulting identity theft f access blocked information law enforcement agency provision section shall construed requiring consumer reporting agency prevent federal state local law enforcement agency accessing blocked information consumer file agency could otherwise obtain access subchapter according fair credit reporting act section required federal law verify physical verification original signed consumer contract account post credit report otherwise anyone paying reporting service could fax mail email fraudulent account demand see verifiable proof original consumer contract signature file account listed failure positively verify account hurt ability obtain credit fcra unverified account must removed unable provide copy verifiable proof must remove account listed,"The text outlines various stipulations regarding the responsibilities of consumer reporting agencies and consumers in cases of alleged identity theft. A consumer can request a block on information they identify as resulting from identity theft, after providing proof of identity and the relevant report. The agency must then inform the furnisher of this information. The agency can also decline or rescind the block if it deems the information to be incorrect or misrepresented by the consumer.\n\nIn case of a block request from the consumer, resellers are obliged to inform them about the possibility of reporting identity theft to the bureau and the agency is obligated to block the consumer report. The agency is also responsible for notification of the consumer if a block is declined or rescinded.\n\nThe text also provides exceptions for check service companies that process payments, requiring them to report identified information subject to an identity theft report to national agencies. Consumer reporting agencies are not required to prevent law enforcement agencies from accessing blocked consumer information.\n\nUnder the Fair Credit Reporting Act's guidelines, if there is a failure to positively verify an account, it can affect the consumer's ability to obtain credit, and unverified accounts ought to be removed.","The dialogue outlines provisions of the Fair Credit Reporting Act regarding identity theft and blocked reporting information. When a consumer identifies potential identity theft, a consumer reporting agency shall block relevant reporting information by the business day after receiving appropriate proof from the consumer. The agency may decline or rescind the block if it determines there was no error or material misrepresentation. Consumers are to be notified promptly and in writing when their information is reinstated.\n\nA reseller file shall apply the same rules as a consumer reporting agency, while consumers have the right to report identity theft to the bureau and obtain information regarding it from the reseller. The check service company must report identity theft to the national consumer reporting agency but cannot access blocked information without proper verification or court order. Consumers' unverified accounts must be removed if they cannot provide verifiable proof of ownership, which could hurt their ability to obtain credit."
461,credit_card,usaa master plan collect cancellation debt usaa open two deliquent must violation fair credit see attatched mastercard statement visa end attatched opened usaa authorize apply opened see attatched usaa debt account cancellation debt box borrower personally liable box date event usaa collect using fraudulent deliquent account created see attatched usaa settle claiming deliquent selling account collection next month two card usaa opened opened creditcards member claimed past deliquent account see attatched credit card fraud claim claim date unauthorized transaction occurred see attatched cfpb response usaa staff confirmed usaa liable unauthorized transaction see attatched usaa review claim usaa determined fradulent activity get back opening two fraudulent account using private id data mc ussa close made payment id theft authorize stealing social security collect ideninity theft case usaa criminal humble opinion,"The input appears to be a complaint about USAA, specifically alleging unauthorized opening and cancellation of delinquent accounts, fraudulent activity, and identity theft. It claims USAA violated fair credit practices and initiated unauthorized transactions. There's also a mention of USAA settling by selling a delinquent account to a collection agency. Furthermore, it implies that USAA utilized private ID data for fraudulent purposes resulting in identity theft.","The user is discussing a dispute with USAA regarding debt collection on two delinquent accounts. They mention that they have attached mastercard and visa statements, and believe USAA opened these accounts without their authorization. They claim past deliquent accounts were also opened in their name, which they saw when applying for new credit cards as a member. The user states that USAA determined fraudulent activity occurred and has closed the accounts. However, USAA is attempting to collect on the debt from these fraudulently created accounts. The user mentions an unauthorized transaction occurring and having filed a credit card fraud claim with a claim date provided. They also mention receiving a response from the Consumer Financial Protection Bureau (CFPB) regarding this issue. USAA staff have confirmed that they are liable for the unauthorized transactions, and the user is requesting that USAA review their claim to get back the money taken due to identity theft. The conversation suggests that there may be ongoing criminal activity related to this case."


##### **Q4.3: Evaluate bert score** **(1 Marks)**

In [59]:
def evaluate_score(test_data, scorer, bert_score=False):

    """
    Return the ROUGE score or BERTScore for predictions on gold examples
    For each example we make a prediction using the prompt.
    Gold summaries and the AI generated summaries are aggregated into lists.
    These lists are used by the corresponding scorers to compute metrics.
    Since BERTScore is computed for each candidate-reference pair, we take the
    average F1 score across the gold examples.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        scorer (function): Scorer function used to compute the ROUGE score or the
                           BERTScore
        bert_score (boolean): A flag variable that indicates if BERTScore should
                              be used as the metric.

    Output:
        score (float): BERTScore or ROUGE score computed by comparing model predictions
                       with ground truth
    """

    model_predictions = test_data['mistral_response'].tolist()
    ground_truths = test_data['summary'].tolist()
    if bert_score:
        score = scorer.compute(
            predictions=model_predictions,
            references=ground_truths,
            lang="en",
            rescale_with_baseline=True # Rescale scores for better interpretability
        )

        return sum(score['f1'])/len(score['f1'])
    else:
        return scorer.compute(
            predictions=model_predictions,
            references=ground_truths
        )

In [60]:
bert_scorer = evaluate.load("bertscore")

Downloading builder script:   0%|          | 0.00/7.95k [00:00<?, ?B/s]

In [61]:
score = evaluate_score(gold_examples,
    bert_scorer,
    bert_score=True)

print(f'BERTScore: {score}')

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERTScore: 0.339604996641477


## Try the Rouge Score

In [62]:
!pip install rouge-score -q

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone


In [63]:
# Load a different scorer, e.g., ROUGE
rouge_scorer = evaluate.load("rouge")

score = evaluate_score(gold_examples,
    rouge_scorer,
    bert_score=False)

print(f'RougeScore: {score}')

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

RougeScore: {'rouge1': 0.4883015684831733, 'rouge2': 0.1539193312686402, 'rougeL': 0.2690705600894815, 'rougeLsum': 0.2972254319430034}


# Submission

In [99]:
%%capture
!apt-get update && apt-get install -y jq


In [102]:
%%shell
jq -M 'del(.metadata.widgets)' SM-V0-Learners_Notebook_Mid_Term_Project.ipynb > SM-V0-Learners_Notebook_Mid_Term_Project_fixed.ipynb




In [103]:
!jupyter nbconvert SM-V0-Learners_Notebook_Mid_Term_Project_fixed.ipynb --to html



[NbConvertApp] Converting notebook SM-V0-Learners_Notebook_Mid_Term_Project_fixed.ipynb to html
[NbConvertApp] Writing 670674 bytes to SM-V0-Learners_Notebook_Mid_Term_Project_fixed.html


In [94]:
os.listdir()

['Complains_classification.csv',
 'Learners_Notebook_Mid_Term_Project.ipynb',
 'new_data_cleaned.csv',
 'requirement.txt',
 'SM-V0-Learners_Notebook_Mid_Term_Project.ipynb']