<center><p float="center">
  <img src="https://mma.prnewswire.com/media/1458111/Great_Learning_Logo.jpg?p=facebook" width="200" height="100"/>
</p></center>

<h1><center><font size=10> Generative AI for NLP Program</center></font></h1>
<h1><center> Project </center></h1>

# **GA-NLP Mid-Term Project: Financial Product Complaint Classification and Summarization**

## **Business Context**

In today's financial landscape, customer complaints are pivotal for financial institutions, highlighting areas of dissatisfaction and guiding business improvements. The intricate task of classifying these complaints into specific product categories is crucial for understanding customer issues and enhancing service delivery. By employing Generative AI for text classification, businesses can gain a detailed understanding of customer grievances related to various financial products such as credit reports, student loans, and money transfers.

The integration of machine learning algorithms has revolutionized the automation of customer complaint classification. Utilizing these advanced techniques, financial institutions can swiftly and accurately categorize new complaints based on their content. This automation not only saves time and resources but also ensures timely responses to customer issues, thereby improving customer satisfaction and compliance with regulatory standards.

Additionally, this project will explore the summarization of customer narratives to provide more personalized solutions to complaints. By using Generative AI, businesses can enhance their ability to classify complaints more precisely and generate complaint summaries that facilitate more tailored and effective service responses.

Embarking on this project of Financial Product Complaint Classification and Summarization, with a focus on classification and summarization accuracy, equips you with essential skills applicable to real-world business contexts. Through hands-on experience with code and implementation specifics, you'll gain the proficiency to build such solutions using open-source machine learning algorithms. This experience will serve as a compelling Proof-of-Concept, paving the way for the implementation of these advanced solutions in financial institutions.

## **Project Objective**

Develop a Generative AI application using a Large Language Model to automate product classification and narrative summarization. This application will predict product categories, generate responses based on customer sentiment, and summarize narratives for the mediation team. We have been tackling this task with BERT and prompt engineering with LLMs. We will explore various techniques and select the most effective method.

## **Section 1 BERT Fine Tuning (10 Marks)** (product classification)

### **Question 1: Installing the necessary packages and importing libraries (1 Mark)**

In [1]:
# Import necessary libraries for data manipulation and analysis
import pandas as pd
import numpy as np

# Import visualization libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Import modules from scikit-learn for machine learning tasks
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix, f1_score, precision_score, recall_score, accuracy_score, classification_report

# Import TensorFlow for deep learning tasks
import tensorflow as tf

import re
import json

2024-07-30 13:51:22.567607: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-30 13:51:22.584338: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-30 13:51:22.604517: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-30 13:51:22.610647: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-30 13:51:22.625526: I tensorflow/core/platform/cpu_feature_guar

In [17]:
# Import BertTokenizer, TFBertForSequenceClassification from the Hugging Face transformers library
from transformers import BertTokenizer, TFBertForSequenceClassification

In [18]:
# Set the seed for the TensorFlow random number generator to ensure reproducibility
tf.random.set_seed(42)

### **Question 2: Data preprocessing for Bert Fine Tuning (2 Marks)**

In [13]:
# Load a CSV File containing Dataset of 500 products, narrative and summary (summary of narrative)
data=pd.read_csv("./Complaints_classification.csv")

In [20]:
data.shape

(500, 3)

In [21]:
data.head()

Unnamed: 0,product,narrative,summary
0,credit_card,purchase order day shipping amount receive pro...,The customer made a purchase order with an agr...
1,credit_card,forwarded message date tue subject please inve...,The sender of the email believes they have bee...
2,retail_banking,forwarded message cc sent friday pdt subject f...,The sender of the email alleges that Wells Far...
3,credit_reporting,payment history missing credit report speciali...,The credit report from Specialized Loan Servic...
4,credit_reporting,payment history missing credit report made mis...,The text concerns a person who found an unauth...


#### Observations:
- product column is categorical
- narrative text is already all lower case and needs no cleaning.
- summary is mixed-case, may need cleaning later

In [7]:
#Bert_data=data['product','narrative']

In [8]:
# Creating dependent and independent variables from Bert_data
train_test = data['narrative']
y = data['product']
# Further split the temporary set into train (80%) and test (20%) sets
X_train, X_test, y_train, y_test = train_test_split(train_test, y, test_size=0.20, stratify=y, random_state=42)

In [9]:
X_train.shape, y_train.shape

((400,), (400,))

In [10]:
X_test.shape, y_test.shape

((100,), (100,))

In [11]:
# Label Encoding
encoder = LabelEncoder()

# fit the encoder to the training labels
y_train_enc = encoder.fit_transform(y_train)

# applying the encoder mapping from training labels to test labels
y_test_enc = encoder.transform(y_test)

### **Question 3: Tokenization (1 Mark)**

In [12]:
# loading and creating an instance of the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
# specifying the maximum length of the input 512
max_length = 512

In [13]:
X_train_tokenized = tokenizer(
    X_train.values.tolist(),    # passing the data as a list to the tokenizer
    max_length=max_length,    # specifies the maximum length of the tokenized data
    padding='max_length',    # padding the data to the specified maximum length
    truncation=True,    # truncating the input if it is longer than the specified maximum length
    return_attention_mask=True,    # specifying to return attention masks
    return_tensors='tf',    # specifying to return the output as tensorflow tensors
)
X_test_tokenized = tokenizer(
    X_test.values.tolist(),
    max_length=max_length,
    padding='max_length',
    truncation=True,
    return_attention_mask=True,
    return_tensors='tf',
)

I0000 00:00:1722347511.568109   35076 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722347511.572620   35076 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722347511.575499   35076 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722347511.578935   35076 cuda_executor.cc:1015] successful NUMA node read from SysFS ha

In [22]:
type(X_train_tokenized)

transformers.tokenization_utils_base.BatchEncoding

### **Question 4: Creating Tensorflow dataset (1 Mark)**

In [23]:
# defining the size of the batches
batch_size = 8

# convert the tokenized input and the output into a batched tensorflow dataset for training
train_tokenized_tf = tf.data.Dataset.from_tensor_slices((dict(X_train_tokenized), y_train_enc)).batch(batch_size)

# convert the tokenized input and the output into a batched tensorflow dataset for testing
test_tokenized_tf = tf.data.Dataset.from_tensor_slices((dict(X_test_tokenized), y_test_enc)).batch(batch_size)

### **Question 5 Evaluating the base model's performance in product classification.(1 Marks)**

In [24]:
def bert_f1_score(actual_vals, pred_vals):
    micro_f1_score = f1_score(actual_vals, pred_vals, average="micro")
    return micro_f1_score

In [25]:
# Actual product class
actual_val = np.concatenate([y for x, y in test_tokenized_tf], axis=0)

2024-07-30 13:58:54.409215: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [26]:
num_classes = y.nunique()
num_classes

5

In [27]:
# Initialize Model using BERT for sequence classification
base_model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=num_classes)

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [28]:
# print the summary of the model
base_model.summary()

Model: "tf_bert_for_sequence_classification"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bert (TFBertMainLayer)      multiple                  109482240 
                                                                 
 dropout_37 (Dropout)        multiple                  0 (unused)
                                                                 
 classifier (Dense)          multiple                  3845      
                                                                 
Total params: 109486085 (417.66 MB)
Trainable params: 109486085 (417.66 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [29]:
# Make prediction on test_tokenized_tf
preds_raw_test = base_model.predict(test_tokenized_tf)
preds_test_base = np.argmax(np.array(tf.nn.softmax(preds_raw_test.logits)), axis=1)
preds_test_base.shape



(100,)

In [30]:
preds_test_base[0:5]

array([3, 3, 3, 3, 3])

In [31]:
y_test_enc[0:5]

array([1, 1, 1, 1, 1])

In [32]:
y_test.value_counts()

product
credit_reporting       77
mortgages_and_loans     7
debt_collection         6
credit_card             6
retail_banking          4
Name: count, dtype: int64

In [33]:
data['product'].value_counts()

product
credit_reporting       388
mortgages_and_loans     36
debt_collection         29
credit_card             28
retail_banking          19
Name: count, dtype: int64

In [34]:
# Evaluate bert base model
base_f1_score = bert_f1_score(y_test_enc, preds_test_base)
print(base_f1_score)

0.09


#### Observations:
- Untrained model performance is really bad.

### **Question 6 Fine-Tuning Bert Model on training set (2 Marks)**

In [35]:
num_classes = y.nunique()
# Model initialization using BERT for sequence classification
ft_model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=num_classes)

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [36]:
# setting the learning rate for the optimizer
learning_rate = 1e-5

# Setting the optimizer to Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)

# Specify the loss function for the model
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Define evaluation metric(s) for the model
metric = [tf.keras.metrics.SparseCategoricalAccuracy('accuracy')]

# Compile the model with the chosen optimizer, loss function, and metrics
ft_model.compile(optimizer=optimizer, loss=loss, metrics=metric)

In [37]:
# Calculate class weights for imbalanced dataset
cw = (y_train_enc.shape[0]) / np.bincount(y_train_enc)

# Create a dictionary mapping class indices to their respective class weights
cw_dict = {}
for i in range(cw.shape[0]):
    cw_dict[encoder.transform(encoder.classes_)[i]] = cw[i]

In [38]:
# Number of training epochs
n_epochs = 1
#train bert model
bert_base_tuned = ft_model.fit(train_tokenized_tf, epochs=n_epochs, class_weight=cw_dict)

I0000 00:00:1722348003.388648   35149 service.cc:146] XLA service 0x7b38d4b74ac0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1722348003.388680   35149 service.cc:154]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2024-07-30 14:00:03.395264: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-07-30 14:00:03.415715: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907
I0000 00:00:1722348003.504914   35149 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.




### **Question 7. Evaluating the trained model performance (1 Mark)**

In [39]:
# Generate raw predictions on the test dataset using the trained model
preds_raw_val_ft = ft_model.predict(test_tokenized_tf)

# Extract predicted labels by finding the index with the highest probability for each example
preds_val_ft = np.argmax(np.array(tf.nn.softmax(preds_raw_val_ft.logits)), axis=1)



In [40]:
preds_val_ft[0:5]

array([3, 1, 3, 1, 1])

In [41]:
# Evaluate bert trained model
ft_f1_score = bert_f1_score(y_test_enc, preds_val_ft)
print(ft_f1_score)

0.6


### **Question 8: Write your observations (1 Mark)**

- Training improved f1 score dramatically, from 0.09 to 0.60

# **Prompt Engineering**

## **Section 2: Install Libraries for Prompt Engineering and Setting up Mistral Model (3 Marks)**

### **Question 9: Install neccessary libraries (1 Mark)**

In [None]:
# Installation for GPU llama_cpp_python==0.2.28
 "__________"
# For downloading the models from HF Hub huggingface-hub==0.23.2
 "__________"
# install evaluate==0.4.2 and bert-score==0.3.13 using pip command
 "__________"
# install numpy==1.25.2
 "__________"

In [1]:
# Basic Imports for Libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

import pandas as pd
import numpy as np
from tqdm import tqdm
import json
import re

import torch
import evaluate

from google.colab import drive
import locale

2024-07-30 14:03:44.509543: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-30 14:03:44.526300: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-30 14:03:44.546795: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-30 14:03:44.553085: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-30 14:03:44.568380: I tensorflow/core/platform/cpu_feature_guar

ModuleNotFoundError: No module named 'google.colab'

### **Question 10: Importing Libaries and Setting up Mistral Model (2 Marks)**

https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q5_K_M.gguf

In [2]:
## Import Hf_hub_download from hugging_face_hub
# using pre-downloaded model

## Import Llama from llama_cpp
from llama_cpp import Llama

In [3]:
# Define the model name or path as a string (You can find this info from hugging face website) Use Mistral

model_name_or_path = "/home/ubuntu/models/"

# Define the model basename as a string, indicating it's in the gguf format

model_basename = "mistral-7b-instruct-v0.2.Q5_K_M.gguf" # the model is in gguf format

In [4]:
model_path = model_name_or_path+model_basename

In [5]:
# Create an instance of the 'Llama' class with specified parameters
# remove the blank spaces and complete the code

lcpp_llm = Llama(
        model_path=model_path,
        n_threads=-1,  # CPU cores
        n_batch=512,  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
        n_gpu_layers=-1,  # Change this value based on your model and your GPU VRAM pool.
        n_ctx=4096,  # Context window
    )

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /home/ubuntu/models/mistral-7b-instruct-v0.2.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   

## **Section 3: Text to Label (12 Marks)**

# **Zero-Shot Prompting (6 Marks)**

### **Q11: Define the Prompt Template, System Message, generate_prompt** **(3 Marks)**

Define a **system message** as a string and assign it to the variable system_message to generate product class.

Create a **zero shot prompt template** that incorporates the system message and user input.

Define **generate_prompt** function that takes both the system_message and user_input as arguments and formats them into a prompt template


Write a Python function called **generate_mistral_response** that takes a single parameter, narrative, which represents the user's complain. Inside the function, you should perform the following tasks:


- **Combine the system_message and narrative to create a prompt string using generate_prompt function.**

*Generate a response from the Mistral model using the lcpp_llm instance with the following parameters:*

- prompt should be the combined prompt string.
- max_tokens should be set to 1200.
- temperature should be set to 0.
- top_p should be set to 0.95.
- repeat_penalty should be set to 1.2.
- top_k should be set to 50.
- stop should be set as a list containing '/s'.
- echo should be set to False.
Extract and return the response text from the generated response.

Don't forget to provide a value for the system_message variable before using it in the function.

In [15]:
system_message = """You are an AI evaluating input text from which to generate a product classification.
Be concise.
If you cannot determine a classification to at least 80% probability, respond with 'I cannot classify this.'
"""

In [16]:
zero_shot_prompt_template = "{input}"

In [17]:
# Define function that combines user_prompt and system_message to create the prompt
def generate_prompt(_system_message, _user_input):
    _prompt = f"[INST] <<SYS>> {_system_message} <<SYS>> {_user_input} [/INST]"
    return _prompt

In [18]:
generate_prompt(system_message, zero_shot_prompt_template.format(input=data.iloc[0]['narrative']))

"[INST] <<SYS>> You are an AI evaluating input text from which to generate a product classification.\nBe concise.\nIf you cannot determine a classification to at least 80% probability, respond with 'I cannot classify this.'\n <<SYS>> purchase order day shipping amount receive product week sent followup email exact verbiage paid two day shipping received order company responded im sorry inform due unusually high order volume order shipped several week stock since early due high demand although continuing take order guaranteeing receive order place due time mask order exact shipping date right however guarantee ship soon soon delivers product u getting small shipment shipping first come first served basis appreciate patience fulfill order quickly recommend keeping order lose place line cancel distributor stock moment prefer cancel please note ask via email cancel accordance cancellation policy agreed checkout electronic inventory online requested order canceled refund issued canceled ord

In [19]:
def generate_mistral_response(input_text):

    # Combine user_prompt and system_message to create the prompt
    prompt=generate_prompt(system_message, input_text)
    # Generate a response from the LLaMA model
    response = lcpp_llm(
        prompt=prompt,
        max_tokens=1200,
        temperature=0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['/s'],
        echo=False
    )

    # Extract and return the response text
    response_text = response["choices"][0]["text"]
    return response_text

In [None]:
generate_mistral_response(zero_shot_prompt_template.format(input=data.iloc[0]['narrative']))

/tmp/pip-install-2shwm50v/llama-cpp-python_54e6ad3a21354de2a272cd641f6dc4df/vendor/llama.cpp/src/llama.cpp:14550: GGML_ASSERT(n_threads > 0) failed


**Due to limited GPU resources, we will test our model with zero prompts on only 50 examples instead of the entire dataset.**

In [None]:
# Randomly select 50 rows
new_data = data.sample(n=50, random_state=40)

### **Q12: Create a new column in the DataFrame called 'mistral_response' and populate it with responses generated by applying the 'generate_mistral_response' function to each 'narrative' in the DataFrame and prepare the mistral_response_cleaned column using extract_category function** **(1 Marks)**

In [None]:
# example - new_data['mistral_response'] = new_data['narrative'].apply(lambda x:______ )
new_data['mistral_response'] = "______ "

In [None]:
new_data['mistral_response'] = "______ "

In [None]:
def extract_category(text):
    # Define the regex pattern to match "category:" or "Category:" followed by a word
    pattern = r'category:\s*(\w+)'  # The pattern itself remains the same

    # Use re.search with the re.IGNORECASE flag to make it case-insensitive
    match = re.search(pattern, text, re.IGNORECASE)

    # If a match is found, return the captured group, else return None
    if match:
        return match.group(1)
    else:
        pattern1 = r'(credit_card|retail_banking|credit_reporting|mortgages_and_loans|debt_collection)'
        match = re.search(pattern1, text, re.IGNORECASE)
        if match:
            return match.group()
        else:
            return ''

In [None]:
# example - new_data['mistral_response_cleaned'] = new_data['narrative'].apply(lambda x:______ )
new_data['mistral_response_cleaned'] = "______ "

### **Q14: Calculate the F1 score** **(1 Marks)**

In [None]:
# Calculate F1 score for 'product' and 'mistral_response'
f1 =  "______ "

print(f'F1 Score: {f1}')

In [None]:
# Calculate F1 score for 'product' and 'mistral_response_cleaned'
f2 =  "______ "
print(f'F1 Score: {f2}')

### **Q15: Explain the difference in F1 scores between mistral_response and mistral_response_cleaned.** **(1 Marks)**

# **Few-Shot Prompting (6 Marks)**

### **Q16: Prepare examples for a few-shot prompt, formulate the prompt, and generate the Mistral response. (4 Marks)**

**Generate a set of gold examples by randomly selecting 10 instances of user_input and assistant_output from dataset ensuring a balanced representation with 2 examples from each class.**

In [None]:
# Separate positive and negative reviews
import json
review_1 = data[data['product'] == 'credit_card']
review_2 = data[data['product'] == 'retail_banking']
review_3 = data[data['product'] == 'credit_reporting']
review_4 = data[data['product'] == 'mortgages_and_loans']
review_5 = data[data['product'] == 'debt_collection']

# Sample 3 positive and 3 negative reviews for gold examples
gold_examples_1 = review_1.sample(2, random_state=40)
gold_examples_2 = review_2.sample(2, random_state=40)
gold_examples_3 = review_3.sample(2, random_state=40)
gold_examples_4 = review_4.sample(2, random_state=40)
gold_examples_5 = review_5.sample(2, random_state=40)

# Concatenate positive and negative gold examples
gold_examples_df = pd.concat([gold_examples_1,gold_examples_2,gold_examples_3,gold_examples_4,gold_examples_5 ])

# Create the training set by excluding gold examples
test_df = data.drop(index=gold_examples_df.index)

# Convert gold examples to JSON
columns_to_select = ['narrative', 'product']
gold_examples_json = gold_examples_df[columns_to_select].to_json(orient='records')

# Print the first record from the JSON
print(json.loads(gold_examples_json)[0])

# Print the shapes of the datasets
print("Test Set Shape:", examples_df.shape)
print("Gold Examples Shape:", gold_examples_df.shape)

Define your **system_message**.

Define **first_turn_template**, **example_template** and **prediction template**

**create few shot prompt** using gold examples and system_message

Randomly select 50 rows from test_df as test_data

Create **mistral_response** and **mistral_response_cleaned** columns

In [None]:
system_message = "______ "

In [None]:
first_turn_template = "______ "
examples_template = "______ "
prediction_template = "______ "

In [None]:
def create_few_shot_prompt(system_message, examples):

    """
    Return a prompt message in the format expected by Mistral 7b.
    10 examples are selected randomly as golden examples to form the
    few-shot prompt.
    We then loop through each example and parse the narrative as the user message
    and the product as the assistant message.

    Args:
        system_message (str): system message with instructions for classification
        examples(DataFrame): A DataFrame with examples (product + narrative + summary)
        to form the few-shot prompt.

    Output:
        few_shot_prompt (str): A prompt string in the Mistral format
    """

    few_shot_prompt = ''

    columns_to_select = "__________"

    examples = (
        examples_df.loc[:, columns_to_select].to_json(orient='records')
    )

    for idx, example in enumerate(json.loads(examples)):
        user_input_example = "__________"
        assistant_output_example = "__________"

        if idx == 0:
            few_shot_prompt += mistral_first_turn_template.format(
                system_message=system_message,
                user_message=user_input_example,
                assistant_message=assistant_output_example
            )
        else:
            few_shot_prompt += mistral_examples_template.format(
                user_message=user_input_example,
                assistant_message=assistant_output_example
            )

    return few_shot_prompt

In [None]:
few_shot_prompt = "______ "

In [None]:
def generate_prompt(few_shot_prompt,new_review):
    prompt =  "______ "
    return prompt

In [None]:
def generate_mistral_response(input_text):

    # Combine user_prompt and system_message to create the prompt
    prompt = "__________"

    # Generate a response from the LLaMA model
    response = lcpp_llm(
    )

    # Extract and return the response text
    response_text = response["choices"][0][______]  ### Fill in the blank
    return response_text

In [None]:
# Randomly select 50 rows
test_data = data.sample(n=50, random_state=40)

In [None]:
test_data['mistral_response'] = "______ "

In [None]:
test_data['mistral_response_cleaned'] = "______ "

### **Calculate F1 score (1 Mark)**

In [None]:
# Calculate F1 score for 'product' and 'mistral_response_cleaned'
f3 =  "______ "
print(f'F1 Score: {f3}')

### **Q17: Share your observations on the few-shot and zero-shot prompt techniques. (1 Marks)**

## **Section 4: Text to Text generation (5 Marks)**

Define a **system message** as a string and assign it to the variable system_message to generate product class.** (1 Mark)**

Create a **zero shot prompt template** that incorporates the system message and user input.

Define **generate_prompt** function that takes both the system_message and user_input as arguments and formats them into a prompt template


Write a Python function called **generate_mistral_response** that takes a single parameter, narrative, which represents the user's complain. Inside the function, you should perform the following tasks:


- **Combine the system_message and narrative to create a prompt string using generate_prompt function.**

*Generate a response from the Mistral model using the lcpp_llm instance with the following parameters:*

- prompt should be the combined prompt string.
- max_tokens should be set to 1200.
- temperature should be set to 0.
- top_p should be set to 0.95.
- repeat_penalty should be set to 1.2.
- top_k should be set to 50.
- stop should be set as a list containing '/s'.
- echo should be set to False.
Extract and return the response text from the generated response.

Don't forget to provide a value for the system_message variable before using it in the function.

In [None]:
system_message = "__________"

In [None]:
zero_shot_prompt_template = "__________"

In [None]:
# Define function that combines user_prompt and system_message to create the prompt
def generate_prompt(system_message,user_input):
    prompt = "__________"
    return prompt

In [None]:
def generate_mistral_response(input_text):

    # Combine user_prompt and system_message to create the prompt
    prompt = "__________"

    # Generate a response from the LLaMA model
    response = lcpp_llm(
    )

    # Extract and return the response text
    response_text = response["choices"][0][______]  ### Fill in the blank
    return response_text

### **Q19: Generate mistral_response column containing LLM generated summaries** **(1 Marks)**

In [None]:
# Randomly select 50 rows
test_data = data.sample(n=50, random_state=40)

In [None]:
test_data['mistral_response'] = "______ "

### **Q20: Evaluate bert score** **(2 Marks)**

In [None]:
def evaluate_score(result, scorer, bert_score=False):

    """
    Return the ROUGE score or BERTScore for predictions on gold examples
    For each example we make a prediction using the prompt.
    Gold summaries and the AI generated summaries are aggregated into lists.
    These lists are used by the corresponding scorers to compute metrics.
    Since BERTScore is computed for each candidate-reference pair, we take the
    average F1 score across the gold examples.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        scorer (function): Scorer function used to compute the ROUGE score or the
                           BERTScore
        bert_score (boolean): A flag variable that indicates if BERTScore should
                              be used as the metric.

    Output:
        score (float): BERTScore or ROUGE score computed by comparing model predictions
                       with ground truth
    """

    model_predictions = result['mistral_response'].tolist
    ground_truths = result['summary'].tolist()
    if bert_score:
        score = scorer.compute(
            predictions=model_predictions,
            references=ground_truths,
            lang="en",
            rescale_with_baseline=True
        )

        return sum(score['f1'])/len(score['f1'])
    else:
        return scorer.compute(
            predictions=model_predictions,
            references=ground_truths
        )

In [None]:
bert_scorer = "__________"

In [None]:
score = "__________"
print(f'BERTScore: {score}')

### **Q21: Write your observation** **(1 Marks)**