## Setup

To complete the following guide you will need to install the following packages:
- fireworks-ai
- pandas
- requests

You will also need:

- Fireworks account (https://fireworks.ai/)
- Fireworks API key
- The firectl command-line interface (https://docs.fireworks.ai/tools-sdks/firectl/firectl)

In [1]:
!pip install pandas requests fireworks-ai --quiet

You should consider upgrading via the '/Users/scottkramer/.pyenv/versions/3.8.16/envs/fine-tuning-workshop/bin/python3.8 -m pip install --upgrade pip' command.[0m[33m
[0m

In [2]:
import json
import os
import time

from fireworks.client import Fireworks
import pandas as pd
import requests

In [29]:
# Sign-in to your Fireworks account
!firectl signin

2024/09/12 12:51:31 There are updates available.
Current version: 1.2.0
Latest version: 1.3.0

To upgrade to the latest version, run
  $ sudo firectl upgrade

Signed in as: sdkramer10@gmail.com
Account ID: sdkramer10-5e98cb


In [143]:
# Make sure you have the FIREWORKS_API_KEY environment variable set to your account's key!
# os.environ['FIREWORKS_API_KEY'] = 'XXX'

client = Fireworks()

# Replace the line below with your Fireworks account id
account_id = 'XXX'

## Problem Definition: Dynamic Insurance Support Ticket Classifier

*Note: The problem definition, data, and labels used in this example were synthetically generated by Claude 3 Opus*

In the insurance industry, customer support plays a crucial role in ensuring client satisfaction and retention. Insurance companies receive a high volume of support tickets daily, covering a wide range of topics such as billing, policy administration, claims assistance, and more. Manually categorizing these tickets can be time-consuming and inefficient, leading to longer response times and potentially impacting customer experience.

### Task
In the week 2 folder, we perform static support ticket classification, where the list of possible categories are injectd into the prompt.
In this example, we do not include the categories in the prompt, and instead make determining the categories a generative task.

There are three primary differences between this notebook and the week 2 notebook that contains static labels:
- Remove 'General Inquiries' as a category from the training data. Otherwise, the model will learn to place all tickets that do not fall into the other categories as 'General Inquiries'
- Remove the list of possible categories from the prompt
- Increase the aggressiveness of training (rank, learning rate, and number of epochs). This ensures the model is biased towards responding with a category that exists in the training data, and only responds with a different category when necessary. Lowering these parameters will cause the model to more aggressively respond with categories that were not in the training data.

#### Labeled Data

The data can be found in the `data` folder.

We will use the following datasets:
- `./data/train.tsv`
- `./data/test.tsv`

In [144]:
training_examples = pd.read_csv('data/train.tsv', sep='\t')
test_examples = pd.read_csv('data/test.tsv', sep='\t')

# Don't include General Inquiries in the training data, otherwise the fine-tuned model will just label all new categories as General Inquiries
training_examples = training_examples[training_examples['label'] != 'General Inquiries']
test_examples = test_examples[test_examples['label'] != 'General Inquiries']

training_tickets = training_examples['text'].tolist()
training_labels = training_examples['label'].tolist()

test_tickets = test_examples['text'].tolist()
test_labels = test_examples['label'].tolist()

training_categories = set(training_labels)

In [135]:
print(training_tickets[0])
print(training_labels[0])

I just got my auto policy renewal bill and the cost seems to be more than what I usually pay. Could you explain the reason for the increase?
Billing Inquiries


### Dataset Curation

We first must transform our dataset into the format expected by Fireworks, and then upload the dataset. The dataset must conform to the schema expected by the Chat Completions API.

See https://docs.fireworks.ai/fine-tuning/fine-tuning-models#conversation for more details

In [137]:
def create_prompt(ticket):
    return f"""Classify the customer support ticket.

Here is the customer support ticket:    
<ticket>{ticket}</ticket>

Respond using this format:
<category>The category label you chose goes here</category>"""    

In [138]:
# Converts the training examples to the format expected by Fireworks.
def training_examples_to_json(examples):
    json_objs = list()
    for idx, example in examples.iterrows():  
        user_msg = create_prompt(example['text'])
        asst_msg = f"<category>{example['label']}</category>"

        if example['label'] == "General Inquiries":
            continue
            
        msg = {"messages": [
            {"role": "user", "content": user_msg}, 
            {"role": "assistant", "content": asst_msg}
        ]}
        json_objs.append(msg)
    
    return json_objs

training_json = training_examples_to_json(training_examples)

In [79]:
# Writes the data to a file so that it can be uploaded to Fireworks
dataset_file_name = 'ticket-classification_training_data.jsonl'
dataset_id = 'dynamic-ticket-classification-v1'

with open(dataset_file_name, 'w') as f:
    for obj in training_json:
        json.dump(obj, f)
        f.write('\n')

In [82]:
# Follow instructions here to first install the firectil CLI - https://readme.fireworks.ai/docs/fine-tuning-models#installing-firectl
# Then run this command to upload the file to Fireworks
!firectl create dataset {dataset_id} {dataset_file_name}

### Fine-Tuning

We will now fine-tune models using the Fireworks API. Fireworks implements the QLoRA algorithm through a simple interface. Training parameters can be set via the --settings-file argument. In this exercise, we the increase the rank, learning rate, and epochs to make the fine-tuning more aggressively learn the categories we trained on. Otherwise, the fine-tuned model will create new categories too often.

See https://docs.fireworks.ai/fine-tuning/fine-tuning-models#starting-your-tuning-job for more details

In [83]:
# Creates a training job with the default hyperparameters
!firectl create fine-tuning-job --settings-file ticket_classification.yaml --display-name dynamic-ticket-classification --dataset {dataset_id} 

In [84]:
# NOTE THAT THESE IDS WILL CHANGE WHEN YOU RUN THE FINE-TUNING JOB ON YOUR ACCOUNT!!!
# The model id is printed in the stdout of the cell above as Name: accounts/{account_id}/fineTuningJobs/{model_id}
model_id = '0933f6f375534b62996ce23f4a8dfb09'

In [103]:
# Wait until the State of the two fine-tuning jobs are listed as COMPLETED (~10-20 minutes)
!firectl get fine-tuning-job {model_id}

### Evluate Results

We will now deploy our models and evaluate the results. In addition to evaluating the fine-tuned model accuracy, we'll also test a new ticket that does not fall into any of the tickets that we fine-tuned on. We'll see that our model correctly creates a new category for this ticket called "App Issues".

See https://docs.fireworks.ai/fine-tuning/fine-tuning-models#deploying-the-model-for-inference for more details

In [102]:
# Deploy the model to a Fireworks serverless endpoint
!firectl deploy {model_id}

In [139]:
# Wait until the the Deploymed Model Refs lists the state of the models as "DEPLOYED" (~5-20 minutes).
!firectl get model {model_id}

In [106]:
# Uses an LLM to predicted class labels for a list of support tickets
def classify_tickets(tickets, model):
    responses = list()

    for ticket in tickets:
        user_prompt = create_prompt(ticket)
    
        response = client.chat.completions.create(
            model=model,
            messages=[
                { "role": "user", "content": user_prompt}
            ],
            # setting temperature to 0 for this use case, so that responses are as deterministic as possible
            temperature=0, 
            stop=["</category>"],
            max_tokens=2048,
        )
        response = response.choices[0].message.content.split("<category>")[-1].strip()
        responses.append(response)

    return responses


# Calculates the percent of predictions we classified correctly
def evaluate_accuracy(predicted, actual):
    num_correct = sum([predicted[i] == actual[i] for i in range(len(actual))])
    return round(100 * num_correct / len(actual), 2)

In [107]:
# Determine how the fine-tuned model performs
model_id = f'accounts/{account_id}/models/{model_id}'

training_responses = classify_tickets(
    tickets=training_tickets, 
    model=model_id
)
accuracy = evaluate_accuracy(training_responses, training_labels)
print(f"Training Set Accuracy: {accuracy}%")

test_responses = classify_tickets(
    tickets=test_tickets, 
    model=model_id
)

accuracy = evaluate_accuracy(test_responses, test_labels)
print(f"Test Set Accuracy: {accuracy}%")

Training Set Accuracy: 76.47%
Test Set Accuracy: 77.94%


In [130]:
# Test a new ticket that does not belong to any of the categories we trained on, and observe how our model labels this new ticket.
app_errors_ticket_description = "I keep getting an error that says '404 not found' when opening the mobile app."

app_errors_classification = classify_tickets(
    tickets=[tech_support_ticket_description],
    model=model_id
)[0]

print('New Ticket Category')
print(app_errors_classification)

print('\nCategories Trained On:')
print('\n'.join(training_categories))

New Ticket Category
App Errors

Categories Trained On:
Coverage Explanations
Quotes and Proposals
Policy Administration
Claims Assistance
Claims Disputes
Billing Inquiries
Billing Disputes
Policy Comparisons
General Inquiries
Account Management


In [142]:
# Undeploy the first model (does not cost anything extra, but Fireworks may limit your number of deployed models).
!firectl undeploy {model_id}