## Setup

To complete the following guide you will need to install the following packages:
- fireworks-ai
- pandas
- requests

You will also need:

- Fireworks account (https://fireworks.ai/)
- Fireworks API key
- The firectl command-line interface (https://docs.fireworks.ai/tools-sdks/firectl/firectl)

In [2]:
#!pipenv install pandas requests fireworks-ai

In [1]:
import json
import os
import time

from fireworks.client import Fireworks
import pandas as pd
import requests

In [2]:
# Sign-in to your Fireworks account
!firectl signin

Signed in as: jayozer@gmail.com
Account ID: jayozer-ce1cd6


In [3]:
!firectl whoami

Signed in as: jayozer@gmail.com
Account ID: jayozer-ce1cd6


In [4]:
# Make sure you have the FIREWORKS_API_KEY environment variable set to your account's key!
# os.environ['FIREWORKS_API_KEY'] = 'XXX'

from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FIREWORKS_API_KEY')

client = Fireworks(api_key=api_key)

# Replace the line below with your Fireworks account id
account_id = os.getenv('FIREWORKS_ACCOUNT_ID')

## Problem Definition: Pediatric dentistry corpus finetuning

*Note: The problem definition, data, and labels used in this example were synthetically generated by Claude 3 Opus*

Short Q&A answers for a pediatric dentistry chatbot. 

### Task
Increase the accuracy of inference on the test.tsv dataset using finetuning. 

#### Labeled Data

We will use the following datasets:
- `./pk_data/train.tsv`
- `./pk_data/test.tsv`

- Main q&A pair data set is in ./pk_data/clean_faq_dataset.txt

# Create the test and train data

In [5]:
import random

input_file = '/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/clean_faq_dataset.txt'
train_file = '/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/train.tsv'
test_file = '/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/test.tsv'

def read_qa_pairs(file_path):
    qa_pairs = []
    with open(file_path, 'r', encoding='utf-8') as file:
        current_question = None
        current_answer = None
        for line in file:
            line = line.strip()
            if line.startswith('Question:'):
                if current_question and current_answer:
                    qa_pairs.append((current_question, current_answer))
                current_question = line[9:].strip()
                current_answer = None
            elif line.startswith('Answer:'):
                current_answer = line[7:].strip()
        if current_question and current_answer:
            qa_pairs.append((current_question, current_answer))
    return qa_pairs

def write_tsv(file_path, data):
    with open(file_path, 'w', encoding='utf-8', newline='') as file:
        file.write('question\tanswer\n')  # Header
        for question, answer in data:
            file.write(f'{question}\t{answer}\n')

# Read Q&A pairs
qa_pairs = read_qa_pairs(input_file)

# Shuffle the data
random.shuffle(qa_pairs)

# Calculate split index
split_index = int(len(qa_pairs) * 0.8)

# Split the data
train_data = qa_pairs[:split_index]
test_data = qa_pairs[split_index:]

# Write train and test data
write_tsv(train_file, train_data)
write_tsv(test_file, test_data)

print(f"Train data ({len(train_data)} pairs) written to {train_file}")
print(f"Test data ({len(test_data)} pairs) written to {test_file}")

Train data (675 pairs) written to /Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/train.tsv
Test data (169 pairs) written to /Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/test.tsv


### finetuning training dataset curation

We first must transform our dataset into the format expected by Fireworks, and then upload the dataset. The dataset must conform to the schema expected by the Chat Completions API.

See https://docs.fireworks.ai/fine-tuning/fine-tuning-models#conversation for more details

In [6]:
import json
import csv

input_file = '/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/train.tsv'
output_file = '/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/pk_faq_training_data.jsonl'

def process_qa_pair(question, answer):
    return {
        "messages": [
            {"role": "system", "content": "You are Poppy, a helpful assistant for Poppy Kids Pediatric Dentistry."},
            {"role": "user", "content": question},
            {"role": "assistant", "content": answer}
        ]
    }

# Read TSV and write JSONL
with open(input_file, 'r', encoding='utf-8') as infile, open(output_file, 'w', encoding='utf-8') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    next(reader)  # Skip header row
    
    for row in reader:
        if len(row) == 2:
            question, answer = row
            formatted_data = process_qa_pair(question, answer)
            json.dump(formatted_data, outfile, ensure_ascii=False)
            outfile.write('\n')

print(f"Formatted data has been written to {output_file}")

Formatted data has been written to /Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/pk_faq__training_data.jsonl


#test data curation

In [41]:
import json
import csv

input_file = '/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/test.tsv'
output_file = '/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/pk_faq_test_data.jsonl'

def process_qa_pair(question, answer):
    return {
        "messages": [
            {"role": "system", "content": "You are Poppy, a helpful assistant for Poppy Kids Pediatric Dentistry."},
            {"role": "user", "content": question},
            {"role": "assistant", "content": answer}
        ]
    }

# Read TSV and write JSONL
with open(input_file, 'r', encoding='utf-8') as infile, open(output_file, 'w', encoding='utf-8') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    next(reader)  # Skip header row
    
    for row in reader:
        if len(row) == 2:
            question, answer = row
            formatted_data = process_qa_pair(question, answer)
            json.dump(formatted_data, outfile, ensure_ascii=False)
            outfile.write('\n')

print(f"Formatted data has been written to {output_file}")

Formatted data has been written to /Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/pk_faq_test_data.jsonl


In [13]:
# Writes the data to a file so that it can be uploaded to Fireworks
dataset_file_name = '/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/pk_faq_training_data.jsonl'
dataset_id = 'pk-faq-v1'  # The dash vs underscore issue. Underscore does not work, must be dashes. 

# Read the existing file to verify its contents
with open(dataset_file_name, 'r') as f:
    training_json = [json.loads(line) for line in f]

print(f"Number of training examples: {len(training_json)}")
print(f"First example: {training_json[0]}")


Number of training examples: 675
First example: {'messages': [{'role': 'system', 'content': 'You are Poppy, a helpful assistant for Poppy Kids Pediatric Dentistry.'}, {'role': 'user', 'content': 'Why are fillings necessary for baby teeth?'}, {'role': 'assistant', 'content': "Fillings in baby teeth are necessary to treat cavities, prevent discomfort, pain, and further dental issues such as infections, and to ensure the overall health and functionality of a child's teeth."}]}


In [14]:
# Follow instructions here to first install the firectil CLI - https://readme.fireworks.ai/docs/fine-tuning-models#installing-firectl
# Then run this command to upload the file to Fireworks
!firectl create dataset {dataset_id} {dataset_file_name}

301.87 KiB / 301.87 KiB [---------------------------] 100.00% 2.07 MiB p/s 300ms


### Fine-Tuning

We will now fine-tune models using the Fireworks API. Fireworks implements the QLoRA algorithm through a simple interface. Training parameters can be set via the --settings-file argument. In this exercise, we will fine-tune two models:
- The first model will use Fireworks default training parameters
- The second model will increase the rank, learning rate, and epochs to make fine-tuning more aggressive

See https://docs.fireworks.ai/fine-tuning/fine-tuning-models#starting-your-tuning-job for more details

In [16]:
# Creates a training job with the default hyperparameters
!firectl create fine-tuning-job --settings-file /Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/pk-faq-v1.yaml --display-name pk-faq-v1 --dataset {dataset_id} 

Name: accounts/jayozer-ce1cd6/fineTuningJobs/f0ddb99646244b33a86f9df7edc0faa5
Display Name: pk-faq-v1
Create Time: 2024-09-17 06:08:06
State: CREATING
Dataset: accounts/jayozer-ce1cd6/datasets/pk-faq-v1
Datasets: []
Status: OK
Created By: jayozer@gmail.com
Container Version: 
Model Id: 
Conversation:
  Jinja Template: {%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }}
    {%- endif -%}
    {%- if message['role'] | upper not in message_roles -%}
        {{ raise_exception('Invalid role ' + message['role']|tojson + '. Only ' + message_roles|tojson + ' are supported.') }}
    {%- endif -%}
    {%- if 'content' not

In [17]:
# Creates a training job with the increased rank, learning rate, and epochs
!firectl create fine-tuning-job --settings-file /Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/pk-faq-v2.yaml --display-name pk-faq-v2 --dataset {dataset_id} 

Name: accounts/jayozer-ce1cd6/fineTuningJobs/427f1b67937e40a5bb3ef06a0c3770d5
Display Name: pk-faq-v2
Create Time: 2024-09-17 06:08:29
State: CREATING
Dataset: accounts/jayozer-ce1cd6/datasets/pk-faq-v1
Datasets: []
Status: OK
Created By: jayozer@gmail.com
Container Version: 
Model Id: 
Conversation:
  Jinja Template: {%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }}
    {%- endif -%}
    {%- if message['role'] | upper not in message_roles -%}
        {{ raise_exception('Invalid role ' + message['role']|tojson + '. Only ' + message_roles|tojson + ' are supported.') }}
    {%- endif -%}
    {%- if 'content' not

In [20]:
# v1 is the id of the training job with default hyperparameters, v2 is with the increased settings
# NOTE THAT THESE IDS WILL CHANGE WHEN YOU RUN THE FINE-TUNING JOB ON YOUR ACCOUNT!!!
# The model id is printed in the stdout of the cell above as Name: accounts/{account_id}/fineTuningJobs/{model_id}
model_v1_id = 'f0ddb99646244b33a86f9df7edc0faa5'
model_v2_id = '427f1b67937e40a5bb3ef06a0c3770d5'

In [21]:
# Wait until the State of the two fine-tuning jobs are listed as COMPLETED (~10-20 minutes)
!firectl get fine-tuning-job {model_v1_id}

Name: accounts/jayozer-ce1cd6/fineTuningJobs/f0ddb99646244b33a86f9df7edc0faa5
Display Name: pk-faq-v1
Create Time: 2024-09-17 06:08:06
State: COMPLETED
Dataset: accounts/jayozer-ce1cd6/datasets/pk-faq-v1
Datasets: []
Status:
  Code: OK
  Message: {'train_runtime': 56.6165, 'train_samples_per_second': 11.922, 'train_steps_per_second': 0.742, 'total_flos': 3057560957485056.0, 'train_loss': 1.6841438100451516, 'epoch': 0.9882352941176471}
Created By: jayozer@gmail.com
Container Version: 
Model Id: f0ddb99646244b33a86f9df7edc0faa5
Conversation:
  Jinja Template: {%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }}
  

### Evluate Results

We will now deploy our models and evaluate the results. We will calculate the accuracy on three different models

- The base model without any fine-tuning
- Our first fine-tuned model, with the default hyperparameters
- Our second fine-tuned model, with the more aggressive hyperparameters

See https://docs.fireworks.ai/fine-tuning/fine-tuning-models#deploying-the-model-for-inference for more details

In [22]:
# Deploy the first model to a Fireworks serverless endpoint
!firectl deploy {model_v1_id}

In [23]:
# Deploy the second model to a Fireworks serverless endpoint
!firectl deploy {model_v2_id}

In [31]:
# Wait until the the Deploymed Model Refs lists the state of the models as "DEPLOYED" (~5-20 minutes).
!firectl get model {model_v1_id}

Name: accounts/jayozer-ce1cd6/models/f0ddb99646244b33a86f9df7edc0faa5
Display Name: 
Description: 
Create Time: 2024-09-17 06:13:42
Created By: 
State: READY
Status: OK
Kind: HF_PEFT_ADDON
Github Url: 
Hugging Face Url: 
Base Model Details:
  World Size: 0
  Checkpoint Format: CHECKPOINT_FORMAT_UNSPECIFIED
  Parameter Count: 0
  Moe: false
Peft Details:
  Base Model: accounts/fireworks/models/llama-v3-8b-instruct-hf
  R: 8
  Target Modules: [q_proj, v_proj, gate_proj, o_proj, k_proj, down_proj, up_proj]
Public: false
Conversation Config:
  Style: jinja
  System: 
  Template: 
Context Length: 8192
Supports Image Input: false
Supports Tools: false
Imported From: 
Fine Tuning Job: accounts/jayozer-ce1cd6/fineTuningJobs/f0ddb99646244b33a86f9df7edc0faa5
Default Draft Model: 
Default Draft Token Count: 0
Precisions: []
Deployed Model Refs: 
  [{
    Name: accounts/jayozer-ce1cd6/deployedModels/f0ddb99646244b33a86f9df7edc0faa5-d6de0bd7
    Deployment: accounts/fireworks/deployments/21c38bed
 

In [32]:
from fireworks.client import Fireworks
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variable
api_key = os.getenv('FIREWORKS_API_KEY')

In [36]:
! firectl get model accounts/fireworks/models/llama-v3-8b-instruct

Name: accounts/fireworks/models/llama-v3-8b-instruct
Display Name: Llama 3 8B Instruct
Description: Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
Create Time: 2024-04-18 13:29:41
Created By: yingliu@fireworks.ai
State: READY
Status: OK
Kind: HF_BASE_MODEL
Github Url: https://github.com/meta-llama/llama3
Hugging Face Url: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
Base Model Details:
  World Size: 1
  Checkpoint Format: NATIVE
  Parameter Count: 0
  Moe: false
Public: true
Conversation Config:
  Style: jinja
  System: 
  Template: 
Context Length: 8192
Supports Image Input: false
Supports Tools: false
Imported From: 
Fine Tuning Job: 
Default Draft Model: 
Default Draft Token Cou

In [38]:
from fireworks.client import Fireworks
import os

# Make sure you have the FIREWORKS_API_KEY environment variable set
api_key = os.getenv('FIREWORKS_API_KEY')

client = Fireworks(api_key=api_key)

model_id = 'accounts/fireworks/models/llama-v3-8b-instruct'

try:
    # Attempt to create a chat completion with the model
    response = client.chat.completions.create(
        model=model_id,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, world!"}
        ],
        max_tokens=10
    )
    print("You have access to the model.")
    print("Model response:", response.choices[0].message.content)
except Exception as e:
    print("You might not have access to this model.")
    print("Error:", str(e))

You have access to the model.
Model response: Hello there! Nice to meet you! Is there


In [40]:
def get_model_response(question, model):
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are Poppy, a helpful assistant for Poppy Kids Pediatric Dentistry."},
            {"role": "user", "content": question}
        ],
        temperature=0,
        max_tokens=2048,
    )
    return response.choices[0].message.content.strip()

def evaluate_accuracy(predicted_answers, actual_answers):
    correct = sum(1 for pred, actual in zip(predicted_answers, actual_answers) if pred.lower() == actual.lower())
    return round(100 * correct / len(actual_answers), 2)

# Load your train and test data
train_data = pd.read_csv('/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/train.tsv', sep='\t')
test_data = pd.read_csv('/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/test.tsv', sep='\t')

# Base model ID
base_model_id = 'accounts/fireworks/models/llama-v3-8b-instruct'

print("Evaluating Base Model...")

# Evaluate on training set
train_responses = [get_model_response(question, base_model_id) for question in train_data['question']]
train_accuracy = evaluate_accuracy(train_responses, train_data['answer'])
print(f"Training Set Accuracy: {train_accuracy}%")

# Evaluate on test set
test_responses = [get_model_response(question, base_model_id) for question in test_data['question']]
test_accuracy = evaluate_accuracy(test_responses, test_data['answer'])
print(f"Test Set Accuracy: {test_accuracy}%")

Evaluating Base Model...
Training Set Accuracy: 0.0%
Test Set Accuracy: 0.0%


# Check Finetune model v1:

In [27]:
# Determine how the fine-tuned model performs with the default fine-tuning params
model_id = f'accounts/{account_id}/models/{model_v1_id}'
print(f"Model ID: {model_id}")



Model ID: accounts/jayozer-ce1cd6/models/f0ddb99646244b33a86f9df7edc0faa5


In [None]:
def get_model_response(question, model):
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are Poppy, a helpful assistant for Poppy Kids Pediatric Dentistry."},
            {"role": "user", "content": question}
        ],
        temperature=0,
        max_tokens=2048,
    )
    return response.choices[0].message.content.strip()

def evaluate_accuracy(predicted_answers, actual_answers):
    correct = sum(1 for pred, actual in zip(predicted_answers, actual_answers) if pred.lower() == actual.lower())
    return round(100 * correct / len(actual_answers), 2)

# Load your train and test data
train_data = pd.read_csv('/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/train.tsv', sep='\t')
test_data = pd.read_csv('/Users/acrobat/Documents/GitHub/fine-tuning-workshop/poppykids/pk_data/test.tsv', sep='\t')

# Your fine-tuned model ID
account_id = os.getenv('FIREWORKS_ACCOUNT_ID')  # Make sure this environment variable is set
fine_tuned_model_id = 'f0ddb99646244b33a86f9df7edc0faa5'  # Replace with your actual fine-tuned model ID
full_fine_tuned_model_id = f'accounts/{account_id}/models/{fine_tuned_model_id}'

print("Evaluating Fine-tuned Model...")

# Evaluate on training set
train_responses = [get_model_response(question, full_fine_tuned_model_id) for question in train_data['question']]
train_accuracy = evaluate_accuracy(train_responses, train_data['answer'])
print(f"Training Set Accuracy: {train_accuracy}%")

# Evaluate on test set
test_responses = [get_model_response(question, full_fine_tuned_model_id) for question in test_data['question']]
test_accuracy = evaluate_accuracy(test_responses, test_data['answer'])
print(f"Test Set Accuracy: {test_accuracy}%")

In [54]:
# Determine how the fine-tuned model performs with the default fine-tuning params
model_id = f'accounts/{account_id}/models/{model_v1_id}'

training_responses = classify_tickets(
    tickets=training_tickets, 
    model=model_id
)
accuracy = evaluate_accuracy(training_responses, training_labels)
print(f"Training Set Accuracy: {accuracy}%")

test_responses = classify_tickets(
    tickets=test_tickets, 
    model=model_id
)

accuracy = evaluate_accuracy(test_responses, test_labels)
print(f"Test Set Accuracy: {accuracy}%")

Training Set Accuracy: 69.12%
Test Set Accuracy: 67.65%


In [55]:
# Determine how the base model performs with the increases rank, epochs, and learning rate
model_id = f'accounts/{account_id}/models/{model_v2_id}'

training_responses = classify_tickets(
    tickets=training_tickets, 
    model=model_id
)
accuracy = evaluate_accuracy(training_responses, training_labels)
print(f"Training Set Accuracy: {accuracy}%")

test_responses = classify_tickets(
    tickets=test_tickets, 
    model=model_id
)

accuracy = evaluate_accuracy(test_responses, test_labels)
print(f"Test Set Accuracy: {accuracy}%")

Training Set Accuracy: 61.76%
Test Set Accuracy: 55.88%


In [58]:
# Undeploy the first model (does not cost anything extra, but Fireworks may limit your number of deployed models).
!firectl undeploy {model_v1_id}

In [59]:
# Undeploy the second model (does not cost anything extra, but Fireworks may limit your number of deployed models).
!firectl undeploy {model_v2_id}