In [1]:
import json
import requests
import os
import time
import getpass
from huggingface_hub import notebook_login

from datasets import load_dataset
from fireworks.client import Fireworks
from pydantic import BaseModel, Field
from transformers import AutoTokenizer

In [2]:
os.environ["FIREWORKS_API_KEY"] = getpass.getpass("fireworks api:")

In [3]:
client = Fireworks(api_key=os.environ["FIREWORKS_API_KEY"])

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [4]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
# This is a datset containing human preferences from the chatbot arena. When a human types a message, they are sent responses from two
# different chatbots. The human then votes on which response they prefer. Throughout this course, I am going to fine-tune a model to predict
# human chatbot preferences. For this week's assignment, I will perform prompt engineering to get a baseline of how well llama3-8b-instruct
# performs at this task before performing fine-tuning
# For more details on the dataset: https://huggingface.co/datasets/lmsys/chatbot_arena_conversations
dataset = load_dataset("lmsys/chatbot_arena_conversations")['train']

In [6]:
# For simplicity, I am only going to look at single-turn chats, where the user declared a winner after a single response from the bot.
examples = [example for example in dataset if example['turn'] == 1]

# The query the user sent to both bots should be exactly the same, so that we are fairly judging the responses. This should be always be
# the case for this dataset. This line just acts as a sanity check.
examples = [example for example in examples if example['conversation_a'][0]['content'] == example['conversation_b'][0]['content']]

# We take different examples for the train/validation/test sets
training_examples = examples[:2000]
validation_examples = examples[-1000:]
test_examples = examples[-2000:-1000]

In [7]:
sys_msg = f'''Choose the better chatbot response between model_a and model_b.

Your response MUST be ONLY the JSON object {{"winner": XXX}}. XXX can only equal "model_a", "model_b", "tie", or "tie (bothbad)".'''

def get_user_msg(example):
    user_query = example['conversation_a'][0]['content']
    model_a_response = example['conversation_a'][1]['content']
    model_b_response = example['conversation_b'][1]['content']
    user_msg = f"""user query: {user_query}

model_a response: {model_a_response}

model_b response: {model_b_response}"""
    return user_msg

# Even though this is a classification task, the chat completions api from Fireworks is the most general and most well developed,
# and performs the best
def create_messages(example):
    user_msg = get_user_msg(example)
    asst_msg = json.dumps({"winner": example['winner']})

    return {"messages": [
        {"role": "system", "content": sys_msg}, 
        {"role": "user", "content": user_msg}, 
        {"role": "assistant", "content": asst_msg}
    ]}

In [8]:
# Converts the training examples to the format expected by Fireworks. See https://readme.fireworks.ai/docs/fine-tuning-models#conversation
def training_examples_to_json(examples):
    json_objs = list()
    for example in examples:  
        msg = create_messages(example)
        json_objs.append(msg)
    
    print(f'Total tokens: {sum([len(tokenizer.tokenize(json.dumps(obj))) for obj in json_objs])}')
    return json_objs

training_json = training_examples_to_json(training_examples)

training_json

Total tokens: 1052349


[{'messages': [{'role': 'system',
    'content': 'Choose the better chatbot response between model_a and model_b.\n\nYour response MUST be ONLY the JSON object {"winner": XXX}. XXX can only equal "model_a", "model_b", "tie", or "tie (bothbad)".'},
   {'role': 'user',
    'content': 'user query: What is the difference between OpenCL and CUDA?\n\nmodel_a response: OpenCL and CUDA are two different programming models that are used for parallel computing.OpenCL is a general-purpose并行编程接口 that allows developers to write parallel code that can run on any platform that supportsCL, which includes most modern operating systems and computer systems, including Windows, Linux, and macOS. It provides a lower-level, more flexible API that is more suitable for building large-scale distributed computing systems.CUDA is a specific implementation ofOpenCL that is designed for performance and scalability in devices with multiple GPU(s). It was developed by Nvidia and is widely used for scientific computi

In [32]:
# Writes the data to a file so that it can be uploaded to Fireworks
dataset_file_name = 'chatbot_arena_training_data.jsonl'
dataset_id = 'chatbot-arena-v4'

with open(dataset_file_name, 'w') as f:
    for obj in training_json:
        json.dump(obj, f)
        f.write('\n')

In [33]:
!firectl create dataset {dataset_id} {dataset_file_name}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


3.98 MiB / 3.98 MiB [--------------------------------] 100.00% 3.08 MiB p/s 1.5s


In [34]:
!firectl get dataset {dataset_id}

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Name: accounts/m44rkt1-d483d1/datasets/chatbot-arena-v4
Display Name: 
Create Time: 2024-06-27 20:55:14
State: READY
Status: OK
Example Count: 2000


In [40]:
# Creates a training job with the default hyperparameters
# Uncomment out to run (prints my api key to stdout, so commenting it out for the demo).
!firectl create fine-tuning-job --settings-file chatbot_arena_training_v1.yaml --display-name chatbot-arena-v4 --dataset {dataset_id}

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Name: accounts/m44rkt1-d483d1/fineTuningJobs/c2c7013101774cd19f0a18cc2f109a29
Display Name: chatbot-arena-v4
Create Time: 2024-06-27 21:01:38
State: CREATING
Dataset: accounts/m44rkt1-d483d1/datasets/chatbot-arena-v4
Created By: m44rkt1@gmail.com
Container Version: 
Model Id: 
Wandb Url: https://wandb.ai/markat1/fine-tuning-workshop-1/groups/group-c2c7013101774cd19f0a18cc2f109a29/workspace
Conversation:
  Jinja Template: {%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }}
    {%- endif -%}
    {%- if message['role'] | upper not in message_roles -%}
        {{ raise_exception('Invalid role ' + message['role']|toj

In [51]:

model_v4_id = 'c2c7013101774cd19f0a18cc2f109a29'

!firectl get fine-tuning-job {model_v4_id}

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Name: accounts/m44rkt1-d483d1/fineTuningJobs/c2c7013101774cd19f0a18cc2f109a29
Display Name: chatbot-arena-v4
Create Time: 2024-06-27 21:01:38
State: COMPLETED
Dataset: accounts/m44rkt1-d483d1/datasets/chatbot-arena-v4
Status: OK
Created By: m44rkt1@gmail.com
Container Version: 
Model Id: 
Wandb Url: https://wandb.ai/markat1/fine-tuning-workshop-1/groups/group-c2c7013101774cd19f0a18cc2f109a29/workspace
Conversation:
  Jinja Template: {%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }}
    {%- endif -%}
    {%- if message['role'] | upper not in message_roles -%}
        {{ raise_exception('Invalid role ' + message

In [102]:
# Creates a training job with the increased rank, learning rate, and epochs
# Uncomment out to run (prints my api key to stdout, so commenting it out for the demo).
!firectl create fine-tuning-job --settings-file chatbot_arena_training_v2.yaml --display-name chatbot-arena-v7 --dataset {dataset_id}

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Name: accounts/m44rkt1-d483d1/fineTuningJobs/895e816bec3c4ed183cc562a1fb0d92e
Display Name: chatbot-arena-v7
Create Time: 2024-06-27 21:38:10
State: CREATING
Dataset: accounts/m44rkt1-d483d1/datasets/chatbot-arena-v4
Created By: m44rkt1@gmail.com
Container Version: 
Model Id: 
Wandb Url: https://wandb.ai/markat1/fine-tuning-workshop/groups/group-895e816bec3c4ed183cc562a1fb0d92e/workspace
Conversation:
  Jinja Template: {%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }}
    {%- endif -%}
    {%- if message['role'] | upper not in message_roles -%}
        {{ raise_exception('Invalid role ' + message['role']|tojso

In [104]:
model_v5_id = 'bfa866cd4b4841669796a0deee36d771'

!firectl get fine-tuning-job {model_v5_id}

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Name: accounts/m44rkt1-d483d1/fineTuningJobs/bfa866cd4b4841669796a0deee36d771
Display Name: chatbot-arena-v6
Create Time: 2024-06-27 21:24:38
State: COMPLETED
Dataset: accounts/m44rkt1-d483d1/datasets/chatbot-arena-v4
Status: OK
Created By: m44rkt1@gmail.com
Container Version: 
Model Id: 
Wandb Url: https://wandb.ai/markat1/fine-tuning-workshop/groups/group-bfa866cd4b4841669796a0deee36d771/workspace
Conversation:
  Jinja Template: {%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }}
    {%- endif -%}
    {%- if message['role'] | upper not in message_roles -%}
        {{ raise_exception('Invalid role ' + message['

In [64]:
!firectl deploy {model_v4_id}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


2024/06/27 21:17:53 Failed to execute: error deploying model: rpc error: code = FailedPrecondition desc = cannot deploy model in state: DEPLOYED


In [105]:
!firectl deploy {model_v5_id}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [106]:
!firectl list models


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


NAME                              CREATE TIME          KIND           CHAT  PUBLIC  STATE       STATUS MESSAGE
bfa866cd4b4841669796a0deee36d771  2024-06-27 21:36:41  HF_PEFT_ADDON  true  false   DEPLOYING   
c2c7013101774cd19f0a18cc2f109a29  2024-06-27 21:08:19  HF_PEFT_ADDON  true  false   DEPLOYED    
f863ef7eda2c46528b30e494548a4249  2024-06-27 21:33:52  HF_PEFT_ADDON  true  false   UNDEPLOYED  

Total size: 3


In [107]:
def get_results(examples, model_id):
    winners = list()
    
    for i, example in enumerate(examples):    
        user_msg = get_user_msg(example)

        response = client.chat.completions.create(
            model=model_id,
            messages=[
                {"role": "system", "content": sys_msg},
                {"role": "user", "content": user_msg},
            ],
            # setting temperature to 0 for this use case, so that responses are as deterministic as possible
            temperature=0, 
        )
        content = response.choices[0].message.content
    
        try:
            winner = json.loads(content.split('\n')[-1])["winner"]
            winners.append((i, winner))
        except:
            print(f"Failed to parse JSON for example {i}.")

    num_correct = sum([1 if winner[1] == examples[winner[0]]['winner'] else 0 for winner in winners])
    return winners, num_correct
        
num_to_eval = 500

In [108]:
!firectl signin

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Signed in as: m44rkt1@gmail.com
Account ID: m44rkt1-d483d1


In [109]:
# Determine how the base model without any fine-tuning performs
model_id = 'accounts/fireworks/models/llama-v3-8b-instruct'

train_results, train_num_correct = get_results(training_examples[:num_to_eval], model_id)
print(f'Training Set Correct: {train_num_correct}')

validation_results, validation_num_correct = get_results(validation_examples[:num_to_eval], model_id)
print(f'Validation Set Correct: {validation_num_correct}')

Training Set Correct: 303
Validation Set Correct: 251


In [110]:
# Determine how the fine-tuned model performs with the default fine-tuning params
model_id = f'accounts/m44rkt1-d483d1/models/{model_v4_id}'

train_results, train_num_correct = get_results(training_examples[:num_to_eval], model_id)
print(f'Training Set Correct: {train_num_correct}')

validation_results, validation_num_correct = get_results(validation_examples[:num_to_eval], model_id)
print(f'Validation Set Correct: {validation_num_correct}')

Failed to parse JSON for example 127.
Training Set Correct: 336
Validation Set Correct: 273


In [111]:
# Determine how the base model performs with the increases rank, epochs, and learning rate
model_id = f'accounts/m44rkt1-d483d1/models/{model_v5_id}'

train_results, train_num_correct = get_results(training_examples[:num_to_eval], model_id)
print(f'Training Set Correct: {train_num_correct}')

validation_results, validation_num_correct = get_results(validation_examples[:num_to_eval], model_id)
print(f'Validation Set Correct: {validation_num_correct}')

Training Set Correct: 364
Validation Set Correct: 286


In [113]:
!firectl undeploy {model_v4_id}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [114]:
!firectl undeploy {model_v5_id}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [115]:
!firectl list models


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


NAME                              CREATE TIME          KIND           CHAT  PUBLIC  STATE        STATUS MESSAGE
895e816bec3c4ed183cc562a1fb0d92e  2024-06-27 21:52:06  HF_PEFT_ADDON  true  false   UNDEPLOYED   
bfa866cd4b4841669796a0deee36d771  2024-06-27 21:36:41  HF_PEFT_ADDON  true  false   UNDEPLOYING  
c2c7013101774cd19f0a18cc2f109a29  2024-06-27 21:08:19  HF_PEFT_ADDON  true  false   UNDEPLOYING  
f863ef7eda2c46528b30e494548a4249  2024-06-27 21:33:52  HF_PEFT_ADDON  true  false   UNDEPLOYED   

Total size: 4


In [118]:
!firectl list models


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


NAME                              CREATE TIME          KIND           CHAT  PUBLIC  STATE       STATUS MESSAGE
bfa866cd4b4841669796a0deee36d771  2024-06-27 21:36:41  HF_PEFT_ADDON  true  false   UNDEPLOYED  
c2c7013101774cd19f0a18cc2f109a29  2024-06-27 21:08:19  HF_PEFT_ADDON  true  false   UNDEPLOYED  

Total size: 2
