## Setup

To complete the following guide you will need to install the following packages:
- fireworks-ai
- pandas
- requests

You will also need:

- Fireworks account (https://fireworks.ai/)
- Fireworks API key
- The firectl command-line interface (https://docs.fireworks.ai/tools-sdks/firectl/firectl)

In [1]:
!pip install pandas requests fireworks-ai --quiet

You should consider upgrading via the '/Users/scottkramer/.pyenv/versions/3.8.16/envs/fine-tuning-workshop/bin/python3.8 -m pip install --upgrade pip' command.[0m[33m
[0m

In [2]:
import json
import os
import time

from fireworks.client import Fireworks
import pandas as pd
import requests

In [3]:
# Sign-in to your Fireworks account
!firectl signin

2024/11/18 17:40:06 There are updates available.
Current version: 1.2.0
Latest version: 1.4.2

To upgrade to the latest version, run
  $ sudo firectl upgrade

Signed in as: sdkramer10@gmail.com
Account ID: sdkramer10-5e98cb


In [4]:
# Make sure you have the FIREWORKS_API_KEY environment variable set to your account's key!
# os.environ['FIREWORKS_API_KEY'] = 'XXX'

client = Fireworks()

# Replace the line below with your Fireworks account id
account_id = 'XXX'

In [9]:
system_msg = "You are an intelligent Solana chat support agent. Use your own knowledge to answer each question with example code or useful links. Avoid any sales-speak and avoid explaining acronyms. When providing code examples, specify what language is used. ie ```javascript or ```rust \n\n"
user_msg = "Solana Program Upgrade Best Practices\n\nAre there any rules / best practices when upgrading programs in Solana? Coming from the Ethereum development space, it's really important in the EVM to make sure existing storage isn't reallocated to new variables as a best practice (of course there are edge cases with bitpacking etc), but rather to continue to add new variables.\r\n\r\nI am curious if there are similar restrictions / edge cases to watch out for when thinking about upgrading programs in Solana?"
assistant_msg = "As long as a solana progamm is not frozen you can update it any time. If it becomes bigger after the upgrade you need to check if you payed enough rent for the new program size. \r\n\r\nAs of now all programs deployed will be deployed with double the size needed. This will change soon since it is now possible to extend the size of programs. \r\n\r\nExtend program was activated on mainnet in epoch 532:\r\nhttps://github.com/solana-labs/solana/issues/26385\r\n\r\nFor accounts you can also allocate more data and then use the bytes after you extended the accounts. \r\nTo increase an account size using anchor you can for example do it like this: \r\n\r\n #[derive(Accounts)]\r\n #[instruction(len: u16)]\r\n pub struct IncreaseZeroCopy<'info> {\r\n #[account(mut, \r\n realloc = len as usize, \r\n realloc::zero = true, \r\n realloc::payer=signer)]\r\n pub data_holder: AccountLoader<'info, DataHolder>,\r\n #[account(mut)]\r\n pub signer: Signer<'info>,\r\n #[account(address = system_program::ID)]\r\n pub system_program: Program<'info, System>,\r\n }\r\n\r\nIn a progamm you can also use: \r\n\r\n pda_account.realloc(new_size, false)?;\r\n\r\nIn this repo you can see some more info about how to create big accounts and realloc: \r\nhttps://github.com/solana-developers/anchor-zero-copy-example\r\n\r\nThe max size for accounts created via CPI (for example anchor init) is 10K max account size in general 10mb. \r\n\r\nComplicated it becomes when you need to migrate data. Then you probably want to have a migration instruction in your program and V1 and V2 accounts or smth like that. \r\n\r\n\n"

training_example = {
    "system": system_msg,
    "user": user_msg,
    "assistant": assistant_msg
}

In [12]:
# Converts the training examples to the format expected by Fireworks.
def training_examples_to_json(examples):
    json_objs = list()
    for example in examples:  

        msg = {"messages": [
            {"role": "system", "content": example['system']},
            {"role": "user", "content": example['user']},
            {"role": "assistant", "content": example['assistant']},
        ]}
        json_objs.append(msg)
    
    return json_objs

training_json = training_examples_to_json([training_example])

In [13]:
# Writes the data to a file so that it can be uploaded to Fireworks
dataset_file_name = 'fine-tune-examples.jsonl'
dataset_id = 'solana-stackexchange-07-20'

with open(dataset_file_name, 'w') as f:
    for obj in training_json:
        json.dump(obj, f)
        f.write('\n')

In [14]:
# Follow instructions here to first install the firectil CLI - https://readme.fireworks.ai/docs/fine-tuning-models#installing-firectl
# Then run this command to upload the file to Fireworks
!firectl create dataset {dataset_id} {dataset_file_name}

2024/11/18 17:53:02 There are updates available.
Current version: 1.2.0
Latest version: 1.4.2

To upgrade to the latest version, run
  $ sudo firectl upgrade

2.45 KiB / 2.45 KiB [-----------------------------] 100.00% 120.54 KiB p/s 200ms


In [15]:
# Creates a training job with the default hyperparameters
!firectl create fine-tuning-job --settings-file fireworks-fine-tune-config.yaml --display-name "Solana stack exchange fine tune Job test" --dataset {dataset_id} 

2024/11/18 17:55:48 There are updates available.
Current version: 1.2.0
Latest version: 1.4.2

To upgrade to the latest version, run
  $ sudo firectl upgrade

Name: accounts/sdkramer10-5e98cb/fineTuningJobs/1ec7d39444864d78b60376c07141c4e3
Display Name: Solana stack exchange fine tune Job test
Create Time: 2024-11-18 17:55:49
State: CREATING
Dataset: accounts/sdkramer10-5e98cb/datasets/solana-stackexchange-07-20
Status: OK
Created By: sdkramer10@gmail.com
Container Version: 
Model Id: 
Wandb Url: 
Conversation:
  Jinja Template: 
{%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }}
    {%- endif -%}
    {%- if me

In [16]:
# NOTE THAT THESE IDS WILL CHANGE WHEN YOU RUN THE FINE-TUNING JOB ON YOUR ACCOUNT!!!
# The model id is printed in the stdout of the cell above as Name: accounts/{account_id}/fineTuningJobs/{model_id}
model_id = '1ec7d39444864d78b60376c07141c4e3'

In [20]:
# Wait until the State of the two fine-tuning jobs are listed as COMPLETED (~10-20 minutes)
!firectl get fine-tuning-job {model_id}

2024/11/18 17:59:06 There are updates available.
Current version: 1.2.0
Latest version: 1.4.2

To upgrade to the latest version, run
  $ sudo firectl upgrade

Name: accounts/sdkramer10-5e98cb/fineTuningJobs/1ec7d39444864d78b60376c07141c4e3
Display Name: Solana stack exchange fine tune Job test
Create Time: 2024-11-18 17:55:49
State: COMPLETED
Dataset: accounts/sdkramer10-5e98cb/datasets/solana-stackexchange-07-20
Status: OK
Created By: sdkramer10@gmail.com
Container Version: 
Model Id: 1ec7d39444864d78b60376c07141c4e3
Wandb Url: 
Conversation:
  Jinja Template: 
{%- set _mode = mode | default('generate', true) -%}
{%- set stop_token = '<|eot_id|>' -%}
{%- set message_roles = ['SYSTEM', 'USER', 'ASSISTANT'] -%}
{%- set ns = namespace(initial_system_message_handled=false, last_assistant_index_for_eos=-1, messages=messages) -%}
{%- for message in ns.messages -%}
    {%- if not message.get('role') -%}
        {{ raise_exception('Key [role] is missing. Original input: ' +  message|tojson) }