In [347]:
account_id = "ioannidu-0dd70b"
api_key="fw_3ZcAXU8JWqvW1WNhSPEHJBa7"

# Fine Tunning Models using APIs

We will present an example of fine tunning a model using [APIs](https://docs.fireworks.ai/api-reference/introduction). Following API calls can be used for setting up automated fine tuning and inference. A similar example is presented using `firect` in [Fine-tuning models Documentation](https://docs.fireworks.ai/fine-tuning/fine-tuning-models) for a more interactive approach.  

In this noteboook we will show how to:

- Prepare a dataset.
- Initiate and run a tuning job given the Dataset prepared. We present a text completion example.
- Deploy fine tuned model.
- Use fine tuned model for inference.
- How to troubleshoot in case of errors.
- Clean up all resources produced.

## General Setup

Please instantiate variables `account_id` and `api_key` that match your credentials.


In [None]:
account_id = "<ACCOUNT_ID>"
api_key = "<API_KEY>"

In [348]:
import requests
import json
import os
import time

## Prepare Dataset

In this Notebook we will use a sample Dataset consisting of the following information:


In [349]:
"""
{"instruction": "When did Virgin Australia start operating?", "context": "Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney.", "response": "Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.", "category": "closed_qa"}
{"instruction": "Which is a species of fish? Tope or Rope", "context": "", "response": "Tope", "category": "classification"}
{"instruction": "Why can camels survive for long without water?", "context": "", "response": "Camels use the fat in their humps to keep them filled with energy and hydration for long periods of time.", "category": "open_qa"}
{"instruction": "Alice's parents have three daughters: Amy, Jessy, and what\u2019s the name of the third daughter?", "context": "", "response": "The name of the third daughter is Alice", "category": "open_qa"}
{"instruction": "When was Tomoaki Komorida born?", "context": "Komorida was born in Kumamoto Prefecture on July 10, 1981. After graduating from high school, he joined the J1 League club Avispa Fukuoka in 2000. Although he debuted as a midfielder in 2001, he did not play much and the club was relegated to the J2 League at the end of the 2001 season. In 2002, he moved to the J2 club Oita Trinita. He became a regular player as a defensive midfielder and the club won the championship in 2002 and was promoted in 2003. He played many matches until 2005. In September 2005, he moved to the J2 club Montedio Yamagata. In 2006, he moved to the J2 club Vissel Kobe. Although he became a regular player as a defensive midfielder, his gradually was played less during the summer. In 2007, he moved to the Japan Football League club Rosso Kumamoto (later Roasso Kumamoto) based in his local region. He played as a regular player and the club was promoted to J2 in 2008. Although he did not play as much, he still played in many matches. In 2010, he moved to Indonesia and joined Persela Lamongan. In July 2010, he returned to Japan and joined the J2 club Giravanz Kitakyushu. He played often as a defensive midfielder and center back until 2012 when he retired.", "response": "Tomoaki Komorida was born on July 10,1981.", "category": "closed_qa"}
""";

The Dataset above is stored in a file called `sampleDataset.jsonl`. 


In [350]:
file_path = "./sampleDataset.jsonl"  
file_name = "sampleDataset.jsonl"
file_size_in_bytes = os.stat(file_path).st_size
print('Size of file in', file_path, '=',  file_size_in_bytes, 'bytes')

Size of file in ./sampleDataset.jsonl = 2633 bytes


To prepare our Dataset for fine-tuning:

- Create a Dataset record.
- Upload the data from the file to the Dataset record.
- Validate the upload. Note that this is a necessary step to complete the process.


### Create Dataset Record

If `datasetId` is not provided a random id wll be set up by the system.  For conveninence, it is best to provide an `id` for the dataset.

In [351]:
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/datasets"

dataset_id =  "my-sample-dataset"

headers = {
    "Authorization": "Bearer " + api_key,
    "Content-Type": "application/json"
}

payload = {
    "dataset": {
        "displayName": "mySampleDataset",
        "format": "COMPLETION",
        "exampleCount": "5"
    },
    "datasetId": dataset_id
}

response = requests.request("POST", url, json=payload, headers=headers)


In [354]:
# The request to create a Dataset should change the state of the Dataset to "UPLOADING".

# Get state
dataset_create_dict = json.loads(response.text)
state = dataset_create_dict["state"]
print("Current state of Dataset Create = ", state)
print(json.dumps(dataset_create_dict, indent=4))

# Wait until state is "READY"  
headers = {"Authorization": "Bearer " + api_key}
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}"
response = requests.request("GET", url, headers=headers)
dataset = json.loads(response.text)
state = dataset["state"]
# In the following loop we will wait for dataset create to be "READY".
# We could optinally add a time out for the case of the state being stuck at "UPLOADING" state.
#while state != "READY":
#    # Update state of the dataset
#    time.sleep(0.1)
#    response = requests.request("GET", url, headers=headers)
#    dataset = json.loads(response.text)
#    state = dataset["state"]

#print("Dataset creation terminated with final state:", state)


Current state of Dataset Create =  UPLOADING
{
    "createTime": "2024-10-28T04:51:26.185571Z",
    "displayName": "mySampleDataset",
    "exampleCount": "5",
    "format": "COMPLETION",
    "name": "accounts/ioannidu-0dd70b/datasets/my-sample-dataset",
    "state": "UPLOADING",
    "status": {
        "code": "OK",
        "message": ""
    },
    "userUploaded": {}
}
Dataset creation terminated with final state: UPLOADING


### Upload Dataset

First, upload the Dataset created. This step will create a signed ulr to later upload the data fromm local file.

In [355]:
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}:getUploadEndpoint"

headers = {
    "Authorization": "Bearer " + api_key,
    "Content-Type": "application/json"
}
payload = {"filenameToSize": {"sampleDataset.jsonl": file_size_in_bytes}}

response_dataset_create = requests.request("POST", url, json=payload, headers=headers)
dataset_ulopad_dict = json.loads(response_dataset_create.text)


In [356]:
signed_url = dataset_ulopad_dict["filenameToSignedUrls"][file_name]
print("Signed url:", signed_url)

Signed url: https://storage.googleapis.com/fireworks-artifacts-ioannidu-0dd70b-44c3f6/dataset/my-sample-dataset-be5f28/sampleDataset.jsonl?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=fireworks-control-plane%40fw-ai-cp-prod.iam.gserviceaccount.com%2F20241028%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20241028T051225Z&X-Goog-Expires=21599&X-Goog-Signature=632c2367b62bd3e07d44a0e624b49500839f28b53237877341cab0aa18095197110830c7c9388d373879496117ff320188d37f259bd7358f9863bf7c0c995079915973e88d51139415c6011d6ad32209866675b3c5350ab336383bb7e0ebe5adc928efc600418e7a2beed8ae5a5162bfa1924a79e9eb414710cf45ac0f8c826d7527d3e21dd8408f7efb3326d14e2230cb3c9dbfa5c0ccfd9db89773ede5ba9cdff2a5a2d25af433b4af644f6fc6280dffda071059d0ea67e82704629c899877a241a2cdf4189ec00bcd40135114bdb00f7d8d758dddf59af6b453a816275e446b706b91313d8ece9782e96f21ca6c5a6bd3c8239a52a5d791d7a14c68c56794&X-Goog-SignedHeaders=content-type%3Bhost%3Bx-goog-content-length-range


Next, upload data from local file into the signed url provided.

In [373]:
headers = {
    "x-goog-content-length-range": f"{file_size_in_bytes}, {file_size_in_bytes}",
    "Content-Type": "application/octet-stream"
}

with open(file_path, 'rb') as file:
    data = file.read()

response_file = requests.request("PUT", signed_url, data=data, headers=headers)


In [374]:
# Check file upload
if response_file.status_code == 200:
    print("File upload was successful!")
else:
    print("File upload failed:", response_file.status_code, response_file.text)
    

File upload failed: 403 <?xml version='1.0' encoding='UTF-8'?><Error><Code>SignatureDoesNotMatch</Code><Message>Access denied.</Message><Details>The request signature we calculated does not match the signature you provided. Check your Google secret key and signing method.</Details><StringToSign>GOOG4-RSA-SHA256
20241028T051225Z
20241028/auto/storage/goog4_request
756ae1df7812b89c7d4c850892500019da1c619f38ebde7e5062e55bfed1ef79</StringToSign><CanonicalRequest>PUT
/fireworks-artifacts-ioannidu-0dd70b-44c3f6/dataset/my-sample-dataset-be5f28/sampleDataset.jsonl
X-Goog-Algorithm=GOOG4-RSA-SHA256&amp;X-Goog-Credential=fireworks-control-plane%40fw-ai-cp-prod.iam.gserviceaccount.com%2F20241028%2Fauto%2Fstorage%2Fgoog4_request&amp;X-Goog-Date=20241028T051225Z&amp;X-Goog-Expires=21599&amp;X-Goog-SignedHeaders=content-type%3Bhost%3Bx-goog-content-length-range
content-type:application/octet-stream
host:storage.googleapis.com
x-goog-content-length-range:2633, 2633

content-type;host;x-goog-content-

### Validate Dataset Upload

This is a necessary step to complete the process.

In [47]:
headers = {
    "Authorization": "Bearer "+api_key,
    "Content-Type": "application/json"
}

url = f"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}:validateUpload"

payload = {}

response_dataset_validate_upload = requests.request("POST", url, json=payload, headers=headers)

In [None]:
# Check that response is {}
dataset_validate_upload_dict = json.loads(response_dataset_validate_upload.text)
print("Response should be {}. Response =", dataset_validate_upload_dict)

# Check state of the dataset and ensure that its state is "READY"
# Get Dataset
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}"
headers = {"Authorization": "Bearer "+api_key}
response = requests.request("GET", url, headers=headers)
# Get state
dataset_dict = json.loads(response.text)
state = dataset_dict["state"]
print("Current state of Dataset Upload = ", state)

print(json.dumps(dataset_create_dict, indent=4))



## Prepare Fine-tuning Job

Using the Dataset create and uploaded next we will create a fine tuning job to train a model. In this example will use as base model `llama-v3p1-8b-instruct` to which the fine-tuning job will add upon given our Dataset.

### Create a Fine-tuning Job

As a result of a successful creation of a fine-tining job, a (fine tuned) trained model will be automatically created. It is recommended to provide a `modelId` during the creation of the fine tuning job, otherwise a random one will be provided for the model created.  This example is a text Completion case and we will train based on `context` and `instruction` provided by the Dataset.   

In [195]:
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs"

model_id = "my-model-id"

payload = {
    "displayName": "mySampleDatasetFinetuningJob",
    "dataset": f"{dataset_id}",
    "modelId": model_id,
    "textCompletion": {
        # How the fields of the JSON dataset should be formatted into the input text.
        "inputTemplate": "### GIVEN THE CONTEXT: {context}  ### INSTRUCTION: {instruction}  ### RESPONSE IS: ",
        # How the fields of the JSON dataset should be formatted into the output text.
        "outputTemplate": "ANSWER: {response}"
    },
    "baseModel": "accounts/fireworks/models/llama-v3p1-8b-instruct",
}

response_finetuning_job = requests.request("POST", url, json=payload, headers=headers)


Upon creation of a fine tunning job its state will be set to "CREATED". A successfull run of the job will cause its state to change to "PENDING" (meaning waiting for resource allocation), "RUNNING", and "COMPLETED".  If the job fails to run its state will change to "FAILED".


In [None]:
# To wait for the job to complete first extract the job's id from its name
headers = {"Authorization": "Bearer "+api_key}

url = f"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs"
response = requests.request("GET", url, headers=headers)
# Get all fine tuning jobs
finetuning_jobs = json.loads(response.text)

# In this particular case we are getting the first of the jobs in the list as we only created a single job.
# In an automated sustem we could extract the index of the job in the list of jobs based on name or a given id.
# Note currently the APIs don't support providing an id for a fine tuned job hence the workaround to find the randomly assigned id.

# The last part of the name of the job is it's randomly assigned id.
print("Fine tuning job's full name =", finetuning_jobs['fineTuningJobs'][0]['name'])
fine_tuning_job_id = finetuning_jobs['fineTuningJobs'][0]['name'].split("/")[-1]
print("Fine tuning job's id =", fine_tuning_job_id)

# Given the job's id now we can wait for the job to complete or fail
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs/{fine_tuning_job_id}"
response = requests.request("GET", url, headers=headers)
fine_tuning_job = json.loads(response.text)
state = fine_tuning_job["state"]
# In the following loop we will wait for the job to either complete and fail.
# We could optinally add a time out for the case of the state being stuck at PENDING or RUNNING state.
while state != "COMPLETED" and state != "FAILED":
    # Update state of the finetuning
    time.sleep(0.1)
    response = requests.request("GET", url, headers=headers)
    fine_tuning_job = json.loads(response.text)
    state = fine_tuning_job["state"]

print("Job run with final state:", state)
#print(json.dumps(fine_tuning_job, indent=4))

## Deploy and Use Fine-tuned Model for Inference

### Deploy Model

Deploying the model is a necessary step before it can be used for inference. 

For deploying an on-demand model  please refer to [Deployement APIs](https://docs.fireworks.ai/api-reference/create-deployment). In our case, the base model used is serverless and to upload it we would need to call:
                        
```firectl deploy my-model-id```  
 
 Deploying a serverless model (with addons) will not create a `Deployment` but instead a `deployed_model` that can be verified by calling: 
 
 ```firect list deployed-models```

Note: `API`s for deploying serverless fine tuned models that will be listed as `deployed-models` will be added shortly. For now please use the `firectl` command above.

### Inference

Setting the prompt to the instructions from the Dataset we trained with, will provide the expected response.

The trained model can be identified in a number of ways as explained in the [document on model identifiers](https://docs.fireworks.ai/models/deploying#model-identifier).


To extract the deployed model id that is needed to identify the newly trained model, please use the following command:

`firectl list deployed-models`

Wait until the state of the deployed model is "DEPLOYED".  The name of the deployed model will be used for inference.

Note: An API will be added shortly for this functionality.



In [198]:

deployed_model_id = "my-model-id-d3276e03" # Updated manually for now, will be updated after API is added.

#kleoniioannidou@Kleonis-MacBook-Pro ~ % firectl get deployed-model my-model-id-d3276e03 
#Name: accounts/ioannidu-0dd70b/deployedModels/my-model-id-d3276e03
#Create Time: 2024-10-27 18:54:15
#Created By: ioannidu@fireworks.ai
#Model: accounts/ioannidu-0dd70b/models/my-model-id
#Deployment: accounts/fireworks/deployments/ee744c5f
#Default: true
#State: DEPLOYED
#Serverless: true
#Status: OK



In [199]:
model = f'accounts/{account_id}/deployedModels/{deployed_model_id}'

url = "https://api.fireworks.ai/inference/v1/completions"

headers = {
    "Authorization": "Bearer " + api_key,
    "Content-Type": "application/json"
}

#### Examples of text completion using our trained model

In [None]:
instruction = "Which is a species of fish? Tope or Rope"
context = ""

payload = {
    "model": model,
    "prompt": f"### GIVEN THE CONTEXT: {context}  ### INSTRUCTION: {instruction}  ### RESPONSE IS: ",
    "max_tokens": 300,
    "temperature": 0,
}

response = requests.request("POST", url, json=payload, headers=headers)

output = json.loads(response.text)

#print(response.choices[0].message.content)
print(output["choices"][0]["text"])

In [None]:
instruction = "Why can camels survive for long without water?"
context = ""


payload = {
    "model": model,
    "prompt": f"### GIVEN THE CONTEXT: {context}  ### INSTRUCTION: {instruction}  ### RESPONSE IS: ",
    "max_tokens": 300,
    "temperature": 0,
    #"context_length_exceeded_behavior": "truncate",
}

response = requests.request("POST", url, json=payload, headers=headers)

output = json.loads(response.text)

print(output['choices'][0]['text'])
#print(response.choices[0].message.content)
#print(output['choices'][0]['text'])

# Troubleshooting

Any of the resources created above can be listed with the APIs shown below.  For example, you could list all datasets uploaded and from the list choose any particular dataset to check its state.

In [None]:
headers = {"Authorization": "Bearer " + api_key}

# List Datasets
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/datasets"
response = requests.request("GET", url, headers=headers)
datasets = json.loads(response.text)
print(json.dumps(datasets, indent=4))

# List fine tuning jobs
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs"
response = requests.request("GET", url, headers=headers)
finetuning_jobs = json.loads(response.text)
print(json.dumps(finetuning_jobs, indent=4))

# List Models
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/models"
response = requests.request("GET", url, headers=headers)
models = json.loads(response.text)
print(json.dumps(models, indent=4))

# List Deployed Models
# API will be added shortly


For each of the resources of the listed provided above you can access it state and status fields to check for potential errors. 

In [None]:
# Print state of model of the first model in the list
print("State = ", json.dumps(models["models"][0]["state"], indent=4))

# Print status of the first model in the list
print("Status = ", json.dumps(models["models"][0]["status"], indent=4))

# Clean up

You can delete all resources you have created. The order of deletion does not matter with the exception of deployed modelts that first need to be undeployed before they get deleted.

In [None]:
headers = {"Authorization": "Bearer " + api_key}

# Delete Dataset
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/datasets/{dataset_id}"
response = requests.request("DELETE", url, headers=headers)
print(response.text)

# Delete Fine-Tunning Job
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/fineTuningJobs/{fine_tuning_job_id}"
response = requests.request("DELETE", url, headers=headers)
print(response.text)


# Delete Model
# First, undeploy the model
# Currently we don't have an API for this so please use the following command:  `firectl undeploy my-model-id`

# State of deploying model will change to "UNDEPLOYING" until the process is completed.

# Second, delete the undeployed model
url = f"https://api.fireworks.ai/v1/accounts/{account_id}/models/{model_id}"
response = requests.request("DELETE", url, headers=headers)
print(response.text)


In [None]:
signed_url