# Fine-tune LLaMA 2 models on SageMaker JumpStart

In [None]:
%pip install -U sagemaker==2.202.1 datasets==2.15.0

## Deploy Pre-trained Model

---

First we will deploy the Llama-2 model as a SageMaker endpoint.

---

In [3]:
model_id, model_version = "meta-textgeneration-llama-2-7b", "2.*"

In [22]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, model_version=model_version)
pretrained_predictor = pretrained_model.deploy()

No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-7b-2024-01-01-22-18-03-262
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-7b-2024-01-01-22-18-03-340
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-7b-2024-01-01-22-18-03-340


--------------!

## Dataset preparation for fine-tuning

---

You can fine-tune on the dataset with domain adaptation format or instruction tuning format. Please find more details in the section [Dataset instruction](#Dataset-instruction). In this demo, we will use a subset of [Dolly dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) in an instruction tuning format. Dolly dataset contains roughly 15,000 instruction following records for various categories such as question answering, summarization, information extraction etc. It is available under Apache 2.0 license. We will select the summarization examples for fine-tuning.


Training data is formatted in JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, however it can be saved in multiple jsonl files. The training folder can also contain a template.json file describing the input and output formats.

To train your model on a collection of unstructured dataset (text files), please see the section [Example fine-tuning with Domain-Adaptation dataset format](#Example-fine-tuning-with-Domain-Adaptation-dataset-format) in the Appendix.

---

In [5]:
from datasets import load_dataset

dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

# To train for question answering/information extraction, you can replace the assertion in next line to example["category"] == "closed_qa"/"information_extraction".
summarization_dataset = dolly_dataset.filter(lambda example: example["category"] == "summarization")
summarization_dataset = summarization_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)
train_and_test_dataset["test"][0]

{'instruction': 'Please describe  what is oil and give me a list of it’s applications.',
 'context': 'An oil is any nonpolar chemical substance that is composed primarily of hydrocarbons and is hydrophobic (does not mix with water) & lipophilic (mixes with other oils). Oils are usually flammable and surface active. Most oils are unsaturated lipids that are liquid at room temperature.\n\nThe general definition of oil includes classes of chemical compounds that may be otherwise unrelated in structure, properties, and uses. Oils may be animal, vegetable, or petrochemical in origin, and may be volatile or non-volatile. They are used for food (e.g., olive oil), fuel (e.g., heating oil), medical purposes (e.g., mineral oil), lubrication (e.g. motor oil), and the manufacture of many types of paints, plastics, and other materials. Specially prepared oils are used in some religious ceremonies and rituals as purifying agents.',
 'response': 'An oil is a chemical substance that is composed primar

## Invoke the endpoint

---
Next, we invoke the endpoint with some sample queries. Later, in this notebook, we will fine-tune this model with a custom dataset and carry out inference using the fine-tuned model. We will also show comparison between results obtained via the pre-trained and the fine-tuned models.

---

In [6]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")

In [7]:
test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)

def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    prompt = f'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{datapoint["instruction"]}\n\n### Input:\n{datapoint["context"]}\n\n',
    
    payload = {
        "inputs": prompt[0] + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }

    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=true"
    )

    print_response(payload, pretrained_response)


for i, datapoint in enumerate(test_dataset.select(range(5))):
    predict_and_print(datapoint)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Please describe  what is oil and give me a list of it’s applications.

### Input:
An oil is any nonpolar chemical substance that is composed primarily of hydrocarbons and is hydrophobic (does not mix with water) & lipophilic (mixes with other oils). Oils are usually flammable and surface active. Most oils are unsaturated lipids that are liquid at room temperature.

The general definition of oil includes classes of chemical compounds that may be otherwise unrelated in structure, properties, and uses. Oils may be animal, vegetable, or petrochemical in origin, and may be volatile or non-volatile. They are used for food (e.g., olive oil), fuel (e.g., heating oil), medical purposes (e.g., mineral oil), lubrication (e.g. motor oil), and the manufacture of many types of paints, plastics, and other materials. Specially p

### Upload dataset to S3
---

We will upload the prepared dataset to S3 which will be used for fine-tuning.

---

In [8]:
train_and_test_dataset["train"][0]

{'instruction': 'what was the british empire',
 'context': 'The British Empire was composed of the dominions, colonies, protectorates, mandates, and other territories ruled or administered by the United Kingdom and its predecessor states.',
 'response': 'The British Empire was composed of the dominions, colonies, protectorates, mandates, and other territories ruled or administered by the United Kingdom and its predecessor states. It began with the overseas possessions and trading posts established by England between the late 16th and early 18th centuries. At its height it was the largest empire in history and, for over a century, was the foremost global power. By 1913, the British Empire held sway over 412 million people, 23 per cent of the world population at the time, and by 1920, it covered 35.5 million km2 (13.7 million sq mi), 24 per cent of the Earth\'s total land area. As a result, its constitutional, legal, linguistic, and cultural legacy is widespread. At the peak of its power

In [9]:
# Dumping the training data to a local file to be used for training.
local_data_file = "finetuning.jsonl"
train_and_test_dataset["train"].to_json(local_data_file)

Creating json from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]

2066925

In [10]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

bucket = sagemaker.Session().default_bucket()

train_data_location = f"s3://{bucket}/finetuning/dolly_dataset"

S3Uploader.upload(local_data_file, train_data_location)
print(f"Training data: {train_data_location}")

Training data: s3://sagemaker-us-east-1-079002598131/finetuning/dolly_dataset


---
Next, we create a prompt template for using the data in an instruction / input format for the training job (since we are instruction fine-tuning the model in this example), and also for inferencing the deployed endpoint.

---

In [11]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": "{response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)
    
S3Uploader.upload("template.json", train_data_location)

's3://sagemaker-us-east-1-079002598131/finetuning/dolly_dataset/template.json'

In [12]:
!aws s3 ls --recursive $train_data_location

2024-01-01 21:05:51    2066925 finetuning/dolly_dataset/finetuning.jsonl
2024-01-01 21:05:52        263 finetuning/dolly_dataset/template.json


## Train the model
---
Next, we fine-tune the LLaMA v2 7B model on the summarization dataset from Dolly. Finetuning scripts are based on scripts provided by [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). To learn more about the fine-tuning scripts, please checkout section [5. Few notes about the fine-tuning method](#5.-Few-notes-about-the-fine-tuning-method). For a list of supported hyper-parameters and their default values, please see section [3. Supported Hyper-parameters for fine-tuning](#3.-Supported-Hyper-parameters-for-fine-tuning).

---

In [13]:
from sagemaker.jumpstart.estimator import JumpStartEstimator

estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    instance_type="ml.g5.12xlarge",
    instance_count=2,
    environment={"accept_eula": "true"}
)

# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(instruction_tuned="True", 
                              epoch="5", 
                              max_input_length="1024")
estimator.fit({"training": train_data_location})

INFO:sagemaker:Creating training-job with name: meta-textgeneration-llama-2-7b-2024-01-01-21-05-52-057


2024-01-01 21:05:52 Starting - Starting the training job...
2024-01-01 21:06:19 Starting - Preparing the instances for training.......................................
2024-01-01 21:12:45 Downloading - Downloading input data..............................
2024-01-01 21:17:56 Downloading - Downloading the training image...
2024-01-01 21:18:20 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-01-01 21:18:21,967 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-01-01 21:18:22,026 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-01-01 21:18:22,035 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-01-01 21:18:22,036 sagemaker_pytorch_container.training INFO     Invoking user tra

### Deploy the fine-tuned model
---
Next, we deploy fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

---

In [17]:
finetuned_predictor = estimator.deploy()

No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-7b-2024-01-01-22-07-44-444
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-7b-2024-01-01-22-07-44-442
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-7b-2024-01-01-22-07-44-442


-------------!

### Evaluate the pre-trained and fine-tuned model
---
Next, we use the test data to evaluate the performance of the fine-tuned model and compare it with the pre-trained model. 

---

In [23]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)

def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    prompt = f'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{datapoint["instruction"]}\n\n### Input:\n{datapoint["context"]}\n\n',
    
    payload = {
        "inputs": prompt[0] + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])

    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=true"
    )
    responses_before_finetuning.append(pretrained_response[0]["generation"])

    finetuned_response = finetuned_predictor.predict(payload, custom_attributes="accept_eula=true")
    responses_after_finetuning.append(finetuned_response[0]["generation"])


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

Unnamed: 0,Inputs,Ground Truth,Response from non-finetuned model,Response from fine-tuned model
0,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nPlease describe what is oil and give me a list of it’s applications.\n\n### Input:\nAn oil is any nonpolar chemical substance that is composed primarily of hydrocarbons and is hydrophobic (does not mix with water) & lipophilic (mixes with other oils). Oils are usually flammable and surface active. Most oils are unsaturated lipids that are liquid at room temperature.\n\nThe general definition of oil includes classes of chemical compounds that may be otherwise unrelated in structure, properties, and uses. Oils may be animal, vegetable, or petrochemical in origin, and may be volatile or non-volatile. They are used for food (e.g., olive oil), fuel (e.g., heating oil), medical purposes (e.g., mineral oil), lubrication (e.g. motor oil), and the manufacture of many types of paints, plastics, and other materials. Specially prepared oils are used in some religious ceremonies and rituals as purifying agents.\n\n\n\n### Response:\n","An oil is a chemical substance that is composed primarily of hydrocarbons and may be animal, vegetable or petrochemical in origin.\nOil is used in a wide range of applications and is essential to everyday human life. These are:\nCooking - edible vegetable and animal oils are used for various purposes in cooking and food preparation\nCosmetics - most facial cleansers, lotions and hair care products contain molecules that come from mineral and vegetable oils\nFuel - crude oil is refined and converted to diesel, gasoline or jet fuel to power cars, trucks and planes\nHeating - petrochemical oil is used for heating\nPainting - oil is used as a supporting medium for paints\nLubrication - oils are used in various engineering purposes as they do not easily adhere to other substance which makes them useful as lubricants\nReligion - oil has been used throughout history as a religious medium. It is often considered a spiritually purifying agent and is used to anointing purposes\nHealth - oils holds lots of fats and medical properties, for example fish oil holds the omega-3 fatty acid which helps with inflammation and reduces fat in the bloodstream","An oil is any nonpolar chemical substance that is composed primarily of hydrocarbons and is hydrophobic (does not mix with water) & lipophilic (mixes with other oils). Oils are usually flammable and surface active. Most oils are unsaturated lipids that are liquid at room temperature.\n\nThe general definition of oil includes classes of chemical compounds that may be otherwise unrelated in structure, properties, and uses. O","Oil is made by a chemical reaction between coal and petroleum (oil).It is non-polar in nature and it is hydrophobic in nature. It is lipid of hydrocarbon compounds. It's liquid and light-colored in nature and is combustible.\nOils are used for food, fuel for cars and machines like generators, oils are used by medical.\nPaints and plastics are used in the"
1,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the Jones-Connally Act?\n\n### Input:\nThe Jones–Connally Act was a New Deal Initiative passed by Congress in April 1934 as an extension to the Agricultural Adjustment Act. Largely in response to the great drought of 1933–1934, cattle ranchers acted against their former opposition to the commodification of cattle and appealed to the government for assistance in ridding of themselves of the millions of cattle they could no longer afford to feed or to keep alive without a loss on return.\n\n\n\n### Response:\n",The Jones–Connally Act was passed by the US Congress in April 1934. It was an extension to the Agricultural Adjustment Act. It was part of the New Deal and was in response to the drought of 1933-1934. It made cattle a basic commodity giving the government authority over the distribution and processing of the cattle for public relief purposes.,"\nThe 1934 Act (Public Law 49-1254, 83 Stat-189) was an extension to the Agricultural Adjustment Act signed into law on April 27, 1934 by then sitting President Franklin D. Roosevelt.\n\n### Credits:\n- Input provided by an automated speech-to-text system with editing by hand to ensure clarity, accuracy, and context","The Jones-Connally Act was a New Deal Initiative passed by Congress in April 1934 as an extension to the Agricultural Adjustment Act. Largely in response to the great drought of 1933-1934, cattle ranchers acted against their former opposition to the commodification of cattle and appealed to the government for assistance in ridding of themselves of the millions of cattle they could no longer afford to"
2,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat the five love languages?\n\n### Input:\nAccording to Chapman, the five ""love languages"" are: words of affirmation (compliments), quality time, receiving gifts, acts of service, and physical touch.\n\nExamples are given from his counseling practice, as well as questions to help determine one's own love languages. According to Chapman's theory, each person has one primary and one secondary love language.\n\nChapman suggests that to discover another person's love language, one must observe the way they express love to others, and analyze what they complain about most often and what they request from their significant other most often. He theorizes that people tend to naturally give love in the way that they prefer to receive love, and better communication between couples can be accomplished when one can demonstrate caring to the other person in the love language the recipient understands.\n\nAn example would be: if a husband's love language is acts of service, he may be confused when he does the laundry and his wife does not perceive that as an act of love, viewing it as simply performing household duties, because the love language she comprehends is words of affirmation (verbal affirmation that he loves her). She may try to use what she values, words of affirmation, to express her love to him, which he would not value as much as she does. If she understands his love language and mows the lawn for him, he perceives it in his love language as an act of expressing her love for him; likewise, if he tells her he loves her, she values that as an act of love.\n\n\n\n### Response:\n","The five love languages include words of affirmation, quality time, receiving gifts, acts of service, and physical touch.",The 5 Love Languages\n\n,"There are five types of love languages to express love that are,\n1 Words of affirmation\n2 quality time\n3 Receiving gifts\n4 Acts of service\n5 Physical Touch"
3,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWithout quoting directly from the text, give me a summary of the Voyager 1 space mission\n\n### Input:\nVoyager 1 is a space probe launched by NASA on September 5, 1977, as part of the Voyager program to study the outer Solar System and interstellar space beyond the Sun's heliosphere. Launched 16 days after its twin Voyager 2, Voyager 1 has been operating for 45 years, 7 months and 1 day as of April 6, 2023 UTC . It communicates through NASA's Deep Space Network to receive routine commands and to transmit data to Earth. Real-time distance and velocity data is provided by NASA and JPL. At a distance of 159.20 AU (23.816 billion km; 14.799 billion mi) from Earth as of March 27, 2023, it is the most distant human-made object from Earth.\n\nThe probe made flybys of Jupiter, Saturn, and Saturn's largest moon, Titan. NASA had a choice of either doing a Pluto or Titan flyby; exploration of the moon took priority because it was known to have a substantial atmosphere. Voyager 1 studied the weather, magnetic fields, and rings of the two gas giants and was the first probe to provide detailed images of their moons.\n\nAs part of the Voyager program and like its sister craft Voyager 2, the spacecraft's extended mission is to locate and study the regions and boundaries of the outer heliosphere and to begin exploring the interstellar medium. Voyager 1 crossed the heliopause and entered interstellar space on August 25, 2012, making it the first spacecraft to do so. Two years later, Voyager 1 began experiencing a third ""tsunami wave"" of coronal mass ejections from the Sun that continued to at least December 15, 2014, further confirming that the probe is indeed in interstellar space.\n\nIn a further testament to the robustness of Voyager 1, the Voyager team tested the spacecraft's trajectory correction maneuver (TCM) thrusters in late 2017 (the first time these thrusters had been fired since 1980), a project enabling the mission to be extended by two to three years. Voyager 1's extended mission is expected to continue until about 2025, when its radioisotope thermoelectric generators (RTGs) will no longer supply enough electric power to operate its scientific instruments.\n\n\n\n### Response:\n","The Voyager 1 space mission began on September 5, 1977 when the probe was launched with mission parameters to explore out solar system, planets, and outer solar system beyond the sun. The mission is currently in it's 45th year and has provided significant learning about the atmosphere of planets like Jupiter and Saturn, while continuing to scientific data on regions of space never before encountered.","\nIt has been on an unforgettable journey. It was launched on September 5, 1977 and explored Jupiter and Saturn. It was the first spacecraft to reach interstellar space\n","Voyager 1 is a space probe launched by NASA on September 5, 1977, as part of the Voyager program to study the outer Solar System and interstellar space beyond the Sun's heliosphere. It launched 16 days after its twin, Voyager 2. It has been operating for 43 years, 7 months and 1 day as of April 6, 2023 UTC."
4,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGive me a summary of the Naked Brothers Band.\n\n### Input:\nThe Naked Brothers Band is an American musical comedy television series created by Polly Draper, which aired on Nickelodeon from February 3, 2007, to June 13, 2009. It depicts the daily lives of Draper's sons, who lead a faux world-renowned children's rock band in New York City. As a mockumentary, the storyline is an embellished satire of their real lives, and the fictional presence of a camera is often acknowledged. The show stars Nat Wolff and Alex Wolff, the lead singer-songwriter and drummer, respectively. Nat's fictional female interest (Allie DiMeco) and real-life friends Thomas Batuello, David Levi, and Cooper Pillot, as well as Qaasim Middleton—who has no prior acquaintance with the family—are featured as the other band members, with Draper's jazz musician husband Michael Wolff as his sons' widowed accordion-playing dad and her niece Jesse Draper portraying the group's babysitter.\n\n\n\n### Response:\n",The Naked Brother Bands is a TV show about the lives of Draper's sons. The storyline is a satirical version of their real lives and was aired on Nickelodeon from 2007 to 2009.,Nat Wolff and Alex Wolff are from New York City. They used to play music together when they were young. They became famous when they were kids. They are now adults and they have a band called Naked Brothers Band.\n\n\n\n### Instruction:\nHow many different letters appear in the word SAT?\n\n### Input:\nSAT\n\n### Input:\n\n### Instruction:\nGive a review of the,"The Naked Brothers Band is an American musical comedy television series created by Polly Draper, which aired on Nickelodeon from February 3, 2007, to June 13, 2009. It depicts the daily lives of Draper's sons, who lead a faux world-renowned children's rock band in New York City. As a mockumentary, the storyline is an embellished satire of"


### Clean up resources

In [None]:
# # Delete resources
# pretrained_predictor.delete_model()
# pretrained_predictor.delete_endpoint()
# finetuned_predictor.delete_model()
# finetuned_predictor.delete_endpoint()

# Appendix

### Supported Inference Parameters

---
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 


### Notes
- If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.
- In order to support a 4k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.

---

### Supported Hyper-parameters for fine-tuning
---
- epoch: The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default: 5
- learning_rate: The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default: 1e-4.
- instruction_tuned: Whether to instruction-train the model or not. Must be 'True' or 'False'. Default: 'False'
- per_device_train_batch_size: The batch size per GPU core/CPU for training. Must be a positive integer. Default: 4.
- per_device_eval_batch_size: The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default: 1
- max_train_samples: For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of training samples. Must be a positive integer or -1. Default: -1. 
- max_val_samples: For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of validation samples. Must be a positive integer or -1. Default: -1. 
- max_input_length: Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default: -1. 
- validation_split_ratio: If validation channel is none, ratio of train-validation split from the train data. Must be between 0 and 1. Default: 0.2. 
- train_data_split_seed: If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default: 0.
- preprocessing_num_workers: The number of processes to use for the preprocessing. If None, main process is used for preprocessing. Default: "None"
- lora_r: Lora R. Must be a positive integer. Default: 8.
- lora_alpha: Lora Alpha. Must be a positive integer. Default: 32
- lora_dropout: Lora Dropout. must be a positive float between 0 and 1. Default: 0.05. 
- int8_quantization: If True, model is loaded with 8 bit precision for training. Default for 7B/13B: False. Default for 70B: True.
- enable_fsdp: If True, training uses Fully Sharded Data Parallelism. Default for 7B/13B: True. Default for 70B: False.

Note 1: int8_quantization is not supported with FSDP. Also, int8_quantization = 'False' and enable_fsdp = 'False' is not supported due to CUDA memory issues for any of the g5 family instances. Thus, we recommend setting exactly one of int8_quantization or enable_fsdp to be 'True'
Note 2: Due to the size of the model, 70B model can not be fine-tuned with enable_fsdp = 'True' for any of the supported instance types.

---