# Fine-tune LLaMA 3 models on SageMaker JumpStart

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy pre-trained Llama 3 model as well as fine-tune it for your dataset in domain adaptation or instruction tuning format.

---

### Model License information
---
To perform inference on these models, you need to pass custom_attributes='accept_eula=true' as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets custom_attributes='accept_eula=false', so all inference requests will fail until you explicitly change this custom attribute.

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler.

---

### Set up

---
We begin by installing and upgrading necessary packages. Restart the kernel after executing the cell below for the first time.

---

In [2]:
!pip install --upgrade sagemaker datasets

[0m

## Deploy Pre-trained Model

---

First we will deploy the Llama-2 model as a SageMaker endpoint. To train/deploy 8B and 70B models, please change model_id to "meta-textgeneration-llama-3-8b" and "meta-textgeneration-llama-3-70b" respectively.

---

In [3]:
model_id, model_version = "meta-textgeneration-llama-3-8b", "2.*"

In [5]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, model_version=model_version)
# Please change the following line to have accept_eula = True
pretrained_predictor = pretrained_model.deploy(accept_eula=True) # manually configure accept_eula=True

-----------!

## Invoke the endpoint

---
Next, we invoke the endpoint with some sample queries. Later, in this notebook, we will fine-tune this model with a custom dataset and carry out inference using the fine-tuned model. We will also show comparison between results obtained via the pre-trained and the fine-tuned models.

---

In [7]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response.get('generated_text')}")
    print("\n==================================\n")

In [10]:
payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False,
    },
}
try:
    response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=true"
    )
    print_response(payload, response)
except Exception as e:
    print(e)

I believe the meaning of life is
>  to be happy. I believe that happiness is a choice. I believe that happiness is a choice. I believe that happiness is a choice. I believe that happiness is a choice. I believe that happiness is a choice. I believe that happiness is a choice. I believe that happiness is a choice. I believe that happiness




---
To learn about additional use cases of pre-trained model, please checkout the notebook [Text completion: Run Llama 3 models in SageMaker JumpStart](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-3-text-completion.ipynb).

---

## Dataset preparation for fine-tuning

---

You can fine-tune on the dataset with domain adaptation format or instruction tuning format. Please find more details in the section [Dataset instruction](#Dataset-instruction). In this demo, we will use a subset of [Dolly dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) in an instruction tuning format. Dolly dataset contains roughly 15,000 instruction following records for various categories such as question answering, summarization, information extraction etc. It is available under Apache 2.0 license. We will select the summarization examples for fine-tuning.


Training data is formatted in JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, however it can be saved in multiple jsonl files. The training folder can also contain a template.json file describing the input and output formats.

To train your model on a collection of unstructured dataset (text files), please see the section [Example fine-tuning with Domain-Adaptation dataset format](#Example-fine-tuning-with-Domain-Adaptation-dataset-format) in the Appendix.

---

In [11]:
from datasets import load_dataset

dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

# To train for question answering/information extraction, you can replace the assertion in next line to example["category"] == "closed_qa"/"information_extraction".
summarization_dataset = dolly_dataset.filter(
    lambda example: example["category"] == "summarization"
)
summarization_dataset = summarization_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

# Dumping the training data to a local file to be used for training.
train_and_test_dataset["train"].to_json("train.jsonl")

Creating json from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]

2115840

In [12]:
train_and_test_dataset["train"][0]

{'instruction': 'Using the passage below, in Machine Learning, what is unsupervised learning and is it different from supervised learning?',
 'context': 'Unsupervised learning is a type of algorithm that learns patterns from untagged data. The goal is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and then generate imaginative content from it.\n\nIn contrast to supervised learning where data is tagged by an expert, e.g. tagged as a "ball" or "fish", unsupervised methods exhibit self-organization that captures patterns as probability densities or a combination of neural feature preferences encoded in the machine\'s weights and activations. The other levels in the supervision spectrum are reinforcement learning where the machine is given only a numerical performance score as guidance, and semi-supervised learning where a small portion of the data is tagged.',
 'response': 'Unsupervised learning 

---
Next, we create a prompt template for using the data in an instruction / input format for the training job (since we are instruction fine-tuning the model in this example), and also for inferencing the deployed endpoint.

---

In [15]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": " {response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

### Upload dataset to S3
---

We will upload the prepared dataset to S3 which will be used for fine-tuning.

---

In [16]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.jsonl"
train_data_location = f"s3://{output_bucket}/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

Training data: s3://sagemaker-us-east-1-094784590684/dolly_dataset


## Train the model
---
Next, we fine-tune the LLaMA 3 8B model on the summarization dataset from Dolly. Finetuning scripts are based on scripts provided by [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). To learn more about the fine-tuning scripts, please checkout section [5. Few notes about the fine-tuning method](#5.-Few-notes-about-the-fine-tuning-method). For a list of supported hyper-parameters and their default values, please see section [3. Supported Hyper-parameters for fine-tuning](#3.-Supported-Hyper-parameters-for-fine-tuning).

---

In [17]:
from sagemaker.jumpstart.estimator import JumpStartEstimator


estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},  # Please change {"accept_eula": "true"}
    disable_output_compression=True,
    instance_type="ml.g5.12xlarge",  # For Llama-3-70b, add instance_type = "ml.g5.48xlarge"
)
# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(
    instruction_tuned="True", epoch="5", max_input_length="1024"
)
estimator.fit({"training": train_data_location})

INFO:sagemaker:Creating training-job with name: meta-textgeneration-llama-3-8b-2024-06-30-03-42-05-187


2024-06-30 03:42:05 Starting - Starting the training job...
2024-06-30 03:42:33 Starting - Preparing the instances for training......
2024-06-30 03:43:08 Downloading - Downloading input data...........................
2024-06-30 03:47:58 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-06-30 03:48:09,300 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-06-30 03:48:09,335 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-06-30 03:48:09,345 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-06-30 03:48:09,347 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2024-06-30 03:48:18,451 sagemaker-training-toolkit INFO     Installing dependenci

Studio Kernel Dying issue:  If your studio kernel dies and you lose reference to the estimator object, please see section [6. Studio Kernel Dead/Creating JumpStart Model from the training Job](#6.-Studio-Kernel-Dead/Creating-JumpStart-Model-from-the-training-Job) on how to deploy endpoint using the training job name and the model id. 


### Deploy the fine-tuned model
---
Next, we deploy fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

---

In [19]:
finetuned_predictor = estimator.deploy()

No instance type selected for inference hosting endpoint. Defaulting to ml.g5.12xlarge.
INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.12xlarge.
INFO:sagemaker:Creating model with name: meta-textgeneration-llama-3-8b-2024-06-30-04-35-09-676
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-3-8b-2024-06-30-04-35-09-667
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-3-8b-2024-06-30-04-35-09-667


-----------!

### Evaluate the pre-trained and fine-tuned model
---
Next, we use the test data to evaluate the performance of the fine-tuned model and compare it with the pre-trained model. 

---

In [24]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

(
    inputs,
    ground_truth_responses,
    responses_before_finetuning,
    responses_after_finetuning,
) = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])
    # Please change the following line to "accept_eula=true"
    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=false"
    )
    responses_before_finetuning.append(pretrained_response.get("generated_text"))
    # Fine Tuned Llama 3 models doesn't required to set "accept_eula=true"
    finetuned_response = finetuned_predictor.predict(payload)
    responses_after_finetuning.append(finetuned_response.get("generated_text"))


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

Unnamed: 0,Inputs,Ground Truth,Response from non-finetuned model,Response from fine-tuned model
0,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWho is the finest Indian cricketer right now?\n\n### Input:\nVirat Kohli (Hindi pronunciation: born 5 November 1988) is an Indian international cricketer and former captain of the Indian national cricket team. He now plays as a right-handed batter for Royal Challengers Bangalore in the IPL and for Delhi in Indian domestic cricket. He is widely recognised as one of the best batsman in cricket history. He is the second most prolific international century batsman in cricket history. The International Cricket Council elected him the male cricketer of the decade despite holding the record for most runs in T20 internationals and the IPL. Kohli has also contributed to a number of India's victories, including the 2011 World Cup and the 2013 Champions Trophy.\n\n\n\n### Response:\n","Virat Kohli (Hindi pronunciation: born 5 November 1988) is an Indian international cricketer and former captain of the Indian national cricket team. He now plays as a right-handed batter for Royal Challengers Bangalore in the IPL and for Delhi in Indian domestic cricket. He is widely recognised as one of the best batsman in cricket history. He is the second most prolific international century batsman in cricket history. The International Cricket Council elected him the male cricketer of the decade despite holding the record for most runs in T20 internationals and the IPL. Kohli has also contributed to a number of India's victories, including the 2011 World Cup and the 2013 Champions Trophy.\n\nKohli was born and raised in New Delhi, where he attended the West Delhi Cricket Academy and began his junior career with the Delhi Under-15 team. He made his international debut in 2008 and soon established himself as a prominent member of the ODI team, eventually making his Test debut in 2011. For the first time, Kohli topped the ICC ODI batting rankings in 2013. During the 2014 T20 World Cup, he set a competition record for the most runs scored. In 2018, he became the world's top-ranked Test batsman, making him the only Indian cricketer to hold the number one place in all three versions of the game. \nIn 2019, he became the first player to score 20,000 international runs in a single decade. Following the T20 World Cup in 2021, Kohli decided to step down as captain of the Indian national team for T20Is, and he stood down as captain of the Test team in early 2022.\n\n\nHe has garnered numerous awards for his achievements on the cricket pitch. He was named the ICC One-Day International Player of the Year in 2012 and has twice won the Sir Garfield Sobers Trophy, which is awarded to the ICC Cricketer of the Year, in 2017 and 2018. Kohli was the best run scorer in the 2012 Asia Cup, scoring 357 runs. Kohli was also named ICC Test Player of the Year and ICC ODI Player of the Year in 2018, making him the first player to get both honours in the same year. In addition, from 2016 to 2018, he was crowned the Wisden Top Cricketer in the World for three years in a row. Kohli received the Arjuna Award in 2013, the Padma Shri in the sports category in 2017, and the Rajiv Gandhi Khel Ratna medal, India's highest sporting honour, in 2018.","Virat Kohli is the finest Indian cricketer right now. He is a right-handed batsman and has been playing for the Indian national cricket team since 2008. He has scored over 10,000 runs in international cricket and has been a part of many successful teams. He is known for his aggressive batting style and has been a key player in India's success in recent years. He is also the captain of the Indian team and has led them to many victories. He is considered",Virat Kohli is the finest Indian cricketer right now.
1,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhere did the 1951 French legislative election take place\n\n### Input:\nElections to the French National Assembly were held in French Somaliland on 17 June 1951 as part of the wider French parliamentary elections. Edmond Magendie was elected as the territory's MP, defeating the incumbent Jean-Carles Martine.\n\n\n\n### Response:\n","The elections to the French National Assembly were held in French Somaliland on 17 June 1951 as part of the wider French parliamentary elections. Edmond Magendie was elected as the territory's MP, defeating the incumbent Jean-Carles Martine.",The 1951 French legislative election took place in French Somaliland.,The 1951 French legislative election took place in French Somaliland.
2,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the difference between a clan and a tribe\n\n### Input:\nIn different cultures and situations, a clan usually has different meaning than other kin-based groups, such as tribes and bands. Often, the distinguishing factor is that a clan is a smaller, integral part of a larger society such as a tribe, chiefdom, or a state. In some societies, clans may have an official leader such as a chief, matriarch or patriarch; or such leadership role is performed by elders. In others, leadership positions may have to be achieved.\n\nThe term tribe is used in many different contexts to refer to a category of human social group. The predominant worldwide usage of the term in English is in the discipline of anthropology. Its definition is contested, in part due to conflicting theoretical understandings of social and kinship structures, and also reflecting the problematic application of this concept to extremely diverse human societies. The concept is often contrasted by anthropologists with other social and kinship groups, being hierarchically larger than a lineage or clan, but smaller than a chiefdom, nation or state. These terms are equally disputed. In some cases tribes have legal recognition and some degree of political autonomy from national or federal government, but this legalistic usage of the term may conflict with anthropological definitions.\n\nIn the United States, Native American tribes are legally considered to have ""domestic dependent nation"" status within the territorial United States, with a government-to-government relationship with the federal government.\n\n\n\n### Response:\n","In different cultures and situations, a clan usually has different meaning than other kin-based groups, such as tribes and bands. Often, the distinguishing factor is that a clan is a smaller, integral part of a larger society such as a tribe, chiefdom, or a state. In some societies, clans may have an official leader such as a chief, matriarch or patriarch; or such leadership role is performed by elders. In others, leadership positions may have to be achieved.\n\nThe term tribe is used in many different contexts to refer to a category of human social group. The predominant worldwide usage of the term in English is in the discipline of anthropology. Its definition is contested, in part due to conflicting theoretical understandings of social and kinship structures, and also reflecting the problematic application of this concept to extremely diverse human societies. The concept is often contrasted by anthropologists with other social and kinship groups, being hierarchically larger than a lineage or clan, but smaller than a chiefdom, nation or state. These terms are equally disputed. In some cases tribes have legal recognition and some degree of political autonomy from national or federal government, but this legalistic usage of the term may conflict with anthropological definitions.\n\nIn the United States, Native American tribes are legally considered to have ""domestic dependent nation"" status within the territorial United States, with a government-to-government relationship with the federal government.","A clan is a group of people who are related by blood or marriage. A tribe is a group of people who are related by blood or marriage, and who live in the same area.","A clan is a smaller, integral part of a larger society such as a tribe, chiefdom, or a state. A tribe is a category of human social group."
3,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nCan you summarize the main points about the first edition of the Dota 2 tournament, The International?\n\n### Input:\nValve announced the first edition of The International on August 1, 2011. 16 teams were invited to compete in the tournament, which would also serve as the first public viewing of Dota 2. The tournament was funded by Valve, including the US$1 million grand prize, with Nvidia supplying the hardware. It took place at Gamescom in Cologne from August 17–21 the same year. The tournament started with a group stage in which the winners of each of the four groups were entered into a winner's bracket, and the other teams entered the loser's bracket. The rest of the tournament was then played as a double-elimination tournament. The final of this inaugural tournament was between Ukrainian-based Natus Vincere and Chinese-based EHOME, with Natus Vincere winning the series 3–1. EHOME won US$250,000, with the rest of the 14 teams splitting the remaining $350,000.\n\n\n\n### Response:\n","The first edition of The International was held at Gamescom in Cologne, Germany August 17-21 2011. Sixteen teams were directly invited by Valve to complete for $1 Million grand prize. The tournament was played in a round robin group stage, followed by a double elimination playoff. The Ukrainian based team, Natus Vincere defeated the Chinese based EHOME 3-1 in the grand finals.","The first edition of The International was held in Cologne, Germany, from August 17–21, 2011. It was the first public viewing of Dota 2, and it was funded by Valve, including the US$1 million grand prize. The tournament started with a group stage in which the winners of each of the four groups were entered into a winner's bracket, and the other teams entered the loser's bracket. The rest of the tournament was then played as a double-elimination","The first edition of the Dota 2 tournament, The International, was announced on August 1, 2011. 16 teams were invited to compete in the tournament, which would also serve as the first public viewing of Dota 2. The tournament was funded by Valve, including the US$1 million grand prize, with Nvidia supplying the hardware. It took place at Gamescom in Cologne from August 17–21 the same year. The tournament started with a group stage in which the winners"
4,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the oldest higher learning institution in the United States\n\n### Input:\nHarvard University is a private Ivy League research university in Cambridge, Massachusetts. Founded in 1636 as Harvard College and named for its first benefactor, the Puritan clergyman John Harvard, it is the oldest institution of higher learning in the United States and is widely considered to be one of the most prestigious universities in the world.\n\n\n\n### Response:\n","The oldest higher learning institution in the United States is Harvard University, which is widely considered the most prestigious university in the world.","Harvard University is the oldest higher learning institution in the United States. It was founded in 1636 as Harvard College and named for its first benefactor, the Puritan clergyman John Harvard. The university is widely considered to be one of the most prestigious universities in the world.","Harvard University is the oldest higher learning institution in the United States. It was founded in 1636 as Harvard College and named for its first benefactor, the Puritan clergyman John Harvard."


### Clean up resources

In [25]:
# Delete resources
pretrained_predictor.delete_model()
pretrained_predictor.delete_endpoint()
finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()

INFO:sagemaker:Deleting model with name: meta-textgeneration-llama-3-8b-2024-06-30-03-27-41-682
INFO:sagemaker:Deleting endpoint configuration with name: meta-textgeneration-llama-3-8b-2024-06-30-03-27-41-859
INFO:sagemaker:Deleting endpoint with name: meta-textgeneration-llama-3-8b-2024-06-30-03-27-41-859
INFO:sagemaker:Deleting model with name: meta-textgeneration-llama-3-8b-2024-06-30-04-35-09-676
INFO:sagemaker:Deleting endpoint configuration with name: meta-textgeneration-llama-3-8b-2024-06-30-04-35-09-667
INFO:sagemaker:Deleting endpoint with name: meta-textgeneration-llama-3-8b-2024-06-30-04-35-09-667


# Appendix

### 1. Supported Inference Parameters

---
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 


### Notes
- If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 8K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 8B and 70B models, we recommend to set `max_new_tokens` no greater than 1500 and 500 respectively, while keeping the total number of tokens less than 8K.
- In order to support a 8k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.

---

### 2. Dataset formatting instruction for training

---

####  Fine-tune the Model on a New Dataset
We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can easily switch to one of the training 
methods by specifying parameter `instruction_tuned` being 'True' or 'False'.


#### 2.1. Domain adaptation fine-tuning
The Text Generation model can also be fine-tuned on any domain specific dataset. After being fine-tuned on the domain specific dataset, the model
is expected to generate domain specific text and solve various NLP tasks in that specific domain with **few shot prompting**.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Each directory contains a CSV/JSON/TXT file. 
  - For CSV/JSON files, the train or validation data is used from the column called 'text' or the first column if no column called 'text' is found.
  - The number of files under train and validation (if provided) should equal to one, respectively. 
- **Output:** A trained model that can be deployed for inference. 

Below is an example of a TXT file for fine-tuning the Text Generation model. The TXT file is SEC filings of Amazon from year 2021 to 2022.

```Note About Forward-Looking Statements
This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions. Forward-looking
statements are based on current expectations and assumptions that are subject
to risks and uncertainties that may cause actual results to differ materially.
We describe risks and uncertainties that could cause actual results and events
to differ materially in “Risk Factors,” “Management’s Discussion and Analysis
of Financial Condition and Results of Operations,” and “Quantitative and
Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form
10-K). Readers are cautioned not to place undue reliance on forward-looking
statements, which speak only as of the date they are made. We undertake no
obligation to update or revise publicly any forward-looking statements,
whether because of new information, future events, or otherwise.
GENERAL
Embracing Our Future ...
```


#### 2.2. Instruction fine-tuning
The Text generation model can be instruction-tuned on any text data provided that the data 
is in the expected format. The instruction-tuned model can be further deployed for inference. 
Below are the instructions for how the training data should be formatted for input to the 
model.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Train and validation directories should contain one or multiple JSON lines (`.jsonl`) formatted files. In particular, train directory can also contain an optional `*.json` file describing the input and output formats. 
  - The best model is selected according to the validation loss, calculated at the end of each epoch.
  If a validation set is not given, an (adjustable) percentage of the training data is
  automatically split and used for validation.
  - The training data must be formatted in a JSON lines (`.jsonl`) format, where each line is a dictionary
representing a single data sample. All training data must be in a single folder, however
it can be saved in multiple jsonl files. The `.jsonl` file extension is mandatory. The training
folder can also contain a `template.json` file describing the input and output formats. If no
template file is given, the following template will be used:
  ```json
  {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}",
    "completion": "{response}"
  }
  ```
  - In this case, the data in the JSON lines entries must include `instruction`, `context` and `response` fields. If a custom template is provided it must also use `prompt` and `completion` keys to define
  the input and output templates.
  Below is a sample custom template:

  ```json
  {
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}"
  }
  ```
Here, the data in the JSON lines entries must include `question`, `context` and `answer` fields. 
- **Output:** A trained model that can be deployed for inference. 

---

#### 2.3. Example fine-tuning with Domain-Adaptation dataset format
---
We provide a subset of SEC filings data of Amazon in domain adaptation dataset format. It is downloaded from publicly available [EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch). Instruction of accessing the data is shown [here](https://www.sec.gov/os/accessing-edgar-data).

License: [Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/legalcode).

Please uncomment the following code to fine-tune the model on dataset in domain adaptation format.

---

In [None]:
import boto3

model_id = "meta-textgeneration-llama-3-8b"

estimator = JumpStartEstimator(
    model_id=model_id,
    environment={"accept_eula": "true"},
    instance_type="ml.g5.24xlarge",
)
estimator.set_hyperparameters(instruction_tuned="False", epoch="5")
estimator.fit(
    {
        "training": f"s3://jumpstart-cache-prod-{boto3.Session().region_name}/training-datasets/sec_amazon"
    }
)

### 3. Supported Hyper-parameters for fine-tuning
---
- epoch: The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default: 5
- learning_rate: The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default: 1e-4.
- instruction_tuned: Whether to instruction-train the model or not. Must be 'True' or 'False'. Default: 'False'
- per_device_train_batch_size: The batch size per GPU core/CPU for training. Must be a positive integer. Default: 4.
- per_device_eval_batch_size: The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default: 1
- max_train_samples: For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of training samples. Must be a positive integer or -1. Default: -1. 
- max_val_samples: For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of validation samples. Must be a positive integer or -1. Default: -1. 
- max_input_length: Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default: -1. 
- validation_split_ratio: If validation channel is none, ratio of train-validation split from the train data. Must be between 0 and 1. Default: 0.2. 
- train_data_split_seed: If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default: 0.
- preprocessing_num_workers: The number of processes to use for the preprocessing. If None, main process is used for preprocessing. Default: "None"
- lora_r: Lora R. Must be a positive integer. Default: 8.
- lora_alpha: Lora Alpha. Must be a positive integer. Default: 32
- lora_dropout: Lora Dropout. must be a positive float between 0 and 1. Default: 0.05. 
- int8_quantization: If True, model is loaded with 8 bit precision for training. Default for 8B: False. Default for 70B: True.
- enable_fsdp: If True, training uses Fully Sharded Data Parallelism. Default for 8B: True. Default for 70B: False.

Note 1: int8_quantization is not supported with FSDP. Also, int8_quantization = 'False' and enable_fsdp = 'False' is not supported due to CUDA memory issues for any of the g5 family instances. Thus, we recommend setting exactly one of int8_quantization or enable_fsdp to be 'True'
Note 2: Due to the size of the model, 70B model can not be fine-tuned with enable_fsdp = 'True' for any of the supported instance types.

---

### 4. Supported Instance types for fine-tuning Llama 3

---
We have tested our scripts on the following instances types for fine-tuning Llama 3:

| Model | Model ID | All Supported Instances Types for fine-tuning |
| - | - | - |
| Llama 3 8B | meta-textgeneration-llama-3-8b | ml.g5.12xlarge, ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge, ml.g4dn.12xlarge |
| Llama 3 8B Instruct | meta-textgeneration-llama-3-8b-instruct | ml.g5.12xlarge, ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge, ml.g4dn.12xlarge  |
| Llama 3 70B | meta-textgeneration-llama-3-70b | ml.g5.48xlarge, ml.p4d.24xlarge |
| Llama 3 70B Instruct | meta-textgeneration-llama-3-70b-instruct | ml.g5.48xlarge, ml.p4d.24xlarge |

Other instance types may also work to fine-tune. Note: When using p3 instances, training will be done with 32 bit precision as bfloat16 is not supported on these instances. Thus, training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

---

### 5. Few notes about the fine-tuning method

---
- Fine-tuning scripts are based on [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). 
- Instruction tuning dataset is first converted into domain adaptation dataset format before fine-tuning. 
- Fine-tuning scripts utilize Fully Sharded Data Parallel (FSDP) as well as Low Rank Adaptation (LoRA) method fine-tuning the models

---

### 6. Studio Kernel Dead/Creating JumpStart Model from the training Job
---
Due to the size of the Llama 70B model, training job may take several hours and the studio kernel may die during the training phase. However, during this time, training is still running in SageMaker. If this happens, you can still deploy the endpoint using the training job name with the following code:

How to find the training job name? Go to Console -> SageMaker -> Training -> Training Jobs -> Identify the training job name and substitute in the following cell. 

---

In [None]:
from sagemaker.jumpstart.estimator import JumpStartEstimator

training_job_name = "<<Replace this with Training Job Name>>"

attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
attached_estimator.logs()
attached_estimator.deploy()

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-3-finetuning.ipynb)