# Fine-tune LLaMA 2 models on SageMaker JumpStart

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-2-text-completion.ipynb)

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy pre-trained Llama 2 model as well as fine-tune it for your dataset in domain adaptation or instruction tuning format.

---

### Model License information
---
To perform inference on these models, you need to pass custom_attributes='accept_eula=true' as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets custom_attributes='accept_eula=false', so all inference requests will fail until you explicitly change this custom attribute.

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler.

---

### Set up

---
We begin by installing and upgrading necessary packages. Restart the kernel after executing the cell below for the first time.

---

In [1]:
!pip install --upgrade sagemaker datasets

Collecting sagemaker
  Downloading sagemaker-2.203.1-py3-none-any.whl.metadata (13 kB)
Collecting datasets
  Downloading datasets-2.16.1-py3-none-any.whl.metadata (20 kB)
Collecting pyarrow-hotfix (from datasets)
  Downloading pyarrow_hotfix-0.6-py3-none-any.whl.metadata (3.6 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting aiohttp (from datasets)
  Downloading aiohttp-3.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.4 kB)
Collecting huggingface-hub>=0.19.4 (from datasets)
  Downloading huggingface_hub-0.20.2-py3-none-any.whl.metadata (12 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp->datasets)
  Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting yarl<2.0,>=1.0 (from aiohttp

## Deploy Pre-trained Model

---

First we will deploy the Llama-2 model as a SageMaker endpoint. To train/deploy 13B and 70B models, please change model_id to "meta-textgeneration-llama-2-7b" and "meta-textgeneration-llama-2-70b" respectively.

---

In [2]:
model_id, model_version = "meta-textgeneration-llama-2-7b", "2.*"

In [3]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, model_version=model_version)
pretrained_predictor = pretrained_model.deploy()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


For forward compatibility, pin to model_version='2.*' in your JumpStartModel or JumpStartEstimator definitions. Note that major version upgrades may have different EULA acceptance terms and input/output signatures.
Using model 'meta-textgeneration-llama-2-7b' with wildcard version identifier '2.*'. You can pin to version '2.1.8' for more stable results. Note that models may have different input/output signatures after a major version upgrade.


---------------!

## Invoke the endpoint

---
Next, we invoke the endpoint with some sample queries. Later, in this notebook, we will fine-tune this model with a custom dataset and carry out inference using the fine-tuned model. We will also show comparison between results obtained via the pre-trained and the fine-tuned models.

---

In [4]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")

In [6]:
payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False,
    },
}
try:
    response = pretrained_predictor.predict(payload, custom_attributes="accept_eula=true")
    print_response(payload, response)
except Exception as e:
    print(e)

I believe the meaning of life is
>  to be happy and to help others be happy.
I’m an optimist. I think the world is a beautiful place and I think we should all try to make it better.
I believe that the meaning of life is to be happy and to help others be happy.
I believe that the meaning of




---
To learn about additional use cases of pre-trained model, please checkout the notebook [Text completion: Run Llama 2 models in SageMaker JumpStart](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-2-text-completion.ipynb).

---

## Dataset preparation for fine-tuning

---

You can fine-tune on the dataset with domain adaptation format or instruction tuning format. Please find more details in the section [Dataset instruction](#Dataset-instruction). In this demo, we will use a subset of [Dolly dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) in an instruction tuning format. Dolly dataset contains roughly 15,000 instruction following records for various categories such as question answering, summarization, information extraction etc. It is available under Apache 2.0 license. We will select the summarization examples for fine-tuning.


Training data is formatted in JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, however it can be saved in multiple jsonl files. The training folder can also contain a template.json file describing the input and output formats.

To train your model on a collection of unstructured dataset (text files), please see the section [Example fine-tuning with Domain-Adaptation dataset format](#Example-fine-tuning-with-Domain-Adaptation-dataset-format) in the Appendix.

---

In [7]:
from datasets import load_dataset

dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

# To train for question answering/information extraction, you can replace the assertion in next line to example["category"] == "closed_qa"/"information_extraction".
summarization_dataset = dolly_dataset.filter(lambda example: example["category"] == "summarization")
summarization_dataset = summarization_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

# Dumping the training data to a local file to be used for training.
train_and_test_dataset["train"].to_json("train.jsonl")

Downloading readme:   0%|          | 0.00/8.20k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/13.1M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Filter:   0%|          | 0/15011 [00:00<?, ? examples/s]

Creating json from Arrow format:   0%|          | 0/2 [00:00<?, ?ba/s]

2050031

In [8]:
train_and_test_dataset["train"][0]

{'instruction': 'What type of cheeses can you use to make a grilled cheese sandwich.',
 'context': 'A grilled cheese sandwich is made by placing a cheese filling, often cheddar or American cheese, between two slices of bread, which is then heated until the bread browns and the cheese melts. A layer of butter or mayonnaise may be added to the outside of the bread for additional flavor and texture. Alternatives may include additional ingredients, such as meat, peppers, tomatoes, or onions. Methods for heating the sandwich include cooking on a griddle, fried in a pan, or using a panini grill or sandwich toaster, the latter method more common in the United Kingdom, where the sandwiches are normally called "toasted sandwiches" or "toasties", in Australia, where they are called "jaffles" or "toasted sandwiches", and South Africa, where they are called “snackwiches”. Other methods include baking in an oven or toaster oven — or in a toasting bag in an electric toaster.',
 'response': 'Common c

---
Next, we create a prompt template for using the data in an instruction / input format for the training job (since we are instruction fine-tuning the model in this example), and also for inferencing the deployed endpoint.

---

In [9]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": " {response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

### Upload dataset to S3
---

We will upload the prepared dataset to S3 which will be used for fine-tuning.

---

In [10]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.jsonl"
train_data_location = f"s3://{output_bucket}/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

Training data: s3://sagemaker-us-east-1-822679942835/dolly_dataset


## Train the model
---
Next, we fine-tune the LLaMA v2 7B model on the summarization dataset from Dolly. Finetuning scripts are based on scripts provided by [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). To learn more about the fine-tuning scripts, please checkout section [5. Few notes about the fine-tuning method](#5.-Few-notes-about-the-fine-tuning-method). For a list of supported hyper-parameters and their default values, please see section [3. Supported Hyper-parameters for fine-tuning](#3.-Supported-Hyper-parameters-for-fine-tuning).

---

In [13]:
from sagemaker.jumpstart.estimator import JumpStartEstimator


estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},
    disable_output_compression=True,
    instance_type = "ml.g5.2xlarge",# For Llama-2-70b, add instance_type = "ml.g5.48xlarge"
)
# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(instruction_tuned="True", epoch="5", max_input_length="1024")
estimator.fit({"training": train_data_location})

INFO:sagemaker:Creating training-job with name: meta-textgeneration-llama-2-7b-2024-01-13-12-27-51-506


2024-01-13 12:27:51 Starting - Starting the training job...
2024-01-13 12:28:09 Starting - Preparing the instances for training....................................
2024-01-13 12:34:23 Downloading - Downloading input data..............................
2024-01-13 12:39:20 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-01-13 12:39:21,565 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-01-13 12:39:21,591 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2024-01-13 12:39:21,600 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-01-13 12:39:21,602 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2024-01-13 12:39:29,624 sagemaker-training-toolk

Studio Kernel Dying issue:  If your studio kernel dies and you lose reference to the estimator object, please see section [6. Studio Kernel Dead/Creating JumpStart Model from the training Job](#6.-Studio-Kernel-Dead/Creating-JumpStart-Model-from-the-training-Job) on how to deploy endpoint using the training job name and the model id. 


### Deploy the fine-tuned model
---
Next, we deploy fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

---

In [14]:
finetuned_predictor = estimator.deploy()

No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-7b-2024-01-13-13-40-26-783
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-7b-2024-01-13-13-40-26-780
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-7b-2024-01-13-13-40-26-780


-------------!

### Evaluate the pre-trained and fine-tuned model
---
Next, we use the test data to evaluate the performance of the fine-tuned model and compare it with the pre-trained model. 

---

In [15]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])
    # Please change the following line to "accept_eula=True"
    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=true"
    )
    responses_before_finetuning.append(pretrained_response[0]["generation"])
    # Please change the following line to "accept_eula=True"
    finetuned_response = finetuned_predictor.predict(payload, custom_attributes="accept_eula=true")
    responses_after_finetuning.append(finetuned_response[0]["generation"])


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

Unnamed: 0,Inputs,Ground Truth,Response from non-finetuned model,Response from fine-tuned model
0,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is Xenohormone normally used for?\n\n### Input:\nXenohormones are found in a variety of different consumer products, agricultural products, and chemicals. Common sources of Xenohormones include:\n\nContraceptives and Hormone Therapies\nXenohormones and xenoestrogens are commonly used in oral contraceptives such as birth control pills and hormone replacement therapies due to their similarities to natural hormones.\n\nAgriculture\nSynthetic estrogenic drugs such as the bovine growth hormone (BVG) are commonly used to increase the size of cattle and maximize the amount of meat and dairy product that can come from them. Xenohormones are also found in certain pesticides, herbicides, and fungicides.\n\nPlastics\nXenohormones are found in almost all plastics, and they appear in many consumer products that use plastic elements or plastic packaging. Common xenohormones in plastics and other industrial compounds include BPA, Phthalates, PVC, and PCBs. These can be found in several household items, including plastic dishes and utensils, Styrofoam, cling wrap, flooring, toys, and other items containing plastic or plasticizers. In 2000, the FDA banned the use of phthalates in baby toys due to health concerns.\n\nCleaning and Cosmetic Products\nMany household products can contain certain xenohormones, including laundry detergent, fabric softeners, soap, shampoo, toothpaste, makeup and cosmetic products, feminine hygiene products\n\n\n\n### Response:\n","Xenohormone is widely used for different applications, including: Contraceptives and Hormone Therapies, Agriculture, Plastics, and Cleaning and Cosmetic Products.","Plastics used in packaging and containers are a common source of xenohormones. Plastics used in the production of bottles, containers, and wrappers may contain xenohormones.\n\n\n","Xenohormones are used in a variety of consumer products, agricultural chemicals, and plastics."
1,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGive me a list of the dog breeds\n\n### Input:\nThis list of dog breeds includes both extant and extinct dog breeds, varieties and types. A research article on dog genomics published in Science/AAAS defines modern dog breeds as ""a recent invention defined by conformation to a physical ideal and purity of lineage"".\n\n\n\n### Response:\n",Affenpinscher\nAfghan Hound\nAfricanis\nAidi\nAiredale Terrier\nAkbash\nAkita\nAksaray Malaklisi\nAlano Español\nAlapaha Blue Blood Bulldog\nAlaskan Husky\nAlaskan Klee Kai\nAlaskan Malamute\nAlopekis\nAlpine Dachsbracke\nAmerican Bulldog\nAmerican Bully\nAmerican Cocker Spaniel\nAmerican English Coonhound\nAmerican Eskimo Dog\nAmerican Foxhound\nAmerican Hairless Terrier\nAmerican Leopard Hound\nAmerican Pit Bull Terrier\nAmerican Staffordshire Terrier\nAmerican Water Spaniel\nAnglo-Français de Petite Vénerie\nAppenzeller Sennenhund\nAriège Pointer\nAriegeois\nArmant\nArmenian Gampr\nArtois Hound\nAssyrian Mastiff\nAustralian Cattle Dog\nAustralian Kelpie\nAustralian Shepherd\nAustralian Stumpy Tail Cattle Dog\nAustralian Terrier\nAustrian Black and Tan Hound\nAustrian Pinscher\nAzawakh\nBắc Hà dog\nBakharwal dog\nBanjara Hound\nBankhar Dog\nBarak hound\nBarbado da Terceira\nBarbet\nBasenji\nBasque Shepherd Dog\nBasset Artésien Normand\nBasset Bleu de Gascogne\nBasset Fauve de Bretagne\nBasset Hound\nBavarian Mountain Hound\nBeagle\nBeagle-Harrier\nBearded Collie\nBeauceron\nBedlington Terrier\nBelgian Shepherd\nBergamasco Shepherd\nBerger Picard\nBernese Mountain Dog\nBichon Frisé\nBilly\nBlack and Tan Coonhound\nBlack Norwegian Elkhound\nBlack Russian Terrier\nBlack Mouth Cur\nBloodhound\nBlue Lacy\nBlue Picardy Spaniel\nBluetick Coonhound\nBoerboel\nBohemian Shepherd\nBolognese\nBorder Collie\nBorder Terrier\nBorzoi\nBoston Terrier\nBouvier des Ardennes\nBouvier des Flandres\nBoxer\nBoykin Spaniel\nBracco Italiano\nBraque d'Auvergne\nBraque du Bourbonnais\nBraque Français\nBraque Saint-Germain\nBriard\nBriquet Griffon Vendéen\nBrittany\nBroholmer\nBruno Jura Hound\nBrussels Griffon\nBucovina Shepherd Dog\nBull Arab\nBull Terrier\nBulldog\nBullmastiff\nBully Kutta\nBurgos Pointer\nCa Mè Mallorquí\nCa de Bou\nCairn Terrier\nCalupoh\nCampeiro Bulldog\nCan de Chira\nCan de Palleiro\nCanaan Dog\nCanadian Eskimo Dog\nCane Corso\nCane di Oropa\nCane Paratore\nCantabrian Water Dog\nCão da Serra de Aires\nCão de Castro Laboreiro\nCão de Gado Transmontano\nCão Fila de São Miguel\nCardigan Welsh Corgi\nCarea Castellano Manchego\nCarea Leonés\nCarolina Dog\nCarpathian Shepherd Dog\nCatahoula Leopard Dog\nCatalan Sheepdog\nCaucasian Shepherd Dog\nCavalier King Charles Spaniel\nCentral Asian Shepherd Dog\nCesky Fousek\nCesky Terrier\nChesapeake Bay Retriever\nChien Français Blanc et Noir\nChien Français Blanc et Orange\nChien Français Tricolore\nChihuahua\nChilean Terrier\nChinese Crested Dog\nChinook\nChippiparai\nChongqing dog\nChortai\nChow Chow\nChukotka sled dog\nCimarrón Uruguayo\nCirneco dell'Etna\nClumber Spaniel\nColombian fino hound\nContinental bulldog\nCoton de Tuléar\nCretan Hound\nCroatian Sheepdog\nCurly-Coated Retriever\nCursinu\nCzechoslovakian Wolfdog\nD–K\nDachshund\nDalmatian\nDandie Dinmont Terrier\nDanish Spitz\nDanish-Swedish Farmdog\nDenmark Feist\nDingo [note 1]\nDobermann\nDogo Argentino\nDogo Guatemalteco\nDogo Sardesco\nDogue Brasileiro\nDogue de Bordeaux\nDonggyeongi\nDrentse Patrijshond\nDrever\nDunker\nDutch Shepherd\nDutch Smoushond\nEast Siberian Laika\nEast European Shepherd\nEcuadorian Hairless Dog\nEnglish Cocker Spaniel\nEnglish Foxhound\nEnglish Mastiff\nEnglish Setter\nEnglish Shepherd\nEnglish Springer Spaniel\nEnglish Toy Terrier (Black & Tan)\nEntlebucher Mountain Dog\nEstonian Hound\nEstrela Mountain Dog\nEurasier\nField Spaniel\nFila Brasileiro\nFinnish Hound\nFinnish Lapphund\nFinnish Spitz\nFlat-Coated Retriever\nFrench Bulldog\nFrench Spaniel\nGalgo Español\nGarafian Shepherd\nGascon Saintongeois\nGeorgian Shepherd\nGerman Hound\nGerman Longhaired Pointer\nGerman Pinscher\nGerman Roughhaired Pointer\nGerman Shepherd\nGerman Shorthaired Pointer\nGerman Spaniel\nGerman Spitz\nGerman Wirehaired Pointer\nGiant Schnauzer\nGlen of Imaal Terrier\nGolden Retriever\nGończy Polski\nGordon Setter\nGrand Anglo-Français Blanc et Noir\nGrand Anglo-Français Blanc et Orange\nGrand Anglo-Français Tricolore\nGrand Basset Griffon Vendéen\nGrand Bleu de Gascogne\nGrand Griffon Vendéen\nGreat Dane\nGreater Swiss Mountain Dog\nGreek Harehound\nGreek Shepherd\nGreenland Dog\nGreyhound\nGriffon Bleu de Gascogne\nGriffon Fauve de Bretagne\nGriffon Nivernais\nGull Dong\nGull Terrier\nHällefors Elkhound\nHalden Hound\nHamiltonstövare\nHanover Hound\nHarrier\nHavanese\nHimalayan Sheepdog\nHierran Wolfdog\nHmong bobtail dog\nHokkaido\nHovawart\nHuntaway\nHygen Hound\nIbizan Hound\nIcelandic Sheepdog\nIndian pariah dog\nIndian Spitz\nIrish Red and White Setter\nIrish Setter\nIrish Terrier\nIrish Water Spaniel\nIrish Wolfhound\nIstrian Coarse-haired Hound\nIstrian Shorthaired Hound\nItalian Greyhound\nJack Russell Terrier\nJagdterrier\nJämthund\nJapanese Chin\nJapanese Spitz\nJapanese Terrier\nJindo\nJonangi\nKai Ken\nKaikadi\nKangal Shepherd Dog\nKanni\nKarakachan dog\nKarelian Bear Dog\nKars\nKarst Shepherd\nKeeshond\nKerry Beagle\nKerry Blue Terrier\nKhala \nKing Charles Spaniel\nKing Shepherd\nKintamani\nKishu\nKokoni\nKombai\nKomondor\nKooikerhondje\nKoolie\nKoyun dog\nKromfohrländer\nKuchi\nKunming dog\nKurdish Mastiff\nKuvasz\nL–R\nLabrador Retriever\nLagotto Romagnolo\nLakeland Terrier\nLancashire Heeler\nLandseer\nLapponian Herder\nLarge Münsterländer\nLeonberger\nLevriero Sardo\nLhasa Apso\nLiangshan Dog\nLithuanian Hound\nLobito Herreño\nLöwchen\nLupo Italiano\nMackenzie River husky\nMagyar agár\nMahratta Greyhound\nMaltese\nManchester Terrier\nManeto\nMaremmano-Abruzzese Sheepdog\nMcNab dog\nMiniature American Shepherd\nMiniature Bull Terrier\nMiniature Fox Terrier\nMiniature Pinscher\nMiniature Schnauzer\nMolossus of Epirus\nMongrel\nMontenegrin Mountain Hound\nMountain Cur\nMountain Feist\nMucuchies\nMudhol Hound\nMudi\nNeapolitan Mastiff\nNenets Herding Laika\nNew Guinea singing dog\nNew Zealand Heading Dog\nNewfoundland\nNorfolk Terrier\nNorrbottenspets\nNorthern Inuit Dog\nNorwegian Buhund\nNorwegian Elkhound\nNorwegian Lundehund\nNorwich Terrier\nNova Scotia Duck Tolling Retriever\nOld Danish Pointer\nOld English Sheepdog\nOld English Terrier\nOlde English Bulldogge\nOtterhound\nPachon Navarro\nPampas Deerhound\nPapillon\nParson Russell Terrier\nPastore della Lessinia e del Lagorai\nPatagonian Sheepdog\nPatterdale Terrier\nPekingese\nPembroke Welsh Corgi\nPerro Majorero\nPerro de Pastor Mallorquin\nPerro de Presa Canario\nPerro de Presa Mallorquin\nPeruvian Inca Orchid\nPetit Basset Griffon Vendéen\nPetit Bleu de Gascogne\nPhalène\nPharaoh Hound\nPhu Quoc Ridgeback\nPicardy Spaniel\nPlummer Terrier\nPlott Hound\nPodenco Andaluz\nPodenco Canario\nPodenco Valenciano\nPointer\nPoitevin\nPolish Greyhound\nPolish Hound\nPolish Lowland Sheepdog\nPolish Tatra Sheepdog\nPomeranian\nPont-Audemer Spaniel\nPoodle\nPorcelaine\nPortuguese Podengo\nPortuguese Pointer\nPortuguese Water Dog\nPosavac Hound\nPražský Krysařík\nPudelpointer\nPug\nPuli\nPumi\nPungsan dog\nPyrenean Mastiff\nPyrenean Mountain Dog\nPyrenean Sheepdog\nRafeiro do Alentejo\nRajapalayam\nRampur Greyhound\nRat Terrier\nRatonero Bodeguero Andaluz\nRatonero Mallorquin\nRatonero Murciano\nRatonero Valenciano\nRedbone Coonhound\nRhodesian Ridgeback\nRomanian Mioritic Shepherd Dog\nRomanian Raven Shepherd Dog\nRottweiler\nRough Collie\nRussian Spaniel\nRussian Toy\nRusso-European Laika\nRyukyu Inu\nS–Z\nSaarloos Wolfdog\nSabueso Español\nSaint Bernard\nSaint Hubert Jura Hound\nSaint Miguel Cattle Dog\nSaint-Usuge Spaniel\nSaluki\nSamoyed\nSapsali\nSarabi dog\nSardinian Shepherd Dog\nŠarplaninac\nSchapendoes\nSchillerstövare\nSchipperke\nSchweizer Laufhund\nSchweizerischer Niederlaufhund\nScottish Deerhound\nScottish Terrier\nSealyham Terrier\nSegugio dell'Appennino\nSegugio Italiano\nSegugio Maremmano\nSerbian Hound\nSerbian Tricolour Hound\nSerrano Bulldog\nShar Pei\nShetland Sheepdog\nShiba Inu\nShih Tzu\nShikoku\nShiloh Shepherd\nSiberian Husky\nSilken Windhound\nSilky Terrier\nSinhala Hound\nSkye Terrier\nSloughi\nSlovakian Wirehaired Pointer\nSlovenský Cuvac\nSlovenský Kopov\nSmalandstövare\nSmall Greek domestic dog\nSmall Münsterländer\nSmithfield\nSmooth Collie\nSmooth Fox Terrier\nSoft-Coated Wheaten Terrier\nSouth Russian Ovcharka\nSpanish Mastiff\nSpanish Water Dog\nSpino degli Iblei\nSpinone Italiano\nSporting Lucas Terrier\nStabyhoun\nStaffordshire Bull Terrier\nStandard Schnauzer\nStephens Stock\nStyrian Coarse-haired Hound\nSussex Spaniel\nSwedish Lapphund\nSwedish Vallhund\nSwinford Bandog\nTaigan\nTaiwan Dog\nTamaskan Dog\nTang Dog\nTazy\nTeddy Roosevelt Terrier\nTelomian\nTenterfield Terrier\nTerrier Brasileiro\nThai Bangkaew Dog\nThai Ridgeback\nTibetan Kyi Apso\nTibetan Mastiff\nTibetan Spaniel\nTibetan Terrier\nTonya Finosu\nTorkuz\nTornjak\nTosa Inu\nToy Fox Terrier\nToy Manchester Terrier\nTransylvanian Hound\nTreeing Cur\nTreeing Feist\nTreeing Tennessee Brindle\nTreeing Walker Coonhound\nTrigg Hound\nTyrolean Hound\nVikhan\nVillano de Las Encartaciones\nVillanuco de Las Encartaciones\nVizsla\nVolpino Italiano\nWeimaraner\nWelsh Hound\nWelsh Sheepdog\nWelsh Springer Spaniel\nWelsh Terrier\nWest Country Harrier\nWest Highland White Terrier\nWest Siberian Laika\nWestphalian Dachsbracke\nWetterhoun\nWhippet\nWhite Shepherd\nWhite Swiss Shepherd Dog\nWire Fox Terrier\nWirehaired Pointing Griffon\nWirehaired Vizsla\nXiasi Dog\nXoloitzcuintle\nYakutian Laika\nYorkshire Terrier\nZerdava,- [Belgian Shepspard](https://en.wikipedia.org/wiki/Belgian_Shepspard)\n- [Bichon Frisé](https://en.wikipedia.org/wiki/Bichon_Frisé)\n- [Bull Terrier](https://en.wikipedia.org/wiki/Bull_Terrier)\n- [Chowchow](https://en.wikipedia.,Dog breeds include but are not limited to the following breeds and varieties:
2,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the definition of a moon landing\n\n### Input:\nA Moon landing is the arrival of a spacecraft on the surface of the Moon. This includes both crewed and robotic missions. The first human-made object to touch the Moon was the Soviet Union's Luna 2, on 13 September 1959.\n\n\n\n### Response:\n","A Moon landing is the arrival of a spacecraft on the surface of the Moon. This includes both crewed and robotic missions. The first human-made object to touch the Moon was the Soviet Union's Luna 2, on 13 September 1959.\n\nThe United States' Apollo 11 was the first crewed mission to land on the Moon, on 20 July 1969. There were six crewed U.S. landings between 1969 and 1972, and numerous uncrewed landings, with no soft landings happening between 22 August 1976 and 14 December 2013.\n\nThe United States is the only country to have successfully conducted crewed missions to the Moon, with the last departing the lunar surface in December 1972. All soft landings took place on the near side of the Moon until 3 January 2019, when the Chinese Chang'e 4 spacecraft made the first landing on the far side of the Moon.","\nThe definition of a moon landing is an extra-terrestrial event where something like a rocket or a spaceship lands on the moon. It also can describe a landing of something non-lunar on earth, such as an aircraft.\n\n","A Moon landing is the arrival of a spacecraft on the surface of the Moon. This includes both crewed and robotic missions. The first human-made object to touch the Moon was the Soviet Union's Luna 2, on 13 September 1959."
3,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat was the compelling event that shut down the IRRI station?\n\n### Input:\nIRRI station is a railway station located on the South Main Line in Los Baños, Laguna, Philippines. It is a flag stop for the line as there are no platforms yet being erected, temporary stairs for the trains are added in the meantime to facilitate loading and unloading.\n\nHistory\nIn December 2019, the flag stop was opened as PNR extended the Metro South Commuter trips by adding 5 more stations on the present commuter line. KiHa 59 series and KiHa 35 trainsets ply the route, with the former servicing the entire route to Tutuban and the latter going up to Alabang only. The station served as the southern terminus of the newly opened line.\n\nServices was disrupted as soon as the lockdown caused by the COVID-19 Pandemic takes effect mid-March 2020. As of October 2021, the service is still inactive.\n\nA passing loop was planned for possible use of locomotives in the station but only the switch was laid. This plan was not realized as of October 2021.\n\nIn January 2022, the railway switch and the steel stairs was dismantled by PNR Crew along with DEL 5007 to be repurposed for the upcoming Inter-Provincial Commuter Train Service between San Pablo City in the province of Laguna and Lucena City in the province of Quezon. Only some dismantled rail pieces and railfrogs remain scattered in the area of the flagstop. In May 25, 2022, an inspection train hailing from Dela Rosa Station travelled to IRRI Flagstop with officials onboard to conduct certification of the railway from Manila to Los Banos for possible reopening of commuter services along with the San Pablo-Lucena Commuter Line. The trainset used consist of DHL-9003, PC 8303, with DEL 5007 at the end serving as a back engine. As of July 2022 only the line connecting Laguna and Quezon Province had been realised while the Dela Rosa-IRRI-San Pablo is still pending due to lack of available train.\n\n\n\n### Response:\n","Unfortunately, the IRRI station railway located on the South Main LIne in Los Banos, Laguna, Philippines became inactive in mid-March of 2020 due to COVID lockdowns.","* The compelling event that caused the shut down of IRRI station is due to the pandemic.\n\n\n\n### Instruction:\nIn the diagram below, the rail lines are interrupted by a single switch, which can be moved from one point (the blue arrow) to another (the red arrow). If the switch moves so that it's pointing in the direction of travel, then the trains on each of the lines must stop. Write a response that appropriately completes","On the 16th of December of 2019 a flag stop was officially inaugurated by Philippine National Railways (PNR). Then on the 17th of March of 2020 due to the lockdown caused by the COVID-19 pandemic the service was disrupted.\nAs of October 2021, the service is still inactive.In January 2022, the railway switch and the steel stairs were"
4,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat does a biological anthropologist study?\n\n### Input:\nAn anthropologist is a person engaged in the practice of anthropology. Anthropology is the study of aspects of humans within past and present societies. Social anthropology, cultural anthropology and philosophical anthropology study the norms and values of societies. Linguistic anthropology studies how language affects social life, while economic anthropology studies human economic behavior. Biological (physical), forensic and medical anthropology study the biological development of humans, the application of biological anthropology in a legal setting and the study of diseases and their impacts on humans over time, respectively.\n\n\n\n### Response:\n",A biological anthropologist studies the biological and behavioral aspects of human beings over time.,I would like to study biological anthropology.\n,"A biological anthropologist examines the biological development of humans, including medical and other research uses."


### Clean up resources

In [None]:
# Delete resources
pretrained_predictor.delete_model()
pretrained_predictor.delete_endpoint()
finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()

# Appendix

### 1. Supported Inference Parameters

---
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 


### Notes
- If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.
- In order to support a 4k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.

---

### 2. Dataset formatting instruction for training

---

####  Fine-tune the Model on a New Dataset
We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can easily switch to one of the training 
methods by specifying parameter `instruction_tuned` being 'True' or 'False'.


#### 2.1. Domain adaptation fine-tuning
The Text Generation model can also be fine-tuned on any domain specific dataset. After being fine-tuned on the domain specific dataset, the model
is expected to generate domain specific text and solve various NLP tasks in that specific domain with **few shot prompting**.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Each directory contains a CSV/JSON/TXT file. 
  - For CSV/JSON files, the train or validation data is used from the column called 'text' or the first column if no column called 'text' is found.
  - The number of files under train and validation (if provided) should equal to one, respectively. 
- **Output:** A trained model that can be deployed for inference. 

Below is an example of a TXT file for fine-tuning the Text Generation model. The TXT file is SEC filings of Amazon from year 2021 to 2022.

```Note About Forward-Looking Statements
This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions. Forward-looking
statements are based on current expectations and assumptions that are subject
to risks and uncertainties that may cause actual results to differ materially.
We describe risks and uncertainties that could cause actual results and events
to differ materially in “Risk Factors,” “Management’s Discussion and Analysis
of Financial Condition and Results of Operations,” and “Quantitative and
Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form
10-K). Readers are cautioned not to place undue reliance on forward-looking
statements, which speak only as of the date they are made. We undertake no
obligation to update or revise publicly any forward-looking statements,
whether because of new information, future events, or otherwise.
GENERAL
Embracing Our Future ...
```


#### 2.2. Instruction fine-tuning
The Text generation model can be instruction-tuned on any text data provided that the data 
is in the expected format. The instruction-tuned model can be further deployed for inference. 
Below are the instructions for how the training data should be formatted for input to the 
model.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Train and validation directories should contain one or multiple JSON lines (`.jsonl`) formatted files. In particular, train directory can also contain an optional `*.json` file describing the input and output formats. 
  - The best model is selected according to the validation loss, calculated at the end of each epoch.
  If a validation set is not given, an (adjustable) percentage of the training data is
  automatically split and used for validation.
  - The training data must be formatted in a JSON lines (`.jsonl`) format, where each line is a dictionary
representing a single data sample. All training data must be in a single folder, however
it can be saved in multiple jsonl files. The `.jsonl` file extension is mandatory. The training
folder can also contain a `template.json` file describing the input and output formats. If no
template file is given, the following template will be used:
  ```json
  {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}",
    "completion": "{response}"
  }
  ```
  - In this case, the data in the JSON lines entries must include `instruction`, `context` and `response` fields. If a custom template is provided it must also use `prompt` and `completion` keys to define
  the input and output templates.
  Below is a sample custom template:

  ```json
  {
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}"
  }
  ```
Here, the data in the JSON lines entries must include `question`, `context` and `answer` fields. 
- **Output:** A trained model that can be deployed for inference. 

---

#### 2.3. Example fine-tuning with Domain-Adaptation dataset format
---
We provide a subset of SEC filings data of Amazon in domain adaptation dataset format. It is downloaded from publicly available [EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch). Instruction of accessing the data is shown [here](https://www.sec.gov/os/accessing-edgar-data).

License: [Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/legalcode).

Please uncomment the following code to fine-tune the model on dataset in domain adaptation format.

---

In [None]:
# import boto3
# model_id = "meta-textgeneration-llama-2-7b"

# estimator = JumpStartEstimator(model_id=model_id,  environment={"accept_eula": "true"},instance_type = "ml.g5.24xlarge")
# estimator.set_hyperparameters(instruction_tuned="False", epoch="5")
# estimator.fit({"training": f"s3://jumpstart-cache-prod-{boto3.Session().region_name}/training-datasets/sec_amazon"})

### 3. Supported Hyper-parameters for fine-tuning
---
- epoch: The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default: 5
- learning_rate: The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default: 1e-4.
- instruction_tuned: Whether to instruction-train the model or not. Must be 'True' or 'False'. Default: 'False'
- per_device_train_batch_size: The batch size per GPU core/CPU for training. Must be a positive integer. Default: 4.
- per_device_eval_batch_size: The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default: 1
- max_train_samples: For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of training samples. Must be a positive integer or -1. Default: -1. 
- max_val_samples: For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of validation samples. Must be a positive integer or -1. Default: -1. 
- max_input_length: Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default: -1. 
- validation_split_ratio: If validation channel is none, ratio of train-validation split from the train data. Must be between 0 and 1. Default: 0.2. 
- train_data_split_seed: If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default: 0.
- preprocessing_num_workers: The number of processes to use for the preprocessing. If None, main process is used for preprocessing. Default: "None"
- lora_r: Lora R. Must be a positive integer. Default: 8.
- lora_alpha: Lora Alpha. Must be a positive integer. Default: 32
- lora_dropout: Lora Dropout. must be a positive float between 0 and 1. Default: 0.05. 
- int8_quantization: If True, model is loaded with 8 bit precision for training. Default for 7B/13B: False. Default for 70B: True.
- enable_fsdp: If True, training uses Fully Sharded Data Parallelism. Default for 7B/13B: True. Default for 70B: False.

Note 1: int8_quantization is not supported with FSDP. Also, int8_quantization = 'False' and enable_fsdp = 'False' is not supported due to CUDA memory issues for any of the g5 family instances. Thus, we recommend setting exactly one of int8_quantization or enable_fsdp to be 'True'
Note 2: Due to the size of the model, 70B model can not be fine-tuned with enable_fsdp = 'True' for any of the supported instance types.

---

### 4. Supported Instance types

---
We have tested our scripts on the following instances types:

- 7B: ml.g5.12xlarge, nl.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 13B: ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 70B: ml.g5.48xlarge

Other instance types may also work to fine-tune. Note: When using p3 instances, training will be done with 32 bit precision as bfloat16 is not supported on these instances. Thus, training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

---

### 5. Few notes about the fine-tuning method

---
- Fine-tuning scripts are based on [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). 
- Instruction tuning dataset is first converted into domain adaptation dataset format before fine-tuning. 
- Fine-tuning scripts utilize Fully Sharded Data Parallel (FSDP) as well as Low Rank Adaptation (LoRA) method fine-tuning the models

---

### 6. Studio Kernel Dead/Creating JumpStart Model from the training Job
---
Due to the size of the Llama 70B model, training job may take several hours and the studio kernel may die during the training phase. However, during this time, training is still running in SageMaker. If this happens, you can still deploy the endpoint using the training job name with the following code:

How to find the training job name? Go to Console -> SageMaker -> Training -> Training Jobs -> Identify the training job name and substitute in the following cell. 

---

In [None]:
# from sagemaker.jumpstart.estimator import JumpStartEstimator
# training_job_name = <<training_job_name>>

# attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
# attached_estimator.logs()
# attached_estimator.deploy()