# Fine-tune LLaMA 2 models on SageMaker JumpStart

In [3]:
!pip install --upgrade sagemaker datasets

Collecting sagemaker
  Downloading sagemaker-2.187.0.tar.gz (886 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m886.2/886.2 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting datasets
  Obtaining dependency information for datasets from https://files.pythonhosted.org/packages/09/7e/fd4d6441a541dba61d0acb3c1fd5df53214c2e9033854e837a99dd9e0793/datasets-2.14.5-py3-none-any.whl.metadata
  Downloading datasets-2.14.5-py3-none-any.whl.metadata (19 kB)
Collecting xxhash (from datasets)
  Obtaining dependency information for xxhash from https://files.pythonhosted.org/packages/13/c3/e942893f4864a424514c81640f114980cfd5aff7e7414d1e0255f4571111/xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting fsspec[http]<2023.9.0,>=2023.1.0 (from datasets)
  Obtaining dependen

In [4]:
from datasets import load_dataset


dataset = load_dataset("m3hrdadfi/recipe_nlg_lite")
newset = dataset.map(lambda x: {'instruction': 'Generate a recipe for the dish '+x['name']})
newset = newset.map(lambda x: {'context': x['description']})
newset = newset.map(lambda x: {'response': 'Ingredients :\n '+ x['ingredients']+'\nSteps:\n'+x['steps']})
newset = newset.select_columns(['instruction','context','response'])

Downloading builder script:   0%|          | 0.00/3.46k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.88k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/5.39k [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


Downloading data:   0%|          | 0.00/6.71M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/6118 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1080 [00:00<?, ? examples/s]

Map:   0%|          | 0/6118 [00:00<?, ? examples/s]

Map:   0%|          | 0/1080 [00:00<?, ? examples/s]

Map:   0%|          | 0/6118 [00:00<?, ? examples/s]

Map:   0%|          | 0/1080 [00:00<?, ? examples/s]

Map:   0%|          | 0/6118 [00:00<?, ? examples/s]

Map:   0%|          | 0/1080 [00:00<?, ? examples/s]

In [24]:
smallset = newset['train'].shard(num_shards=2, index=0)

In [25]:
len(smallset['instruction'])

3059

In [26]:
smallset.to_json('train.jsonl')

Creating json from Arrow format:   0%|          | 0/4 [00:00<?, ?ba/s]

4218922

In [29]:
smallset["response"][200]

"Ingredients :\n 2 oz dried mushrooms, 4 cups water, 2 cups green or brown lentils, , 6 cups low sodium vegetable broth, 2 tbsp olive oil, 1 whole large onion, diced, 2 tbsp tomato paste, 2 tbsp tamari sauce, 1 1/2 tsp crushed garlic, 28 oz crushed fire roasted tomatoes, 1/3 cup fresh chopped parsley, , 1 tsp smoked paprika, 3/4 tsp salt, or more to taste, 1 tsp basil, 1 tsp oregano, 1/2 tsp turmeric, 1/2 tsp thyme, 1/2 tsp black pepper, 1/4 tsp cayenne pepper, cooked pasta for serving\nSteps:\nboil mushrooms in 4 cups water for 5 minutes . let them soak in the hot water for a few minutes while you cook the lentils, until the liquid is dark brown and has cooled off a bit.meanwhile, bring lentils to a boil in vegetable broth . reduce to a simmer, then cook for about 15 minutes or until just tender . drain . remove the soaked mushrooms with a slotted spoon and rinse them under cool running water to remove any excess residue . chop up the soaked mushrooms, then rinse them once again there

In [9]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": " {response}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

---
Next, we create a prompt template for using the data in an instruction / input format for the training job (since we are instruction fine-tuning the model in this example), and also for inferencing the deployed endpoint.

---

### Upload dataset to S3
---

We will upload the prepared dataset to S3 which will be used for fine-tuning.

---

In [30]:
from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.jsonl"
train_data_location = f"s3://{output_bucket}/recipe_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
Training data: s3://sagemaker-us-east-1-879412751908/recipe_dataset


In [31]:
model_id, model_version = "meta-textgeneration-llama-2-7b", "*"

## Train the model
---
Next, we fine-tune the LLaMA v2 7B model on the summarization dataset from Dolly. Finetuning scripts are based on scripts provided by [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). To learn more about the fine-tuning scripts, please checkout section [5. Few notes about the fine-tuning method](#5.-Few-notes-about-the-fine-tuning-method). For a list of supported hyper-parameters and their default values, please see section [3. Supported Hyper-parameters for fine-tuning](#3.-Supported-Hyper-parameters-for-fine-tuning).

---

In [33]:
from sagemaker.jumpstart.estimator import JumpStartEstimator


estimator = JumpStartEstimator(
    model_id=model_id,
    environment={"accept_eula": "true"},
    disable_output_compression=True,  # For Llama-2-70b, add instance_type = "ml.g5.48xlarge"
    instance_type = "ml.g5.2xlarge"
)
# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(instruction_tuned="True", epoch="3", max_input_length="1024")
estimator.fit({"training": train_data_location})

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: meta-textgeneration-llama-2-7b-2023-09-24-03-45-50-812


2023-09-24 03:45:50 Starting - Starting the training job...
2023-09-24 03:46:06 Starting - Preparing the instances for training.........
2023-09-24 03:47:25 Downloading - Downloading input data............
2023-09-24 03:49:26 Training - Downloading the training image..................
2023-09-24 03:52:47 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-09-24 03:52:48,801 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-09-24 03:52:48,815 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-09-24 03:52:48,823 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-09-24 03:52:48,825 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2023-09-24 03

Studio Kernel Dying issue:  If your studio kernel dies and you lose reference to the estimator object, please see section [6. Studio Kernel Dead/Creating JumpStart Model from the training Job](#6.-Studio-Kernel-Dead/Creating-JumpStart-Model-from-the-training-Job) on how to deploy endpoint using the training job name and the model id. 


### Deploy the fine-tuned model
---
Next, we deploy fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

---

In [34]:
finetuned_predictor = estimator.deploy()

INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker.image_uris:Ignoring unnecessary Python version: py39.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: ml.g5.2xlarge.
INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-7b-2023-09-24-05-11-52-442
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-7b-2023-09-24-05-11-52-431
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-7b-2023-09-24-05-11-52-431


--------------!

### Evaluate the pre-trained and fine-tuned model
---
Next, we use the test data to evaluate the performance of the fine-tuned model and compare it with the pre-trained model. 

---

In [37]:
import pandas as pd
from IPython.display import display, HTML

test_dataset = newset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 1000},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])

    # Please change the following line to "accept_eula=True"
    finetuned_response = finetuned_predictor.predict(payload, custom_attributes="accept_eula=true")
    responses_after_finetuning.append(finetuned_response[0]["generation"])


try:
    for i, datapoint in enumerate(test_dataset.select(range(5))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
    display(HTML(df.to_html()))
except Exception as e:
    print(e)

Unnamed: 0,Inputs,Ground Truth,Response from fine-tuned model
0,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGenerate a recipe for the dish spaghetti met kerstomaten en basilicum\n\n### Input:\nspaghetti met kerstomaten en basilicum with spaghetti, kipfilet, kerstomaten, basilicum, margarine, knorr kruidenpasta spaghetti bolognese\n\n\n\n### Response:\n","Ingredients :\n 300.0 gram spaghetti, 300.0 gram kipfilet, 1.0 container kerstomaten, 0.5 bunch basilicum, 25.0 gram margarine, 1.0 container knorr kruidenpasta spaghetti bolognese\nSteps:\nkook de spaghetti beetgaar volgens de aanwijzingen op de verpakking . snijd de kipfilet in reepjes en halveer de kerstomaten . snijd de basilicum in reepjes . verhit de margarine en bak de kipreepjes goudbruin . bak de gehalveerde kerstomaten even mee . voeg 400 ml water, de kruidenpasta en de basilicum toe . breng de saus al roerend aan de kook en laat alles 2 3 minuten zachtjes doorkoken . serveer de spaghetti met de saus.","Ingredients :\n 300.0 gram spaghetti, 2.0 kipfilet, 2.0 kerstomaten, 6.0 bunch basilicum, 1.0 tablespoon margarine, 1.0 bouillonknorr kruidenpasta spaghetti bolognese\nSteps:\nkook de spaghetti volgens de bereidingswijze op de verpakking. verwarm de margarine in een hapje en bak hierin de kipfilets en de tomaten. bak ze in ca. 15 20 min. gaar en gaar. hak de basilicum fijn en snijd de spaghetti in dikke ringen. schep de spaghetti uiteen op vier eetvouwsallades, bak de pasta in ca. 6 8 min. gaar, bestrooi er dan de kipfilets, de kerstomaten en de basilicum doorheen en strooi ruim tussendoor wat peper en zout. giet ondertussen de pasta bolognese erruild door en eet heerlijke spaghetti met kerstomaten.\n\n"
1,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGenerate a recipe for the dish honey herb roasted chicken\n\n### Input:\nhoney herb roasted chicken juicy roast chicken with herb honey and white wine sauce . kosher, meat, rosh hashanah, passover, gluten free.\n\n\n\n### Response:\n","Ingredients :\n 3 1/2 4 lbs whole chicken without giblets, 1 small handful of fresh rosemary sprigs, 1 small handful of fresh thyme sprigs, peel from one small lemon, sliced, 2 garlic cloves, 2 large onions, peeled and sliced, salt and pepper, 1/4 cup olive oil, 3 tbsp honey, divided, 1/2 cup white wine, 1/4 cup chicken broth, fresh rosemary and thyme sprigs for garnish optional\nSteps:\npreheat oven to 400 degrees . whisk together 1/4 cup of olive oil, 1 tbsp of honey, and 1 tbsp fresh lemon juice . this is your basting mixture . assemble your chicken, herbs, lemon peel, garlic, and a few small slices of onion . season the cavity of the chicken with salt and pepper and brush some of the basting mixture inside the chicken go light on the salt if you're using a kosher chicken, which will already be salted . stuff the cavity with half of the fresh rosemary and thyme sprigs, sliced lemon peel, garlic cloves, and a few pieces of the sliced onion . don't overstuff the cavity pack it loosely with room to breathe . truss the chicken click here for instructions . season the chicken with salt and pepper . line the bottom of your roasting pan with foil . take remaining sliced onions and place them in an even layer on the bottom of your roasting pan . drizzle 2 tbsp of olive oil over the onions . remove leaves from the rest of the rosemary and thyme sprigs, then put the leaves into the roasting pan and discard stems . use a large spoon to toss and coat the onion slices with herbs and olive oil . place the chicken breast side down onto the bed of onions . pour half of the basting mixture over the chicken, using a brush to coat it evenly . cover the roasting pan with foil nonstick foil is best . pierce a few small vents into the outer edges of the foil with a knife . place covered roasting pan into the preheated oven and cook for 45 minutes . take roasting pan out of the oven, remove the foil reserve, and flip the chicken to breast side up . brush off any onions or herbs that cling to the top of the chicken . brush the rest of the basting mixture evenly onto the top of the chicken . cover the roasting pan again with the vented foil . place back into the oven for 45 minutes longer.remove foil from the top of the roasting pan and reduce oven heat to 375 degrees . let the chicken continue to roast another 20 30 minutes until the skin is brown and crisp . the skin may get quite dark in places this is from the honey in the basting liquid . watch the chicken carefully, as it can go from brown to black very quickly . blackened parts of the skin will not affect the flavor in a negative way . chicken is done when the skin is nicely browned, and the internal temperature reaches at least 170 degrees f as measured on a food thermometer at the thickest part of the thigh.take the chicken out of the roasting pan and place it on a carving board . cover it with foil to keep the heat in and let it rest for 15 minutes before carving . meanwhile, use a mesh strainer to strain the pan drippings from the roasting pan into a small saucepan . let the drippings settle for a moment you will see brown bits settle to the bottom of the pan and lighter fat and oil rise to the top . skim about half of the fat/oil from the top of the liquid . add 1/2 cup of white wine, 1/4 cup of chicken broth, and 2 tbsp of honey to the pan . whisk to blend . heat the sauce over medium to a light boil and let it simmer for about 2 minutes, whisking constantly.remove the honey sauce from heat and season with salt and pepper to taste . carve the chicken . drizzled each serving of meat with some of the honey sauce . garnish with fresh thyme or rosemary, if desired, for a pretty and aromatic presentation.","Ingredients :\n 4 lbs boneless chicken thighs, 2 tbsp butter, 1 tbsp olive oil, 1 whole garlic clove, 2 tsp salt, 1/4 tsp black pepper, 1/4 tsp ground coriander, 1/4 tsp dried thyme, 1/2 tsp poultry seasoning, 1 tbsp honey, 2 tsbsp white wine, 1 tbsp lemon juice, 1 tbsp fresh tarragon chopped, 4 sprigs fresh rosemary, 1 tbsp parsley, cooked rice or pasta\nSteps:\nin a large bowl, place the entire chicken thighs. in another bowl, add all ingredients. mix together until combined. pour into your bowl and cover. refrigerate for 20 24 hours. at least an hour before cooking, remove the herb mixture from the fridge. discard the marinade. set aside. preheat your oven to 400degf. roast chicken for about 45 60 minutes on 400 degf. depending on the size it will cook longer check halfway through and add an additional time. check internal temperature of the thighs with a meat thermometer for doneness. if you are serving skin on, cook until skin is golden brown. note if you like it slightly more cooked, increase oven to 425degf. take out of the oven and let rest for about 10 15 minutes before slicing. serve over rice, pasta or on its own. the leftovers will keep in the fridge for up to 7 days and in the freezer for up to 3 months.\n\n"
2,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGenerate a recipe for the dish fig and honey cocktail\n\n### Input:\nfig and honey cocktail recipe, perfect for serving at rosh hashanah, sukkot, or just because . includes recipe for fresh fig puree.\n\n\n\n### Response:\n","Ingredients :\n 2 cups fresh figs, rinsed and halved, 1 tbsp sugar, 2 tsp filtered water, 1 tsp freshly squeezed lemon juice, 1 1/2 tbsp fig puree, 1/2 tsp honey, 2 oz ginger ale, 1 1/2 oz vodka, 2 tsp freshly squeezed lemon juice, ice, 1 1/3 cups fig puree, 1/4 cup honey, 28 oz ginger ale, 21 oz vodka, 1/2 cup freshly squeezed lemon juice\nSteps:\ncombine ingredients in a blender and pulse until smooth . transfer to a container and refrigerate for 2 3 days, or use immediately to make cocktails . this recipe makes about 1 cup 8 oz . puree, which will make about 10 cocktails.","Ingredients :\n 2.0 fresh figs, 2.0 honey, 2.0 cup freshly squeezed orange juice, 2.0 cup club soda\nSteps:\nif using a store bought fig purée, skip this step. if not, see here for recipe for fresh fig purée. to make fresh fig purée, combine the fresh figs, 1 1/2 teaspoons honey, orange juice, and water. purée in a blender. in another large bowl, mix the freshly puréed figs with the club soda. place in your pitcher and serve chilled. note if you're only making a pitcher or two, feel free to purée just enough figs to yield the amount of fig purée you need. the recipe above is designed for a large batch. you can also make this cocktail with pre made fig purée. simply substitute a store bought fig purée for the fresh variety\n\n"
3,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGenerate a recipe for the dish spicy breakfast fajitas with eggs and guacamole\n\n### Input:\nspicy breakfast fajitas with eggs and guacamole is my not just delicious for breakfast, you can also serve them for lunch or dinner\n\n\n\n### Response:\n","Ingredients :\n quick guacamole, 6 ounces avocado, 1 tablespoon lime juice, kosher salt, veggies, 1 tablespoon extra virgin olive oil, 2 red, 1 /2 medium white onion, 1 teaspoon ground cumin, 1 /2 teaspoon chili powder, pinch of red pepper flakes, kosher salt, 2 cloves garlic, 2 teaspoons fresh lime juice, fajitas and garnishes, 6 corn tortillas, olive oil spray, 6 large eggs, 1 ounce crumbled feta or queso fresco cheese, handful of fresh cilantro, freshly ground black pepper, hot sauce and/or your favorite salsa\nSteps:\nusing a spoon, scoop the flesh of the avocados into a small bowl . add the lime juice and 1/4 teaspoon salt and mash with a pastry cutter, potato masher, or fork until the mixture is blended and no longer chunky . taste and add additional lime juice and/or salt, if necessary.","Ingredients :\n 6.0 eggs, 1.0 tablespoon olive oil, 0.2 teaspoon garlic powder, 1.0 bell pepper, 2.0 ounce olive oil spray, 0.2 teaspoon cumin, 2.0 ounce olive oil spray, 0.1 teaspoon garlic powder, 7.2 ounce cooked beef, 0.2 cup cheddar cheese, 1.0 tablespoon chopped cilantro, 3.0 tablespoon chopped onion, 0.5 cup guacamole\nSteps:\nheat olive oil in a large nonstick skillet over medium heat and cook eggs until cooked to your liking. top with chopped onion and cilantro before serving. spray a second large skillet with olive oil and cook the peppers and chorizo on medium high heat until browned and cooked through. spray the beans with olive oil while cooking to prevent sticking and cook with the other ingredients until heated through. top eggs with red peppers, chorizo and cheese. top with guacamole.\n\n"
4,"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGenerate a recipe for the dish tandoori met tilapia, garnalen en kokosrijst\n\n### Input:\ntandoori met tilapia, garnalen en kokosrijst with tilapiafilet, paprika, paprika, bladselderij, rijst, zout, conimex romige kokosmelk, knorr chicken tonight milde tandoori, zonnebloemolie, garnalen\n\n\n\n### Response:\n","Ingredients :\n 350.0 gram tilapiafilet, 1.0 paprika, 1.0 paprika, 1.0 bunch bladselderij, 400.0 gram rijst, zout, 1.0 packet conimex romige kokosmelk, 1.0 jar knorr chicken tonight milde tandoori, 2.0 tablespoon zonnebloemolie, 125.0 gram garnalen\nSteps:\n1. snij de tilapiafilet in stukken . maak de paprika's schoon en snijd deze in repen . hak de bladselderij fijn . kook de rijst gaar volgens de gebruiksaanwijzing in 0, 5 liter water met zout en de kokosmelk . 2. verhit de olie in een ruime koekepan en bak de tilapiafilet ongeveer 2 minuten op hoog vuur . voeg de paprika toe en roerbak nog 3 minuten . voeg de tandoori saus toe en verwarm het geheel 3 minuten op middelhoog vuur . meng de garnalen door de saus . 3. meng de bladselderij door de kokosrijst en serveer bij de tandoori.","Ingredients :\n 1.0 tilapiafilet, 1.0 paprika, 1.0 paprika, 2.0 tablespoon bladselderij, 1.0 package rijst, zout, 1.0 containers conimex romige kokosmelk, 1.0 bag knorr chicken tonight milde tandoori, zonnebloemolie, 350.0 gram garnalen\nSteps:\nschil de paprika, snijd het dekseltje af en de paprikapoot in boven de paprika in enkele ringen uitlekken. kook de 2 teugen rijst, zout en de kokosmelk goudblauw op. kook de tilapia in 3 4 minute goudblauw gaar. meng de tandoori en de knorr chicken tonight mix goed door elkaar. bestrooi de bodem van de hapjesbowl met zand. snijd de paprika en de bladselderij in reepjes. verdeel de paprika rondom de plaat. rijk afmeten de rijstmengsel. verdeel de tilapia over de rijst. bestrooi vervolgens de paprika en de bladselderij over het vlees. schep tandoori, helpt met het smaken. bak de tilapia 5 a 8 minuten goudblauw, met de koken in zicht. giet de inhoud van de canje af en strooi er in het vlees. serveer de tilapia direct uit de ovenschaal. laat er een vuur in het vuur komen. doe de garnalen in een diepe pan en zet het vuur hoog. kook nog even de garnalen aan gemaakt met 4 5 minuten. verwijder het uiteinde van een knorr kruidentje of cumin en bestrooi de garnalen er met. bak de garnalen nog even op. meng het afgegoten rijsttegel met de helft van de conimex romig kokosmelk, peper en zout, ruikt naar de garnalen en voeg er wat zonnemolentje bij. verdeel de garnalen over de rijstmix en voeg de rest van het kokosmelk er aan toe.\n\n"


In [52]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id)
pretrained_predictor = pretrained_model.deploy()

INFO:sagemaker.jumpstart:No instance type selected for inference hosting endpoint. Defaulting to ml.g5.2xlarge.
INFO:sagemaker.image_uris:Ignoring unnecessary Python version: py39.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: ml.g5.2xlarge.
INFO:sagemaker:Creating model with name: meta-textgeneration-llama-2-7b-2023-09-24-05-48-50-461
INFO:sagemaker:Creating endpoint-config with name meta-textgeneration-llama-2-7b-2023-09-24-05-48-50-555
INFO:sagemaker:Creating endpoint with name meta-textgeneration-llama-2-7b-2023-09-24-05-48-50-555


----------------!

In [59]:
def p2new(recipe):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction="Below is an instruction that describes a task, "+
            "paired with an input that provides further context. "+
            "Write a response that appropriately completes the request.\n"+
            "### Instruction:\nGenerate a recipe for the dish "+recipe,
            context=""
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 1000},
    }
    # Please change the following line to "accept_eula=True"
    finetuned_response = finetuned_predictor.predict(payload, custom_attributes="accept_eula=true")
    
    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=true"
    )
    print("finetuned response -> "+finetuned_response[0]['generation'])
    print("\n==================================\n")
    print("Pretrained response -> "+pretrained_response[0]['generation'])
    return finetuned_response
    

In [None]:

def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")
    pretrained_response = pretrained_predictor.predict(
        payload, custom_attributes="accept_eula=true"
    )

In [62]:
retval = p2new("Italian veg salad with extra hot spice")

finetuned response ->  Ingredients :
 2 tbsp extra virgin olive oil, 1 tsp balsamic, 1/4 cup white balsamic, 3/4 cup baby spinach, 1/4 cup red pepper strips, 1/4 cup cilantro
Steps:
toss in a large bowl, drizzle dressing and add salt and pepper to taste. garnish with crumbled blue cheese or toasted pine nuts




Pretrained response -> 



### Instruction:

Create a recipe for your favourite dish of the week. You can either write the recipe out at the end of this file, or by following the instructions at [Write Recipes](https://github.com/vant/vant-ant-design-starter/blob/master/recipe/recipe.md).

### Input:


### Response:

```jsx
---
author: 'Ant Design'
cover: ''
description: Write a recipe
instructions:
  - Include steps to follow to make the dish
  - Include one image per step of the dish if it calls for complex steps
  - The recipe should end with a call to action, encouraging your reader to go and try it! Don’t forget to link to your project.
keywords:
  - Recipes
learn:
  - Rec

In [65]:
pretrained_predictor.delete_endpoint()

INFO:sagemaker:Deleting endpoint configuration with name: meta-textgeneration-llama-2-7b-2023-09-24-05-48-50-555
INFO:sagemaker:Deleting endpoint with name: meta-textgeneration-llama-2-7b-2023-09-24-05-48-50-555


In [66]:
finetuned_predictor.delete_endpoint()

INFO:sagemaker:Deleting endpoint configuration with name: meta-textgeneration-llama-2-7b-2023-09-24-05-11-52-431
INFO:sagemaker:Deleting endpoint with name: meta-textgeneration-llama-2-7b-2023-09-24-05-11-52-431


### Clean up resources

In [None]:
# Delete resources
#pretrained_predictor.delete_model()
#pretrained_predictor.delete_endpoint()
#finetuned_predictor.delete_model()
#finetuned_predictor.delete_endpoint()

# Appendix

### 1. Supported Inference Parameters

---
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 


### Notes
- If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.
- In order to support a 4k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.

---

### 2. Dataset formatting instruction for training

---

####  Fine-tune the Model on a New Dataset
We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can easily switch to one of the training 
methods by specifying parameter `instruction_tuned` being 'True' or 'False'.


#### 2.1. Domain adaptation fine-tuning
The Text Generation model can also be fine-tuned on any domain specific dataset. After being fine-tuned on the domain specific dataset, the model
is expected to generate domain specific text and solve various NLP tasks in that specific domain with **few shot prompting**.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Each directory contains a CSV/JSON/TXT file. 
  - For CSV/JSON files, the train or validation data is used from the column called 'text' or the first column if no column called 'text' is found.
  - The number of files under train and validation (if provided) should equal to one, respectively. 
- **Output:** A trained model that can be deployed for inference. 

Below is an example of a TXT file for fine-tuning the Text Generation model. The TXT file is SEC filings of Amazon from year 2021 to 2022.

```Note About Forward-Looking Statements
This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions. Forward-looking
statements are based on current expectations and assumptions that are subject
to risks and uncertainties that may cause actual results to differ materially.
We describe risks and uncertainties that could cause actual results and events
to differ materially in “Risk Factors,” “Management’s Discussion and Analysis
of Financial Condition and Results of Operations,” and “Quantitative and
Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form
10-K). Readers are cautioned not to place undue reliance on forward-looking
statements, which speak only as of the date they are made. We undertake no
obligation to update or revise publicly any forward-looking statements,
whether because of new information, future events, or otherwise.
GENERAL
Embracing Our Future ...
```


#### 2.2. Instruction fine-tuning
The Text generation model can be instruction-tuned on any text data provided that the data 
is in the expected format. The instruction-tuned model can be further deployed for inference. 
Below are the instructions for how the training data should be formatted for input to the 
model.

Below are the instructions for how the training data should be formatted for input to the model.

- **Input:** A train and an optional validation directory. Train and validation directories should contain one or multiple JSON lines (`.jsonl`) formatted files. In particular, train directory can also contain an optional `*.json` file describing the input and output formats. 
  - The best model is selected according to the validation loss, calculated at the end of each epoch.
  If a validation set is not given, an (adjustable) percentage of the training data is
  automatically split and used for validation.
  - The training data must be formatted in a JSON lines (`.jsonl`) format, where each line is a dictionary
representing a single data sample. All training data must be in a single folder, however
it can be saved in multiple jsonl files. The `.jsonl` file extension is mandatory. The training
folder can also contain a `template.json` file describing the input and output formats. If no
template file is given, the following template will be used:
  ```json
  {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}",
    "completion": "{response}"
  }
  ```
  - In this case, the data in the JSON lines entries must include `instruction`, `context` and `response` fields. If a custom template is provided it must also use `prompt` and `completion` keys to define
  the input and output templates.
  Below is a sample custom template:

  ```json
  {
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}"
  }
  ```
Here, the data in the JSON lines entries must include `question`, `context` and `answer` fields. 
- **Output:** A trained model that can be deployed for inference. 

---

#### 2.3. Example fine-tuning with Domain-Adaptation dataset format
---
We provide a subset of SEC filings data of Amazon in domain adaptation dataset format. It is downloaded from publicly available [EDGAR](https://www.sec.gov/edgar/searchedgar/companysearch). Instruction of accessing the data is shown [here](https://www.sec.gov/os/accessing-edgar-data).

License: [Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/legalcode).

Please uncomment the following code to fine-tune the model on dataset in domain adaptation format.

---

In [None]:
# import boto3
# model_id = "meta-textgeneration-llama-2-7b"

# estimator = JumpStartEstimator(model_id=model_id,  environment={"accept_eula": "true"},instance_type = "ml.g5.24xlarge")
# estimator.set_hyperparameters(instruction_tuned="False", epoch="5")
# estimator.fit({"training": f"s3://jumpstart-cache-prod-{boto3.Session().region_name}/training-datasets/sec_amazon"})

### 3. Supported Hyper-parameters for fine-tuning
---
- epoch: The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default: 5
- learning_rate: The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default: 1e-4.
- instruction_tuned: Whether to instruction-train the model or not. Must be 'True' or 'False'. Default: 'False'
- per_device_train_batch_size: The batch size per GPU core/CPU for training. Must be a positive integer. Default: 4.
- per_device_eval_batch_size: The batch size per GPU core/CPU for evaluation. Must be a positive integer. Default: 1
- max_train_samples: For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of training samples. Must be a positive integer or -1. Default: -1. 
- max_val_samples: For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of validation samples. Must be a positive integer or -1. Default: -1. 
- max_input_length: Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default: -1. 
- validation_split_ratio: If validation channel is none, ratio of train-validation split from the train data. Must be between 0 and 1. Default: 0.2. 
- train_data_split_seed: If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default: 0.
- preprocessing_num_workers: The number of processes to use for the preprocessing. If None, main process is used for preprocessing. Default: "None"
- lora_r: Lora R. Must be a positive integer. Default: 8.
- lora_alpha: Lora Alpha. Must be a positive integer. Default: 32
- lora_dropout: Lora Dropout. must be a positive float between 0 and 1. Default: 0.05. 
- int8_quantization: If True, model is loaded with 8 bit precision for training. Default for 7B/13B: False. Default for 70B: True.
- enable_fsdp: If True, training uses Fully Sharded Data Parallelism. Default for 7B/13B: True. Default for 70B: False.

Note 1: int8_quantization is not supported with FSDP. Also, int8_quantization = 'False' and enable_fsdp = 'False' is not supported due to CUDA memory issues for any of the g5 family instances. Thus, we recommend setting exactly one of int8_quantization or enable_fsdp to be 'True'
Note 2: Due to the size of the model, 70B model can not be fine-tuned with enable_fsdp = 'True' for any of the supported instance types.

---

### 4. Supported Instance types

---
We have tested our scripts on the following instances types:

- 7B: ml.g5.12xlarge, nl.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 13B: ml.g5.24xlarge, ml.g5.48xlarge, ml.p3dn.24xlarge
- 70B: ml.g5.48xlarge

Other instance types may also work to fine-tune. Note: When using p3 instances, training will be done with 32 bit precision as bfloat16 is not supported on these instances. Thus, training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

---

### 5. Few notes about the fine-tuning method

---
- Fine-tuning scripts are based on [this repo](https://github.com/facebookresearch/llama-recipes/tree/main). 
- Instruction tuning dataset is first converted into domain adaptation dataset format before fine-tuning. 
- Fine-tuning scripts utilize Fully Sharded Data Parallel (FSDP) as well as Low Rank Adaptation (LoRA) method fine-tuning the models

---

### 6. Studio Kernel Dead/Creating JumpStart Model from the training Job
---
Due to the size of the Llama 70B model, training job may take several hours and the studio kernel may die during the training phase. However, during this time, training is still running in SageMaker. If this happens, you can still deploy the endpoint using the training job name with the following code:

How to find the training job name? Go to Console -> SageMaker -> Training -> Training Jobs -> Identify the training job name and substitute in the following cell. 

---

In [None]:
# from sagemaker.jumpstart.estimator import JumpStartEstimator
# training_job_name = <<training_job_name>>

# attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
# attached_estimator.logs()
# attached_estimator.deploy()