Copyright (c) Meta Platforms, Inc. and affiliates.
This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.

<a href="https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/finetuning/quickstart_peft_finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PEFT Finetuning Quick Start Notebook

This notebook shows how to train a Meta Llama 3 model on a single GPU (e.g. A10 with 24GB) using int8 quantization and LoRA finetuning.

**_Note:_** To run this notebook on a machine with less than 24GB VRAM (e.g. T4 with 16GB) the context length of the training dataset needs to be adapted.
We do this based on the available VRAM during execution.
If you run into OOM issues try to further lower the value of train_config.context_length.

### Step 0: Install pre-requirements and convert checkpoint

We need to have llama-recipes and its dependencies installed for this notebook. Additionally, we need to log in with the huggingface_cli and make sure that the account is able to to access the Meta Llama weights.

In [1]:
# uncomment if running from Colab T4
# ! pip install llama-recipes ipywidgets

# import huggingface_hub
# huggingface_hub.login()

In [2]:
# from google.colab import userdata

In [3]:
%pip install python-dotenv

Collecting python-dotenv
  Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1
Note: you may need to restart the kernel to use updated packages.


In [4]:
import os
from dotenv import load_dotenv

In [5]:
load_dotenv()

True

In [6]:
HF_TOKEN = os.getenv("HF_TOKEN")

In [7]:
# Google Colab Huggingface login
# !huggingface-cli login --token {userdata.get('HF_TOKEN')}
!huggingface-cli login --token {HF_TOKEN}

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
The token `NLP` has been saved to /Users/arnav/.cache/huggingface/stored_tokens
Your token has been saved to /Users/arnav/.cache/huggingface/token
Login successful.
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


In [8]:
%pip install -r requirements_peft.txt

Collecting llama-recipes (from -r requirements_peft.txt (line 1))
  Using cached llama_recipes-0.0.4.post1-py3-none-any.whl.metadata (12 kB)
Collecting torch (from -r requirements_peft.txt (line 3))
  Downloading torch-2.2.2-cp311-none-macosx_10_9_x86_64.whl.metadata (25 kB)
Collecting accelerate (from llama-recipes->-r requirements_peft.txt (line 1))
  Using cached accelerate-1.1.1-py3-none-any.whl.metadata (19 kB)
Collecting appdirs (from llama-recipes->-r requirements_peft.txt (line 1))
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting bitsandbytes (from llama-recipes->-r requirements_peft.txt (line 1))
  Using cached bitsandbytes-0.42.0-py3-none-any.whl.metadata (9.9 kB)
Collecting black (from llama-recipes->-r requirements_peft.txt (line 1))
  Downloading black-24.10.0-cp311-cp311-macosx_10_9_x86_64.whl.metadata (79 kB)
Collecting codeshield (from llama-recipes->-r requirements_peft.txt (line 1))
  Using cached codeshield-1.0.1-py3-none-any.whl.metadata

### Step 1: Load the model

Setup training configuration and load the model and tokenizer.

In [None]:
import torch
from transformers import LlamaForCausalLM, AutoTokenizer
from llama_recipes.configs import train_config as TRAIN_CONFIG

train_config = TRAIN_CONFIG()
train_config.model_name = "meta-llama/Meta-Llama-3.1-8B"
train_config.num_epochs = 1
train_config.run_validation = False
train_config.gradient_accumulation_steps = 4
train_config.batch_size_training = 1
train_config.lr = 3e-4
train_config.use_fast_kernels = True
train_config.use_fp16 = True
train_config.context_length = 1024 if torch.cuda.is_available() and torch.cuda.get_device_properties(0).total_memory < 16e9 else 2048 # T4 16GB or A10 24GB
train_config.batching_strategy = "packing"
train_config.output_dir = "llama-8b-peft-nofuneval"
train_config.use_peft = True

from transformers import BitsAndBytesConfig
config = BitsAndBytesConfig(
    load_in_8bit=True,
)

model = LlamaForCausalLM.from_pretrained(
            train_config.model_name,
            device_map="auto",
            quantization_config=config,
            use_cache=False,
            attn_implementation="sdpa" if train_config.use_fast_kernels else None,
            torch_dtype=torch.float16,
        )

tokenizer = AutoTokenizer.from_pretrained(train_config.model_name)
tokenizer.pad_token = tokenizer.eos_token

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3.1-8B.
403 Client Error. (Request ID: Root=1-6740f8cd-011828e33829a21146ad1cbf;5e9700e5-c90f-404f-95ad-22802627afa7)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/resolve/main/config.json.
Your request to access model meta-llama/Llama-3.1-8B has been rejected by the repo's authors.

### Step 2: Check base model

Run the base model on an example input:

In [None]:
eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) 
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.inference_mode():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) 
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a p

We can see that the base model only repeats the conversation.

### Step 3: Load the preprocessed dataset

We load and preprocess the samsum dataset which consists of curated pairs of dialogs and their summarization:

In [None]:
# !pip install --upgrade llama_recipes

Collecting llama_recipes
  Downloading llama_recipes-0.0.4.post1-py3-none-any.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m0m
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.2.0-cp39-cp39-macosx_10_9_x86_64.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting sentence-transformers
  Downloading sentence_transformers-3.3.1-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting openai
  Downloading openai-1.55.0-py3-none-any.whl (389 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m389.5/389.5 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting pyyaml==6.0.1
  Downloading PyYAML-6.0.1-cp39-cp39-mac

In [None]:
!python3.11 -m pip install --upgrade pip

Collecting pip
  Using cached pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Using cached pip-24.3.1-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.2
    Uninstalling pip-24.2:
      Successfully uninstalled pip-24.2
Successfully installed pip-24.3.1


In [None]:
# %/opt/local/bin/python3.12 -m pip install ipykernel -U --user --force-reinstall

: 

In [None]:
# %pip install faiss-gpu

: 

In [None]:
# !yes | git clone https://github.com/meta-llama/llama-recipes.git
# !cd llama-recipes
# # uninstall llama-recipes and reinstall it
# !pip uninstall -y llama-recipes
# !cd llama-recipes && ls && pip install -U pip setuptools && pip install -e .

[0mCODE_OF_CONDUCT.md   dev_requirements.txt requirements.txt
CONTRIBUTING.md      [34mdocs[m[m                 [34msrc[m[m
README.md            pyproject.toml       [34mtools[m[m
UPDATES.md           [34mrecipes[m[m
Obtaining file:///Users/arnav/Documents/Cornell/CS6158/6158_final_project/llama-recipes
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
Collecting evaluate (from llama-recipes==0.0.4.post1)
  Using cached evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
INFO: pip is looking at multiple versions of llama-recipes to determine which version is compatible with other requirements. This could take a while.
[31mERROR: Could not find a version that satisfies the requirement faiss-gpu; python_version < "3.11" (from

In [None]:
# !pip install --upgrade llama_recipes

In [10]:
from datasets import load_dataset, DatasetDict, Dataset
from llama_recipes.utils.dataset_utils import get_dataloader
import json

In [None]:
# from datasets import load_dataset, DatasetDict, Dataset
# from llama_recipes.utils.dataset_utils import get_dataloader
# import json

In [None]:
!curl -X GET \
     "https://datasets-server.huggingface.co/first-rows?dataset=ManavSinghal157%2FNoFunEval&config=default&split=resource_util" > resource_util.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  181k  100  181k    0     0   654k      0 --:--:-- --:--:-- --:--:--  656k


In [None]:
# print(raw_data)

['  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current', '                                 Dload  Upload   Total   Spent    Left  Speed', '', '  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"dataset":"ManavSinghal157/NoFunEval","config":"default","split":"resource_util","features":[{"feature_idx":0,"name":"non_functional_requirement","type":{"dtype":"string","_type":"Value"}},{"feature_idx":1,"name":"commit","type":{"dtype":"string","_type":"Value"}},{"feature_idx":2,"name":"commit_message","type":{"dtype":"string","_type":"Value"}},{"feature_idx":3,"name":"source_code","type":{"dtype":"string","_type":"Value"}},{"feature_idx":4,"name":"target_code","type":{"dtype":"string","_type":"Value"}},{"feature_idx":5,"name":"pl","type":{"dtype":"string","_type":"Value"}},{"feature_idx":6,"name":"chain_of_thought","type":{"dtype":"string","_type":"Value"}},{"feature_idx":7,"name":"one_shot","type":{"dtype":"string","_type":"Value"}},{"fe

In [None]:
# load every line except the last one in the jsonl file "resource_util.jsonl" into a list of json objects
with open("resource_util.jsonl") as f:
    raw_data = [json.loads(line) for line in f.readlines() if line]
    # lines = f.readlines()
    # raw_data = json.loads(lines[0])

print(raw_data[:5])
memory_data = [data for data in raw_data if data["non_functional_requirement"] == "memory"]
print(len(memory_data))
print(memory_data[:5])

# dataset = Dataset.from_dict(memory_data)

memory_input_output = [{"input_text": data["base_prompt"], "output_text": data["target_code"]} for data in memory_data]

print(memory_input_output)


[{'non_functional_requirement': 'memory', 'commit': 'https://github.com/aaronjwood/PortAuthority/commit/b311d3140b4845eb56d20a40c4577f14c05d2404', 'commit_message': '\'\\\\"Avoid potential resource leak\\\\n\\\\"\'', 'source_code': 'package com.aaronjwood.portauthority.network;\n\nimport android.app.Activity;\nimport android.database.Cursor;\n\nimport com.aaronjwood.portauthority.async.ScanPortsAsyncTask;\nimport com.aaronjwood.portauthority.db.Database;\nimport com.aaronjwood.portauthority.response.HostAsyncResponse;\n\npublic class Host {\n\n    /**\n     * Starts a port scan\n     *\n     * @param ip        IP address\n     * @param startPort The port to start scanning at\n     * @param stopPort  The port to stop scanning at\n     * @param delegate  Delegate to be called when the port scan has finished\n     */\n    public void scanPorts(String ip, int startPort, int stopPort, HostAsyncResponse delegate) {\n        new ScanPortsAsyncTask(delegate).execute(ip, startPort, stopPort);\n

In [None]:
data = json.load(open("resource_util.json"))

In [None]:
rows = list(row['row'] for row in data["rows"])

In [None]:
# pretty print the first row
print(json.dumps(rows[0], indent=4))

{
    "non_functional_requirement": "memory",
    "commit": "https://github.com/aaronjwood/PortAuthority/commit/b311d3140b4845eb56d20a40c4577f14c05d2404",
    "commit_message": "'\\\\\"Avoid potential resource leak\\\\n\\\\\"'",
    "source_code": "package com.aaronjwood.portauthority.network;\n\nimport android.app.Activity;\nimport android.database.Cursor;\n\nimport com.aaronjwood.portauthority.async.ScanPortsAsyncTask;\nimport com.aaronjwood.portauthority.db.Database;\nimport com.aaronjwood.portauthority.response.HostAsyncResponse;\n\npublic class Host {\n\n    /**\n     * Starts a port scan\n     *\n     * @param ip        IP address\n     * @param startPort The port to start scanning at\n     * @param stopPort  The port to stop scanning at\n     * @param delegate  Delegate to be called when the port scan has finished\n     */\n    public void scanPorts(String ip, int startPort, int stopPort, HostAsyncResponse delegate) {\n        new ScanPortsAsyncTask(delegate).execute(ip, startPo

In [None]:
print(len(rows))

10


In [None]:
# load as an appropriate format where "base_prompt" is the input and "target_code" is the output
dataset = Dataset.from_dict({"input_text": [row['base_prompt'] for row in rows], "output_text": [row['target_code'] for row in rows]})

In [None]:
print(dataset)

Dataset({
    features: ['input_text', 'output_text'],
    num_rows: 10
})


In [None]:
data = load_dataset("json", data_files="resource_util.json")

NameError: name 'load_dataset' is not defined

In [None]:
%pip install --upgrade llama_recipes

Collecting llama_recipes
  Using cached llama_recipes-0.0.4.post1-py3-none-any.whl.metadata (12 kB)
INFO: pip is looking at multiple versions of llama-recipes to determine which version is compatible with other requirements. This could take a while.
  Using cached llama_recipes-0.0.4-py3-none-any.whl.metadata (12 kB)
Note: you may need to restart the kernel to use updated packages.


In [11]:
from llama_recipes.configs.datasets import samsum_dataset
from llama_recipes.utils.dataset_utils import get_dataloader

samsum_dataset.trust_remote_code = True

train_dataloader = get_dataloader(tokenizer, samsum_dataset, train_config)
eval_dataloader = get_dataloader(tokenizer, samsum_dataset, train_config, "val")

NameError: name 'tokenizer' is not defined

### Step 4: Prepare model for PEFT

Let's prepare the model for Parameter Efficient Fine Tuning (PEFT):

In [None]:
from peft import get_peft_model, prepare_model_for_kbit_training, LoraConfig
from dataclasses import asdict
from llama_recipes.configs import lora_config as LORA_CONFIG

lora_config = LORA_CONFIG()
lora_config.r = 8
lora_config.lora_alpha = 32
lora_dropout: float=0.01

peft_config = LoraConfig(**asdict(lora_config))

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

### Step 5: Fine tune the model

Here, we fine tune the model for a single epoch.

In [None]:
import torch.optim as optim
from llama_recipes.utils.train_utils import train
from torch.optim.lr_scheduler import StepLR

model.train()

optimizer = optim.AdamW(
            model.parameters(),
            lr=train_config.lr,
            weight_decay=train_config.weight_decay,
        )
scheduler = StepLR(optimizer, step_size=1, gamma=train_config.gamma)

# Start the training process
results = train(
    model,
    train_dataloader,
    eval_dataloader,
    tokenizer,
    optimizer,
    scheduler,
    train_config.gradient_accumulation_steps,
    train_config,
    None,
    None,
    None,
    wandb_run=None,
)

  scaler = torch.cuda.amp.GradScaler()
Training Epoch: 1:   0%|[34m          [0m| 0/319 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
  with autocast():
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
Training Epoch: 1/1, step 1278/1279 completed (loss: 0.28094857931137085): : 320it [2:08:50, 24.16s/it]                      4.21s/it]  


Max CUDA memory allocated was 15 GB
Max CUDA memory reserved was 16 GB
Peak active CUDA memory was 15 GB
CUDA Malloc retries : 0
CPU Total Peak Memory consumed during the train (max): 2 GB
Epoch 1: train_perplexity=1.3404, train_epoch_loss=0.2930, epoch time 7730.981359725998s


### Step 6:
Save model checkpoint

In [None]:
model.save_pretrained(train_config.output_dir)

### Step 7:
Try the fine tuned model on the same example again to see the learning progress:

In [None]:
model.eval()
with torch.inference_mode():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) 
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
A wants to get a puppy for his son. A took him to the animal shelter last Monday and he showed A one he really liked. A wants to get him one of those little dogs. A and B agre