Copyright (c) Meta Platforms, Inc. and affiliates.
This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.

## Quick Start Notebook

This notebook shows how to train a Llama 2 model on a single GPU (e.g. A10 with 24GB) using int8 quantization and LoRA.

### Step 0: Install pre-requirements and convert checkpoint

The example uses the Hugging Face trainer and model which means that the checkpoint has to be converted from its original format into the dedicated Hugging Face format.
The conversion can be achieved by running the `convert_llama_weights_to_hf.py` script provided with the transformer package.
Given that the original checkpoint resides under `models/7B` we can install all requirements and convert the checkpoint with:

In [None]:
!ls

FinalDataset.csv  llama_recipes      __MACOSX	models_hf.zip  tmp
folder.zip	  llama_recipes.zip  models_hf	sample_data


In [1]:
%%bash
pip install transformers datasets accelerate sentencepiece protobuf==3.20 py7zr scipy peft bitsandbytes fire torch_tb_profiler ipywidgets
#TRANSFORM=`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`
#python ${TRANSFORM} --input_dir models --model_size 7B --output_dir models_hf/7B

Collecting protobuf==3.20
  Downloading protobuf-3.20.0-py2.py3-none-any.whl (162 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m162.1/162.1 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting py7zr
  Obtaining dependency information for py7zr from https://files.pythonhosted.org/packages/2c/da/155bb1f692c067b9213c9c7b8c19a012a65027399606d623a25dfb1d3af1/py7zr-0.20.6-py3-none-any.whl.metadata
  Downloading py7zr-0.20.6-py3-none-any.whl.metadata (16 kB)
Collecting torch_tb_profiler
  Downloading torch_tb_profiler-0.4.1-py3-none-any.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting ipywidgets
  Obtaining dependency information for ipywidgets from https://files.pythonhosted.org/packages/b8/d4/ce436660098b2f456e2b8fdf76d4f33cbc3766c874c4aa2f772c7a5e943f/ipywidgets-8.1.0-py3-none-any.whl.metadata
  Downloading ipywidgets-8.1.0-py

### Step 1: Load the model

Point model_id to model weight folder

In [None]:
# !unzip models_hf.zip

In [None]:
# Use a GCP bucket to load Llama model weights if needed
# !curl https://sdk.cloud.google.com | bash
# !gcloud init
# !gsutil cp gs://adv-llama/models_hf.zip  .
# !unzip models_hf.zip

In [2]:
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

model_id="./models_hf/7B"

tokenizer = LlamaTokenizer.from_pretrained(model_id)

  warn("The installed version of bitsandbytes was compiled without GPU support. "


'NoneType' object has no attribute 'cadam32bit_grad_fp32'


HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './models_hf/7B'. Use `repo_type` argument if needed.

In [None]:
model =LlamaForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map='auto', torch_dtype=torch.float16)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Step 2: Load the preprocessed dataset

We load and preprocess the samsum dataset which consists of curated pairs of dialogs and their summarization:

In [None]:
# !unzip folder.zip

In [None]:
#!zip llama_recipes.zip -r llama_recipes

updating: llama_recipes/ (stored 0%)
updating: llama_recipes/multi_node.slurm (deflated 40%)
updating: llama_recipes/configs/ (stored 0%)
updating: llama_recipes/configs/peft.py (deflated 47%)
updating: llama_recipes/configs/fsdp.py (deflated 46%)
updating: llama_recipes/configs/__pycache__/ (stored 0%)
updating: llama_recipes/configs/__pycache__/peft.cpython-310.pyc (deflated 46%)
updating: llama_recipes/configs/__pycache__/training.cpython-310.pyc (deflated 41%)
updating: llama_recipes/configs/__pycache__/datasets.cpython-310.pyc (deflated 54%)
updating: llama_recipes/configs/__pycache__/fsdp.cpython-310.pyc (deflated 35%)
updating: llama_recipes/configs/__pycache__/__init__.cpython-310.pyc (deflated 29%)
updating: llama_recipes/configs/training.py (deflated 52%)
updating: llama_recipes/configs/__init__.py (deflated 36%)
updating: llama_recipes/configs/datasets.py (deflated 68%)
updating: llama_recipes/docs/ (stored 0%)
updating: llama_recipes/docs/FAQ.md (deflated 54%)
updating: lla

In [None]:
from pathlib import Path
import os
import sys
from utils.dataset_utils import get_preprocessed_dataset
from configs.datasets import whatsapp_dataset

train_dataset = get_preprocessed_dataset(tokenizer, whatsapp_dataset, 'train')


### Step 3: Check base model

Run the base model on an example input:

In [None]:
eval_prompt = """
Reply to the following messages as the user Advaith. Provide just one reply, do not continue the conversation
User (Ritu): Hello.
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100,do_sample=True)[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



Reply to the following messages as the user Advaith. Provide just one reply, do not continue the conversation
User (Ritu): Hello.

Advaith: Hey Ritu! What's up?


We can see that the base model only repeats the conversation.

### Step 4: Prepare model for PEFT

Let's prepare the model for Parameter Efficient Fine Tuning (PEFT):

In [None]:
model.train()

def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_int8_training,
    )

    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules = ["q_proj", "v_proj"]
    )

    # prepare int-8 model for training
    model = prepare_model_for_int8_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model, peft_config

# create peft config
model, lora_config = create_peft_config(model)





trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199


### Step 5: Define an optional profiler

In [None]:
from transformers import TrainerCallback
from contextlib import nullcontext
enable_profiler = False
output_dir = "tmp/llama-output"

config = {
    'lora_config': lora_config,
    'learning_rate': 3e-5,
    'num_train_epochs': 1,
    'gradient_accumulation_steps': 2,
    'per_device_train_batch_size': 2,
    'gradient_checkpointing': False,
}

# Set up profiler
if enable_profiler:
    wait, warmup, active, repeat = 1, 1, 2, 1
    total_steps = (wait + warmup + active) * (1 + repeat)
    schedule =  torch.profiler.schedule(wait=wait, warmup=warmup, active=active, repeat=repeat)
    profiler = torch.profiler.profile(
        schedule=schedule,
        on_trace_ready=torch.profiler.tensorboard_trace_handler(f"{output_dir}/logs/tensorboard"),
        record_shapes=True,
        profile_memory=True,
        with_stack=True)

    class ProfilerCallback(TrainerCallback):
        def __init__(self, profiler):
            self.profiler = profiler

        def on_step_end(self, *args, **kwargs):
            self.profiler.step()

    profiler_callback = ProfilerCallback(profiler)
else:
    profiler = nullcontext()

### Step 6: Fine tune the model

Here, we fine tune the model for a single epoch which takes a bit more than an hour on a A100.

In [None]:
from transformers import default_data_collator, Trainer, TrainingArguments



# Define training args
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    bf16=False,  # Use BF16 if available
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="no",
    optim="adamw_torch_fused",
    max_steps=total_steps if enable_profiler else -1,
    **{k:v for k,v in config.items() if k != 'lora_config'}
)

with profiler:
    # Create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        data_collator=default_data_collator,
        callbacks=[profiler_callback] if enable_profiler else [],
    )

    # Start training
    trainer.train()

Step,Training Loss
10,0.7903
20,0.8164
30,0.8188
40,0.8063
50,0.8143
60,0.7829
70,0.7991
80,0.8033
90,0.7944
100,0.8122


### Step 7:
Save model checkpoint

In [None]:
output_dir

'tmp/llama-output'

In [None]:
model.save_pretrained(output_dir)

### Step 8:
Try the fine tuned model on the same example again to see the learning progress:

In [None]:
🔊🔊🔊 Welcome to the Turing test! 📜

Can you distinguish your friend Advaith 👨🏽 from an AI Model 🤖? Take this test to find out! 🧪

You can ask me 3️⃣ questions!. Both Advaith and the AI bot will answer each question. 🤔

🪧 Example question: 
You: What's your favourite dish?
Candidate A: Curd rice
Candidate B: Pasta

After 3 such questions, you need to guess which candidate was Advaith 👨🏽, and which one was the AI bot🦙!

⚠️ Other rules: 
1. No questions about events after training cut-off date (Sept 2022) ❌
2. No factual questions about yourself or Advaith (ex. What is Advaith's last name?) ❌

Ask your first question below. 👇🏽 Best of luck!
-------------------------------


In [104]:
!zip -r all_docs.zip . -x "models_hf/*" -x "*.zip"

  adding: .config/ (stored 0%)
  adding: .config/configurations/ (stored 0%)
  adding: .config/configurations/config_default (deflated 15%)
  adding: .config/configurations/config_adv-llama (deflated 14%)
  adding: .config/.last_survey_prompt.yaml (stored 0%)
  adding: .config/default_configs.db (deflated 98%)
  adding: .config/gce (stored 0%)
  adding: .config/config_sentinel (stored 0%)
  adding: .config/active_config (stored 0%)
  adding: .config/.last_opt_in_prompt.yaml (stored 0%)
  adding: .config/logs/ (stored 0%)
  adding: .config/logs/2023.08.03/ (stored 0%)
  adding: .config/logs/2023.08.03/13.44.48.234496.log (deflated 56%)
  adding: .config/logs/2023.08.03/13.43.26.810905.log (deflated 91%)
  adding: .config/logs/2023.08.03/13.43.52.311313.log (deflated 58%)
  adding: .config/logs/2023.08.03/13.44.47.473323.log (deflated 57%)
  adding: .config/logs/2023.08.03/13.44.15.972512.log (deflated 86%)
  adding: .config/logs/2023.08.03/13.44.23.058287.log (deflated 57%)
  adding: .c

In [3]:
eval_prompt = """
Reply to the following messages as the user Advaith. Provide just one reply, do not continue the conversation
User (John): Who are you? :P
Advaith:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=250,do_sample=True)[0], skip_special_tokens=True))


NameError: name 'tokenizer' is not defined