<a href="https://colab.research.google.com/github/wandb/edu/blob/main/llm-training-course/colab/finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<!--- @wandbcode{llmtraining-colab} -->

# Training a 3B Llama on instruction dataset with Weights & Biases, HuggingFace, LoRA and Quantization

Tested on Google Colab V100 GPU. Check out [W&B HuggingFace documentation](https://docs.wandb.ai/guides/integrations/huggingface) for more details.

In [1]:
!python -m pip install -U wandb transformers trl accelerate peft bitsandbytes datasets evaluate sentencepiece "protobuf==3.20.3"



In [2]:
!wget https://github.com/wandb/edu/raw/main/llm-training-course/colab/utils.py

--2025-05-11 15:30:48--  https://github.com/wandb/edu/raw/main/llm-training-course/colab/utils.py
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/wandb/edu/main/llm-training-course/colab/utils.py [following]
--2025-05-11 15:30:49--  https://raw.githubusercontent.com/wandb/edu/main/llm-training-course/colab/utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8155 (8.0K) [text/plain]
Saving to: ‘utils.py.2’


2025-05-11 15:30:49 (102 MB/s) - ‘utils.py.2’ saved [8155/8155]



Let's grab the Alpaca (GPT-4 curated instructions and outputs) dataset:

In [3]:
import wandb

wandb.init(
    project="alpaca_ft",  # the project I am working on
    job_type="train",
    tags=["hf_sft_lora", "3b"],
)  # the Hyperparameters I want to keep track of
artifact = wandb.use_artifact("capecape/alpaca_ft/alpaca_gpt4_splitted:v4", type="dataset")
artifact_dir = artifact.download()

[34m[1mwandb[0m: Currently logged in as: [33mvivirox[0m ([33mgem-city-xyz[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: \ 1 of 2 files downloaded...[34m[1mwandb[0m:   2 of 2 files downloaded.  


In [4]:
from datasets import load_dataset

alpaca_ds = load_dataset("json", data_dir=artifact_dir)

In [5]:
# Let's subsample the training and test dataset - you may want to switch to full dataset in your experiments
alpaca_ds["train"] = alpaca_ds["train"].select(range(512))
alpaca_ds["test"] = alpaca_ds["test"].select(range(10))

Let's log the dataset also as a table so we can inspect it on the workspace.

In [6]:
def prompt_no_input(row):
    return (
        "Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n{instruction}\n\n### Response:\n{output}"
    ).format_map(row)


def prompt_input(row):
    return (
        "Below is an instruction that describes a task, paired with an input that provides further context. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n{output}"
    ).format_map(row)


def create_prompt(row):
    return prompt_no_input(row) if row["input"] == "" else prompt_input(row)

In [7]:
train_dataset = alpaca_ds["train"]
eval_dataset = alpaca_ds["test"]

In [8]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

In [9]:
model_id = "openlm-research/open_llama_3b_v2"

Let's define our configurations for LoRA, quantization and model training so that it fits on our GPU.

In [10]:
from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=64,  # the rank of the LoRA matrices
    lora_alpha=16,  # the weight
    lora_dropout=0.1,  # dropout to add to the LoRA layers
    bias="none",  # add bias to the nn.Linear layers?
    task_type="CAUSAL_LM",
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
    ],  # the name of the layers to add LoRA
)

In [11]:
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

In [12]:
model_kwargs = dict(
    device_map={"": 0},
    trust_remote_code=True,
    # low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    # use_flash_attention_2=True,
    use_cache=False,
    quantization_config=bnb_config,
)

In [13]:
batch_size = 2
gradient_accumulation_steps = 16
num_train_epochs = 1

We'll add a `report_to="wandb"` flag here to get the benefits of [W&B HuggingFace integration](https://docs.wandb.ai/guides/integrations/huggingface).

In [14]:
import os
from transformers import TrainingArguments
from trl import SFTTrainer  # Ensure SFTTrainer is imported if you are using it

# Explicitly set WANDB_LOG_MODEL to "end" to log the model at the end of training.
os.environ["WANDB_LOG_MODEL"] = "false"

output_dir = "./output/"

training_args = TrainingArguments(
    run_name="alpaca-3b-finetune-run",  # Fixes the run_name warning
    num_train_epochs=num_train_epochs,
    output_dir=output_dir,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    gradient_accumulation_steps=gradient_accumulation_steps,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs=dict(
        use_reentrant=False
    ),  # Make sure this is compatible with your library versions
    eval_strategy="epoch",
    logging_strategy="steps",
    logging_steps=1,
    save_strategy="epoch",
    report_to="wandb",
)

In [15]:
from utils import LLMSampleCB

# Remove model_init_kwargs and pass the model configuration directly to the model argument
trainer = SFTTrainer(
    model=AutoModelForCausalLM.from_pretrained(model_id, **model_kwargs),  # Pass model_kwargs here
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    # 'packing_strategy' is replaced with 'packing'
    # packing="padding",  # Changed 'packing_strategy' to 'packing' #This is incorrect and needs to be removed.
    # max_seq_length=1024, # Remove this line, the parameter is no longer supported
    args=training_args,
    formatting_func=create_prompt,
    peft_config=peft_config,
)

# Now you can access trainer's attributes
trainer.processing_class.model_max_length = (
    1024  # This is where the max_seq_length should be set instead
)

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggin

In [16]:
# remove answers
def create_prompt_no_anwer(row):
    row["output"] = ""
    return {"text": create_prompt(row)}


test_dataset = eval_dataset.map(create_prompt_no_anwer)

We will add a custom W&B callback to the trainer so that we can sample and log model generations in W&B dashboard. Review [W&B HuggingFace documentation](https://docs.wandb.ai/guides/integrations/huggingface) for the most up-to-date best practices.

In [17]:
wandb_callback = LLMSampleCB(trainer, test_dataset, num_samples=10, max_new_tokens=256)

Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


In [18]:
trainer.add_callback(wandb_callback)

It's time to train!

In [21]:
!pip install --upgrade transformers



In [22]:
trainer.train()
wandb.finish()

Epoch,Training Loss,Validation Loss
1,1.3598,1.411796


  0%|          | 0/10 [00:00<?, ?it/s]

[34m[1mwandb[0m: Adding directory to artifact (./output/checkpoint-16)... Done. 1.0s


AttributeError: 'str' object has no attribute 'is_enabled'

Check out the sample generations in your W&B workspace. We've trained on a very small sample of dataset, so likely they won't be good. Try to improve this result! Train on a larger dataset, experiment with different hyperparameters and settings. Then share a W&B report with your results. Good luck!