<a href="https://colab.research.google.com/github/omkarwazulkar/GoogleColab/blob/main/FineTune-Llama3-1B-Chat-LoRA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -U trl bitsandbytes transformers accelerate datasets

## **Dataset**

In [2]:
system_message = """You are Llama, an AI assistant created by Omkar to be helpful and honest. Your knowledge spans a wide range of topics, allowing you to engage in substantive conversations and provide analysis on complex subjects."""

from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset("HuggingFaceH4/no_robots")

columns_to_remove = [
    c for c in dataset["train"].column_names if c != "messages"
]

def create_conversation(sample):
    if sample["messages"][0]["role"] == "system":
        return {"messages": sample["messages"]}
    else:
        return {
            "messages": [{"role": "system", "content": system_message}]
            + sample["messages"]
        }

dataset = dataset.map(
    create_conversation,
    remove_columns=columns_to_remove,
)

dataset["train"] = dataset["train"].filter(
    lambda x: len(x["messages"][1:]) % 2 == 0
)
dataset["test"] = dataset["test"].filter(
    lambda x: len(x["messages"][1:]) % 2 == 0
)

train_dataset = dataset["train"]
test_dataset = dataset["test"]

LLAMA_3_CHAT_TEMPLATE = (
    "{% for message in messages %}"
    "{% if message['role'] == 'system' %}"
    "{{ message['content'] }}"
    "{% elif message['role'] == 'user' %}"
    "{{ '\\n\\nHuman: ' + message['content'] + eos_token }}"
    "{% elif message['role'] == 'assistant' %}"
    "{{ '\\n\\nAssistant: ' + message['content'] + eos_token }}"
    "{% endif %}"
    "{% endfor %}"
)

tokenizer = AutoTokenizer.from_pretrained(
    "meta-llama/Llama-3.2-1B",
    use_fast=True,
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.chat_template = LLAMA_3_CHAT_TEMPLATE

def template_dataset(example):
    return {
        "text": tokenizer.apply_chat_template(
            example["messages"],
            tokenize=False,
            add_generation_prompt=False,
        )
    }

train_dataset = train_dataset.map(
    template_dataset, remove_columns=["messages"]
)
test_dataset = test_dataset.map(
    template_dataset, remove_columns=["messages"]
)

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/10.5M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/571k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/9500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Filter:   0%|          | 0/9500 [00:00<?, ? examples/s]

Filter:   0%|          | 0/500 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

Map:   0%|          | 0/9485 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [3]:
print(train_dataset[1]["text"])

You are Llama, an AI assistant created by Omkar to be helpful and honest. Your knowledge spans a wide range of topics, allowing you to engage in substantive conversations and provide analysis on complex subjects.

Human: Help write a letter of 100 -200 words to my future self for Kyra, reflecting on her goals and aspirations.<|end_of_text|>

Assistant: Dear Future Self,

I hope you're happy and proud of what you've achieved. As I write this, I'm excited to think about our goals and how far you've come. One goal was to be a machine learning engineer. I hope you've worked hard and become skilled in this field. Keep learning and innovating. Traveling was important to us. I hope you've seen different places and enjoyed the beauty of our world. Remember the memories and lessons. Starting a family mattered to us. If you have kids, treasure every moment. Be patient, loving, and grateful for your family.

Take care of yourself. Rest, reflect, and cherish the time you spend with loved ones. Rem

## **Model**

In [4]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "meta-llama/Llama-3.2-1B"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token


In [5]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.float16,
    use_cache=False,
)

model.gradient_checkpointing_enable()

config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

In [6]:
from peft import LoraConfig

peft_config = LoraConfig(
    r=16,
    lora_alpha=8,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules="all-linear",
)


In [8]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="/content/llama3.2-1b-lora",
    num_train_epochs=1,
    per_device_train_batch_size=2,   # lower since no quantization
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=4,   # effective batch = 8
    learning_rate=2e-4,
    lr_scheduler_type="constant",
    logging_steps=10,
    save_strategy="epoch",
    max_grad_norm=0.3,
    warmup_ratio=0.03,

    # üîí T4-safe
    fp16=True,
    bf16=False,
    tf32=False,

    gradient_checkpointing=True,
    optim="adamw_torch",
    report_to="tensorboard",
)

In [12]:
from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    processing_class=tokenizer,
    peft_config=peft_config,
)

trainer.model.print_trainable_parameters()

Adding EOS to train dataset:   0%|          | 0/9485 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/9485 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/9485 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/500 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/500 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/500 [00:00<?, ? examples/s]

The model is already on multiple devices. Skipping the move to device specified in `args`.


trainable params: 11,272,192 || all params: 1,247,086,592 || trainable%: 0.9039


In [13]:
trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 128001}.


Step,Training Loss
10,2.3182
20,1.9615
30,2.0003
40,1.9343
50,1.9616
60,1.9142
70,1.7603
80,1.9418
90,1.8606
100,1.9958


TrainOutput(global_step=1186, training_loss=1.864907952223419, metrics={'train_runtime': 2582.1637, 'train_samples_per_second': 3.673, 'train_steps_per_second': 0.459, 'total_flos': 2.3326142166196224e+16, 'train_loss': 1.864907952223419, 'entropy': 1.879511190497357, 'num_tokens': 2965531.0, 'mean_token_accuracy': 0.5793458041937455, 'epoch': 1.0})

In [15]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

model_id = "meta-llama/Llama-3.2-1B"
lora_path = "/content/llama3.2-1b-lora/checkpoint-1186"  # where trainer.save_model() saved

LLAMA_3_CHAT_TEMPLATE = (
    "{% for message in messages %}"
    "{% if message['role'] == 'system' %}"
    "{{ message['content'] }}"
    "{% elif message['role'] == 'user' %}"
    "{{ '\\n\\nHuman: ' + message['content'] + eos_token }}"
    "{% elif message['role'] == 'assistant' %}"
    "{{ '\\n\\nAssistant: ' + message['content'] + eos_token }}"
    "{% endif %}"
    "{% endfor %}"
    "{% if add_generation_prompt %}"
    "{{ '\\n\\nAssistant: ' }}"
    "{% endif %}"
)

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.chat_template = LLAMA_3_CHAT_TEMPLATE


In [None]:
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(
    base_model,
    lora_path,
)

model.eval()


In [17]:
messages = [
    {
        "role": "system",
        "content": "You are Llama, an AI assistant created by Omkar to be helpful and honest."
    },
    {
        "role": "user",
        "content": "Explain gradient checkpointing in simple terms."
    }
]


In [18]:
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)


In [19]:
prompt

'You are Llama, an AI assistant created by Omkar to be helpful and honest.\n\nHuman: Explain gradient checkpointing in simple terms.<|end_of_text|>\n\nAssistant: '

In [20]:
inputs = tokenizer(
    prompt,
    return_tensors="pt",
).to(model.device)


In [21]:
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        eos_token_id=tokenizer.eos_token_id,
    )


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [22]:
response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
)

print(response)

1) The term "gradient checkpointing" refers to a method of training a neural network where the model is first trained on a large dataset (e.g., millions of images), then given a new image it has never seen before, and its predictions for that image are made with the model's weights from the previous training phase still intact.
2) This allows the network to quickly adapt to new data without having to retrain the whole thing again. 
3) Gradient checkpointing can also reduce overfitting by only updating parameters during training, while leaving other components unchanged. 

In practice, this means that when you train your model using gradient checkpointing, you'll have a smaller final model after training because some of the older information will not be used anymore. However, if you use gradient checkpointing early on in your training process, your model should be more robust against unseen inputs later down the line. 



In [23]:
merged = model.merge_and_unload()

In [24]:
!huggingface-cli login



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: fineGrained).
The token `HFT` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenti

In [25]:
hf_repo = "omkarwazulkar/Llama-3.2-1B-LoRA-HuggingFaceH4"

In [26]:
from peft import PeftModel
from transformers import AutoModelForCausalLM

model_id = "meta-llama/Llama-3.2-1B"
lora_path = "/content/llama3.2-1b-lora/checkpoint-1186"

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
)

model = PeftModel.from_pretrained(base_model, lora_path)

model.push_to_hub(hf_repo)
tokenizer.push_to_hub(hf_repo)

`torch_dtype` is deprecated! Use `dtype` instead!


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...adapter_model.safetensors:   0%|          | 23.3kB / 45.1MB            

README.md: 0.00B [00:00, ?B/s]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...mpikst2t7s/tokenizer.json: 100%|##########| 17.2MB / 17.2MB            

CommitInfo(commit_url='https://huggingface.co/omkarwazulkar/Llama-3.2-1B-LoRA-HuggingFaceH4/commit/8be620e411a2343e413b3dcd02f641636b6dffc3', commit_message='Upload tokenizer', commit_description='', oid='8be620e411a2343e413b3dcd02f641636b6dffc3', pr_url=None, repo_url=RepoUrl('https://huggingface.co/omkarwazulkar/Llama-3.2-1B-LoRA-HuggingFaceH4', endpoint='https://huggingface.co', repo_type='model', repo_id='omkarwazulkar/Llama-3.2-1B-LoRA-HuggingFaceH4'), pr_revision=None, pr_num=None)