### Fine tuning Llama 3.2 1B base model on machine translation

In this notebook, we are fine tuning open sourced large language models, specifically Llama 3.2 1B model using QLoRA technique. The downstream task is machine translation from source to target language. The code is using huggingface libraries like `transformers`, `peft`, `datasets`, `bitsandbytes`, among others. I hope you enjoy!

#### Environment Setup

In [1]:
!pip install peft datasets bitsandbytes

Collecting peft
  Downloading peft-0.13.1-py3-none-any.whl.metadata (13 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Downloading peft-0.13.1-py3-none-any.whl (320 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.7/320.7 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hDownloading bitsandbytes-0.44.1-py3-none-manylinux_2_24_x86_64.whl (122.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: bitsandbytes, peft
Successfully installed bitsandbytes-0.44.1 peft-0.13.1


In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, Trainer, DataCollatorForLanguageModeling
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model
from huggingface_hub import login
from datasets import Dataset
import torch
import os

device = "cuda" if torch.cuda.is_available() else "cpu"
token = "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"
login(token)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


#### Initialize model and tokenizer

In [4]:
checkpoint = "meta-llama/Llama-3.2-1B"

# define qlora config using bitsandbytes
qlora_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type='nf4'
)

# initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(checkpoint, padding_side='left')

# initialize model with qlora config
model = AutoModelForCausalLM.from_pretrained(
    checkpoint,
    torch_dtype=torch.bfloat16,
    quantization_config=qlora_config
).to(device)

# add pad token to tokenizer if not already present
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({"pad_token": "[PAD]"})
    model.resize_token_embeddings(len(tokenizer))
    model.config.pad_token_id = tokenizer.pad_token_id
    model.generation_config.pad_token_id = tokenizer.pad_token_id

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

In [5]:
# check model parameters data type

for param in model.parameters():
    print(param.dtype)
    break

torch.bfloat16


#### Run inference before fine tuning

In [4]:
# inference on base model

inputs = ["hello, my name is nitish pandey", "hello"]
input_tok = tokenizer(inputs, padding=True, truncation=True, return_tensors="pt").to(device)
output = model.generate(**input_tok, max_length=300)
print(tokenizer.decode(output[1]))

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


KeyboardInterrupt: 

#### Create train and test dataset

In [5]:
source_lang = "English"
target_lang = "Spanish"
source_file_path = "/kaggle/input/en-es-machine-translation/Total_Data_en-es 5/Total_Data_en-es.en"
target_file_path = "/kaggle/input/en-es-machine-translation/Total_Data_en-es 5/Total_Data_en-es.es"

# processing the input sentences and tokenization
def process(examples):
    prompt_template = """{src_lang}:
{source_sentence}
{tgt_lang}:
{target_sentence}{end_of_seq_token}"""
    prompt_examples = [
        prompt_template.format(
            src_lang=source_lang,
            source_sentence=source_sentence.strip(),
            tgt_lang=target_lang,
            target_sentence=examples['target'][idx].strip(),
            end_of_seq_token=tokenizer.eos_token
        ) for idx, source_sentence in enumerate(examples['source'])
    ]
    tokenized_input = tokenizer(prompt_examples, padding=False, truncation=True)
    return tokenized_input

# input data
source_list = open(source_file_path).readlines()
target_list = open(target_file_path).readlines()

# dataset initialization
dataset = Dataset.from_dict({"source": source_list, "target": target_list})
dataset = dataset.map(process, batched=True)
dataset = dataset.remove_columns(["source", "target"])

# train test split
dataset = dataset.train_test_split(test_size=0.1, shuffle=True)
train_dataset = dataset['train']
test_dataset = dataset['test']

Map:   0%|          | 0/5480 [00:00<?, ? examples/s]

#### Initialize LoRA adapter and create peft model

In [6]:
# configure lora
config = LoraConfig(
    r=4,
    lora_alpha=8,
    lora_dropout=0.2
)

peft_model = get_peft_model(model, config, adapter_name="en-es-adapter-1")

#### Define training arguments and initialize trainer object

In [None]:
# configure training arguments
trainer_config = TrainingArguments(
    output_dir="/kaggle/working/",
    per_device_train_batch_size=4,
    torch_empty_cache_steps=2,
    num_train_epochs=3,
    optim="paged_adamw_32bit",
    eval_strategy="steps",
    eval_steps=200,
    eval_accumulation_steps=1,
    warmup_ratio=0.1,
    logging_steps=10
)

# initialize trainer
trainer = Trainer(
    model=peft_model,
    args=trainer_config,
    data_collator=DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False),
    train_dataset=train_dataset,
    eval_dataset=test_dataset
)

In [None]:
peft_model.print_trainable_parameters()

When we run `trainer.train()`, api token of Weights & Biases is asked for logging purposes. W&B allows you to see the training logs in real time and manage your training runs using a great UX. I like using it. 

In [9]:
# start fine tuning

trainer.train()

#### Run inference after fine tuning

In [20]:
inputs = ["""English:
Hello, my name is Nitish Pandey
Spanish:
""", """English:
There's a bird sitting on the branch of the tree
Spanish:
"""]
input_tok = tokenizer(inputs, padding=True, truncation=True, return_tensors="pt").to(device)
output = model.generate(**input_tok, max_length=300)
print(tokenizer.decode(output[0]))

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[PAD][PAD]<|begin_of_text|>English:
Hello, my name is Nitish Pandey
Spanish:
Hola, mi nombre es Nitish Pandey<|end_of_text|><|end_of_text|><|end_of_text|><|end_of_text|><|end_of_text|>


In [14]:
print(tokenizer.decode(output[1]))

<|begin_of_text|>English:
There's a bird sitting on the branch of the tree
Spanish:
Hay una paloma en el árbol<|end_of_text|>


In [None]:
print('hello')