### Gemma 2 QLoRA Adapter Training Script

This script uses HF Transformers and PEFT to train QLoRA adapters for Gemma 2

Script Overview:
- Loads Base Gemma 2 model with double quantisation
- Loads finetuning datasets from teh HF Hub
- Initialises new PEFT Model for training using QLoRA adapters
- Configures Supervised Finetuning Trainer
- Trains QLoRA Adapter and saves to the HF Hub

The adapters produced by this script are held on the HF Hub and downloadable at https://huggingface.co/sbhikha

List of trained adapters:
- Working adapters:
    - ENG Adapter: "sbhikha/Gemma9B_Inkuba_LoRa_eng_v1"
    - XHO Adapter: "sbhikha/Gemma9B_Inkuba_LoRa_xho_v1"
    - ZUL Adapter" "sbhikha/Gemma9B_Inkuba_LoRa_zul_v1"
    - COMBINED Adapter: "sbhikha/Gemma9B_Inkuba_LoRa_combined_v1"
- Failed:
    - sbhikha/Gemma9B_Inkuba_LoRA_adapter_v1 (Trained on Sliced Datasets)
    - sbhikha/Gemma9B_Inkuba_LoRA_eng_v2
    - sbhikha/Gemma9B_Inkuba_LoRA_eng_v3
    - sbhikha/Gemma9B_Inkuba_LoRA_zul_v3
    - sbhikha/Gemma9B_Inkuba_LoRA_xho_v4
    - sbhikha/Gemma9B_Inkuba_LoRA_combined_v5
    - sbhikha/Gemma9B_Inkuba_LoRA_combined_v4

### Configuration

In [1]:
# Check GPU
! nvidia-smi

Thu Feb 27 07:34:01 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.94                 Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3080      WDDM  |   00000000:01:00.0  On |                  N/A |
| 58%   44C    P8             55W /  370W |    1462MiB /  10240MiB |     10%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
# # Install required packages
# ! pip install -U bitsandbytes
# ! pip install huggingface_hub
# ! pip install transformers
# ! pip install pandas
# ! pip install accelerate
# ! pip install --upgrade torch torchvision torchaudio
# ! pip install ipython
# ! pip install peft
# ! pip install trl
# ! pip install datasets
# ! pip install flash-attn --no-build-isolation
# ! pip install -U wandb

In [5]:
# Log into HF Hub
from huggingface_hub import login
import os

HF_TOKEN = os.getenv("HF_TOKEN")

login(token=HF_TOKEN)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to C:\Users\Sandil\.cache\huggingface\token
Login successful


In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
from accelerate import init_empty_weights, load_checkpoint_and_dispatch

# print(torch.cuda.device_count())
# print(torch.cuda.get_device_name(0))

### Load Gemma 2 Base

In [4]:
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

double_quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    quantization_config=double_quant_config,
    device_map="auto",
    attn_implementation='eager'
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [5]:
prompt = "Translate the following [SENTENCE] from {SRC_LANG} to {TGT_LANG}. Return only the [TRANSLATION] in {TGT_LANG}. [SENTENCE] {SRC_LANG}: {SRC_SENT}. [TRANSLATION] {TGT_LANG}:"
prompt_2 = "<|system|> You are a machine translation assistant. Your task is to translate the user's input sentence from {SRC_LANG} into {TGT_LANG}.<|end|> <|user|> {SRC_SENT} <|end|> <|assistant|>"
tgt_lang = "English"
src_lang = "isiZulu"
src_sent = "Izibalo zangaphakathi: Isidingo sezibalo sizokwanda njengoba kugxila intandoyeningi"

input_ids = tokenizer(prompt_2.format(SRC_LANG=src_lang, TGT_LANG=tgt_lang, SRC_SENT=src_sent), return_tensors="pt").to("cuda:0")

outputs = model.generate(**input_ids, max_new_tokens=200)
translation = tokenizer.decode(outputs[0])
print(translation)



<bos><|system|> You are a machine translation assistant. Your task is to translate the user's input sentence from isiZulu into English.<|end|> <|user|> Izibalo zangaphakathi: Isidingo sezibalo sizokwanda njengoba kugxila intandoyeningi <|end|> <|assistant|> Internal calculations: The calculation process will increase as the complexity grows. <|end|><end_of_turn><eos>


### Initialise Weights & Biases
- Used for keeping track of training statistics

In [6]:
import wandb

wb_token = 'acd48cc467b99d27cd51968cc0ece2ceb9dde7f1'

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune Gemma-2-9b-it on Inkuba Eng (V2)', 
    job_type="training", 
    anonymous="allow"
)

[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33msandil-bhikha[0m ([33msandil-bhikha-university-of-liverpool[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/ubuntu/.netrc


### Load Finetuning Data

In [13]:
from datasets import load_dataset, concatenate_datasets

DATA = "sbhikha/Inkuba_{lang}_{set}_instruction_tuning"

eng_train = load_dataset(DATA.format(lang="english", set="train"))
eng_dev = load_dataset(DATA.format(lang="english", set="dev"))

zul_train = load_dataset(DATA.format(lang="isizulu", set="train"))
zul_dev = load_dataset(DATA.format(lang="isizulu", set="dev"))

xho_train = load_dataset(DATA.format(lang="xhosa", set="train"))
xho_dev = load_dataset(DATA.format(lang="xhosa", set="dev"))

train_data = concatenate_datasets([eng_train["train"], zul_train["train"], xho_train["train"]])
dev_data = concatenate_datasets([eng_dev["train"], zul_dev["train"], xho_dev["train"]])

In [21]:
train_data
# dev_data

Dataset({
    features: ['text_input', 'output', '__index_level_0__'],
    num_rows: 58851
})

### PEFT (QLoRa) Config

In [22]:
# Load PEFT and prepare model
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

import bitsandbytes as bnb

# Find trainable modules
def find_all_linear_names(model):
  cls = bnb.nn.Linear4bit 
  lora_module_names = set()
  for name, module in model.named_modules():
    if isinstance(module, cls):
      names = name.split('.')
      lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names: # needed for 16-bit
      lora_module_names.remove('lm_head')
  return list(lora_module_names)

In [23]:
# Print list of trainable modules
modules = find_all_linear_names(model)
print(modules)

['v_proj', 'k_proj', 'gate_proj', 'down_proj', 'q_proj', 'o_proj', 'up_proj']


In [25]:
from peft import LoraConfig, get_peft_model

# Configure LoRa hyperparameters
lora_config = LoraConfig(
    r=64,
    lora_alpha=32,
    target_modules=modules,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Show PEFT Model
model = get_peft_model(model, lora_config)

In [26]:
# Print adapter % of model (trainable parameters)
trainable, total = model.get_nb_trainable_parameters()
print(f"Trainable: {trainable} | total: {total} | Percentage: {trainable/total*100:.4f}%")

Trainable: 216072192 | total: 9457778176 | Percentage: 2.2846%


### Configure Trainer

In [27]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

folder = "Gemma9B_QloRA_Adapter"

from trl import SFTTrainer

tokenizer.pad_token = tokenizer.eos_token
torch.cuda.empty_cache()

# Create Supervised Fine Tuning Job
trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=dev_data,
    dataset_text_field="text_input",
    peft_config=lora_config,
    args=TrainingArguments(             # Hyperparameters
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=3,
        max_steps=100,
        learning_rate=2e-4,
        logging_steps=1,
        output_dir="Gemma9B_QLoRA Adapter",
        optim="paged_adamw_8bit",
        save_strategy="epoch",
        eval_strategy="epoch",
        report_to="wandb"
    ),
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/58851 [00:00<?, ? examples/s]

Map:   0%|          | 0/50475 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


### Train Model

In [29]:
train_result = trainer.train()

Epoch,Training Loss,Validation Loss
0,0.8516,0.943896


In [30]:
model.push_to_hub("sbhikha/Gemma9B_Inkuba_LoRa_new")
wandb.finish()
model.config.use_cache = True

adapter_model.safetensors:   0%|          | 0.00/864M [00:00<?, ?B/s]

VBox(children=(Label(value='0.020 MB of 0.020 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
eval/loss,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁▁▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇████
train/global_step,▁▁▁▁▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇█████
train/grad_norm,▇█▃▄▂▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/learning_rate,▃████▇▇▇▇▆▆▆▆▆▆▅▅▅▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁▁
train/loss,███▄▂▂▂▂▁▁▁▂▁▂▁▁▁▂▂▁▂▂▁▁▁▂▂▁▁▁▁▂▁▁▁▁▁▁▁▁

0,1
eval/loss,0.9439
eval/runtime,4397.3963
eval/samples_per_second,11.478
eval/steps_per_second,1.435
total_flos,5950794723643392.0
train/epoch,0.02719
train/global_step,100.0
train/grad_norm,0.40107
train/learning_rate,0.0
train/loss,0.8516
