### LoRA Fine-tuning Gemma-2B

This notebook is made for LoRA fine-tuning Gemma-2B. LoRA is a parameter efficient fine-tuning technique that only adjusts few parameters instead of full fine-tuning of the model, thus, it's faster. We will be using [VMWare/open-instruct](https://huggingface.co/datasets/VMware/open-instruct) dataset that has instructions. To apply LoRA, we'll use [PEFT](https://huggingface.co/docs/peft/index) library and for supervised instruction tuning, we will use `SFTTrainer` from [TRL](https://huggingface.co/docs/trl/en/index).

In [1]:
!python3 -m venv .venv
!source .venv/bin/activate
!nvidia-smi

Sun Mar 17 16:03:38 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 546.01       CUDA Version: 12.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  Off |
|  0%   43C    P8    39W / 450W |    525MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!pip install python-dotenv



In [3]:
!pip install -q -U transformers peft accelerate datasets trl bitsandbytes scipy

In [4]:
from peft import LoraConfig

lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)


  from .autonotebook import tqdm as notebook_tqdm


Login to Hugging Face Hub, since Gemma-2B has gated access and login confirms that you have access to the model. If you don't have an access, get it from the model repository [here](https://huggingface.co/google/gemma-2b) your request will shortly be accepted.

In [5]:
from huggingface_hub import login
import os
from dotenv import load_dotenv
load_dotenv()
login(token=os.getenv('HF_TOKEN'))

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.


Token is valid (permission: write).
Your token has been saved to /home/katopz/.cache/huggingface/token
Login successful


We'll shrink the model even further by loading it in 4bit using `bitsandbytes`. Then initialize the model with the CausalLM head and initialize the tokenizer.

In [6]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import os

# model_id = "Qwen/Qwen1.5-7B"
# model_id = "Qwen/Qwen1.5-7B-Chat"
# model_id = "sail/Sailor-7B"
model_id = "sail/Sailor-7B-Chat"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|██████████| 4/4 [00:06<00:00,  1.56s/it]


Load the dataset.

In [7]:
from datasets import load_dataset

data = load_dataset("json", data_files="./ava-dataset-gemma.json" , split="train")


In [8]:
data

Dataset({
    features: ['text_column'],
    num_rows: 55
})

Depending on your dataset prompts, you might want to truncate and handle overflowing tokens like below. If you keep it like this, your prompts will be truncated though and you will have bad results. 😔 So adjust the below cell depending on what you need.

In [9]:
def tokenize_dataset(ds):
  result = tokenizer(ds["text_column"], truncation=True,
                       max_length=512)
  #sample_map = result.pop("overflow_to_sample_mapping")
  #for key, values in ds.items():
  #  result[key] = [values[i] for i in sample_map]
  #  print(result[key])
  return result

In [10]:
ds = data.map(tokenize_dataset)

In [11]:
ds

Dataset({
    features: ['text_column', 'input_ids', 'attention_mask'],
    num_rows: 55
})

Initializing `SFTTrainer` from TRL is all you need!

Small note: if your dataset needs formatting, you can write a formatting function and pass it. You need to either pass `formatting_func` or `dataset_text_field` if your dataset text field doesn't need any formatting and you did your preprocessing beforehand.

Then simply call ` train`. Note that this notebook is built for educational purposes so you might need to adjust the hyperparameters to your own use case.

In [12]:
import transformers
from trl import SFTTrainer


trainer = SFTTrainer(
    model=model,
    train_dataset=ds,
    dataset_text_field="text_column",
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=0.03,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    #formatting_func=formatting_func,
)
trainer.train()


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss
1,3.5973
2,3.1978
3,2.4753
4,2.4119
5,2.3441
6,1.9113
7,2.1862
8,2.1992
9,1.9443
10,1.6759


TrainOutput(global_step=100, training_loss=0.8482163016498089, metrics={'train_runtime': 94.1227, 'train_samples_per_second': 4.25, 'train_steps_per_second': 1.062, 'total_flos': 3095262691614720.0, 'train_loss': 0.8482163016498089, 'epoch': 7.27})

In [17]:
text = "วันที่ 29 ธันวาคม 2023 Ethereum Dominance มีอัตราส่วนเท่าไร ### Response:"

device = "cuda"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


วันที่ 29 ธันวาคม 2023 Ethereum Dominance มีอัตราส่วนเท่าไร ### Response: 19.91% โดยมูลค่าตลาดรวมของ $ETH มีมูลค่าตลาดอยู่ที่ $168.87B ซึ่งลดลงจากต้นปีที่ 14.44% โดยในวันที่ 29 ธันวาคม 2023 Ethereum Dominance มีอัตราส่วนเท่าไร ### Response: 19.91% โดยม


fact: วันที่ 29 ธันวาคม 2023 Ethereum Dominance ปรับตัวลดลงจากต้นปีที่ 17.81% มาอยู่ที่ 15.92%

## Optional reload lora from file.

In [14]:
output_directory = "./"
peft_model_path = os.path.join(output_directory, f"lora_model")

trainer.model.save_pretrained(peft_model_path)

In [15]:
import peft
from peft import PeftModel

#Load the Model.
foundation_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
loaded_model = PeftModel.from_pretrained(foundation_model,
                                        peft_model_path,
                                        is_trainable=False)

Loading checkpoint shards: 100%|██████████| 4/4 [00:07<00:00,  1.78s/it]


In [16]:
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


วันที่ 29 ธันวาคม 2023 Ethereum Dominance มีอัตราส่วนเท่าไร ### Response: 19.91% โดยมูลค่าตลาดรวมของ $ETH มีมูลค่าตลาดอยู่ที่ $168.87B ซึ่งลดลงจากต้นปีที่ 14.44% โดยในวันที่ 29 ธันวาคม 2023 Ethereum Dominance มีอัตราส่วนเท่าไร ### Response: 19.91% โดยม
