<a href="https://colab.research.google.com/github/mshojaei77/Awesome-Fine-tuning/blob/main/Gemma_Fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### LoRA Fine-tuning Gemma-2B

This notebook is made for LoRA fine-tuning Gemma-2B. LoRA is a parameter efficient fine-tuning technique that only adjusts few parameters instead of full fine-tuning of the model, thus, it's faster. We will be using [VMWare/open-instruct](https://huggingface.co/datasets/VMware/open-instruct) dataset that has instructions. To apply LoRA, we'll use [PEFT](https://huggingface.co/docs/peft/index) library and for supervised instruction tuning, we will use `SFTTrainer` from [TRL](https://huggingface.co/docs/trl/en/index).

In [None]:
!pip install -q -U transformers peft accelerate datasets trl bitsandbytes

In [None]:
from peft import LoraConfig

lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM",
)


Login to Hugging Face Hub, since Gemma-2B has gated access and login confirms that you have access to the model. If you don't have an access, get it from the model repository [here](https://huggingface.co/google/gemma-2b) your request will shortly be accepted.

In [None]:
from huggingface_hub import notebook_login
notebook_login()

We'll shrink the model even further by loading it in 4bit using `bitsandbytes`. Then initialize the model with the CausalLM head and initialize the tokenizer.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import os

model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})


Load the dataset.

In [None]:
from datasets import load_dataset

data = load_dataset("VMware/open-instruct", split="train")


Concat Alpaca prompt with responses.

In [None]:
texts = []
for prompt, response in zip(data["alpaca_prompt"], data["response"]):
  text = prompt + response
  texts.append(text)

Remove unnecessary columns.

In [None]:
data = data.remove_columns(["source", "alpaca_prompt", "response", "task_name", "template_type", "instruction"])

Add the concatenated column back.

In [None]:
data = data.add_column("text_column", texts)

Depending on your dataset prompts, you might want to truncate and handle overflowing tokens like below. If you keep it like this, your prompts will be truncated though and you will have bad results. 😔 So adjust the below cell depending on what you need.

In [None]:
def tokenize_dataset(ds):
  result = tokenizer(ds["text_column"],truncation=True,
                       max_length=512)
  #sample_map = result.pop("overflow_to_sample_mapping")
  #for key, values in ds.items():
  #  result[key] = [values[i] for i in sample_map]
  #  print(result[key])
  return result

In [None]:
ds = data.map(tokenize_dataset)

In [None]:
ds

Initializing `SFTTrainer` from TRL is all you need!

Small note: if your dataset needs formatting, you can write a formatting function and pass it. You need to either pass `formatting_func` or `dataset_text_field` if your dataset text field doesn't need any formatting and you did your preprocessing beforehand.

Then simply call ` train`. Note that this notebook is built for educational purposes so you might need to adjust the hyperparameters to your own use case.

In [None]:
import transformers
from trl import SFTTrainer


trainer = SFTTrainer(
    model=model,
    train_dataset=ds,
    dataset_text_field="text_column",
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=30,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    #formatting_func=formatting_func,
)
trainer.train()


In [None]:
text = "Write a news style post about a fake event, like aliens from Mars landing on Earth. It is meant to be funny but also be written in the authoritative style of a news report, kind of like The Onion. ### Response:"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)

In [None]:
print(tokenizer.decode(outputs[0], skip_special_tokens=True))