# Fine Tuning Gemma with PEFT and QLora on your laptop 

## Overview

This tutorial can run on  of your laptop with NVIDIA GPU. 
you should install CUDA 12.3，Pycharm/VSode and PyTorch 2.1.2 beforehand.

Dataset: argilla/databricks-dolly-15k-curated-en

## Setup
### download gemma-2b model from huggingface
[https://huggingface.co/google/gemma-2b/tree/main](https://huggingface.co/google/gemma-2b/tree/main)
Note: I don't like the cache model mechanism of huggingface, 

### Configure your wandb key

To use wandb to monitor, you must provide wandb API key. you can apply API key from [https://wandb.ai](https://wandb.ai)

### Set environment variables

Set environment variables for `wandb`

In [None]:
#install the required dependencies
!pip3 install -q -U python-dotenv
!pip3 install -q -U https://github.com/jakaline-dev/bitsandbytes_win/releases/download/0.42.0/bitsandbytes-0.42.0-cp311-cp311-win_amd64.whl
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.39.0.dev0

Collecting git+https://****@github.com/huggingface/new-model-addition-golden-gate@add-golden-gate
  Cloning https://****@github.com/huggingface/new-model-addition-golden-gate (to revision add-golden-gate) to /tmp/pip-req-build-8jci0sy8
  Running command git clone --filter=blob:none --quiet 'https://****@github.com/huggingface/new-model-addition-golden-gate' /tmp/pip-req-build-8jci0sy8
  Running command git checkout -b add-golden-gate --track origin/add-golden-gate
  Switched to a new branch 'add-golden-gate'
  Branch 'add-golden-gate' set up to track remote branch 'add-golden-gate' from 'origin'.
  Resolved https://****@github.com/huggingface/new-model-addition-golden-gate to commit e9d36beb5fcafeb2ac327a68eee82009d24cb58f
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [3]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GemmaTokenizer

#I don't like huggingface cache model mechanism, so i download gemma model to local
# use base model from local path
#if you want to use huggingface model directly, uncomment the following code
#base_model_path="google/gemma-2b
base_model_path = "c:/ai/models/gemma"


# Load tokenizer and model with QLoRA configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16
)

tokenizer = GemmaTokenizer.from_pretrained(base_model_path)

#using low_cpu_mem_usage since model is quantized
model = AutoModelForCausalLM.from_pretrained(base_model_path,quantization_config=bnb_config,low_cpu_mem_usage=True)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [4]:
# just to test the base model response

text = "Instruction: Can you explain contrastive learning in machine learning in simple terms for someone new to the field of ML?\n Response:"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

### USER: Can you explain contrastive learning in machine learning in simple terms for someone new to the field of ML?### Assistant: Sure, here's a simple explanation of contrastive learning in machine learning:

**Contrastive learning** is a type of learning algorithm that focuses on learning relationships between data points rather than learning the features of individual points. It's like learning the similarities and differences between different items.

Here's an analogy: Imagine you have two similar pictures of the same object. You want to teach a machine to recognize that they are similar. Instead of focusing on the specific features of each picture, contrastive learning would learn the similarities between the pictures as a whole.

**Here are the key concepts of contrastive learning:**

* **Positive and negative pairs:** The algorithm learns from pairs of data points, where one pair is similar and the other pair is dissimilar.
* **Contrastive loss:** This loss function measures

In [6]:
#Configure Lora

from peft import LoraConfig

# LoRA attention dimension
lora_r = 8

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

lora_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    bias= "none",
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    task_type="CAUSAL_LM"
)

In [57]:
#Load argilla/databricks-dolly-15k-curated-en dataset
from datasets import load_dataset

dataset = load_dataset("argilla/databricks-dolly-15k-curated-en")

dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'category', 'original-instruction', 'original-context', 'original-response', 'new-instruction', 'new-context', 'new-response', 'external_id'],
        num_rows: 15015
    })
})

In [90]:
#just try small dataset 
from datasets import Dataset
data = []
for d in dataset['train']:
    # Filter out examples with context, to keep it simple.
    if d["original-context"]:
       continue
    # Format the entire example as a single string.
    #print("original-context:"+d['original-context'])
    template = "Instruction:\n{instruction}\n{context}\n\nResponse:\n{response}".format(instruction=d['new-instruction']['value'],context=d['new-context']['value'],response=d['new-response']['value'])
    #print(d['new-instruction']['value'])
    #print(template)
    #break
    data.append({"text":template})

# Only use 1000 training examples, to keep it fast.
train_dataset = Dataset.from_list(data)


In [91]:
train_dataset

Dataset({
    features: ['text'],
    num_rows: 10417
})

In [92]:
import transformers
import os
import  wandb
from dotenv import find_dotenv,load_dotenv
from trl import SFTTrainer

env =load_dotenv(find_dotenv())

wandb.login(key=os.environ['wandb'])

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    dataset_text_field="text",
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=2000,
        learning_rate=5e-5,
        fp16=True,
        max_grad_norm=0.3,
        gradient_checkpointing=True,
        group_by_length=True,
        optim="paged_adamw_32bit",
        lr_scheduler_type="cosine",
        logging_steps=10,
        output_dir="outputs",
        report_to="wandb"
    ),
    peft_config=lora_config,
    #formatting_func=formatting_func,
)
trainer.train()

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: C:\Users\kevon\.netrc


Map:   0%|          | 0/10417 [00:00<?, ? examples/s]



VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011277777777286247, max=1.0…

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss




TrainOutput(global_step=2000, training_loss=1.7383782362937927, metrics={'train_runtime': 5357.9932, 'train_samples_per_second': 1.493, 'train_steps_per_second': 0.373, 'total_flos': 4.18592474370048e+16, 'train_loss': 1.7383782362937927, 'epoch': 0.77})

In [94]:
text = "Instruction: Can you explain contrastive learning in machine learning in simple terms for someone new to the field of ML?\n\n Response:"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Instruction: Can you explain contrastive learning in machine learning in simple terms for someone new to the field of ML?

 Response: Contrastive learning is a type of machine learning that is used to learn from unlabeled data. It is a type of learning that is used to learn from unlabeled data. The idea is to learn from the similarities and differences between pairs of data points.


In [95]:
trainer.model.save_pretrained("gemma-a")
tokenizer.save_pretrained("gemma-a")



('gemma-a\\tokenizer_config.json',
 'gemma-a\\special_tokens_map.json',
 'gemma-a\\tokenizer.model',
 'gemma-a\\added_tokens.json')

In [96]:
from peft import PeftModel
base_model_path ="c:/ai/models/gemma"
new_model ="lion-gemma"
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    return_dict=True,
    torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_path)

# Merge base model with the adapter
model = PeftModel.from_pretrained(base_model, "gemma-a")
model = model.merge_and_unload()

# Save model and tokenizer
model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

('lion-gemma\\tokenizer_config.json',
 'lion-gemma\\special_tokens_map.json',
 'lion-gemma\\tokenizer.model',
 'lion-gemma\\added_tokens.json',
 'lion-gemma\\tokenizer.json')

In [98]:
n_tokenizer = AutoTokenizer.from_pretrained(new_model,device_map = "cuda")
n_model = AutoModelForCausalLM.from_pretrained(new_model,quantization_config=bnb_config,device_map="cuda")
text = "Instruction: Can you explain contrastive learning in machine learning in simple terms for someone new to the field of ML?\n\n Response:"
device = "cuda"
inputs = n_tokenizer(text, return_tensors="pt").to(device)
outputs = n_model.generate(**inputs, max_new_tokens=512)
print(n_tokenizer.decode(outputs[0], skip_special_tokens=True))


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Instruction: Can you explain contrastive learning in machine learning in simple terms for someone new to the field of ML?

 Response: In simple terms, contrastive learning is a type of machine learning that is used to learn the relationships between data points. It is a type of learning that is used to learn the relationships between data points. The goal of contrastive learning is to learn the relationships between data points so that the model can be used to make predictions about new data points.
