<a href="https://colab.research.google.com/github/ritikjain51/llm-finetuning/blob/main/Fine_Tune_Google_Gemma.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi

Fri May 24 12:21:05 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   67C    P8              11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
pip install bitsandbytes transformers trl peft datasets accelerate



## Importing the Packages

In [3]:
import os
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig, GemmaTokenizer, TrainingArguments
from trl import SFTTrainer
from datasets import load_dataset
from peft import LoraConfig
import torch

# For reading the secrets
from google.colab import userdata

In [4]:
os.environ["HF_TOKEN"] = userdata.get("HF_TOKEN")

In [19]:
model_id = "google/gemma-2b"

bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype = torch.bfloat16
)
bnb_config

BitsAndBytesConfig {
  "_load_in_4bit": true,
  "_load_in_8bit": false,
  "bnb_4bit_compute_dtype": "bfloat16",
  "bnb_4bit_quant_storage": "uint8",
  "bnb_4bit_quant_type": "nf4",
  "bnb_4bit_use_double_quant": false,
  "llm_int8_enable_fp32_cpu_offload": false,
  "llm_int8_has_fp16_weight": false,
  "llm_int8_skip_modules": null,
  "llm_int8_threshold": 6.0,
  "load_in_4bit": true,
  "load_in_8bit": false,
  "quant_method": "bitsandbytes"
}

In [20]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map = {"": 0}
)

tokenizer = AutoTokenizer.from_pretrained(model_id)


config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/33.6k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

## Inference from Base Model

In [21]:
prompt = "How LLM can help humanity?"

inputs = tokenizer(prompt, return_tensors="pt")

output = model.generate(**inputs, max_new_tokens=100)

result = tokenizer.decode(output[0], skip_special_tokens=True)
print(result)



How LLM can help humanity?

The LLM degree is a postgraduate degree in law. It is a two-year course that is offered by many universities around the world. The LLM degree is a great way to improve your legal knowledge and skills. It can also help you to get a job in the legal field.

The LLM degree is a great way to improve your legal knowledge and skills. It can also help you to get a job in the legal field.

The LLM degree is a great way to


## FineTuning the Model

In [22]:
lora_config = LoraConfig(
    r = 8,
    # target_modules = [

    # ],
    task_type = "CAUSAL_LM"
)

lora_config

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='CAUSAL_LM', inference_mode=False, r=8, target_modules=None, lora_alpha=8, lora_dropout=0.0, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None)

### Loading Dataset

In [23]:
dataset_name = "Abirate/english_quotes"

data = load_dataset(dataset_name)

data = data.map(lambda x: tokenizer(x["quote"]), batched=True)

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

In [24]:
data["train"]["quote"][0]

'“Be yourself; everyone else is already taken.”'

In [25]:
def formatting_func(example):
  return [f"Quote: {example['quote'][0]}\nAuthor: {example['author'][0]}"]

formatting_func(data["train"])

['Quote: “Be yourself; everyone else is already taken.”\nAuthor: Oscar Wilde']

In [26]:
training_args = TrainingArguments(
    per_device_train_batch_size=1,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps = 4,
    warmup_steps = 2,
    max_steps = 50,
    learning_rate = 2e-4,
    fp16=True,
    logging_steps = 5,
    output_dir = "./results",
    optim = "paged_adamw_8bit"
)

In [27]:
trainer = SFTTrainer(
    model=model,
    train_dataset = data["train"],
    args = training_args,
    peft_config = lora_config,
    formatting_func = formatting_func
)

max_steps is given, it will override any value given in num_train_epochs


In [28]:
trainer.train()

Step,Training Loss
5,2.4734
10,2.8161
15,2.3043
20,2.3697
25,2.5615
30,2.1239
35,2.7139
40,2.8854
45,2.3193
50,2.3211


TrainOutput(global_step=50, training_loss=2.488853759765625, metrics={'train_runtime': 56.7386, 'train_samples_per_second': 3.525, 'train_steps_per_second': 0.881, 'total_flos': 100433090322432.0, 'train_loss': 2.488853759765625, 'epoch': 0.07974481658692185})

In [29]:
trained_model = "gemma-2b-quotes-finetuned"
trainer.model.save_pretrained(trained_model)



## Inference Pipeline

In [30]:
from transformers import pipeline

In [31]:
pipe = pipeline(
    "text-generation", model=model, tokenizer=tokenizer,
    max_length = 30,
    temperature = 0,
    truncation=True
)

In [32]:
quotes_template = "Quote: {}"

result = pipe(quotes_template.format("Listen to the mustn'ts"))
print(result[0]["generated_text"])



Quote: Listen to the mustn'ts of of of of of of of of of of of of of of of of of of of of of of
