**Finetuning the gemma model**

In [41]:
!pip install -q -U transformers accelerate peft bitsandbytes trl datasets

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [42]:
import os
import torch
import transformers
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
from transformers import (
AutoTokenizer, 
AutoModelForCausalLM,
BitsAndBytesConfig,
GemmaTokenizer
)

In [43]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("HF_TOKEN")


In [69]:
from kaggle_secrets import UserSecretsClient
import os

user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("HF_TOKEN")

os.environ["HF_TOKEN"] = hf_token




In [45]:
model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [46]:
tokenizer = AutoTokenizer.from_pretrained(model_id,
                                         token=os.environ["HF_TOKEN"])
model = AutoModelForCausalLM.from_pretrained(model_id,
                                            quantization_config=bnb_config,
                                            device_map={"":0},
                                            token=os.environ["HF_TOKEN"])


Loading weights:   0%|          | 0/164 [00:00<?, ?it/s]

In [47]:
text = "Quote: Be yourself; everyone else is already taken."
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: Be yourself; everyone else is already taken.

The quote is attributed to Oscar Wilde, but it is not clear who actually said it.

The quote is often used as a way to encourage people to be themselves and not to conform to the expectations of others.

The quote is often used


In [48]:
os.environ["WANDB_DISABLED"]="false"

In [49]:
lora_config = LoraConfig(
r=8,
target_modules = ["q_proj", "o_proj", "k_proj", "v_proj",
"gate_proj", "up_proj", "down_proj"],
task_type = "CAUSAL_LM")

In [50]:
from datasets import load_dataset

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

In [51]:
data["train"]["quote"]

Column(['“Be yourself; everyone else is already taken.”', "“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”", "“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”", '“So many books, so little time.”', '“A room without books is like a body without a soul.”', ...])

In [52]:
data["train"]

Dataset({
    features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],
    num_rows: 2508
})

In [53]:
def formatting_func(example):
    texts = []
    for quote, author in zip(example["quote"], example["author"]):
        text = f'Quote: {quote}\nAuthor: {author}'
        texts.append(text)
    return texts

In [54]:
trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=SFTConfig(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=100,
        learning_rate=2e-4,
        fp16=False,
        logging_steps=10,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
    processing_class=tokenizer
)



In [55]:
trainer.train()

Step,Training Loss
10,2.479064
20,1.86997
30,2.00776
40,2.229466
50,1.845597
60,2.046134
70,1.892348
80,1.80022
90,2.23351
100,2.18098


TrainOutput(global_step=100, training_loss=2.0585047149658204, metrics={'train_runtime': 334.129, 'train_samples_per_second': 1.197, 'train_steps_per_second': 0.299, 'total_flos': 189744345784320.0, 'train_loss': 2.0585047149658204})

In [56]:
new_model = "gemma_quotes_finetuned"
trainer.model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

('gemma_quotes_finetuned/tokenizer_config.json',
 'gemma_quotes_finetuned/tokenizer.json')

In [57]:
from peft import AutoPeftModelForCausalLM

# LOAD the model from the path
finetuned_model = AutoPeftModelForCausalLM.from_pretrained(
    new_model,  # Load from this path
    device_map={"": 0},
    torch_dtype=torch.bfloat16
)

# merges LoRA weights into the base model
finetuned_model = finetuned_model.merge_and_unload()

Loading weights:   0%|          | 0/164 [00:00<?, ?it/s]

In [58]:
# loaded model
text = "Quote: Logic will get you from A to Z; imagination will get you everywhere.\nAuthor:"
inputs = tokenizer(text, return_tensors="pt").to(finetuned_model.device)

outputs = finetuned_model.generate(
    **inputs,
    max_new_tokens=20,  
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2,
    no_repeat_ngram_size=3,
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: Logic will get you from A to Z; imagination will get you everywhere.
Author: Albert Einstein

I don't know if it was a typo, but I thought the quote said


In [68]:
# loaded model
text = "Quote: Of course it is happening inside your head, Harry."
inputs = tokenizer(text, return_tensors="pt").to(finetuned_model.device)

outputs = finetuned_model.generate(
    **inputs,
    max_new_tokens=20,  
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.2,
    no_repeat_ngram_size=3,
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: Of course it is happening inside your head, Harry. But why on earth should that mean that it's not real?

Harry Potter and the Sorcerer
