# Instruction tuning (Finetuning) using custom dataset on GPT2

## 00. Setup packages and import all reqired settings

In [59]:
import os
import getpass
#os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"  # Arrange GPU devices starting from 0
#os.environ["CUDA_VISIBLE_DEVICES"]= "1"  # Set the GPU 1 to use
#os.environ["HUGGING_FACE_HUB_TOKEN"] = getpass.getpass("Token:") 
#assert os.environ["HUGGING_FACE_HUB_TOKEN"]

In [6]:
!pip install transformers datasets accelerate peft -qqq

In [60]:
from transformers import AutoTokenizer, AutoModelForCausalLM # GPT2TokenizerFast, GPT2LMHeadModel
from datasets import load_dataset

## 01. Data Load

In [61]:
train_dataset = load_dataset('Aeala/ShareGPT_Vicuna_unfiltered', split="train[:5000]")

In [62]:
train_dataset[0]["conversations"]

[{'from': 'human',
  'markdown': None,
  'text': None,
  'value': "Summarize the main ideas of Jeff Walker's Product Launch Formula into bullet points as it pertains to a growth marketing agency implementing these strategies and tactics for their clients..."},
 {'from': 'gpt',
  'markdown': None,
  'text': None,
  'value': "Here are the main ideas of Jeff Walker's Product Launch Formula that can be applied by a growth marketing agency for their clients:\n\n1. Identify the target audience and their needs: Understand the ideal customer for the product or service, and create a messaging that resonates with them.\n2. Pre-launch: Build anticipation and excitement for the launch by creating buzz, gathering testimonials and case studies, and using social media to create awareness.\n3. Launch: Use a well-crafted launch sequence to maximize sales and conversions. This can include offering bonuses, creating scarcity, and using a deadline to create urgency.\n4. Post-launch: Follow up with custome

## 02. Load Tokenizer Setup

In [63]:
tokenizer = AutoTokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

## 03. Preprocessing (Tokenization and Preprocessing for Causal Language Modeling)

### 03-1. Tokenize for all samples

In [64]:
def tokenizer_fuction(samples):
  sample = samples["conversations"]
  result = ""
  for sample in sample:
    if sample["from"] == "human":
      result = result + "USER: " + sample["value"] + " \n"
    else:
      result = result + "CHATBOT: " + sample["value"] + " \n"

  return tokenizer(result, padding="max_length", truncation=True, max_length=256)

In [65]:
tokenized_dataset = train_dataset.map(
    tokenizer_fuction,
    remove_columns=["conversations", "id"]
)

### 03-2. Data Preperation for Causal Language Modeling (next token prediction)

In [66]:
from transformers import DataCollatorForLanguageModeling
collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

## 04. Load Pretrained Model and Generate sentences in initial settings

In [67]:
model = AutoModelForCausalLM.from_pretrained('gpt2')
model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [68]:
model.get_memory_footprint()/1000000000

0.510342192

In [69]:
def gen_function(prompt, model, tokenizer):

    # 1) Prompt
    input_text = prompt
    # 2) Tokenizing and Tensor transformation
    input_ids = tokenizer.encode(input_text, return_tensors="pt")
    input_ids = input_ids.to('cuda')
    # 3) Generate texts
    max_length = 100
    model = model.to("cuda")
    sample_outputs = model.generate(input_ids, do_sample=True, max_length=max_length, temperature=0.7)
    # 4) Decoding texts
    return tokenizer.decode(sample_outputs[0], skip_special_tokens=True)


In [70]:
output = gen_function("Could you give me some examples of Numpy array?", model, tokenizer)
print(output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Could you give me some examples of Numpy array?

I can't remember what Numpy had to do with arrays except it was not really a problem at all. (One of the coolest things about Numpy is that it is not only a great programming language but also an awesome library!)

I can't remember what Numpy had to do with arrays except it was not really a problem at all. (One of the coolest things about Numpy is that it is not only a


## 05. Train with Trainer and TrainingArguments

In [71]:
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
	output_dir="./gpt2_instruction_tuning",
	overwrite_output_dir=True,
	num_train_epochs=1,
	per_device_train_batch_size=8,
	save_steps=1000,
	save_total_limit=2,
)

In [72]:
trainer = Trainer(
	model=model,
	args=training_args,
	data_collator=collator,
	train_dataset=tokenized_dataset,
)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [73]:
trainer.train()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
500,2.4525


TrainOutput(global_step=625, training_loss=2.439342578125, metrics={'train_runtime': 94.2928, 'train_samples_per_second': 53.026, 'train_steps_per_second': 6.628, 'total_flos': 653230080000000.0, 'train_loss': 2.439342578125, 'epoch': 1.0})

In [74]:
output = gen_function("Could you give me some examples of Numpy array?", model, tokenizer)
print(output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Could you give me some examples of Numpy array? You can make some of them (or use Numpy arrays) and you can use the `array` keyword.

Let's first create a Numpy array. Here is an example of how it could work:
```
import numpy as np
import matplotlib.pyplot as plt


# Make a Numpy array of 4 elements
# Create an array
array = np.array([1,


## 06. Quantization

In [75]:
model_name = "gpt2"
model_quant = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    load_in_4bit=True,
    bnb_4bit_use_double_quant = True,
    trust_remote_code=True
)

In [76]:
model_quant.get_memory_footprint()/1000000000

0.134060568

In [77]:
max_length = 200
input_ids = tokenizer("Give me some Numpy codes?", return_tensors="pt").input_ids
sample_outputs = model_quant.generate(input_ids.to("cuda"), do_sample=True, max_length=max_length, temperature=0.75)
print(tokenizer.decode(sample_outputs[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Give me some Numpy codes? I wanted to learn them.

I'd like to put my own to be a great example.

I want you to enjoy this. I want you to know that you want your own and not the other.

I want you to be able to use this code to build your own.

When I came up with the idea of a class with a public class I wanted it to be a nice structure that could be used by people.

I looked at a bunch of code, and some that came up with public classes, and the other ones I tried to use as well.

And finally I built it with classes and then I defined classes, and then I used a couple of others.

And I wrote a class on what I wanted to do with the code, and I got it just right.

And now here's what I've decided:

All I've changed is that I have a


 - Quantized model can not be trainable becuase the optimizer can't handle 4bit types of computation

In [78]:
trainer = Trainer(
	model=model_quant,
	args=training_args,
	data_collator=collator,
	train_dataset=tokenized_dataset,
)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details

## 07. Training with LoRA Adaptor
 - We can load the GPT2 pretrained model with 4bit and freeze all parameters
 - Then add up just a LoRA adapter which is a small trainable layer
 - Finally we can tune Large LLM by training only the LoRA adapter 

In [81]:
from peft import LoraConfig, PeftModel, PeftConfig, TaskType, get_peft_model
import torch

### 07-1. Quanitzation using BitsAndBytes

In [82]:

from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

### 07-2. Load model with 4bit using BitsAndBytes

In [87]:
model_quant_lora = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)

### 07-3. Set LoRA config

In [89]:
#Set Lora settings
lora_alpha = 6
lora_dropout = 0.2
r = 6
#target_module = ["lm_head",]
#module_to_save
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=r,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
    #target_modules=target_module
)

In [90]:
peft_config

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>, inference_mode=False, r=6, target_modules=None, lora_alpha=6, lora_dropout=0.2, fan_in_fan_out=False, bias='none', modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={})

### 07-4. Load model with LoRA adaptor

In [91]:
model_quant_lora = get_peft_model(model=model_quant_lora, peft_config=peft_config)

In [92]:
model_quant_lora.print_trainable_parameters()

trainable params: 221,184 || all params: 124,660,992 || trainable%: 0.1774283971685385


In [93]:
model_quant_lora

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.2, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=6, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=6, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=768, out_feat

In [94]:
from transformers import Trainer, TrainingArguments

per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 10
out_dir = "./my-gpt2-output"

train_arg = TrainingArguments(
    output_dir=out_dir,
    num_train_epochs=1,
    per_device_eval_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    fp16=True,
    group_by_length=True
)

In [97]:
max_seq_length = 512

trainer = Trainer(
	model=model_quant_lora,
	args=training_args,
	data_collator=collator,
	train_dataset=tokenized_dataset,
)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [110]:
trainer.train()

Step,Training Loss
500,2.7706


TrainOutput(global_step=625, training_loss=2.7597544921875, metrics={'train_runtime': 50.8508, 'train_samples_per_second': 98.327, 'train_steps_per_second': 12.291, 'total_flos': 654928773120000.0, 'train_loss': 2.7597544921875, 'epoch': 1.0})

In [115]:
max_length = 200
input_ids = tokenizer("Who is Brack Obama?", return_tensors="pt")
sample_outputs = model_quant_lora.generate(**input_ids.to("cuda"), do_sample=True, max_length=max_length, temperature=0.75)
print(tokenizer.decode(sample_outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Who is Brack Obama?

Well, Brack Obama is obviously the most popular president in recent history. It's the first time in history that a president has been elected president with a national popular vote.

The most popular president was Lyndon Johnson, who was the first African-American president. Since that time, President Obama has been the most popular president in history. As an African-American, Obama has had an extremely negative campaign environment. To put it in perspective, if you're a Republican and you're in the Democrat, you can be a Republican and a Democrat.

This isn't the first time that the Democratic Party has had a negative campaign environment. A lot of people are saying that Democrats are bad, especially when it comes to foreign policy. But that's because we're not in the Republican Party anymore.

The other major political parties in the United States have had much less negative campaigns than the Republicans, but the Democratic Party is still in the Republican


In [None]:
model_quant_lora.save()