<a href="https://colab.research.google.com/github/kmk4444/Fine-tuning/blob/main/neox_20b_qlora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Load the Base Model

In this section, we setup the base model for usage.

# Install dependencies

In order to get started we need to install the appropriate dependencies.

In [1]:
!pip install requests==2.31.0



In [None]:
# install dependencies
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets


In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

# Download the base model

The following will download the base model, in this case the EleutherAI/gpt-neox-20b.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "EleutherAI/gpt-neox-20b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})



# **Fine Tune QLORA**

This is a notebook that's designed to fine tune a QLora custom dataset.

# **Prepare the Model for training**

The following prepares the model for training.



In [3]:
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

def print_trainable_parameters(model):
  """
  Prints the number of trainable parameters in the model.
  """
  trainable_params = 0
  all_param = 0
  for _, param in model.named_parameters():
    all_param += param.numel()
    if param.requires_grad:
      trainable_params += param.numel()
  print(
      f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
  )

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)


trainable params: 8650752 || all params: 10597552128 || trainable%: 0.08162971878329976


# **Dowloand the finetuning dataset**

In this case, we'll dowloand the english quotes dataset, in the future we'll change this to our own custom dataset.



In [None]:
from datasets import load_dataset

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

# **Train the dataset**

The following will perform the training (finetuning) of the dataset. It should take about 3 minutes to finetune.

In [5]:
import transformers

# needed for gpt-neo-x tokenizer
tokenizer.pad_token = tokenizer.eos_token

trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=10,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()


max_steps is given, it will override any value given in num_train_epochs


Step,Training Loss
1,2.7021
2,2.1375
3,2.5214
4,1.6753
5,1.8091
6,2.6741
7,2.4932
8,1.6813
9,2.6276
10,2.0986


TrainOutput(global_step=10, training_loss=2.242027056217194, metrics={'train_runtime': 136.8158, 'train_samples_per_second': 0.292, 'train_steps_per_second': 0.073, 'total_flos': 167211775033344.0, 'train_loss': 2.242027056217194, 'epoch': 0.01594896331738437})

# **Save the model**

**Save the model locally**

The following will save the model locally to the notebook.

In [6]:
#trainer.save_model("./Quotes Model")
model.save_pretrained("Quotes Model")

# **Mount GDrive**

The following will mount GDrive, so we can save our model permanently to our GDrive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive',force_remount=True)

# **Run QLORA**

The following runs our fine tuned model

**Run The Model - Tom Brady**

In [8]:
text = "Tom Brady is"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is in

Tom Brady is a great quarterback, but he's not a great leader. He's a great quarterback who happens to



Run the Model - Who are you?

In [None]:
text = "Who are you? "
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Run The Model - Quotes

In [None]:
text = "Twenty years from now"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))