# Supervised Fine-Tuning (SFT) Template





Supervised fine-tuning (SFT) is a technique used to adapt a pre-trained Large Language Model (LLM) to a specific downstream task using labeled data.This process allows the model to learn task-specific patterns and nuances by adapting its parameters according to the specific data distribution and task requirements.

`I prepared this Supervised Fine-Tuning (SFT) template for my use case, but you could change it to suit your requirements.`



To View My Account:

* [Hugging Face ](https://huggingface.co/santhoshmlops)

* [Git Hub](https://github.com/santhoshmlops)

To View Some other Fine Tuning Template:

* [Fine Tuning Template ](https://github.com/santhoshmlops/MyHF_LLM_FineTuning/tree/main/FineTuningTemplate)


To View My Model Fine Tuning  NoteBook:

* [MY HF LLM Fine-Tuning](https://github.com/santhoshmlops/MyHF_LLM_FineTuning)



## Setting Up on Google Colab
Google Colab provides a convenient, cloud-based environment with access to powerful GPUs like the `T4`. If you choose Colab for this tutorial, make sure to select a GPU runtime by going to `Runtime > Change runtime type > T4 GPU`. This ensures that your notebook has access to the necessary computational resources.

## Setting Up Hugging Face Authentication

On Google Colab, you can safely store your Hugging Face token by using Colab's "Secrets" feature. This can be done by clicking on the "Key" icon in the sidebar, selecting "`Secrets`", and adding a new secret with the name `HF_TOKEN` and your Hugging Face token as the value. This method ensures that your token remains secure and is not exposed in your notebook's code.

# Step 1 - Install the required Python packages

In [1]:
!pip install -q -U transformers
!pip install -q -U peft
!pip install -q -U bitsandbytes
!pip install -q -U trl
!pip install -q -U accelerate
!pip install -q -U datasets

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.9/190.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m43.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m35.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m41.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.7/731.7 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.6/121.6 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━

# Step 2 - Logging into Hugging Face Hub
Paste the Hugging Face Hub Write API KEY

In [3]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

# Step 3 - Loading Required Libraries

In [3]:
import os
import torch
from datasets import load_dataset, Dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments,DataCollatorForLanguageModeling
from peft import LoraConfig,PeftModel, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from trl import SFTTrainer
from accelerate import Accelerator

# Step 4 - Setting Model Parameters for SFT
`Note:` The parameter can be changed for fine tuning, or it can be left as it is and filled with the value of the empty parameter.

In [4]:
# Load Model for Tuning
model_ckpt = "google/gemma-2b-it"  # Change the model_ckpt as your wish. For eg - "microsoft/phi-1_5"
hf_user_name = "santhoshmlops"
hub_model_ckpt = hf_user_name+"/"+ model_ckpt.split("/")[-1]+"-SFT" # Change the hub_model_ckpt as your wish. For eg - "santhoshmlops/microsoft_phi-1_5_merged-SFT"
dataset_name = "santhoshmlops/Skai_Gemma_Instruct_ChatTemplate"

# Lora Parameters
r= 16
lora_alpha = 32
lora_dropout = 0.05
bias = "none"
task_type = "CAUSAL_LM"
target_modules = ["q_proj","k_proj", "v_proj","o_proj","gate_proj","up_proj","down_proj"]    # Change the Target modules based on the model for tuning For eg - ["q_proj","k_proj"]

# BitsandBytes Parameters
load_in_4bit = True
bnb_4bit_quant_type = "nf4"
bnb_4bit_compute_dtype = torch.float16
bnb_4bit_use_double_quant = True

# Automodel Parameters
device_map = {"": Accelerator().local_process_index}
torch_dtype = torch.float16

# Tokenizer Parameters
trust_remote_code = True

# Training Parameters
output_dir = model_ckpt.split("/")[-1]+"-SFT"   # Change the model_ckpt as your wish. For eg - "microsoft_phi-1_5_merged-SFT"
num_train_epochs = 1
per_device_train_batch_size = 2
gradient_accumulation_steps = 2
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.003
optim = "paged_adamw_8bit"
lr_scheduler_type = "cosine"
max_steps = 750
warmup_ratio = 0.03
group_by_length = True
save_steps = 100
save_strategy = "epoch"
logging_steps = 100
logging_dir = "./logs"
fp16 = True
bf16 = False
push_to_hub = True
neftune_noise_alpha = 5
report_to = "tensorboard"

# SFT Training Parameters
train_cln_name = "text"
packing = False
max_seq_length = 1024

# Merge and push the model to Hub
low_cpu_mem_usage = True
return_dict = True

# Step 5 - Loading and Formatting the Dataset
`Note:` Prepare your dataset for fine tuning by defining and formatting it for your use case. The `def create_data():` function is an example for tuning the dataset.

In [5]:
def create_data():
  data = load_dataset(dataset_name, split="train")
  return data

data = create_data()
print(data[0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/359 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.55M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/15011 [00:00<?, ? examples/s]

{'text': "<bos><start_of_turn>user Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n When did Virgin Australia start operating?. \n Here are the context: Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia's domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly serve 32 cities in Australia, from hubs in Brisbane, Melbourne and Sydney. <end_of_turn>\n<start_of_turn>model \n Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. <end_of_turn>"}


# Step 6 - Fine-Tuning with Lora and Supervised Finetuning

In [8]:
# Load the model and tokenizer with specified configurations.
tokenizer = AutoTokenizer.from_pretrained(
    model_ckpt,
    trust_remote_code=trust_remote_code
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=load_in_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=bnb_4bit_compute_dtype,
    bnb_4bit_use_double_quant=bnb_4bit_use_double_quant
)

model = AutoModelForCausalLM.from_pretrained(
    model_ckpt,
    quantization_config=bnb_config,
    device_map=device_map,
    trust_remote_code=trust_remote_code,
    torch_dtype=torch_dtype
)
model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

# Training arguments
training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    gradient_checkpointing=gradient_checkpointing,
    max_grad_norm=max_grad_norm,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    optim=optim,
    lr_scheduler_type=lr_scheduler_type,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    save_steps=save_steps,
    save_strategy=save_strategy,
    logging_steps=logging_steps,
    logging_dir=logging_dir,
    fp16=fp16,
    bf16=bf16,
    push_to_hub=push_to_hub,
    neftune_noise_alpha = neftune_noise_alpha,
    report_to=report_to
)

# Prepare the model with LoRA (Low-Rank Adaptation) configuration.
lora_config = LoraConfig(
    r=r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    bias=bias,
    task_type=task_type,
    target_modules=target_modules
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# Create a trainer for training the model.
trainer = SFTTrainer(
    model=model,
    train_dataset=data,
    peft_config=lora_config,
    dataset_text_field=train_cln_name,
    args=training_args,
    tokenizer=tokenizer,
    packing=packing,
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

tokenizer_config.json:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/888 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Map:   0%|          | 0/15011 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


# Step 7 - Lets start the training process

In [9]:
# Train the model and save it.
trainer.train()
trainer.push_to_hub()



Step,Training Loss
100,2.0727
200,1.6891
300,1.6909
400,1.607
500,1.6071
600,1.5934
700,1.5852


CommitInfo(commit_url='https://huggingface.co/santhoshmlops/gemma-2b-it-SFT/commit/c2bad07bb53fc62308d92d9825e7ce7a39dd36b8', commit_message='End of training', commit_description='', oid='c2bad07bb53fc62308d92d9825e7ce7a39dd36b8', pr_url=None, pr_revision=None, pr_num=None)

# Step 8 - Merge the model with LoRA weights

In [None]:
# Clear the memory footprint
del model, trainer
torch.cuda.empty_cache()

# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_ckpt,
                                          trust_remote_code=trust_remote_code)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(model_ckpt,
                                                  low_cpu_mem_usage=low_cpu_mem_usage,
                                                  return_dict=return_dict,
                                                  torch_dtype=torch_dtype,
                                                  device_map=device_map,trust_remote_code=trust_remote_code)

# Merge models
merged_model = PeftModel.from_pretrained(base_model,hub_model_ckpt, from_transformers=True)
merged_model = merged_model.merge_and_unload()

# Push the model and tokenizer to the Hugging Face Model Hub
merged_model.push_to_hub(hub_model_ckpt, use_temp_dir=False)
tokenizer.push_to_hub(hub_model_ckpt, use_temp_dir=False)

# Step 8 - Inferencing with the model output

In [4]:
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("santhoshmlops/gemma-2b-it-SFT")
model = AutoModelForCausalLM.from_pretrained("santhoshmlops/gemma-2b-it-SFT", quantization_config=quantization_config)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))


config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/78.5M [00:00<?, ?B/s]



<bos>Write me a poem about Machine Learning. 
Here are the context: Machine learning (ML


In [6]:
input_text = "Write me a poem about deep Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids,max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

<bos>Write me a poem about deep Learning. 
Here are the context: Deep learning is a subfield of machine learning that uses artificial neural networks to automatically discover patterns in data. Deep learning algorithms are typically trained on large datasets of labeled data, and learn to make accurate predictions on new, unlabeled data. Deep learning algorithms are typically trained on large datasets of labeled data, and learn to make accurate predictions on new, unlabeled data. Deep learning algorithms are typically trained on large datasets of labeled data, and learn to make accurate predictions on new, unlabeled data. Deep learning algorithms are typically trained on large datasets of labeled data, and learn to make accurate predictions on new, unlabeled data. Deep learning algorithms are typically trained on large datasets of labeled data, and learn to make accurate


In [7]:
input_text = "Write me a poem about gen ai."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids,max_new_tokens=150)
print(tokenizer.decode(outputs[0]))

<bos>Write me a poem about gen ai. 
Here are the context: Gen AI is a term used to describe a new wave of AI that is more focused on the use of AI in the real world. Gen AI is a term used to describe a new wave of AI that is more focused on the use of AI in the real world. Gen AI is a term used to describe a new wave of AI that is more focused on the use of AI in the real world. Gen AI is a term used to describe a new wave of AI that is more focused on the use of AI in the real world. Gen AI is a term used to describe a new wave of AI that is more focused on the use of AI in the real world. Gen AI is a term used to describe


In [9]:
import time
st_time = time.time()
input_text = "Write me a poem about python."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids,max_new_tokens=150)
print(tokenizer.decode(outputs[0]))
print(time.time()-st_time)

<bos>Write me a poem about python.
Python is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
It is used for programming
It is a programming language
20.961308240890503


In [10]:
import time
st_time = time.time()
input_text = "What do you know about elon musk."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids,max_new_tokens=150)
print(tokenizer.decode(outputs[0]))
print(time.time()-st_time)

<bos>What do you know about elon musk. 
Here are the context: Elon Musk (/ˈmɛsk/; born June 28, 1971) is an American entrepreneur, engineer, and investor. He is the co-founder, chairman, and CEO of SpaceX, a private space transportation company, and the co-founder and product architect of Tesla, Inc., an electric vehicle and clean energy company. Musk is the co-founder of Neuralink, a neurotechnology company, and the co-founder of The Boring Company, a construction company. He is the co-founder of The Terra Firma Constellation, a constellation of artificial satellites. Musk is the co-founder of The Zip2, a web services company, and co-founder of Zip2. Musk
15.859235286712646


In [11]:
import time
st_time = time.time()
input_text = "tell me about you."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids,max_new_tokens=150)
print(tokenizer.decode(outputs[0]))
print(time.time()-st_time)

<bos>tell me about you. 
Here are the context: I am a large language model, trained by Google. I am a conversational AI that can be used to generate human-quality text, translate languages, write different kinds of creative content, and more. I am trained on a massive dataset of text and code, and I am able to communicate and generate human-like text in response to a wide range of prompts. I am trained on a massive dataset of text and code, and I am able to communicate and generate human-like text in response to a wide range of prompts. I am trained on a massive dataset of text and code, and I am able to communicate and generate human-like text in response to a wide range of prompts. I am trained on
15.518263816833496
