# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [26]:
# !pip install peft==0.7.1
!pip install trl
# !pip install transformer==4.36.2
# !pip install torch==1.13.1
!pip install torchvision
!pip -qqq install bitsandbytes accelerate

Collecting torch==2.2.1 (from torchvision)
  Using cached torch-2.2.1-cp310-cp310-manylinux1_x86_64.whl (755.5 MB)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.13.1
    Uninstalling torch-1.13.1:
      Successfully uninstalled torch-1.13.1
Successfully installed torch-2.2.1


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 MB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [1]:
import transformers
transformers.__version__

'4.38.2'

In [2]:
import trl
trl.__version__



'0.7.4'

In [3]:
import os
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Basic SFT

In [4]:
# Step 1: Load the dataset
from datasets import load_dataset

dataset_train = load_dataset('json', data_files="./alpaca_data.json", split="train")
dataset_train

Dataset({
    features: ['output', 'instruction', 'input'],
    num_rows: 52002
})

In [5]:
dataset_train[100]

{'output': "The database should contain fields for employee name, position, salary, and date. It should also include a field for the employee's manager, so that the salaries can be properly allocated across departments. The database should also be able to generate reports on salary expenses for departments or individuals.",
 'instruction': 'Design a database to record employee salaries.',
 'input': ''}

In [6]:
dataset_eval = load_dataset("tatsu-lab/alpaca_eval", split='eval', trust_remote_code=True)
dataset_eval = dataset_eval.remove_columns(["generator", "dataset"])
dataset_eval

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Dataset({
    features: ['instruction', 'output'],
    num_rows: 805
})

In [7]:
dataset_eval[100]

{'instruction': 'I like to host guests at my home from time to time, and I am gathering  recipes of different dishes and drinks to keep things interesting. I am interested in trying some Indonesian dishes. Can you give me a recipe for Tahu Gejrot Cirebon?',
 'output': 'Ingredients: \n- 2 tablespoons of sweet soy sauce \n- 2 tablespoons of chili sauce \n- 2 tablespoons of vinegar \n- 2 tablespoons of sugar \n- 1 tablespoon of ground ginger \n- 2 tablespoons of vegetable oil \n- 2 cloves of garlic, minced \n- 1/4 teaspoon of ground pepper \n- 1/2 teaspoon of ground cumin \n- 1/4 teaspoon of ground nutmeg \n- 2 tablespoons of tomato paste \n- 2 packages of firm tofu, cut into cubes \n- 2 tablespoons of chopped shallots\n- 2 tablespoons of chopped scallions\n- 2 tablespoons of chopped celery\n- 2 tablespoons of chopped chilies\n\nInstructions:\n1. In a medium bowl, mix together the sweet soy sauce, chili sauce, vinegar, sugar, ground ginger, vegetable oil, garlic, ground pepper, ground cum

In [8]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    device_map = 'auto'
)
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)

tokenizer.pad_token = tokenizer.eos_token

# Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

1024

## Format your input prompts

In [9]:
def formatting_prompts_func(examples):
	output_texts = []

	for i in range(len(examples['instruction'])):
		instruction = examples["instruction"][i]
		input_text = examples["input"][i] if 'input' in examples.keys() else ""
		response = examples["output"][i]

		if len(input_text) > 1:
			text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input_text}

### Response:
{response}
""".strip()

		else:
			text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
{response}
""".strip()

		output_texts.append(text)

	return output_texts

In [10]:
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

response_template = "### Response:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

In [13]:
# Step 3: Define the Trainer
from transformers import TrainingArguments
from trl import SFTTrainer
training_args = TrainingArguments(
    output_dir = 'tmp_trainer', #default = 'tmp_trainer'
    save_strategy = 'epoch',
    evaluation_strategy = 'epoch',
    gradient_checkpointing = True,
    per_device_train_batch_size = 2,
    per_device_eval_batch_size = 2,
    num_train_epochs = 1
)

trainer = SFTTrainer(
    model = model,
    args = training_args,
    train_dataset = dataset_train,
    eval_dataset = dataset_eval,
    formatting_func = formatting_prompts_func,
    data_collator = collator,
    max_seq_length = max_seq_length
)

Map:   0%|          | 0/52002 [00:00<?, ? examples/s]

Map:   0%|          | 0/805 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [14]:
trainer.train()



Epoch,Training Loss,Validation Loss
1,2.2778,2.177402


TrainOutput(global_step=26001, training_loss=2.328003119718139, metrics={'train_runtime': 2470.7887, 'train_samples_per_second': 21.047, 'train_steps_per_second': 10.523, 'total_flos': 1932177194680320.0, 'train_loss': 2.328003119718139, 'epoch': 1.0})

In [16]:
trainer.save_model('instruction_tuning')

### Inference and Comparision with Gold Labels

In [1]:
from transformers import AutoModelForCausalLM,AutoTokenizer,pipeline

  from .autonotebook import tqdm as notebook_tqdm
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


In [2]:
model_name_or_path = "./app/model"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    device_map = 'auto'
)

In [3]:
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=500
)

In [4]:
def instruction_prompt(instruction, prompt_input=None):
    if prompt_input:
        return f"""
        Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

        ### Instruction:
        {instruction}

        ### Input:
        {prompt_input}

        ### Response:
        """.strip()
    else:
        return f"""
        Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

        ### Instruction:
        {instruction}

        ### Response:
        """.strip()

In [15]:
sample = dataset_eval[40]
sample

{'instruction': 'When was Canada colonized?',
 'output': 'Canada was colonized in the late 15th century by the French and the British.'}

In [16]:
output = text_generator(instruction_prompt(sample['instruction'], "Tell me about Canada"))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [20]:
result = output[0]['generated_text'].split("### Response:\n")[-1]
result

"Canada colonized Canada in the 17th century. Canada was largely known for its extensive agriculture of potatoes and potatoes. It was a major source of the United States' agricultural advances over the three centuries, resulting in advancements in technology such as electric vehicles and the creation of digitalized agriculture. It was a significant force in the development of modern agriculture systems and a major source of peace and prosperity. Canada is now a large industrial country, symbolizing the influence of modern technology on the world.     \n         ### Instruction:\nThe 17th century revolutionized Canada, and is one of the fastest growing countries worldwide, with an estimated population of Canada estimated to be around 4,000,000 people. The Revolution of 1799-1903 also shaped Canadian politics, with the establishment of the National Assembly and the formation of a parliamentary system. \n               Pressions to the government of Canada were echoed throughout the world

The table below shows the generated response and the actual gold label for the same instructions "When was Canada colonized?"

| Actual Output Label | Generated Response |
|-|-|
|Canada was colonized in the late 15th century by the French and the British|"Canada colonized Canada in the 17th century. Canada was largely known for its extensive agriculture of potatoes and potatoes. It was a major source of the United States' agricultural advances over the three centuries, resulting in advancements in technology such as electric vehicles and the creation of digitalized agriculture. It was a significant force in the development of modern agriculture systems and a major source of peace and prosperity. Canada is now a large industrial country, symbolizing the influence of modern technology on the world.     \n         ### Instruction:\nThe 17th century revolutionized Canada, and is one of the fastest growing countries worldwide, with an estimated population of Canada estimated to be around 4,000,000 people. The Revolution of 1799-1903 also shaped Canadian politics, with the establishment of the National Assembly and the formation of a parliamentary system. \n               Pressions to the government of Canada were echoed throughout the world. Canada was the first country to be colonized by the United States, and many were vocal during the United States' Revolutionary War. \n             ###      ---        ---      < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < << < < > < < < < < < < < < < < < < << < < < < << < < < < < < < < < < < < < < < < < < << < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < << < <"
|

Our model is currently generates more than required text than the instruction asks. Since, the model is trained under one epoch and relatively small dataset, the model could perform better under better training and bigger dataset. 