# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [10]:
# !pip3 install peft==0.7.1
# !pip3 install trl==0.7.4
# !pip3 install transformer==4.36.2

In [11]:
import transformers
transformers.__version__

'4.36.2'

In [12]:
import trl
trl.__version__

'0.7.4'

In [13]:
import os
import torch
# Set GPU device
# os.environ["CUDA_VISIBLE_DEVICES"] = "1"

# os.environ['http_proxy']  = 'http://192.41.170.23:3128'
# os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [14]:
# Step 1: Load the dataset
from datasets import load_dataset
dataset_train = load_dataset('json', data_files='dataset/alpaca_data.json', split='train')
dataset_train

Dataset({
    features: ['input', 'output', 'instruction'],
    num_rows: 52002
})

In [15]:
dataset_train[20000]

{'input': '(A musical note)',
 'output': 'The musical note is an F sharp.',
 'instruction': 'Name the given musical note.'}

In [16]:
dataset_eval = load_dataset("tatsu-lab/alpaca_eval", split='eval', trust_remote_code=True)
dataset_eval = dataset_eval.remove_columns(["generator", "dataset"])
dataset_eval

Downloading builder script: 100%|██████████| 8.10k/8.10k [00:00<?, ?B/s]
Downloading readme: 100%|██████████| 30.0/30.0 [00:00<?, ?B/s]
Downloading data: 100%|██████████| 621k/621k [00:01<00:00, 619kB/s] 


C:\Users\Acer\.cache\huggingface\datasets\downloads\07bde58ae497102ab81d326d84eafcf6c2c7e8df8cd8b8d0ef64d9eceab41ada


Generating eval split: 805 examples [00:00, 33777.32 examples/s]


Dataset({
    features: ['instruction', 'output'],
    num_rows: 805
})

In [17]:
dataset_eval[20000]

{'instruction': 'what are five important topics for game design',
 'output': '1. Storytelling\n2. Player Mechanics\n3. Art Direction\n4. Level Design\n5. User Interface Design'}

In [18]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

config.json: 100%|██████████| 762/762 [00:00<?, ?B/s] 
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  _torch_pytree._register_pytree_node(
model.safetensors: 100%|██████████| 353M/353M [00:14<00:00, 24.1MB/s] 
generation_config.json: 100%|██████████| 124/124 [00:00<00:00, 124kB/s]
tokenizer_config.json: 100%|██████████| 26.0/26.0 [00:00<00:00, 25.9kB/s]
vocab.json: 100%|██████████| 1.04M/1.04M [00:01<00:00, 885kB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 552kB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 4.38MB/s]


1024

In [None]:
# def formatting_prompts_func(example):
#     output_texts = []
#     for i in range(len(example['instruction'])):
#         text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
#         output_texts.append(text)
#     return output_texts

# #check instruction-prompt
# formatting_prompts_func(dataset[:2])

In [None]:
# # Step 3: Define the Trainer
# trainer = SFTTrainer(
#     model,
#     train_dataset=dataset.select(range(1000)),
#     formatting_func=formatting_prompts_func,
#     data_collator=collator,
# )

# trainer.train() 

In [19]:
dataset_eval[0].keys()

dict_keys(['instruction', 'output'])

In [20]:
dataset_train[:2]

{'input': ['', ''],
 'output': ['1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
  'The three primary colors are red, blue, and yellow.'],
 'instruction': ['Give three tips for staying healthy.',
  'What are the three primary colors?']}

### Standard-Alpaca : Format your input prompts
For instruction fine-tuning, it is quite common to have two columns inside the dataset: one for the prompt & the other for the response.

This allows people to format examples like Stanford-Alpaca did as follows:

In [None]:
# test = '''
# Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

# ### Instruction:
# {instruction}

# ### Response:
# {response}
# '''

In [21]:
def formatting_prompts_func(examples):
    output_texts = []

    for i in range(len(examples['instruction'])):
        instruction = examples["instruction"][i]
        input_text = examples["input"][i] if 'input' in examples.keys() else ""
        response = examples["output"][i]
        
        if len(input_text) > 1:
            text = f"""
            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            # Instruction: {instruction}
            # Input:
            {input_text}

            # Response:
            {response}

            # Your task: Modify the given code.
            """
            output_texts.append(text)
    
    return output_texts


['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for staying healthy.\n\n### Response:\n1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Response:\nThe three primary colors are red, blue, and yellow.']

In [23]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

response_template = "### Response:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

## Model Training

In [24]:
from transformers import TrainingArguments

# Define training arguments
training_args = TrainingArguments(
    output_dir='./training_results',  # Specify the output directory
    save_strategy='epoch',  # Save model checkpoints every epoch
    evaluation_strategy='epoch',  # Evaluate every epoch
    gradient_checkpointing=True,  # Enable gradient checkpointing for memory efficiency
    per_device_train_batch_size=2,  # Batch size for training
    per_device_eval_batch_size=2,  # Batch size for evaluation
    num_train_epochs=3,  # Number of training epochs
)

# Define the Trainer
trainer = SFTTrainer(
    model,  # Pass the model
    args=training_args,  # Pass the training arguments
    train_dataset=dataset_train.select(range(10000)),  # Train dataset
    eval_dataset=dataset_eval,  # Evaluation dataset
    formatting_func=formatting_prompts_func,  # Custom formatting function
    data_collator=collator,  # Data collator
    max_seq_length=max_seq_length,  # Maximum sequence length
)

# Start training
trainer.train()
trainer.save_model('model/instruction_tuning')

Map: 100%|██████████| 10000/10000 [00:01<00:00, 7006.07 examples/s]
Map: 100%|██████████| 805/805 [00:00<00:00, 3473.73 examples/s]
  0%|          | 0/15000 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
  3%|▎         | 500/15000 [02:12<58:38,  4.12it/s]  

{'loss': 2.6654, 'learning_rate': 4.8333333333333334e-05, 'epoch': 0.1}


  7%|▋         | 1000/15000 [04:15<1:01:29,  3.79it/s]

{'loss': 2.6308, 'learning_rate': 4.666666666666667e-05, 'epoch': 0.2}


 10%|█         | 1500/15000 [06:19<1:00:36,  3.71it/s]

{'loss': 2.5518, 'learning_rate': 4.5e-05, 'epoch': 0.3}


 13%|█▎        | 2000/15000 [08:23<45:53,  4.72it/s]  

{'loss': 2.5176, 'learning_rate': 4.3333333333333334e-05, 'epoch': 0.4}


 17%|█▋        | 2501/15000 [10:38<52:16,  3.98it/s]  

{'loss': 2.4691, 'learning_rate': 4.166666666666667e-05, 'epoch': 0.5}


 20%|██        | 3000/15000 [12:45<50:36,  3.95it/s]  

{'loss': 2.5089, 'learning_rate': 4e-05, 'epoch': 0.6}


 23%|██▎       | 3500/15000 [14:52<44:31,  4.30it/s]  

{'loss': 2.4718, 'learning_rate': 3.8333333333333334e-05, 'epoch': 0.7}


 27%|██▋       | 4000/15000 [17:07<41:47,  4.39it/s]  

{'loss': 2.4342, 'learning_rate': 3.6666666666666666e-05, 'epoch': 0.8}


 30%|███       | 4501/15000 [19:19<35:36,  4.91it/s]  

{'loss': 2.4213, 'learning_rate': 3.5e-05, 'epoch': 0.9}


 33%|███▎      | 5000/15000 [21:28<45:22,  3.67it/s]  

{'loss': 2.4404, 'learning_rate': 3.3333333333333335e-05, 'epoch': 1.0}


                                                    
 33%|███▎      | 5000/15000 [22:32<45:22,  3.67it/s]

{'eval_loss': 2.266629457473755, 'eval_runtime': 63.4956, 'eval_samples_per_second': 12.678, 'eval_steps_per_second': 6.347, 'epoch': 1.0}


 37%|███▋      | 5500/15000 [24:39<34:09,  4.64it/s]   

{'loss': 2.1132, 'learning_rate': 3.1666666666666666e-05, 'epoch': 1.1}


 40%|████      | 6000/15000 [26:56<32:49,  4.57it/s]  

{'loss': 2.1395, 'learning_rate': 3e-05, 'epoch': 1.2}


 43%|████▎     | 6500/15000 [29:09<30:37,  4.63it/s]  

{'loss': 2.1145, 'learning_rate': 2.8333333333333335e-05, 'epoch': 1.3}


 47%|████▋     | 7001/15000 [31:21<26:49,  4.97it/s]  

{'loss': 2.114, 'learning_rate': 2.6666666666666667e-05, 'epoch': 1.4}


 50%|█████     | 7501/15000 [33:28<27:19,  4.57it/s]  

{'loss': 2.1527, 'learning_rate': 2.5e-05, 'epoch': 1.5}


 53%|█████▎    | 8000/15000 [35:38<24:46,  4.71it/s]  

{'loss': 2.072, 'learning_rate': 2.3333333333333336e-05, 'epoch': 1.6}


 57%|█████▋    | 8501/15000 [37:48<23:32,  4.60it/s]  

{'loss': 2.1227, 'learning_rate': 2.1666666666666667e-05, 'epoch': 1.7}


 60%|██████    | 9000/15000 [39:56<26:47,  3.73it/s]  

{'loss': 2.1047, 'learning_rate': 2e-05, 'epoch': 1.8}


 63%|██████▎   | 9500/15000 [42:01<20:04,  4.56it/s]

{'loss': 2.1137, 'learning_rate': 1.8333333333333333e-05, 'epoch': 1.9}


 67%|██████▋   | 10000/15000 [44:06<11:08,  7.48it/s] 

{'loss': 2.1251, 'learning_rate': 1.6666666666666667e-05, 'epoch': 2.0}


                                                     
 67%|██████▋   | 10000/15000 [45:07<11:08,  7.48it/s]

{'eval_loss': 2.264587879180908, 'eval_runtime': 60.7449, 'eval_samples_per_second': 13.252, 'eval_steps_per_second': 6.634, 'epoch': 2.0}


 70%|███████   | 10500/15000 [47:19<17:13,  4.35it/s]   

{'loss': 1.9179, 'learning_rate': 1.5e-05, 'epoch': 2.1}


 73%|███████▎  | 11000/15000 [49:19<12:29,  5.34it/s]

{'loss': 1.9209, 'learning_rate': 1.3333333333333333e-05, 'epoch': 2.2}


 77%|███████▋  | 11500/15000 [51:25<13:12,  4.41it/s]  

{'loss': 1.9093, 'learning_rate': 1.1666666666666668e-05, 'epoch': 2.3}


 80%|████████  | 12001/15000 [53:31<11:25,  4.37it/s]

{'loss': 1.8931, 'learning_rate': 1e-05, 'epoch': 2.4}


 83%|████████▎ | 12501/15000 [55:31<09:34,  4.35it/s]

{'loss': 1.954, 'learning_rate': 8.333333333333334e-06, 'epoch': 2.5}


 87%|████████▋ | 13001/15000 [57:39<08:42,  3.83it/s]  

{'loss': 1.9601, 'learning_rate': 6.666666666666667e-06, 'epoch': 2.6}


 90%|█████████ | 13501/15000 [59:42<05:16,  4.74it/s]

{'loss': 1.9034, 'learning_rate': 5e-06, 'epoch': 2.7}


 93%|█████████▎| 14001/15000 [1:01:43<03:40,  4.52it/s]

{'loss': 1.8551, 'learning_rate': 3.3333333333333333e-06, 'epoch': 2.8}


 97%|█████████▋| 14501/15000 [1:03:42<01:38,  5.04it/s]

{'loss': 1.9018, 'learning_rate': 1.6666666666666667e-06, 'epoch': 2.9}


100%|██████████| 15000/15000 [1:05:43<00:00,  4.42it/s]

{'loss': 1.9002, 'learning_rate': 0.0, 'epoch': 3.0}


                                                       
100%|██████████| 15000/15000 [1:06:43<00:00,  4.42it/s]

{'eval_loss': 2.279604911804199, 'eval_runtime': 60.2418, 'eval_samples_per_second': 13.363, 'eval_steps_per_second': 6.69, 'epoch': 3.0}


100%|██████████| 15000/15000 [1:06:45<00:00,  3.75it/s]

{'train_runtime': 4005.1725, 'train_samples_per_second': 7.49, 'train_steps_per_second': 3.745, 'train_loss': 2.179969942220052, 'epoch': 3.0}





TrainOutput(global_step=15000, training_loss=2.179969942220052, metrics={'train_runtime': 4005.1725, 'train_samples_per_second': 7.49, 'train_steps_per_second': 3.745, 'train_loss': 2.179969942220052, 'epoch': 3.0})

## Inference

In [26]:
# Define the model name or path
model = "model/instruction_tuning"

# Load the pre-trained model
model = AutoModelForCausalLM.from_pretrained(
    model, device_map='auto'
)

# Import the pipeline for text generation
from transformers import pipeline

# Define a text generation pipeline with the loaded model and tokenizer
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=500  # Maximum number of tokens to generate
)


In [28]:
def instruction_prompt(instruction, prompt_input=None):
    # If prompt_input is provided
    if prompt_input:
        # Generate instruction prompt with input
        return f"""
        Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

        ### Instruction:
        {instruction}  # Display instruction

        ### Input:
        {prompt_input}  # Display input

        ### Response:  # Prompt for response
        """.strip()
    else:
        # Generate instruction prompt without input
        return f"""
        Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

        ### Instruction:
        {instruction}  # Display instruction

        ### Response:  # Prompt for response
        """.strip()


In [4]:
sample = dataset_eval[189]
sample

{What do alpacas eat?, Alpacas primarily eat grass and hay, as well as grains and supplements in captivity.}


In [5]:
output = text_generator(instruction_prompt("Tell me about a Alapcas.", sample.get('input', None)))


Alpaca dataset is a collection of information and data related to alpacas, a species of domesticated South American camelids. It contains various attributes such as alpaca behavior, habitat, diet, and farming practices, providing valuable insights for researchers and enthusiasts interested in these animals
