# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [1]:
!pip3 install peft==0.7.1
!pip3 install trl==0.7.4
!pip3 install transformer==4.36.2




[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: C:\Users\minnb\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: C:\Users\minnb\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip
ERROR: Could not find a version that satisfies the requirement transformer==4.36.2 (from versions: none)
ERROR: No matching distribution found for transformer==4.36.2

[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: C:\Users\minnb\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [2]:
import transformers
transformers.__version__

  from .autonotebook import tqdm as notebook_tqdm


'4.36.2'

In [3]:
import trl
trl.__version__



'0.7.4'

In [4]:
import os
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [5]:
# Step 1: Load the dataset
from datasets import Dataset

# Path to your JSON file
json_file_path = "data/alpaca_data.json"

# Load the JSON file into a dataset
dataset = Dataset.from_json(json_file_path)

# Optionally, you can inspect the dataset
print(dataset)


Dataset({
    features: ['output', 'instruction', 'input'],
    num_rows: 52002
})


In [6]:
dataset[0]

{'output': '1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 'instruction': 'Give three tips for staying healthy.',
 'input': ''}

In [7]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

In [8]:
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

#check instruction-prompt
formatting_prompts_func(dataset[:2])

['### Question: Give three tips for staying healthy.\n ### Answer: 1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 '### Question: What are the three primary colors?\n ### Answer: The three primary colors are red, blue, and yellow.']

In [9]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

In [10]:
# Step 1: Load the dataset
from datasets import load_dataset
eval_dataset = load_dataset("tatsu-lab/alpaca_eval", split='eval')
eval_dataset

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Dataset({
    features: ['instruction', 'output', 'generator', 'dataset'],
    num_rows: 805
})

In [11]:
# Step 3: Define the Trainer
trainer = SFTTrainer(
    model,
    train_dataset=dataset.select(range(1000)),
    eval_dataset=eval_dataset.select(range(50)),
    formatting_func=formatting_prompts_func,
    data_collator=collator,
)

trainer.train() 


* 'schema_extra' has been renamed to 'json_schema_extra'
  0%|          | 0/375 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 375/375 [01:52<00:00,  3.34it/s]

{'train_runtime': 113.1259, 'train_samples_per_second': 26.519, 'train_steps_per_second': 3.315, 'train_loss': 2.6020519205729165, 'epoch': 3.0}





TrainOutput(global_step=375, training_loss=2.6020519205729165, metrics={'train_runtime': 113.1259, 'train_samples_per_second': 26.519, 'train_steps_per_second': 3.315, 'train_loss': 2.6020519205729165, 'epoch': 3.0})

## 3. Evaluation

In [12]:
trainer.evaluate()

100%|██████████| 7/7 [00:05<00:00,  1.36it/s]


{'eval_loss': 2.4145238399505615,
 'eval_runtime': 5.1538,
 'eval_samples_per_second': 9.702,
 'eval_steps_per_second': 1.358,
 'epoch': 3.0}

In [13]:
# Encode input text
input_text = "Give me three ways to eat healthy"
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)

# Generate output
output = model.generate(input_ids, max_length=256, num_beams=5, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated text:\n", generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text:
 Give me three ways to eat healthy:
1. Eat fruits and vegetables
2. Cook vegetables and fruits
3. Drink water
4. Take a bite out of the fridge
5. Have a snack or snack
6. Make sure to bring your own snacks and snacks
7. Serve with a salad or salad
8. Bring to a close
9. Enjoy yourself and enjoy yourself!
10. Share your favorite foods with friends and family
11. Use social media to spread the word about healthy eating.
12. Follow the hashtag #healthyeating
13. Write a blog post to share your thoughts on the health benefits of eating healthy foods. You can also reach out to us on Facebook, Twitter, Instagram, and Instagram to let us know what you think.


In [14]:
save_path = 'app/models/model.pt'
torch.save(model, save_path)

In [15]:
import pickle

save_path = 'app/models/tokenizer.pkl'
pickle.dump(tokenizer, open(save_path, 'wb'))