### Objective

In this notebook, we will look at `trl` package that stands for **transformer reinforcement learning**. RL is reward based training process and we will be looking at a very straight forward way of fine tu

In [1]:
import trl

In [2]:
trl.__version__

'0.9.6'

#### Dataset

We will be using a very small subset of **IMDB** dataset for this experiment.

In [3]:
# importing the libraries for accessing dataset
from datasets import load_dataset

In [4]:
dataset = load_dataset("imdb", split="train")
dataset = dataset.train_test_split(test_size=0.2)['test'].train_test_split(test_size=0.1)

In [5]:
dataset.shape

{'train': (4500, 2), 'test': (500, 2)}

#### Define Training Arguments

In [6]:
# these specific batch sizes have been chosen based on a GPU with VRAM of 12 GB
# unfortunately use of args like so has been deprecated 

from transformers import TrainingArguments

args = TrainingArguments(
    output_dir='/home/ubuntu/dailyResearch/trainers/output',
    push_to_hub=False,
    report_to="none",
    per_device_eval_batch_size=3,
    per_device_train_batch_size=4,
    eval_strategy='steps',
    eval_steps=200,
    save_strategy='epoch',
    num_train_epochs=1
)

In [7]:
# training dataset
dataset['train'][0]

{'text': 'There was a Bugs Bunny cartoon titled "Baby Buggy Bunny" that was EXACTLY this plot. Baby-faced Finster robbed a bank and the money in the carriage rolled away and fell into Bug\'s rabbit hole. He dressed up as a baby to get into Bugg\'s hole to retrieve the money. The scene in "Little Man" where he\'s looking in the bathroom mirror shaving with a cigar in his mouth is straight from the cartoon. This was a hilarious 5-minute cartoon; not so much an entire movie. If you are really interested in this, buy the Bugs Bunny DVD. It\'s was much more original the first time (1954). Plus you\'ll get a lot more classic Bugs Bunny cartoons to boot!',
 'label': 0}

#### Create Trainer

Here we will look at the first type of RL trainer that we call **SFTTrainer**. We have to remember that SFT trainers don't give a lot of support for customized workflows and also they are very stream-lined, although easy to work with they dont offer flexibility.

In [8]:
# create the trainer instance
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer, SFTConfig

model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m",  clean_up_tokenization_spaces=True)
tokenizer.clean_up_tokenization_spaces = True

sft_config = SFTConfig(output_dir="/tmp")

In [9]:
# we don't have to specify a value for report_to -> will cause error if you provide "none"
sft_config.output_dir="/home/ubuntu/dailyResearch/trainers/output"
sft_config.push_to_hub=False
sft_config.per_device_train_batch_size=4
sft_config.per_device_eval_batch_size=3
sft_config.eval_strategy='steps'
sft_config.eval_steps=200
sft_config.save_strategy='epoch'
sft_config.num_train_epochs=1
sft_config.dataset_text_field="text"
sft_config.max_seq_length=512

In [10]:
trainer = SFTTrainer(
    model,
    tokenizer=tokenizer,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    args=sft_config,
)

Map:   0%|          | 0/4500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [11]:
trainer.train()

Step,Training Loss,Validation Loss
200,No log,3.552261
400,No log,3.534817
600,3.595800,3.498288
800,3.595800,3.461796
1000,3.494500,3.430421


TrainOutput(global_step=1125, training_loss=3.535986843532986, metrics={'train_runtime': 511.0163, 'train_samples_per_second': 8.806, 'train_steps_per_second': 2.201, 'total_flos': 3434917236768768.0, 'train_loss': 3.535986843532986, 'epoch': 1.0})

For the following configuration, we use 13GB of VRAM
- sft_config.per_device_train_batch_size=4
- sft_config.per_device_eval_batch_size=3

In [12]:
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m",  clean_up_tokenization_spaces=True)
tokenizer.clean_up_tokenization_spaces = True

sft_config = SFTConfig(output_dir="/tmp")

In [9]:
# we don't have to specify a value for report_to -> will cause error if you provide "none"
sft_config.output_dir="/home/ubuntu/dailyResearch/trainers/output"
sft_config.push_to_hub=False
sft_config.per_device_train_batch_size=6
sft_config.per_device_eval_batch_size=6
sft_config.eval_strategy='steps'
sft_config.eval_steps=200
sft_config.save_strategy='epoch'
sft_config.num_train_epochs=1
sft_config.dataset_text_field="text"
sft_config.max_seq_length=512

In [10]:
trainer = SFTTrainer(
    model,
    tokenizer=tokenizer,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    args=sft_config,
)

Map:   0%|          | 0/4500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [11]:
trainer.train()

Step,Training Loss,Validation Loss
200,No log,3.499239
400,No log,3.45816
600,3.511800,3.413438


TrainOutput(global_step=750, training_loss=3.4898204752604167, metrics={'train_runtime': 487.9496, 'train_samples_per_second': 9.222, 'train_steps_per_second': 1.537, 'total_flos': 3735532832882688.0, 'train_loss': 3.4898204752604167, 'epoch': 1.0})

For the following configuration, we use 15.7GB of VRAM
- sft_config.per_device_train_batch_size=6
- sft_config.per_device_eval_batch_size=6

In [9]:
# we don't have to specify a value for report_to -> will cause error if you provide "none"
sft_config.output_dir="/home/ubuntu/dailyResearch/trainers/output"
sft_config.push_to_hub=False
sft_config.per_device_train_batch_size=8
sft_config.per_device_eval_batch_size=8
sft_config.eval_strategy='steps'
sft_config.eval_steps=200
sft_config.save_strategy='epoch'
sft_config.num_train_epochs=1
sft_config.dataset_text_field="text"
sft_config.max_seq_length=512

In [10]:
trainer = SFTTrainer(
    model,
    tokenizer=tokenizer,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    args=sft_config,
)

Map:   0%|          | 0/4500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [11]:
trainer.train()

Step,Training Loss,Validation Loss
200,No log,3.442333
400,No log,3.395725


TrainOutput(global_step=563, training_loss=3.4840845138522702, metrics={'train_runtime': 473.1567, 'train_samples_per_second': 9.511, 'train_steps_per_second': 1.19, 'total_flos': 3907172762124288.0, 'train_loss': 3.4840845138522702, 'epoch': 1.0})

For the following configuration, we use 19.01GB of VRAM
- sft_config.per_device_train_batch_size=8
- sft_config.per_device_eval_batch_size=8


### Training for Completion Tasks

We would want to look at a very important construct that helps in  properly formatting, padding, and aligning data specifically for completion-based language modeling tasks, making sure that model can learn effectively from the data provided. This is called `DataCollatorForCompletionOnlyLM`
- Completion Tasks: This data collator is designed specifically for scenarios where the model is trained to generate or complete text sequences. For example, it’s used when you want the model to predict the next part of a text given a prompt.

- Padding and Formatting: In training models, input sequences are often padded to ensure uniform batch sizes. DataCollatorForCompletionOnlyLM handles the padding and formatting of these sequences in a way that’s appropriate for completion tasks. This means it ensures that the padding doesn’t interfere with the model’s ability to learn from the actual content.

- Special Handling of Labels: For language modeling tasks, the labels (i.e., the ground truth sequences the model is supposed to predict) are often the same as the inputs but shifted. This collator can handle such setups by ensuring the labels are correctly aligned with the inputs.


#### Dataset for Completion Task

In [1]:
from datasets import load_dataset

dataset = load_dataset("lucasmccabe-lmi/CodeAlpaca-20k", split="train")
dataset = dataset.train_test_split(test_size=0.3)['test'].train_test_split(test_size=0.2)

In [2]:
dataset['train'][0]

{'instruction': 'Design and implement a function that takes two lists of integers as parameters and returns the minimum difference between two elements of the two lists. Write corresponding code in Python.',
 'input': 'list1 = [2, 4, 7, 13] \nlist2 = [3, 8, 9, 14]',
 'output': "def min_difference(list1, list2):\n    # Initialize the minimum difference to a large number\n    min_diff = float('inf')\n\n    for a in list1:\n        for b in list2:\n            # Update min_diff only if the difference \n            # between a and b is smaller than min_diff \n            min_diff = min(min_diff, abs(a - b))\n    \n    return min_diff"}

#### Import the model and tokenizer

In [3]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer
)

model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m", clean_up_tokenization_spaces=True)
tokenizer.clean_up_tokenization_spaces = True

#### Create configuration

In [4]:
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM, SFTConfig

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

In [5]:
sft_config = SFTConfig(output_dir="/tmp")

sft_config.output_dir="/home/ubuntu/dailyResearch/trainers/output"
sft_config.push_to_hub=False
sft_config.per_device_train_batch_size=2
sft_config.per_device_eval_batch_size=2
sft_config.eval_strategy='steps'
sft_config.eval_steps=200
sft_config.save_strategy='steps'
sft_config.save_steps=200
sft_config.num_train_epochs=1
sft_config.max_seq_length= 1024

In [6]:
trainer = SFTTrainer(
    model,
    tokenizer=tokenizer,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    formatting_func=formatting_prompts_func,
    data_collator=collator,
    args=sft_config,
)

Map:   0%|          | 0/4805 [00:00<?, ? examples/s]

Map:   0%|          | 0/1202 [00:00<?, ? examples/s]

In [7]:
trainer.train()

Step,Training Loss,Validation Loss
200,No log,2.040772
400,No log,1.867631
600,2.204000,1.809298
800,2.204000,1.706257
1000,1.770300,1.629065
1200,1.770300,1.563001
1400,1.770300,1.488702
1600,1.637200,1.428524
1800,1.637200,1.38855
2000,1.469000,1.3442


TrainOutput(global_step=2403, training_loss=1.7124031819356664, metrics={'train_runtime': 644.2829, 'train_samples_per_second': 7.458, 'train_steps_per_second': 3.73, 'total_flos': 1306487466491904.0, 'train_loss': 1.7124031819356664, 'epoch': 1.0})

For the following configuration, we saw a VRAM usage eof 16.7 GB
- sft_config.per_device_train_batch_size=2
- sft_config.per_device_eval_batch_size=2

In [5]:
sft_config = SFTConfig(output_dir="/tmp")

sft_config.output_dir="/home/ubuntu/dailyResearch/trainers/output"
sft_config.push_to_hub=False
sft_config.per_device_train_batch_size=3
sft_config.per_device_eval_batch_size=3
sft_config.eval_strategy='steps'
sft_config.eval_steps=200
sft_config.save_strategy='steps'
sft_config.save_steps=200
sft_config.num_train_epochs=1
sft_config.max_seq_length= 1024

In [6]:
trainer = SFTTrainer(
    model,
    tokenizer=tokenizer,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    formatting_func=formatting_prompts_func,
    data_collator=collator,
    args=sft_config,
)

Map:   0%|          | 0/4805 [00:00<?, ? examples/s]

Map:   0%|          | 0/1202 [00:00<?, ? examples/s]

In [7]:
trainer.train()

Step,Training Loss,Validation Loss
200,No log,1.887257
400,No log,1.734314
600,1.927300,1.622077
800,1.927300,1.506015
1000,1.641000,1.428209
1200,1.641000,1.366779
1400,1.641000,1.312488
1600,1.424700,1.279824


TrainOutput(global_step=1602, training_loss=1.648401543739881, metrics={'train_runtime': 653.0799, 'train_samples_per_second': 7.357, 'train_steps_per_second': 2.453, 'total_flos': 1533678303903744.0, 'train_loss': 1.648401543739881, 'epoch': 1.0})

For the following configuration, we saw a VRAM usage eof 22.3 GB
- sft_config.per_device_train_batch_size=3
- sft_config.per_device_eval_batch_size=3

#### Special type of Completion task

Here, we have a conversational type of dataset. So, for these datasets we have `instruction template` abd `response template`.

#### Dataset for this task

In [1]:
from datasets import load_dataset

dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")
dataset = dataset.train_test_split(test_size=0.3)['test'].train_test_split(test_size=0.2)

Repo card metadata block was not found. Setting CardData to empty.


In [2]:
dataset['train'][0]

{'text': '### Human: Explain the "prosecutor\'s fallacy" like I am five.### Assistant: Imagine you have a bag of 100 marbles. You know that 90 of the marbles are blue and 10 are red.\n\nYou pick a marble out of the bag without looking, and it\'s blue.\n\nThe "prosecutor\'s fallacy" is when someone argues that the probability you picked a blue marble is 90% or very high. But that\'s wrong.\n\nOnce you\'ve picked a marble, either it was blue (which it was) or red. The probabilities reset to either 100% or 0%.\n\nThe correct way to think about it is that before picking, there was a 90% chance of picking blue. But after picking, there is a 100% chance that you picked the marble you picked.\n\nThe "prosecutor\'s fallacy" gets its name because prosecutors sometimes commit this fallacy, arguing that the probability of someone\'s guilt is very high based on evidence, without taking into account that either they are guilty or not guilty, 100% or 0%, once the crime has occurred.### Human: Why wo

#### Configure model and tokenizer

In [3]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer
)

model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m", clean_up_tokenization_spaces=True)
tokenizer.clean_up_tokenization_spaces = True

#### Configure Trainer

In [4]:
from trl import SFTConfig, SFTTrainer, DataCollatorForCompletionOnlyLM

instruction_template = "### Human:"
response_template = "### Assistant:"
collator = DataCollatorForCompletionOnlyLM(instruction_template=instruction_template, response_template=response_template, tokenizer=tokenizer, mlm=False)

In [5]:
sft_config = SFTConfig(output_dir="/tmp")

sft_config.output_dir="/home/ubuntu/dailyResearch/trainers/output"
sft_config.push_to_hub=False
sft_config.per_device_train_batch_size=1
sft_config.per_device_eval_batch_size=1
sft_config.eval_strategy='steps'
sft_config.eval_steps=200
sft_config.save_strategy='steps'
sft_config.save_steps=200
sft_config.num_train_epochs=1
sft_config.dataset_text_field='text'
sft_config.max_seq_length=1024

In [6]:
trainer = SFTTrainer(
    model,
    tokenizer=tokenizer,
    args=sft_config,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    data_collator=collator,
)

Map:   0%|          | 0/2363 [00:00<?, ? examples/s]

Map:   0%|          | 0/591 [00:00<?, ? examples/s]

In [None]:
trainer.train()

Step,Training Loss,Validation Loss
200,No log,
400,No log,




El objetivo de los Ciclistas será acabar cada Etapa con el menor tiempo posible y así tratar de ganar el campeonato de Ciclistas y que sus Equipos ganen el campeonato por Equipos.

Descripción de las clases principales existentes en el proyecto «DP Ciclismo»

Estas características van a influir de diversas formas en el rendimiento del Ciclista con su Bicicleta. Así, la velocidad que es capaz de alcanzar un Ciclista con un Bicicleta en una Etapa depende inversamente de este valor. Es decir, a mayor dificultad, menor será la velocidad que el Ciclista con su Bicicleta pueden alcanzar en esa Etapa. Este valor se deberá usar para el cómputo de la velocidad de cada Ciclista.

Influye de forma directa en los minutos que necesita un Ciclista con su Bicicleta para finalizar la carrera. Es decir, a mayor distancia en una Etapa, mayor será el tiempo necesario que el Ciclista con su Bicicleta necesita para finalizar esa Etapa.

Estas características van a influir de diversas formas en el rendimi