<a href="https://colab.research.google.com/github/anshradh/trl_custom/blob/test/04_writing_prompt_supervised_baseline_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Writing Prompt Response Supervised Learning Baseline
We fine-tune a language model to respond to reddit writing prompts using standard supervised learning. This is also known as behavioral cloning or imitation learning in RL.

## Prerequisites

In [1]:
## Install needed libraries and log into huggingface
!pip install datasets
!pip install transformers
!pip install accelerate
!pip install huggingface_hub
!apt install git-lfs
from huggingface_hub import notebook_login
notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


In [2]:
import torch
from tqdm.auto import tqdm
import numpy as np
from torch.utils.data import DataLoader
import pandas as pd
import numpy as np
import torch.nn.functional as F
from torch.optim import Adam
import torch
import collections
import random
tqdm.pandas()

from datasets import load_dataset, ClassLabel, load_metric, concatenate_datasets

from transformers import AutoModel, AutoTokenizer
from transformers import top_k_top_p_filtering
from torch import nn
from torch.nn import Identity
import torch.nn.functional as F
import torch

from transformers import AutoModelForCausalLM, DataCollatorWithPadding, AdamW, get_scheduler

from accelerate import Accelerator

## Load and preprocess dataset

In [3]:
## Load dataset from huggingface
prompt_response_dataset = load_dataset("rewardsignal/reddit_writing_prompts", data_files="prompt_responses_full.csv", split='train[:80%]')

Using custom data configuration rewardsignal--reddit_writing_prompts-dd5d2a64487ab606


Downloading and preparing dataset csv/rewardsignal--reddit_writing_prompts to /root/.cache/huggingface/datasets/csv/rewardsignal--reddit_writing_prompts-dd5d2a64487ab606/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/681M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/rewardsignal--reddit_writing_prompts-dd5d2a64487ab606/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519. Subsequent calls will reuse this data.


In [4]:
## We tokenize the dataset and standardize the prompts and responses
# tokenizer_name = input()
tokenizer_name = 'distilgpt2'
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, use_fast=True)
prompt_prefix = "Writing Prompt: "
response_prefix = "Response: "

def preprocess_text_function(examples):
  examples["prompt"] = [prompt.replace('[WP] ', prompt_prefix) for prompt in examples["prompt"]]
  examples["response"] = [response_prefix + response for response in examples["response"]]
  return tokenizer(examples['prompt'], examples['response'], truncation=True)

tokenized_prompt_response_dataset = prompt_response_dataset.map(preprocess_text_function, batched=True, remove_columns=['Unnamed: 0', 'prompt_id', 'prompt', 'prompt_score', 'prompt_created_utc', 'response_id', 'response', 'response_score', 'response_created_utc', 'num_responses', 'response_children', 'score_bin', 'response_rank'], num_proc=4)


Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

        

#1:   0%|          | 0/39 [00:00<?, ?ba/s]

#0:   0%|          | 0/39 [00:00<?, ?ba/s]

#2:   0%|          | 0/39 [00:00<?, ?ba/s]

#3:   0%|          | 0/39 [00:00<?, ?ba/s]

In [5]:
## We group the prompts and responses together into continuous text blocks to simplify training
block_size = 512
def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
        # customize this part to your needs.
    total_length = (total_length // block_size) * block_size
    # Split by chunks of max_len.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result
tokenized_prompt_response_dataset = tokenized_prompt_response_dataset.map(group_texts, batched=True, batch_size = 1000, num_proc = 4)
tokenized_prompt_response_dataset.set_format("torch")

        

#0:   0%|          | 0/39 [00:00<?, ?ba/s]

#1:   0%|          | 0/39 [00:00<?, ?ba/s]

#3:   0%|          | 0/39 [00:00<?, ?ba/s]

#2:   0%|          | 0/39 [00:00<?, ?ba/s]

## Prepare for training

In [6]:
## Split into training and evaluation datasets
supervised_train_dataset = tokenized_prompt_response_dataset.shuffle(seed=42).select(range(4*len(tokenized_prompt_response_dataset)//5))
supervised_eval_dataset = tokenized_prompt_response_dataset.shuffle(seed=42).select(range(4*len(tokenized_prompt_response_dataset)//5, len(tokenized_prompt_response_dataset)))

Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/csv/rewardsignal--reddit_writing_prompts-dd5d2a64487ab606/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519/cache-fd487d54ba07f17d.arrow


In [7]:
## We prepare for supervised fine-tuning on the dataset
# supervised_model_name = input()
supervised_model_name = 'distilgpt2'
tokenizer.pad_token = tokenizer.eos_token
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
supervised_model = AutoModelForCausalLM.from_pretrained(supervised_model_name, num_labels=2)

train_dataloader = DataLoader(
    supervised_train_dataset, shuffle=True, batch_size=4, collate_fn=data_collator
)
eval_dataloader = DataLoader(
    supervised_eval_dataset, batch_size=4, collate_fn=data_collator
)

optimizer = AdamW(supervised_model.parameters(), lr=3e-5)
accelerator = Accelerator()
train_dataloader, eval_dataloader, supervised_model, optimizer = accelerator.prepare(train_dataloader, eval_dataloader, supervised_model, optimizer)
num_epochs = 1
num_training_steps = num_epochs * len(train_dataloader)
lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=num_training_steps,
)

progress_bar = tqdm(range(num_training_steps))

Downloading:   0%|          | 0.00/336M [00:00<?, ?B/s]



  0%|          | 0/42512 [00:00<?, ?it/s]

## Training Loop

In [8]:
## Training loop for fine-tuning
supervised_model.train()
for epoch in range(num_epochs):
    for batch in train_dataloader:
        outputs = supervised_model(**batch)
        loss = outputs.loss
        accelerator.backward(loss)
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

## Evaluate Outputs

In [9]:
## Look at 10 batch outputs from the evaluation dataset
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
supervised_model.to(device)
count = 0
for batch in eval_dataloader:
    count += 1
    batch = {k: v.to(device) for k, v in batch.items()}
    with torch.no_grad():
        outputs = supervised_model.generate(**batch, max_length=512, min_length = 200)
        print(tokenizer.batch_decode(outputs, max_length = 512))
    if count == 10: break

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 512, but ``max_length`` is set to 512. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 512, but ``max_length`` is set to 512. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 512, but ``max_length`` is set to 512. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 512, but ``max_length`` is set to 512. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


[' on tactics and me on strategy, the two sides of the same coin. We\'d talk about the threats of global warming and the ineptitudes of our politicians, and the helpless circumstances that we surround ourselves within this world. We talked about how one day we would use the connections and skills we get from this job to assemble the perfect team: the tech support, the brawn, and of course us, the brains. \n\nThen, of course, my family died and I moved to Boston. It had been too painful. I had not kept up with any of my connections, and dove myself into work. Day in, day out. Helping the biggest companies in the world maximize their profits with my expertise and recommendations. Losing sight of what had been important to me.\n\n"Are you in?"\n\nIt\'s about time to make a change. \n\n"Yes. -S" I typed. From now on, John is Proxima and I am Sirius, and we are going to pull the biggest heists the world has ever seen.Writing Prompt: A old coworker you haven\'t heard from in years contacts y

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 512, but ``max_length`` is set to 512. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 512, but ``max_length`` is set to 512. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 512, but ``max_length`` is set to 512. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 512, but ``max_length`` is set to 512. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


['\n\n\'What?!\' He didn\'t know he could really feel shock until 3 days ago, yet here again he was shocked.\n\n"No... No, no, nonono! Napoleon Boneparty you can\'t leave me too!!"\n\nWhen Mary died, it was like the sun disappeared.\n\n"They were all killed, you can\'t go too! What am I supposed to do?!"\n\nNo answer came.\n\nOnly the sound of rain colliding with green leaves.\n\n"...Sir Boneparty, I\'ll place you at her right, I\'m sure she would\'ve loved that... You would too."\n\n...And now the stars had faded as well.\n\nL0435 looked at his right hand.\n\nNormally it would be shining, polished metal! Clean and pristine, so much that Mary\'s finger prints would leave a mark with even a simple touch.\n\nYet for the last 3 days, it hadn\'t been polished. A bit of dirt, and now fur. Surely her finger prints still lingered under all that.\n\nHe thought of Mary\'s father, who had collapsed right after her and her mother, mutering his last words \'Damn... them\'.\n\nWithout the sun and t

In [10]:
## Upload model to huggingface hub
supervised_model.push_to_hub(tokenizer_name + "_supervised_model_final", use_temp_dir=True)

Cloning https://huggingface.co/anshr/distilgpt2_supervised_model_final into local empty directory.


Upload file pytorch_model.bin:   0%|          | 3.34k/318M [00:00<?, ?B/s]

To https://huggingface.co/anshr/distilgpt2_supervised_model_final
   f28a151..d6bb395  main -> main



'https://huggingface.co/anshr/distilgpt2_supervised_model_final/commit/d6bb395ebca39b1f9ae02a25778c5983f5a73089'

## Results and Discussion
Overall, this step was pretty straightforward and should provide us a good supervised baseline to apply RL on top of.