<a href="https://colab.research.google.com/github/vanderbilt-data-science/ai_summer/blob/main/4_2_ift_rlhf_solns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF)
> What happens after pre-training?

**Lesson objectives:** In this lesson, you will:
* Gain a high-level understanding of the total (known) steps of training large language models (LLMs) such as GPT3, Bard, etc.
* Gain conceptual understanding of SFT, Instruction Fine-Tuning (IFT), and RLHF
* Understand the requirements of SFT, IFT, and RLHF, including:
  * Data
  * Human effort
  * Code
  * Compute
* Implement components of SFT, IFT, and RLHF using open-source models on HF
* Gain a high-level view of optimizations that can help with training LLMs

Please feel free to contact me (Charreau Bell) or any VU DSI Team members for questions or updates/changes to the notebook!



In [None]:
! pip install -q transformers datasets trl xformers

# Training of LLMs: Conceptual

## Recalling past conversations
[What makes a dialog agent useful?](https://huggingface.co/blog/dialog-agents)

Well, what does it take to make a dialog agent useful?
* Pre-training and pre-training data
* Human intervention to describe/annotate preferences
* Supervised Fine-Tuning and Instruction Fine-Tuning
* Reinforcement Learning from Human Feedback

## Steps of LLM Training

0. Pre-training
1. Supervised Fine-Tuning (SFT)  
  a) Instruction Fine-Tuning (IFT)  
  b) Supervised Fine-Tuning (Safety)  
2. Training Reward/Preference Model
3. Reinforcement Learning from Human Feedback (RLHF)

<center>
<img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/blog/133_trl_peft/openai-diagram.png" alt="Girl in a jacket" width="800">
</center>

<a href="https://arxiv.org/pdf/2203.02155.pdf">Source: Training language models to follow instructions with human feedback, OpenAI</a>


### Pre-training
<center>
<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter2/transformer_and_head.svg" alt="Pre-training" width="800">
</center>

<p>Source: <a href="https://huggingface.co/learn/nlp-course/chapter2/2?fw=pt#model-heads-making-sense-out-of-numbers">Huggingface NLP Course, Chapter 2.2: Behind the Pipeline</a></p>

Also see: [Improving Language Understanding
by Generative Pre-Training, OpenAI](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)

***


### Supervised and Instruction Fine-Tuning
* [Self-instruct (Alpaca)](https://arxiv.org/abs/2212.10560)  
* [Unnatural Instructions](https://huggingface.co/datasets/mrm8488/unnatural-instructions)

***


### RLHF
<center>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/rlhf/rlhf.png" alt="RLHF" width="600">
</center>

* [Huggingface: Illustrated RLHF](https://huggingface.co/blog/rlhf)

***


## Putting it all together: StackLlama
[A demo from Huggingface](https://huggingface.co/blog/stackllama)
***


# Training of LLMs: Code

## Pre-training
We can use an existing pre-trained model (off-the-shelf) to start with, or pretrain our model some more.

**Data**:
* Your own
* Huggingface Datasets
* [RedPajama, open source LLama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)

**Model**:
* Your own or some custom model architecture (may not be friendly with HF API)
* Huggingface Models

**Tooling**:
* Framework (Huggingface, PyTorch, TF)

In [None]:
#imports for loading data
from datasets import Dataset, load_dataset
import os

#imports for training
from transformers import AutoModelForCausalLM, AutoTokenizer, DataCollatorForLanguageModeling, TrainingArguments, Trainer

#imports for evaluation(ish)
from transformers import pipeline

In [None]:
#create/download pre-trained models
gpt2_tokenizer = AutoTokenizer.from_pretrained('gpt2')
gpt2 = AutoModelForCausalLM.from_pretrained('gpt2')

#avoid annoying errors
gpt2_tokenizer.pad_token = gpt2_tokenizer.eos_token
gpt2.config.pad_token_id = gpt2.config.eos_token_id

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

### Load custom data

In [None]:
# download the data using command line tools
!curl -Os http://qwone.com/~jason/20Newsgroups/20news-18828.tar.gz
!tar -zxf 20news-18828.tar.gz

In [None]:
# make dataset of interest
base_path = '20news-18828/sci.electronics/'
data_files = os.listdir(base_path)
text = []

# read all texts
for textfile in data_files:
  with open(''.join([base_path, textfile]), 'r', encoding='cp1252') as f: 
    text.append(f.read())

# make dataset
hf_ds = Dataset.from_dict({'files':data_files, 'text':text})

# split data
hf_ds = hf_ds.train_test_split(train_size=0.8)
hf_ds

DatasetDict({
    train: Dataset({
        features: ['files', 'text'],
        num_rows: 784
    })
    test: Dataset({
        features: ['files', 'text'],
        num_rows: 197
    })
})

In [None]:
# check
hf_ds['train'][3]

{'files': '54163',
 'text': "From: mmoss@ic.sunysb.edu (Matthew D Moss)\nSubject: How do you build neural networks?\n\n\nSubject says it all, though I should specify that I'm looking for solutions\nthat DON'T require me purchasing specific chips, etc....\n\nIn other words, is there some sort of neural network circuit I could build\nafter a visit to a local R-Shack?\n-- \n+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\n| Matthew David Moss\t\t| Blessed are the pure in heart, for they     |\n| INTERNET: mmoss@ic.sunysb.edu\t| will see God.                               |\n| BITNET  : mmoss@sbccmail\t|                                 Matthew 5:8 |\n"}

### Prepare for training

In [None]:
# tokenize
def prepare_inputs(example):
  return gpt2_tokenizer(example["text"], truncation=True)

token_ds = hf_ds.map(prepare_inputs, batched=True, remove_columns='files')

Map:   0%|          | 0/784 [00:00<?, ? examples/s]

Map:   0%|          | 0/197 [00:00<?, ? examples/s]

In [None]:
# make collator
data_collator = DataCollatorForLanguageModeling(tokenizer=gpt2_tokenizer, mlm=False)

# training arguments
training_args = TrainingArguments(output_dir = "gpt2-email", #where the model should be saved
                                  logging_strategy = "steps",
                                  evaluation_strategy="steps", #how often to evaluate performance
                                  save_strategy='steps',
                                  logging_steps = 100,
                                  eval_steps = 100, 
                                  save_steps = 100,
                                  load_best_model_at_end = True,
                                  per_device_train_batch_size = 4,
                                  per_device_eval_batch_size = 4,
                                  num_train_epochs=3,
                                  push_to_hub=False, #whether or not to push to hub
                                  hub_strategy='checkpoint',
                                  report_to='all')

### Train

In [None]:
#set data and functionality for trainer
gpt2 = AutoModelForCausalLM.from_pretrained('gpt2')
gpt2.config.pad_token_id = gpt2.config.eos_token_id

trainer = Trainer(model=gpt2,
                  args=training_args,
                  tokenizer=gpt2_tokenizer,
                  data_collator=data_collator,
                  train_dataset=token_ds['train'],
                  eval_dataset=token_ds['test']
                  )

trainer.train()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
100,3.5145,3.11046
200,3.2019,2.97894
300,2.9968,2.912373
400,2.9391,2.876891
500,2.8537,2.856033


TrainOutput(global_step=588, training_loss=3.057577016402264, metrics={'train_runtime': 138.1735, 'train_samples_per_second': 17.022, 'train_steps_per_second': 4.256, 'total_flos': 731881022976000.0, 'train_loss': 3.057577016402264, 'epoch': 3.0})

## Supervised Fine-Tuning
What tools and materials do we need for SFT and IFT?

**Data**:
* Some instructions and/or safety fine-tuning data, or other fine-tuning data*
* [Instruction fine-tuning datasets (Alpaca)](https://huggingface.co/datasets/tatsu-lab/alpaca)

**Pre-trained model**

**Tooling**:
* Standard Huggingface transformer API, or 
*[Upcoming: HF SFT](https://huggingface.co/docs/trl/main/en/sft_trainer#trl.SFTTrainer)


In [None]:
#load dataset and split for ease
ift_ds = load_dataset('tatsu-lab/alpaca')['train']
ift_ds = ift_ds.train_test_split(train_size=0.8)

#tokenize
ift_token = ift_ds.map(prepare_inputs, batched=True, remove_columns=['instruction', 'input', 'output'])



  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/41601 [00:00<?, ? examples/s]

Map:   0%|          | 0/10401 [00:00<?, ? examples/s]

In [None]:
# training arguments
training_args = TrainingArguments(output_dir = "gpt2-email-sft", #where the model should be saved
                                  logging_strategy = "steps",
                                  evaluation_strategy="steps", #how often to evaluate performance
                                  save_strategy='steps',
                                  logging_steps = 500,
                                  eval_steps = 500, 
                                  save_steps = 500,
                                  load_best_model_at_end = True,
                                  per_device_train_batch_size = 10,
                                  per_device_eval_batch_size = 10,
                                  num_train_epochs=1,
                                  push_to_hub=False, #whether or not to push to hub
                                  hub_strategy='checkpoint',
                                  report_to='all')

trainer = Trainer(model=gpt2,
                  args=training_args,
                  tokenizer=gpt2_tokenizer,
                  data_collator=data_collator,
                  train_dataset=ift_token['train'],
                  eval_dataset=ift_token['test']
                  )

trainer.train()



Step,Training Loss,Validation Loss
500,1.7413,1.567494
1000,1.625,1.530314
1500,1.5943,1.50923
2000,1.5824,1.492622
2500,1.5645,1.483172
3000,1.5535,1.472779
3500,1.5403,1.466491
4000,1.5276,1.462877


TrainOutput(global_step=4161, training_loss=1.588906726135826, metrics={'train_runtime': 1024.0112, 'train_samples_per_second': 40.626, 'train_steps_per_second': 4.063, 'total_flos': 4684870190592000.0, 'train_loss': 1.588906726135826, 'epoch': 1.0})

## Quick side quest: evaluating and comparing models
* Is the model better...practically?

In [None]:
# Create test phrase
test_phrase = """I am writing a college paper on the history of world exploration.
Please write a 100-word introductory paragraph on this topic."""

In [None]:
# Create pipelines to try for ease of use
init_gpt = pipeline('text-generation', 'gpt2', device='cuda:0')
ft_gpt = pipeline('text-generation', 'gpt2-email/checkpoint-500', device='cuda:0')
sft_gpt = pipeline('text-generation', 'gpt2-email-sft/checkpoint-4000', device='cuda:0')

#get responses
responses = [mdl(test_phrase, max_new_tokens=150) for mdl in [init_gpt, ft_gpt, sft_gpt]]


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
for ind, model_name in enumerate(['Init gpt', 'Fine-tuned gpt', 'Instruction Fine-tuned gpt']):
  print(model_name, ':\n', responses[ind][0]['generated_text'], '\n\n*************\n')


Init gpt :
 I am writing a college paper on the history of world exploration.
Please write a 100-word introductory paragraph on this topic.
The paper you received may inspire others at the same time
(and may also, if applicable, help you understand why)
If you like it please write (at your very first thought) a letter as this is so appreciated by the university (and everyone else at your school), so that many of you will see this as a really, really important thing.
It is important to remember this is not a course, it is a conversation. If it gets in your way, you do not have the time or interest to respond fully.
No matter what your point of view is, it doesn't matter how old or old your teacher really is, because if you can't defend a position or offer support, then you cannot defend your 

*************

Fine-tuned gpt :
 I am writing a college paper on the history of world exploration.
Please write a 100-word introductory paragraph on this topic.
Please share the experience of sci-

## RLHF
What tools and materials do we need to have or need to create to implement RLHF?

**Data**:
* Prompts and answers with numeric reward associated with them

**Models**:
* To create or obtain: preference model (PM)/reward model(RM)
* SFT/IFT model

**Tooling**:
* [Huggingface TRL library](https://huggingface.co/docs/trl/index)

### Demo notebook for RLHF
* [Demo code](https://huggingface.co/docs/trl/sentiment_tuning)

***


# Congratulations!!
You should now have a high-level view of the world of training LLMs and some resources to help you do so. You should be able to state:
1. What are the common steps of training LLMs currently?
2. What happens during each of the common steps of training LLMs?
3. What tools are available to me to help me train my own model?
4. About how much data do I need for each step?
5. What human resources are needed to make this possible?

***



# Homework
You should read through:
* [Training language models to follow instructions with human feedback, OpenAI](https://arxiv.org/pdf/2203.02155.pdf), as well as their blog entry [here](https://openai.com/research/instruction-following). You should look for information including:
  - Exactly how much data was used/how many examples?
  - To what extent were human resources used in the training and how?
  - In addition to the _What Makes a Dialog Agent Useful_ blog above, how is safety implemented in OpenAI models? What about other models? What kinds of safety will you need for your application?
  - What types of RLHF preference models can I use?

Read through the other resources provided, specifically:
* The OpenAI pre-training paper
* The RHLF blog post
* Either the Alpaca (Instruction Fine-Tuning) paper linked above or [The Alpaca Blog post](https://crfm.stanford.edu/2023/03/13/alpaca.html)

Especially consider:
* Which steps do I or do I not need in this process for my application?
* How should I evaluate my model?
* What kind of data will I need?