# Fine-Tuning GPT2 on Colab GPU… For Free!

This is a colab notebook for the [associated Medium article](https://medium.com/p/340468c92ed)

## Installing Dependencies
We would run pip3 install transformers normally in Bash, but because this is in Colab, we have to run it with !

In [1]:
!pip3 install git+https://github.com/huggingface/transformers

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-pq6ze5jv
  Running command git clone -q https://github.com/huggingface/transformers /tmp/pip-req-build-pq6ze5jv
Collecting tokenizers==0.8.1.rc2
[?25l  Downloading https://files.pythonhosted.org/packages/80/83/8b9fccb9e48eeb575ee19179e2bdde0ee9a1904f97de5f02d19016b8804f/tokenizers-0.8.1rc2-cp36-cp36m-manylinux1_x86_64.whl (3.0MB)
[K     |████████████████████████████████| 3.0MB 6.7MB/s 
Collecting sentencepiece!=0.1.92
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 36.7MB/s 
[?25hCollecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K  

## Getting WikiText Data

You can read more about WikiText data here. Overall, there's WikiText-2 and WikiText-103. We're going to use WikiText-2 because it's smaller, and we have limits in terms of how long we can run on GPU, and how much data we can load into memory in Colab. To download and run

In [2]:
%%bash
wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip
unzip wikitext-2-raw-v1.zip

Archive:  wikitext-2-raw-v1.zip
   creating: wikitext-2-raw/
  inflating: wikitext-2-raw/wiki.test.raw  
  inflating: wikitext-2-raw/wiki.valid.raw  
  inflating: wikitext-2-raw/wiki.train.raw  


--2020-09-16 10:19:57--  https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.139.21
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.139.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4721645 (4.5M) [application/zip]
Saving to: ‘wikitext-2-raw-v1.zip’

     0K .......... .......... .......... .......... ..........  1%  303K 15s
    50K .......... .......... .......... .......... ..........  2%  607K 11s
   100K .......... .......... .......... .......... ..........  3% 93.9M 7s
   150K .......... .......... .......... .......... ..........  4%  124M 5s
   200K .......... .......... .......... .......... ..........  5%  610K 6s
   250K .......... .......... .......... .......... ..........  6% 76.4M 5s
   300K .......... .......... .......... .......... ..........  7% 95.7M 4s
   350K .......... .......... .......... .......... ..........  8% 94.1M 3s
   400K ..........

## Fine-Tuning GPT2

HuggingFace actually provides a script to help fine tune models here. We can just download the script by running

In [3]:
! wget https://raw.githubusercontent.com/huggingface/transformers/master/examples/language-modeling/run_language_modeling.py try a different branch

--2020-09-16 10:19:58--  https://raw.githubusercontent.com/huggingface/transformers/master/examples/language-modeling/run_language_modeling.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11434 (11K) [text/plain]
Saving to: ‘run_language_modeling.py’


2020-09-16 10:19:58 (83.9 MB/s) - ‘run_language_modeling.py’ saved [11434/11434]

--2020-09-16 10:19:58--  http://try/
Resolving try (try)... failed: Name or service not known.
wget: unable to resolve host address ‘try’
--2020-09-16 10:19:58--  http://a/
Resolving a (a)... failed: Name or service not known.
wget: unable to resolve host address ‘a’
--2020-09-16 10:19:58--  http://different/
Resolving different (different)... failed: Name or service not known.
wget: unable to resolve host address ‘different’
--202

Now we are ready to fine tune.

There are many parameters to the script, and you can understand them by reading the manual. I'm just going to go over the important ones for basic training.

- `output_dir` is where the model will be output
- `model_type` is what model you want to use. In our case, it's gpt2 
- `model_name_or_path` is the path to the model. If you want to train from scratch, you can leave this blank. In our case, it's also gpt2 
- `do_train` tells it to train
- `train_data_file` points to the training file
- `do_eval` tells it to evaluate afterwards. Not always required, but good to have
- `eval_data_file` points to the evaluation file

Some extra ones you MAY care about, but you can also skip this.
- `save_steps` is when to save checkpoints. If you have limited memory, you can set this to -1 so it'll skip saving until the end
- `per_gpu_train_batch_size` is batch size for GPU. You can increase this if your GPU has enough memory. To be safe, you can start with 1 and ramp it up if you still have memory
- `num_train_epochs` is the number of epochs to train. Since we're fine-tuning, I'm going to set this to 2


In [4]:
%%bash
export TRAIN_FILE=wikitext-2-raw/wiki.train.raw
export TEST_FILE=wikitext-2-raw/wiki.test.raw
export MODEL_NAME=gpt2
export OUTPUT_DIR=output

python3 run_language_modeling.py \
    --output_dir=$OUTPUT_DIR \
    --model_type=$MODEL_NAME \
    --model_name_or_path=$MODEL_NAME \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE \
    --per_gpu_train_batch_size=1 \
    --save_steps=-1 \
    --num_train_epochs=2

{'loss': 3.30689208984375, 'learning_rate': 4.470563320626853e-05, 'epoch': 0.2117746717492588, 'total_flos': 382279090176000, 'step': 500}
{'loss': 3.1671357421875, 'learning_rate': 3.9411266412537063e-05, 'epoch': 0.4235493434985176, 'total_flos': 764558180352000, 'step': 1000}
{'loss': 3.14716064453125, 'learning_rate': 3.4116899618805594e-05, 'epoch': 0.6353240152477764, 'total_flos': 1146837270528000, 'step': 1500}
{'loss': 3.1381865234375, 'learning_rate': 2.882253282507412e-05, 'epoch': 0.8470986869970352, 'total_flos': 1529116360704000, 'step': 2000}
{'loss': 3.085134765625, 'learning_rate': 2.352816603134265e-05, 'epoch': 1.058873358746294, 'total_flos': 1911395450880000, 'step': 2500}
{'loss': 2.962466796875, 'learning_rate': 1.8233799237611182e-05, 'epoch': 1.2706480304955527, 'total_flos': 2293674541056000, 'step': 3000}
{'loss': 2.940912109375, 'learning_rate': 1.2939432443879713e-05, 'epoch': 1.4824227022448115, 'total_flos': 2675953631232000, 'step': 3500}
{'loss': 2.937

2020-09-16 10:20:03.087803: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
09/16/2020 10:20:05 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='output', overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=False, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, per_gpu_train_batch_size=1, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=2.0, max_steps=-1, warmup_steps=0, logging_dir='runs/Sep16_10-20-05_0a51eeb406a9', logging_first_step=False, logging_steps=500, save_steps=-1, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval

## Results

To use it, you can run something like

In [6]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
import numpy as np

OUTPUT_DIR = "./output"
device = 'cpu'
if torch.cuda.is_available():
    device = 'cuda'

tokenizer = GPT2Tokenizer.from_pretrained(OUTPUT_DIR)
model = GPT2LMHeadModel.from_pretrained(OUTPUT_DIR)
model = model.to(device)
                                        
def generate(input_str, length=250, n=5):
  cur_ids = torch.tensor(tokenizer.encode(input_str)).unsqueeze(0).long().to(device)
  model.eval()
  with torch.no_grad():
    for i in range(length):
      outputs = model(cur_ids[:, -1024:], labels=cur_ids[:, -1024:])
      loss, logits = outputs[:2]
      softmax_logits = torch.softmax(logits[0,-1], dim=0)
      next_token_id = choose_from_top(softmax_logits.to('cpu').numpy(), n=n)
      cur_ids = torch.cat([cur_ids, torch.ones((1,1)).long().to(device) * next_token_id], dim=1)
    output_list = list(cur_ids.squeeze().to('cpu').numpy())
    output_text = tokenizer.decode(output_list)
    return output_text

def choose_from_top(probs, n=5):
    ind = np.argpartition(probs, -n)[-n:]
    top_prob = probs[ind]
    top_prob = top_prob / np.sum(top_prob) # Normalize
    choice = np.random.choice(n, 1, p = top_prob)
    token_id = ind[choice][0]
    return int(token_id)

generated_text = generate(" = Toronto Raptors = \n")
print(generated_text)

 = Toronto Raptors = 
 
 
 = = Background = = = 
 
 Toronto's first team was formed on July 1, 1993, by the Toronto Maple Leafs, who had won the 1995 – 96 season championship. The team was coached by John MacLeod, who had previously worked at the New York Rangers. 
 
 
 = John MacLeod = 
 
 John MacLeod was born on May 16, 1921, in Bury, Surrey, Canada. He was the youngest of four children. His father, John MacLeod, served in World War I as a soldier in the Canadian Pacific Railway. John was a member of the Royal Newfoundland Regiment, and the Canadian Light Horse Regiment, and a member of the Royal Canadian Mounted Police. He graduated from the Royal Military College of Canada, and was awarded a Bachelor of Letters, a Bachelor of Letters, and a Bachelor of Fine Arts, and a Master of Fine Arts degree. 
 
 = = History = = 
 
 In 1918, John attended the Royal Military College of Canada, where he studied for his bachelor of letters degree, which he earned in 1919, and then earned a Bachel

output			  runs	       wikitext-2-raw
run_language_modeling.py  sample_data  wikitext-2-raw-v1.zip


## Compressing/Zipping Model

In order for us to preserve this model, we should compress it and save it somewhere. This can be done easily with

In [None]:
! tar -czf gpt2-tuned.tar.gz output/

which creates a file called `gpt2-tuned.tar.gz`

## Saving it to Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Now you can copy your output model to your Google Drive by running

In [None]:
!cp gpt2-tuned.tar.gz /content/drive/My\ Drive/