# Fine-Tuning the GPT-2 Model 

This notebook explores the GPT-2 model released by OpenAI on February 14th, 2019. This blog post 'Better Language Models and Their Implications' https://openai.com/blog/better-language-models/, discusses the project.

Due to concerns over the effectiveness of their model (this reasoning bears questioning), they have only released a smaller version (117M parameters vs 1.5B parameters) of their larger model. See their reasoning [here](https://openai.com/blog/better-language-models/#releasestrategy). They also did not release any of the training software they used to produce the full GPT-2 model.

Data: The GPT-2 model was trained on a corpus of 8 million webpages (40GB of text) scraped from outgoing reddit links with Karma > 3. This if a form of human/collaborative filtering, and supposed indicator of "quality" of the outgoing links. See their data curation approach [here](https://openai.com/blog/better-language-models/#fn1).  

Paper: [Language Models are Unsupervised Task Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)

Code: https://github.com/openai/gpt-2

## Fine-Tuning vs. Retraining

Training a 1.5B parameter model requires an enormous corpus of text (8M webpages yielding 40GB of plain text). Even with a reduced 117M parameter GPT-2, training this architecture on a small dataset (where data is smaller than the number of parameters) reduces the networks ability to generalize and likely results in overfitting. Fine-tuning a network means taking that high-parameter, pretrained network, and continuing training (via backpropagation) with our new, smaller dataset. The caveat here is that the new dataset needs to resemble/be similar in kind to the original training data.

Even if we wanted to train GPT-2 architecture from scratch, OpenAI has not provided access to the training data or training code.

The example below wraps up the gpt-2 fine-tuning process in a very simple interface, installed via pip.

In [None]:
!pip install gpt-2-simple --user

In [None]:
import gpt_2_simple as gpt2

model_name = "124M"
gpt2.download_gpt2(model_name=model_name)   # model is saved into current directory under /models/124M/

sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              'script.txt',
              model_name=model_name,
              steps=300)#1000)   # steps is max number of training steps

gpt2.generate(sess)

## Activities

- Explore different finetuning texts (of your choice) and their results.
  - Observe how the size of the text (num characters) relate to the number of parameters of the GPT-2 model you are working with.
  - Observe the loss 
  - Look for over-fitting (loss at or near 0)
- Try the other sizes of published models, how do they change the complexity/intelligibility of generated text. gpt_2_simple wraps the "small" 124M and "medium" 355M models.

## Reference
- Example is from [https://github.com/minimaxir/gpt-2-simple](https://github.com/minimaxir/gpt-2-simple). 