# Notebook found in this link: https://minimaxir.com/2019/09/howto-gpt2/
# How To Make Custom AI-Generated Text With GPT-2

#  Train a GPT-2 Text-Generating Model w/ GPU For Free 

by [Max Woolf](http://minimaxir.com)

*Last updated: November 10th, 2019*

Retrain an advanced text generating neural network on any text dataset **for free on a GPU using Collaboratory** using `gpt-2-simple`!

For more about `gpt-2-simple`, you can visit [this GitHub repository](https://github.com/minimaxir/gpt-2-simple). You can also read my [blog post](https://minimaxir.com/2019/09/howto-gpt2/) for more information how to use this notebook!


In [1]:
%tensorflow_version 1.x
!pip install -q gpt-2-simple
import gpt_2_simple as gpt2
from datetime import datetime

TensorFlow 1.x selected.
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



In [2]:
import os
import pickle
import shutil
from google.colab import drive
drive.flush_and_unmount()
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
WORK_DIR = r'/content/drive/My Drive/ThesisStoryGen/Data/StoryGen/GPT2_1/'
DATA_IN_DIR = r'/content/drive/My Drive/ThesisStoryGen/Data/StoryGen/CBT/'

In [4]:
!pwd

/content


In [5]:
!ls

cbtest_CN_test_2500ex_OUT.txt	cbt_train_OUT.txt
cbtest_CN_valid_2000ex_OUT.txt	cbt_valid_OUT.txt
cbtest_NE_test_2500ex_OUT.txt	checkpoint
cbtest_NE_valid_2000ex_OUT.txt	checkpoint_run2.tar
cbtest_P_test_2500ex_OUT.txt	drive
cbtest_P_valid_2000ex_OUT.txt	models
cbtest_V_test_2500ex_OUT.txt	sample_data
cbtest_V_valid_2000ex_OUT.txt	samples
cbt_test_OUT.txt


In [6]:
os.listdir(DATA_IN_DIR)

['cbtest_CN_test_2500ex_OUT.txt',
 'cbtest_CN_train_OUT.txt',
 'cbtest_CN_valid_2000ex_OUT.txt',
 'cbtest_NE_test_2500ex_OUT.txt',
 'cbtest_NE_train_OUT.txt',
 'cbtest_NE_valid_2000ex_OUT.txt',
 'cbtest_P_test_2500ex_OUT.txt',
 'cbtest_P_train_OUT.txt',
 'cbtest_P_valid_2000ex_OUT.txt',
 'cbtest_V_test_2500ex_OUT.txt',
 'cbtest_V_train_OUT.txt',
 'cbtest_V_valid_2000ex_OUT.txt',
 'cbt_test_OUT.txt',
 'cbt_train_OUT.txt',
 'cbt_valid_OUT.txt']

## GPU

Colaboratory uses either a Nvidia T4 GPU or an Nvidia K80 GPU. The T4 is slightly faster than the old K80 for training GPT-2, and has more memory allowing you to train the larger GPT-2 models and generate more text.

You can verify which GPU is active by running the cell below.

In [7]:
!nvidia-smi

Mon Oct 26 12:47:16 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   73C    P8    12W /  70W |      0MiB / 15079MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Downloading GPT-2 - FRESH

If you're retraining a model on new text, you need to download the GPT-2 model first. 

There are three released sizes of GPT-2:

* `124M` (default): the "small" model, 500MB on disk.
* `355M`: the "medium" model, 1.5GB on disk.
* `774M`: the "large" model, cannot currently be finetuned with Colaboratory but can be used to generate text from the pretrained model (see later in Notebook)
* `1558M`: the "extra large", true model. Will not work if a K80 GPU is attached to the notebook. (like `774M`, it cannot be finetuned).

Larger models have more knowledge, but take longer to finetune and longer to generate text. You can specify which base model to use by changing `model_name` in the cells below.

The next cell downloads it from Google Cloud Storage and saves it in the Colaboratory VM at `/models/<model_name>`.

This model isn't permanently saved in the Colaboratory VM; you'll have to redownload it if you want to retrain it at a later time.

In [7]:
gpt2.download_gpt2(model_name="355M")

Fetching checkpoint: 1.05Mit [00:00, 457Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 112Mit/s]                                                    
Fetching hparams.json: 1.05Mit [00:00, 561Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 1.42Git [00:08, 163Mit/s]                                  
Fetching model.ckpt.index: 1.05Mit [00:00, 368Mit/s]                                                
Fetching model.ckpt.meta: 1.05Mit [00:00, 101Mit/s]                                                 
Fetching vocab.bpe: 1.05Mit [00:00, 161Mit/s]                                                       


## Re-Load a Trained Model Checkpoint

Running the next cell will copy the `.rar` checkpoint file from your Google Drive into the Colaboratory VM.

In [None]:
gpt2.copy_checkpoint_from_gdrive(run_name='run2')

In [None]:
!ls ./checkpoint/

run1


In [None]:
!ls ./checkpoint/run2/

checkpoint				     hparams.json
counter					     model-10000.data-00000-of-00001
encoder.json				     model-10000.index
events.out.tfevents.1603544264.838e6a004e1d  model-10000.meta
events.out.tfevents.1603545553.838e6a004e1d  vocab.bpe


Delete the checkpoint rar file from VM

In [None]:
!ls

checkpoint  checkpoint_run1.tar  drive	sample_data


In [None]:
!rm ./checkpoint_run1.tar

In [None]:
!ls

checkpoint  drive  sample_data


The next cell will allow you to load the retrained model checkpoint + metadata necessary to generate text.

**IMPORTANT NOTE:** If you want to rerun this cell, **restart the VM first** (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

In [None]:
#sess = gpt2.start_tf_sess()
#gpt2.load_gpt2(sess, run_name='run1')

Loading checkpoint checkpoint/run1/model-10000
INFO:tensorflow:Restoring parameters from checkpoint/run1/model-10000


## Setup input file

In [8]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbt_valid_OUT.txt"  ## File 1 , 1.2mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbt_test_OUT.txt"  ## File 2 , 1.5mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_V_valid_2000ex_OUT.txt"  ## File 3 , 4.3mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_V_test_2500ex_OUT.txt"  ## File 4 , 5.4mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_P_valid_2000ex_OUT.txt"  ## File 5 , 4.5mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_P_test_2500ex_OUT.txt"  ## File 6 , 5.7mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_NE_valid_2000ex_OUT.txt"  ## File 7 , 4.1mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_NE_test_2500ex_OUT.txt"  ## File 8 , 5.3mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_CN_valid_2000ex_OUT.txt"  ## File 9 , 4.4mb
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_CN_test_2500ex_OUT.txt"  ## File 10 , 5.8mb
FILE_FOR_TRAINING = DATA_IN_DIR + r"cbt_train_OUT.txt"  ## File 11 , 25.2mb

print(os.path.exists(FILE_FOR_TRAINING))

True


In [9]:
!ls

cbtest_CN_test_2500ex_OUT.txt	cbtest_V_test_2500ex_OUT.txt   drive
cbtest_CN_valid_2000ex_OUT.txt	cbtest_V_valid_2000ex_OUT.txt  models
cbtest_NE_test_2500ex_OUT.txt	cbt_test_OUT.txt	       sample_data
cbtest_NE_valid_2000ex_OUT.txt	cbt_valid_OUT.txt	       samples
cbtest_P_test_2500ex_OUT.txt	checkpoint
cbtest_P_valid_2000ex_OUT.txt	checkpoint_run2.tar


Author recommends: If your text file is larger than 10MB, it is recommended to upload that file to Google Drive first, then copy that file from Google Drive to the Colaboratory VM.

In [10]:
try:
  shutil.copyfile(FILE_FOR_TRAINING, r'./' + os.path.basename(FILE_FOR_TRAINING))
  print(f"Copied {FILE_FOR_TRAINING}")
except Exception as e:
  print(f"ERROR: Message = {e}")

Copied /content/drive/My Drive/ThesisStoryGen/Data/StoryGen/CBT/cbt_train_OUT.txt


In [11]:
!ls

cbtest_CN_test_2500ex_OUT.txt	cbt_train_OUT.txt
cbtest_CN_valid_2000ex_OUT.txt	cbt_valid_OUT.txt
cbtest_NE_test_2500ex_OUT.txt	checkpoint
cbtest_NE_valid_2000ex_OUT.txt	checkpoint_run2.tar
cbtest_P_test_2500ex_OUT.txt	drive
cbtest_P_valid_2000ex_OUT.txt	models
cbtest_V_test_2500ex_OUT.txt	sample_data
cbtest_V_valid_2000ex_OUT.txt	samples
cbt_test_OUT.txt


## Finetune GPT-2 - SUBSEQUENT TRAINING RUNS

The next cell will start the actual finetuning of GPT-2. It creates a persistent TensorFlow session which stores the training config, then runs the training for the specified number of `steps`. (to have the finetuning run indefinitely, set `steps = -1`)

The model checkpoints will be saved in `/checkpoint/run1` by default. The checkpoints are saved every 500 steps (can be changed) and when the cell is stopped.

The training might time out after 4ish hours; make sure you end training and save the results so you don't lose them!

**IMPORTANT NOTE:** If you want to rerun this cell, **restart the VM first** (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

Other optional-but-helpful parameters for `gpt2.finetune`:


*  **`restore_from`**: Set to `fresh` to start training from the base GPT-2, or set to `latest` to restart training from an existing checkpoint.
* **`sample_every`**: Number of steps to print example output
* **`print_every`**: Number of steps to print training progress.
* **`learning_rate`**:  Learning rate for the training. (default `1e-4`, can lower to `1e-5` if you have <1MB input data)
*  **`run_name`**: subfolder within `checkpoint` to save the model. This is useful if you want to work with multiple models (will also need to specify  `run_name` when loading the model)
* **`overwrite`**: Set to `True` if you want to continue finetuning an existing model (w/ `restore_from='latest'`) without creating duplicate copies. 

In [9]:
file_name = os.path.basename(FILE_FOR_TRAINING)
file_name

'cbt_train_OUT.txt'

In [10]:
sess = gpt2.start_tf_sess()

## RUN 2 - FILE 2 - r"cbt_test_OUT.txt"  - File 2 , 1.5mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbt_test_OUT.txt"  ## File 2 , 1.5mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-1500
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-1500


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:01<00:00,  1.95s/it]


dataset has 373907 tokens
Training...
Saving checkpoint/run2/model-1500
Saving checkpoint/run2/model-1500
[1550 | 127.61] loss=2.25 avg=2.25
[1600 | 209.89] loss=2.51 avg=2.38
[1650 | 292.38] loss=2.92 avg=2.56
[1700 | 374.78] loss=2.95 avg=2.66
[1750 | 457.38] loss=2.80 avg=2.69
[1800 | 540.06] loss=2.25 avg=2.61
[1850 | 622.65] loss=1.98 avg=2.52
[1900 | 705.15] loss=1.09 avg=2.33
[1950 | 787.62] loss=3.28 avg=2.44
[2000 | 870.30] loss=1.30 avg=2.32
[2050 | 952.77] loss=1.02 avg=2.20
[2100 | 1035.39] loss=1.48 avg=2.14
[2150 | 1117.91] loss=2.30 avg=2.15
[2200 | 1200.56] loss=1.55 avg=2.10
[2250 | 1283.26] loss=2.42 avg=2.13
[2300 | 1365.81] loss=1.79 avg=2.10
[2350 | 1448.41] loss=1.10 avg=2.04
[2400 | 1530.95] loss=1.89 avg=2.03
[2450 | 1613.60] loss=1.30 avg=1.99
[2500 | 1696.25] loss=1.09 avg=1.94
[2550 | 1778.85] loss=0.88 avg=1.88
[2600 | 1861.46] loss=0.97 avg=1.84
[2650 | 1944.07] loss=0.93 avg=1.79
[2700 | 2026.60] loss=0.37 avg=1.73
[2750 | 2109.16] loss=0.41 avg=1.67
[2800

## RUN 2 - FILE 3 - r"cbtest_V_valid_2000ex_OUT.txt"  - File 3 , 4.3mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_V_valid_2000ex_OUT.txt"  ## File 3 , 4.3mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-3000
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-3000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:04<00:00,  4.62s/it]


dataset has 1081880 tokens
Training...
Saving checkpoint/run2/model-3000
Saving checkpoint/run2/model-3000
[3050 | 127.32] loss=1.31 avg=1.31
[3100 | 209.89] loss=1.30 avg=1.31
[3150 | 292.69] loss=1.01 avg=1.21
[3200 | 375.57] loss=1.11 avg=1.18
[3250 | 458.33] loss=0.61 avg=1.07
[3300 | 541.16] loss=1.01 avg=1.06
[3350 | 623.82] loss=0.35 avg=0.95
[3400 | 706.60] loss=0.21 avg=0.86
[3450 | 789.32] loss=0.25 avg=0.79
[3500 | 872.14] loss=0.87 avg=0.80
[3550 | 954.86] loss=0.50 avg=0.77
[3600 | 1037.74] loss=0.33 avg=0.73
[3650 | 1120.51] loss=0.51 avg=0.71
[3700 | 1203.39] loss=0.68 avg=0.71
[3750 | 1286.19] loss=0.52 avg=0.70
[3800 | 1368.84] loss=0.74 avg=0.70
[3850 | 1451.72] loss=0.47 avg=0.68
[3900 | 1534.41] loss=0.19 avg=0.65
[3950 | 1617.22] loss=0.22 avg=0.63
[4000 | 1699.94] loss=0.29 avg=0.61
[4050 | 1782.77] loss=0.25 avg=0.59
[4100 | 1865.57] loss=0.26 avg=0.57
[4150 | 1948.19] loss=0.39 avg=0.57
[4200 | 2030.92] loss=0.27 avg=0.55
 I said ? ''
The King started for the st

## RUN 2 - FILE 4 - r"cbtest_V_test_2500ex_OUT.txt"  - File 4 , 5.4mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_V_test_2500ex_OUT.txt"  ## File 4 , 5.4mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-4500
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-4500


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:06<00:00,  6.25s/it]


dataset has 1376928 tokens
Training...
Saving checkpoint/run2/model-4500
Saving checkpoint/run2/model-4500
[4550 | 127.29] loss=1.54 avg=1.54
[4600 | 209.78] loss=1.17 avg=1.36
[4650 | 292.43] loss=0.44 avg=1.05
[4700 | 375.02] loss=0.86 avg=1.00
[4750 | 457.78] loss=0.50 avg=0.90
[4800 | 540.48] loss=0.53 avg=0.83
[4850 | 623.12] loss=0.71 avg=0.82
[4900 | 705.86] loss=0.90 avg=0.83
[4950 | 788.48] loss=0.52 avg=0.79
[5000 | 871.29] loss=0.76 avg=0.79
[5050 | 954.00] loss=0.56 avg=0.77
[5100 | 1036.65] loss=0.40 avg=0.73
[5150 | 1119.29] loss=0.62 avg=0.72
[5200 | 1201.91] loss=0.49 avg=0.71
[5250 | 1284.67] loss=0.39 avg=0.68
[5300 | 1367.36] loss=0.83 avg=0.69
[5350 | 1450.12] loss=0.33 avg=0.67
[5400 | 1532.82] loss=0.34 avg=0.65
[5450 | 1615.49] loss=0.63 avg=0.65
[5500 | 1698.16] loss=0.24 avg=0.63
[5550 | 1780.87] loss=0.51 avg=0.62
[5600 | 1863.60] loss=0.73 avg=0.63
 had the most difficult matter quite easily managed.
` Well, I hope that, ' said he, ` we have all got so far th

## RUN 2 - FILE 5 - r"cbtest_P_valid_2000ex_OUT.txt"  - File 5 , 4.5mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_P_valid_2000ex_OUT.txt"  ## File 5 , 4.5mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-6000
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-6000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:05<00:00,  5.13s/it]


dataset has 1121806 tokens
Training...
Saving checkpoint/run2/model-6000
Saving checkpoint/run2/model-6000
[6050 | 124.86] loss=0.55 avg=0.55
[6100 | 207.37] loss=0.45 avg=0.50
[6150 | 290.05] loss=0.78 avg=0.59
[6200 | 372.61] loss=0.37 avg=0.54
[6250 | 455.21] loss=0.18 avg=0.47
[6300 | 537.83] loss=0.30 avg=0.44
[6350 | 620.57] loss=0.70 avg=0.48
[6400 | 703.10] loss=0.33 avg=0.46
[6450 | 785.61] loss=0.14 avg=0.42
[6500 | 868.22] loss=0.24 avg=0.40
[6550 | 950.90] loss=0.35 avg=0.40
[6600 | 1033.38] loss=0.15 avg=0.38
[6650 | 1115.90] loss=0.13 avg=0.35
[6700 | 1198.61] loss=0.24 avg=0.35
[6750 | 1280.99] loss=0.13 avg=0.33
[6800 | 1363.46] loss=0.11 avg=0.32
[6850 | 1446.10] loss=0.19 avg=0.31
[6900 | 1528.75] loss=0.18 avg=0.30
[6950 | 1611.48] loss=0.15 avg=0.29
[7000 | 1694.13] loss=0.17 avg=0.28
 swinto, it must be in the care of some fairy, who will reward us with the most precious things.
I am very thankful to the lady who took me in, for she knows me quite well, and has not

## RUN 2 - FILE 6 - r"cbtest_P_test_2500ex_OUT.txt"  - File 6 , 5.7mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_P_test_2500ex_OUT.txt"  ## File 6 , 5.7mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-7500
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-7500


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:06<00:00,  6.31s/it]


dataset has 1424360 tokens
Training...
Saving checkpoint/run2/model-7500
Saving checkpoint/run2/model-7500
[7550 | 125.45] loss=0.61 avg=0.61
[7600 | 208.00] loss=0.29 avg=0.45
[7650 | 290.59] loss=0.37 avg=0.42
[7700 | 373.17] loss=0.32 avg=0.39
[7750 | 455.87] loss=0.15 avg=0.34
[7800 | 538.45] loss=0.48 avg=0.37
[7850 | 621.10] loss=0.22 avg=0.35
[7900 | 703.55] loss=0.25 avg=0.33
[7950 | 786.26] loss=0.12 avg=0.31
[8000 | 868.89] loss=0.31 avg=0.31
[8050 | 951.61] loss=0.43 avg=0.32
[8100 | 1034.15] loss=0.24 avg=0.31
[8150 | 1116.71] loss=0.33 avg=0.32
[8200 | 1199.38] loss=0.20 avg=0.31
[8250 | 1281.98] loss=0.15 avg=0.30
[8300 | 1364.68] loss=0.85 avg=0.33
[8350 | 1447.31] loss=0.35 avg=0.33
[8400 | 1530.04] loss=0.30 avg=0.33
 own that great and terrible question.
`` Why, '' she said, a little flustered, `` I have never heard it mentioned in a romance before . ''
`` If anyone ever wrote a romance, '' said Uncle Abimelech, `` it must be me.
Why, it is almost a mystery and a roma

## RUN 2 - FILE 7 - r"cbtest_NE_valid_2000ex_OUT.txt"  - File 7 , 4.1mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_NE_valid_2000ex_OUT.txt"  ## File 7 , 4.1mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-9000
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-9000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:04<00:00,  4.77s/it]


dataset has 1050855 tokens
Training...
Saving checkpoint/run2/model-9000
Saving checkpoint/run2/model-9000
[9050 | 127.27] loss=0.43 avg=0.43
[9100 | 209.89] loss=0.31 avg=0.37
[9150 | 292.65] loss=0.30 avg=0.35
[9200 | 375.49] loss=0.39 avg=0.36
[9250 | 458.31] loss=0.26 avg=0.34
[9300 | 541.13] loss=0.27 avg=0.33
[9350 | 623.93] loss=0.43 avg=0.34
[9400 | 706.74] loss=0.14 avg=0.32
[9450 | 789.56] loss=0.27 avg=0.31
[9500 | 872.31] loss=0.31 avg=0.31
[9550 | 955.02] loss=0.21 avg=0.30
[9600 | 1037.70] loss=0.26 avg=0.30
[9650 | 1120.43] loss=0.22 avg=0.29
[9700 | 1203.18] loss=0.19 avg=0.28
[9750 | 1285.85] loss=0.26 avg=0.28
[9800 | 1368.56] loss=0.10 avg=0.27
 of the old women he had so snubbed.
`` And did you take my bones in exchange for your black child ? ''
`` I did, '' said XXXXX .	Scarlet		Creamers|Dee|Ellis|Frewocks|Marr|Scarlet|XII|morning|right
But when they arrived at the Horn, the dance was already underway, and before the prince had lost sight of his prize, the snake wa

## RUN 2 - FILE 8 - r"cbtest_NE_test_2500ex_OUT.txt"  - File 8 , 5.3mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_NE_test_2500ex_OUT.txt"  ## File 8 , 5.3mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-10500
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-10500


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:06<00:00,  6.31s/it]


dataset has 1360345 tokens
Training...
Saving checkpoint/run2/model-10500
Saving checkpoint/run2/model-10500
[10550 | 125.17] loss=0.20 avg=0.20
[10600 | 207.73] loss=0.42 avg=0.31
[10650 | 290.35] loss=0.70 avg=0.44
[10700 | 373.15] loss=0.25 avg=0.40
[10750 | 456.00] loss=0.33 avg=0.38
[10800 | 538.80] loss=0.46 avg=0.40
[10850 | 621.63] loss=0.24 avg=0.37
[10900 | 704.32] loss=0.30 avg=0.36
[10950 | 787.09] loss=0.27 avg=0.35
[11000 | 869.87] loss=0.14 avg=0.33
[11050 | 952.70] loss=0.56 avg=0.35
[11100 | 1035.45] loss=0.33 avg=0.35
[11150 | 1118.24] loss=0.28 avg=0.35
[11200 | 1200.97] loss=0.17 avg=0.33
 boy-girl, as they say of the Virgin Mary . ''
`` That is hardly fair to put Dee here, '' said Peter.
`` Ef the Chipmunk says those women have no business to meddle in church matters at all . ''
`` Who is Ef the Chipmunk ? ''
demanded Old Mr. Toad, as if he had been walking uncertainly along a street.
`` Who is Ef the XXXXX ?	Chipmunk		Akela|Peter|Toad|XVI|Toads|house|man-woman|str

  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:06<00:00,  6.31s/it]


dataset has 1360345 tokens
Training...
Saving checkpoint/run2/model-10500
Saving checkpoint/run2/model-10500
[10550 | 125.17] loss=0.20 avg=0.20
[10600 | 207.73] loss=0.42 avg=0.31
[10650 | 290.35] loss=0.70 avg=0.44
[10700 | 373.15] loss=0.25 avg=0.40
[10750 | 456.00] loss=0.33 avg=0.38
[10800 | 538.80] loss=0.46 avg=0.40
[10850 | 621.63] loss=0.24 avg=0.37
[10900 | 704.32] loss=0.30 avg=0.36
[10950 | 787.09] loss=0.27 avg=0.35
[11000 | 869.87] loss=0.14 avg=0.33
[11050 | 952.70] loss=0.56 avg=0.35
[11100 | 1035.45] loss=0.33 avg=0.35
[11150 | 1118.24] loss=0.28 avg=0.35
[11200 | 1200.97] loss=0.17 avg=0.33
 boy-girl, as they say of the Virgin Mary . ''
`` That is hardly fair to put Dee here, '' said Peter.
`` Ef the Chipmunk says those women have no business to meddle in church matters at all . ''
`` Who is Ef the Chipmunk ? ''
demanded Old Mr. Toad, as if he had been walking uncertainly along a street.
`` Who is Ef the XXXXX ?	Chipmunk		Akela|Peter|Toad|XVI|Toads|house|man-woman|str

## RUN 2 - FILE 9 - r"cbtest_CN_valid_2000ex_OUT.txt"  - File 9 , 4.4mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_CN_valid_2000ex_OUT.txt"  ## File 9 , 4.4mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-12000
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-12000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:04<00:00,  4.73s/it]


dataset has 1114512 tokens
Training...
Saving checkpoint/run2/model-12000
Saving checkpoint/run2/model-12000
[12050 | 125.30] loss=0.17 avg=0.17
[12100 | 208.11] loss=0.29 avg=0.23
[12150 | 290.92] loss=0.44 avg=0.30
[12200 | 373.60] loss=0.27 avg=0.29
[12250 | 456.18] loss=0.16 avg=0.27
[12300 | 538.88] loss=0.30 avg=0.27
[12350 | 621.73] loss=0.19 avg=0.26
[12400 | 704.38] loss=0.36 avg=0.27
[12450 | 787.09] loss=0.22 avg=0.27
[12500 | 869.78] loss=0.26 avg=0.27
[12550 | 952.46] loss=0.16 avg=0.26
[12600 | 1035.06] loss=0.28 avg=0.26
 seven bells ringing, `` Knock, knock!
Will you come with me?
You may sleep in a fir-tree out back, and I will take the sofy.
Good-bye, and do not forget me . ''
As the last words were sung, a tiny voice inside whispered, `` Perhaps he has some sort of XXXXX in his head . ''	thing		Good-bye|adventures|bottom|doorway|heart|king|nose|something|thought
One of the girls told him that it was the lamb which her uncle had caught, but when he came to look at the

## RUN 2 - FILE 10 - r"cbtest_CN_test_2500ex_OUT.txt"  - File 10 , 5.8mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbtest_CN_test_2500ex_OUT.txt"  ## File 10 , 5.8mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-13500
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-13500


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:06<00:00,  6.33s/it]


dataset has 1438608 tokens
Training...
Saving checkpoint/run2/model-13500
Saving checkpoint/run2/model-13500
[13550 | 124.47] loss=0.33 avg=0.33
[13600 | 206.99] loss=0.47 avg=0.40
[13650 | 289.62] loss=0.26 avg=0.35
[13700 | 372.21] loss=0.29 avg=0.34
[13750 | 454.83] loss=0.18 avg=0.30
[13800 | 537.40] loss=0.21 avg=0.29
[13850 | 620.12] loss=0.10 avg=0.26
[13900 | 702.56] loss=0.51 avg=0.29
[13950 | 784.92] loss=0.25 avg=0.29
[14000 | 867.52] loss=0.20 avg=0.28
 themselves was a great deal larger than mine.
The Moon kept growing larger and larger, till at last it fell out of the sky, and thereupon we both arose and walked out of the cave into the open air.
` The Sun was hot, and I could not keep it in the room, so I rose and walked out, and after walking some time I found my way to a fountain.
I drank, and felt very strange, and before long I heard the Sun asking for my hand again.
I drank another drop, and then I felt that my body was sound and my limbs strong.
I returned, lowered 

## RUN 2 - FILE 11 - first time - r"cbt_train_OUT.txt"  - File 11 , 25.2mb

In [14]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbt_train_OUT.txt"  ## File 11 , 25.2mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1000,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1000,
              save_every=500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-15000
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-15000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:27<00:00, 27.90s/it]


dataset has 6257758 tokens
Training...
Saving checkpoint/run2/model-15000
Saving checkpoint/run2/model-15000
ves, which I shall call Aunt Esther, '' he said.
Cousin Esther smiled broadly.
`` New brooms come in here, '' she said.
Rikki-tikki jumped as high up in the air as he could go, his head just out of the window, his eyes nearly to his ears.
Aunt Esther put her long hair up in bboy, and it was so luscious that people thought it was waxing and blowing.
There was a fresh breeze through the green pines and a rosy rose was visible around every bend of the little schooner.
It was ten o'clock when Rikki-tikki went down to the village after business, and they waited for him half an night and a half an XXXXX for the two tattered husks in the wood to settle .	day		Aunt|boys|corn|dress|family|girl|head|idea|manner|people
Then the Hunter did as the old woman had told him : he cut open the bird, found its heart, swallowed it, and took the cloak home with him.
The next morning when he awoke he 

## RUN 2 - FILE 11 - second time - r"cbt_train_OUT.txt"  - File 11 , 25.2mb

In [11]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbt_train_OUT.txt"  ## File 11 , 25.2mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1000,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1000,
              save_every=500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-16000
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-16000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:27<00:00, 27.25s/it]


dataset has 6257758 tokens
Training...
Saving checkpoint/run2/model-16000
Saving checkpoint/run2/model-16000
. He had an iron pan, and in it hung a basket full of gold and silver.
` So it shall be .'
The next morning the prince set his foot on board a ship, and was soon as good as well sailing on the water ; but he never thought of land till his return, when, taking advantage of his journey in the meantime, he happened to pick up the golden ball, and bring it back to the country of the great king his father.
But he found that he was a long way off, and had a great time to get his wife and a good wife for all, so as to prevent her being separated from him ; and she told him of her sorrow and longing.
` I shall have to bear the burden!
I shall be a poor lad at home .'
And her father told her how he had loved and loved her as well as a lion ; and she said, ` We must work together as men, and each of us will bring in his own way .'
So they worked together, and when the king asked the eldes

## RUN 2 - FILE 11 - third time - r"cbt_train_OUT.txt"  - File 11 , 25.2mb

In [None]:
#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbt_train_OUT.txt"  ## File 11 , 25.2mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1000,
              restore_from='latest', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              overwrite=True,
              run_name='run2',
              print_every=50,
              sample_every=1000,
              save_every=500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint checkpoint/run2/model-17000
INFO:tensorflow:Restoring parameters from checkpoint/run2/model-17000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:28<00:00, 28.48s/it]


dataset has 6257758 tokens
Training...
Saving checkpoint/run2/model-17000
Saving checkpoint/run2/model-17000
 foolishness about it ! ''
She was so anxious that it really did seem as if she would not be able to speak as she spoke.
This frightened Alice.
How could she hope that it would really do her good?
And how could it possibly do justice, as she sat listening with a feeling which would be hard to bear at a moment of deep thought?
The answer, however the matter came out, was to try and see if it was possible.
It was quite a relief to think that she would in time see and know and care for that child.
It seemed to Alice that she had to be very careful not to make a mistake and to try hard not to make too many.
She was very glad of it in a moment when she tried to look at her very closely -LRB- as she felt all of her fancy and curiosity betraying itself for a moment -RRB- and looked at her with some anxiety.
And how could she possibly be sure that there was nobody here?
And why, her ver

## Finetune GPT-2 - FIRST TIME TRAINING RUN

The next cell will start the actual finetuning of GPT-2. It creates a persistent TensorFlow session which stores the training config, then runs the training for the specified number of `steps`. (to have the finetuning run indefinitely, set `steps = -1`)

The model checkpoints will be saved in `/checkpoint/run1` by default. The checkpoints are saved every 500 steps (can be changed) and when the cell is stopped.

The training might time out after 4ish hours; make sure you end training and save the results so you don't lose them!

**IMPORTANT NOTE:** If you want to rerun this cell, **restart the VM first** (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

Other optional-but-helpful parameters for `gpt2.finetune`:


*  **`restore_from`**: Set to `fresh` to start training from the base GPT-2, or set to `latest` to restart training from an existing checkpoint.
* **`sample_every`**: Number of steps to print example output
* **`print_every`**: Number of steps to print training progress.
* **`learning_rate`**:  Learning rate for the training. (default `1e-4`, can lower to `1e-5` if you have <1MB input data)
*  **`run_name`**: subfolder within `checkpoint` to save the model. This is useful if you want to work with multiple models (will also need to specify  `run_name` when loading the model)
* **`overwrite`**: Set to `True` if you want to continue finetuning an existing model (w/ `restore_from='latest'`) without creating duplicate copies. 

In [12]:
file_name = os.path.basename(FILE_FOR_TRAINING)
file_name

'cbt_valid_OUT.txt'

## RUN 2 - FILE 1 - r"cbt_valid_OUT.txt" - File 1 , 1.2mb


In [13]:
sess = gpt2.start_tf_sess()

#FILE_FOR_TRAINING = DATA_IN_DIR + r"cbt_valid_OUT.txt"  ## File 1 , 1.2mb

gpt2.finetune(sess,
              dataset=file_name,
              model_name='355M',
              steps=1500,
              restore_from='fresh', # fresh to start from baseline GTP-2 , else latest to restarting training from existing checkpoint
              run_name='run2',
              print_every=50,
              sample_every=1400,
              save_every=1500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Please use tensorflow.python.ops.op_selector.get_backward_walk_ops.
Loading checkpoint models/355M/model.ckpt
INFO:tensorflow:Restoring parameters from models/355M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:01<00:00,  1.55s/it]


dataset has 288319 tokens
Training...
[50 | 87.98] loss=3.43 avg=3.43
[100 | 170.48] loss=2.40 avg=2.91
[150 | 253.02] loss=2.65 avg=2.83
[200 | 335.54] loss=2.57 avg=2.76
[250 | 418.12] loss=2.27 avg=2.66
[300 | 500.83] loss=1.26 avg=2.42
[350 | 583.52] loss=1.96 avg=2.35
[400 | 666.16] loss=1.50 avg=2.24
[450 | 748.72] loss=1.20 avg=2.12
[500 | 831.37] loss=2.64 avg=2.18
[550 | 914.03] loss=2.68 avg=2.22
[600 | 996.67] loss=1.80 avg=2.19
[650 | 1079.42] loss=0.95 avg=2.09
[700 | 1161.99] loss=1.22 avg=2.02
[750 | 1244.58] loss=1.68 avg=2.00
[800 | 1327.20] loss=1.89 avg=1.99
[850 | 1409.87] loss=0.59 avg=1.90
[900 | 1492.41] loss=0.55 avg=1.82
[950 | 1575.07] loss=1.68 avg=1.81
[1000 | 1657.75] loss=0.25 avg=1.72
[1050 | 1740.49] loss=0.37 avg=1.65
[1100 | 1823.20] loss=0.87 avg=1.61
[1150 | 1905.86] loss=0.31 avg=1.55
[1200 | 1988.39] loss=0.53 avg=1.50
[1250 | 2071.07] loss=0.98 avg=1.48
[1300 | 2153.80] loss=0.12 avg=1.42
[1350 | 2236.44] loss=0.13 avg=1.37
[1400 | 2318.98] loss=0

## Save checkpoint data to google drive from non-persistent VM

After the model is trained, you can copy the checkpoint folder to your own Google Drive.

If you want to download it to your personal computer, it's strongly recommended you copy it there first, then download from Google Drive. The checkpoint folder is copied as a `.rar` compressed file; you can download it and uncompress it locally.

In [12]:
gpt2.copy_checkpoint_to_gdrive(run_name='run2')
print(f"Saved")

Saved


You're done! Feel free to go to the **Generate Text From The Trained Model** section to generate text based on your retrained model.

## Load a Trained Model Checkpoint

Running the next cell will copy the `.rar` checkpoint file from your Google Drive into the Colaboratory VM.

In [None]:
gpt2.copy_checkpoint_from_gdrive(run_name='run2')

The next cell will allow you to load the retrained model checkpoint + metadata necessary to generate text.

**IMPORTANT NOTE:** If you want to rerun this cell, **restart the VM first** (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

In [None]:
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name='run2')

Loading checkpoint checkpoint/run1/model-10000
INFO:tensorflow:Restoring parameters from checkpoint/run1/model-10000


## Generate Text From The Trained Model

After you've trained the model or loaded a retrained model from checkpoint, you can now generate text. `generate` generates a single text from the loaded model.

In [None]:
gpt2.generate(sess, run_name='run2')

The most interesting part of what we are going to do is to get into the way of it, and then we can illustrate it better with pictures and figures, and then we can make it a reality.
So, if you want to know more about the way of life in the forest, or any other place, I think you will be interested in the following things.
First, there is a big living tree in the forest, and it is the ancient tree of which the story is told.
What does it do ? ''
` As the water runs, so the tree does, ' said the old tree, as loud as it could, ` and when it is wet it needs a root-stub, for it has not quite dry enough to hold on to .'
` And when the water is dry and the root-stub is firm, ' cried the boy.
` Then the tree will hold on to it, and you can dig it up and put it in the water and roll it in the mud.
If the water is cold you will feel cold inside, and that is not good.
The roots grow wetter the longer you are holding on, so do not lose the tree, but dig it up and lay it in the water and roll it in

In [None]:
gpt2.generate(sess,
              length=250,
              temperature=0.7,
              prefix="man cycling down road in park",
              nsamples=5,
              batch_size=5
              )

man cycling down road in park-like fashion.
The old man had gone to the enclosure rather for a stroll, and he wore a pair of spectacles and a long white frock, for he had come to look on the world through his spectacles.
The girls were just going to get their first love-letter when a voice said to them from the window : `` My deares, I am going to tell you at once that I have a great deal to tell you.
It is a rare thing for me to do, and not often I find it so agreeable.
I do not want to talk too long, for I shall be troubled if you do not tell me at once, for I shall never be able to trust you, and then if the world does not seem quite fair to me, and I am so lonely, I shall just be miserable and refuse to be happy at all -- '' He stopped suddenly, and her eyes filled with tears as she gazed sadly into the old man 's tall face.
`` Oh, father!
oh, father!
I am so sorry.
Is there really no way for me to get out of this prison ? ''
said the girl, who never had any heart, and was really
m

For bulk generation, you can generate a large amount of text to a file and sort out the samples locally on your computer. The next cell will generate a generated text file with a unique timestamp.

You can rerun the cells as many times as you want for even more generated texts!

In [None]:
gen_file = 'gpt2_gentext_{:%Y%m%d_%H%M%S}.txt'.format(datetime.utcnow())

gpt2.generate_to_file(sess,
                      destination_path=gen_file,
                      length=500,
                      temperature=0.7,
                      nsamples=100,
                      batch_size=20
                      )

In [None]:
# may have to run twice to get file to download
files.download(gen_file)

## Next section

## Generate Text From The Pretrained Model

If you want to generate text from the pretrained model, not a finetuned model, pass `model_name` to `gpt2.load_gpt2()` and `gpt2.generate()`.

This is currently the only way to generate text from the 774M or 1558M models with this notebook.

If you're creating an API based on your model and need to pass the generated text elsewhere, you can do `text = gpt2.generate(sess, return_as_list=True)[0]`

You can also pass in a `prefix` to the generate function to force the text to start with a given character sequence and generate text from there (good if you add an indicator when the text starts).

You can also generate multiple texts at a time by specifing `nsamples`. Unique to GPT-2, you can pass a `batch_size` to generate multiple samples in parallel, giving a massive speedup (in Colaboratory, set a maximum of 20 for `batch_size`).

Other optional-but-helpful parameters for `gpt2.generate` and friends:

*  **`length`**: Number of tokens to generate (default 1023, the maximum)
* **`temperature`**: The higher the temperature, the crazier the text (default 0.7, recommended to keep between 0.7 and 1.0)
* **`top_k`**: Limits the generated guesses to the top *k* guesses (default 0 which disables the behavior; if the generated output is super crazy, you may want to set `top_k=40`)
* **`top_p`**: Nucleus sampling: limits the generated guesses to a cumulative probability. (gets good results on a dataset with `top_p=0.9`)
* **`truncate`**: Truncates the input text until a given sequence, excluding that sequence (e.g. if `truncate='<|endoftext|>'`, the returned text will include everything before the first `<|endoftext|>`). It may be useful to combine this with a smaller `length` if the input texts are short.
*  **`include_prefix`**: If using `truncate` and `include_prefix=False`, the specified `prefix` will not be included in the returned text.

In [None]:
model_name = "774M"

gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.05Mit [00:00, 325Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 111Mit/s]                                                    
Fetching hparams.json: 1.05Mit [00:00, 243Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 3.10Git [00:46, 67.1Mit/s]                                 
Fetching model.ckpt.index: 1.05Mit [00:00, 187Mit/s]                                                
Fetching model.ckpt.meta: 2.10Mit [00:00, 119Mit/s]                                                 
Fetching vocab.bpe: 1.05Mit [00:00, 167Mit/s]                                                       


In [None]:
sess = gpt2.start_tf_sess()

gpt2.load_gpt2(sess, model_name=model_name)

Loading pretrained model models/774M/model.ckpt
INFO:tensorflow:Restoring parameters from models/774M/model.ckpt


In [None]:
seed_sents = [
              'man walking past tv in his house',
              'man riding bike in a park',
              'woman holding her hand bag']
#seed_sents = r'. '.join(seed_sents) + r'.'
#seed_sents

'man walking past tv in his house. man riding bike in a park. woman holding her hand bag.'

In [None]:
seed_sents = [
              'man riding bike in a park',
              'two dogs walking with owner',
              'children playing hop scotch']
seed_sents = [val + r'.' for val in seed_sents]
for seed in seed_sents:
  print(f"*****************************************************************\n********** Sent = <<{seed}>> **********\n")
  gpt2.generate(sess,
                model_name=model_name,
                prefix=seed,
                length=150,
                temperature=0.7,
                top_p=0.9,
                nsamples=20,
                batch_size=5
                )

*****************************************************************
********** Sent = <<man riding bike in a park.>> **********

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
man riding bike in a park.

In another case, an employee of a local school who was a suspect in a burglary was arrested on a charge of assault with a deadly weapon.

The alleged victim of the burglary was also arrested on a charge of assault with a deadly weapon.

The accused had been on the run since the crime.

In a third case, a man was arrested in connection with the robbery of a woman at gunpoint.

The man was also arrested on a charge of assault with a deadly weapon.<|endoftext|>THE WORLD'S LARGEST COMPUTER HACKING CONFERENCE

The World's Largest Computer Hacking Conference is a place where people from all over the world come to learn and share their
man riding bike in a park. The next morning, he was arrested, and the police were called to his house. There, the 

In [None]:
seed_sents = [
              'man walking past tv in his house',
              'man riding bike in a park',
              'woman holding her hand bag']

seed_sents = r'. '.join(seed_sents) + r'.'

gpt2.generate(sess,
              model_name=model_name,
              prefix=seed_sents,
              length=150,
              temperature=0.5,
              top_p=0.95,
              nsamples=15,
              batch_size=5
              )

man walking past tv in his house. man riding bike in a park. woman holding her hand bag. man with a dog in a park. man walking in a park. man in a park. man walking in a park. man walking in a park. man in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park. man walking in a park.
man walking past tv in his house. man riding bike in a park. woman holding her hand bag.

(The first is a picture of a woman and the second is a picture of a man, who is walking by tv in his house. man riding bike in a park. woman holding her hand bag. The first is a picture of a woman and the second is a picture of a man, who is walking by tv in 

In [None]:
seed_sents = [
              'man walking past tv in his house',
              'man riding bike in a park',
              'woman holding her hand bag']
seed_sents = r'. '.join(seed_sents) + r'.'
gpt2.generate(sess,
              model_name=model_name,
              prefix=seed_sents,
              length=150,
              temperature=0.7,
              top_p=0.9,
              nsamples=20,
              batch_size=5
              )

man walking past tv in his house. man riding bike in a park. woman holding her hand bag. man walking home.

A man in a pajamas walking home.

A woman walking home.

A man walking home.

A woman walking home.

A man walking home.

A woman walking home.

A man walking home.

A woman walking home.

A man walking home.

A woman walking home.

A man walking home.

A woman walking home.

A man walking home.

A man walking home.

A woman walking home.

A man walking home.

A woman walking home.

A man walking home.

A woman walking home.

A man walking home.

man walking past tv in his house. man riding bike in a park. woman holding her hand bag. woman walking down the street holding her purse. woman walking down the street holding her purse. woman walking down the street holding her purse. woman walking down the street holding her purse. woman walking down the street holding her purse. woman walking down the street holding her purse. woman walking down the street holding her purse. man walki

In [None]:
#gpt2.generate(sess,
#              model_name=model_name,
#              prefix="The secret of life is",
#              length=100,
#              temperature=0.7,
#              top_p=0.9,
#              nsamples=5,
#              batch_size=5
#              )

The secret of life is that it's really easy to make it complicated," said Bill Nye, the host of the popular science show "Bill Nye the Science Guy." "And this is one of the reasons why we all need to be smarter about science, because we can't keep up with the amazing things that are going on all the time."

While Nye is correct that "everything that's going on all the time" is making the world a better place, he misses the point. This is not
The secret of life is in the rhythm of the universe. It's not a mystery. It's not a mystery to me. It's the nature of the universe. It's the beauty of the universe. It's the way the universe works. It's the way the universe is. It's the way the universe is going to work. It's the way the universe is. It's the way the universe is. It's the way the universe is. It's the way the universe is. It's the way
The secret of life is in the universe.


-

The Red Devil

It's the end of the world as we know it, and the only thing that can save us is a band of 

# Etcetera

If the notebook has errors (e.g. GPU Sync Fail), force-kill the Colaboratory virtual machine and restart it with the command below:

In [None]:
!kill -9 -1

# LICENSE

MIT License

Copyright (c) 2019 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.