<a href="https://colab.research.google.com/github/mitsuoxv/erp/blob/master/gpt_2_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Let President Reagan speak

I try OpenAI's gpt-2, by following [the simplified wrapper](https://github.com/minimaxir/gpt-2-simple). Its license is:

MIT License

Copyright (c) 2019 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

## Prepare modules

In [2]:
%tensorflow_version 1.x

In [5]:
import tensorflow as tf
print(tf.__version__)

1.15.2


In [6]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [7]:
!pip install gpt-2-simple

Collecting gpt-2-simple
  Downloading https://files.pythonhosted.org/packages/6f/e4/a90add0c3328eed38a46c3ed137f2363b5d6a07bf13ee5d5d4d1e480b8c3/gpt_2_simple-0.7.1.tar.gz
Collecting toposort
  Downloading https://files.pythonhosted.org/packages/e9/8a/321cd8ea5f4a22a06e3ba30ef31ec33bea11a3443eeb1d89807640ee6ed4/toposort-1.5-py2.py3-none-any.whl
Building wheels for collected packages: gpt-2-simple
  Building wheel for gpt-2-simple (setup.py) ... [?25l[?25hdone
  Created wheel for gpt-2-simple: filename=gpt_2_simple-0.7.1-cp36-none-any.whl size=23581 sha256=037cb291d7a41b9427bdbeb5feb329af5f8f78ad8f810225cadd253b0a5533e2
  Stored in directory: /root/.cache/pip/wheels/0c/f8/23/b53ce437504597edff76bf9c3b8de08ad716f74f6c6baaa91a
Successfully built gpt-2-simple
Installing collected packages: toposort, gpt-2-simple
Successfully installed gpt-2-simple-0.7.1 toposort-1.5


## Download gpt-2 124M model

In [9]:
import gpt_2_simple as gpt2

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



In [10]:
import os
import requests

model_name = "124M"
if not os.path.isdir(os.path.join("tmp", model_name)):
    print(f"Downloading {model_name} model...")
    gpt2.download_gpt2(model_name=model_name)   # model is saved into current directory under /tmp/124M/

Fetching checkpoint: 1.05Mit [00:00, 405Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 63.9Mit/s]                                                   
Fetching hparams.json: 1.05Mit [00:00, 543Mit/s]                                                    

Downloading 124M model...



Fetching model.ckpt.data-00000-of-00001: 498Mit [00:02, 189Mit/s]                                   
Fetching model.ckpt.index: 1.05Mit [00:00, 202Mit/s]                                                
Fetching model.ckpt.meta: 1.05Mit [00:00, 116Mit/s]                                                 
Fetching vocab.bpe: 1.05Mit [00:00, 148Mit/s]                                                       


## Download texts, and create reagan.txt

In [11]:
!wget https://raw.githubusercontent.com/mitsuoxv/erp/master/texts/presidents/{1982..1989}_pres.txt --directory-prefix=/tmp/

--2020-07-26 11:19:56--  https://raw.githubusercontent.com/mitsuoxv/erp/master/texts/presidents/1982_pres.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19163 (19K) [text/plain]
Saving to: ‘/tmp/1982_pres.txt’


2020-07-26 11:19:56 (1.54 MB/s) - ‘/tmp/1982_pres.txt’ saved [19163/19163]

--2020-07-26 11:19:56--  https://raw.githubusercontent.com/mitsuoxv/erp/master/texts/presidents/1983_pres.txt
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 13924 (14K) [text/plain]
Saving to: ‘/tmp/1983_pres.txt’


2020-07-26 11:19:56 (7.89 MB/s) - ‘/tmp/1983_pres.txt’ saved [13924/13924]

--2020-07-26 11:19:56--  https://raw.githubusercontent.com/mitsuoxv/erp/master/texts/presidents/1984_pres.txt
Reusing 

In [12]:
pres_texts = ""

for year in range(1982, 1989):
    input_filename = '/tmp/{}_pres.txt'.format(year)
    
    with open(input_filename, 'r') as f:
        text = f.read()
    
    pres_texts = pres_texts + '\n' + text

output_filename = '/tmp/reagan.txt'
with open(output_filename, 'w') as f:
    f.write(pres_texts)

## Finetune

It takes about one and half hour.

In [13]:
sess = gpt2.start_tf_sess()

In [14]:
gpt2.finetune(sess,
              output_filename,
              model_name=model_name,
              steps=1000)   # steps is max number of training steps

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  3.61it/s]


dataset has 24101 tokens
Training...
[1 | 13.61] loss=2.92 avg=2.92
[2 | 18.13] loss=2.89 avg=2.91
[3 | 22.67] loss=2.82 avg=2.88
[4 | 27.20] loss=2.64 avg=2.82
[5 | 31.74] loss=2.67 avg=2.79
[6 | 36.28] loss=2.53 avg=2.74
[7 | 40.80] loss=2.54 avg=2.71
[8 | 45.32] loss=2.50 avg=2.68
[9 | 49.85] loss=2.32 avg=2.64
[10 | 54.36] loss=2.54 avg=2.63
[11 | 58.89] loss=2.31 avg=2.60
[12 | 63.42] loss=2.29 avg=2.57
[13 | 67.95] loss=2.11 avg=2.54
[14 | 72.48] loss=2.30 avg=2.52
[15 | 76.99] loss=2.11 avg=2.49
[16 | 81.51] loss=2.24 avg=2.47
[17 | 86.02] loss=2.06 avg=2.45
[18 | 90.54] loss=1.93 avg=2.42
[19 | 95.06] loss=2.06 avg=2.40
[20 | 99.59] loss=2.00 avg=2.37
[21 | 104.11] loss=1.89 avg=2.35
[22 | 108.62] loss=2.09 avg=2.33
[23 | 113.14] loss=1.93 avg=2.32
[24 | 117.67] loss=1.77 avg=2.29
[25 | 122.19] loss=1.53 avg=2.26
[26 | 126.71] loss=1.56 avg=2.23
[27 | 131.23] loss=2.09 avg=2.22
[28 | 135.75] loss=1.58 avg=2.19
[29 | 140.26] loss=1.41 avg=2.16
[30 | 144.79] loss=1.50 avg=2.14
[3

## Generate texts

In [15]:
gpt2.generate(sess,
              prefix='I have some proposals to the Congress.')

I have some proposals to the Congress.
One way or the other, we will pursue the issues that matter most to us—jobs, growth, and economic opportunity—that will lead to sustained economic growth and to free trade and international economic cooperation.
Trade in goods and services is only one aspect of our economic relations with the rest of the world.
The international flow of capital into the United States and from the United States to other countries is also of great importance.
The United States should play a primary role in preserving the vitality of the international capital market.
Severe strains on that market developed in 1982 as several nations found it difficult to service their overseas debt obligations.
In 1982, the Federal Government worked closely with debtor and creditor nations and the major international lending agencies to prevent a disruption in the functioning of world capital markets.
Now, with the cooperation of a wide variety of creditors, countries with especially

The third line, "Trade in goods ...", and below are the exact copies of line #240 to #284 of the texts I used for finefuning, that is reagan.txt. The only line I could not find in reagan.txt is the second line:

One way or the other, we will pursue the issues that matter most to us—jobs, growth, and economic opportunity—that will lead to sustained economic growth and to free trade and international economic cooperation.