# Phase 4b: Using a Transformer (GPT-2) for natural language generation (NLG)

In [1]:
from google.colab import drive
from google.colab import files
drive.mount("/content/drive")

Mounted at /content/drive


In [2]:
# backward compatibility
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [3]:
import tensorflow as tf

In [4]:
tf.__version__

'1.15.2'

## DECLARING CONSTANTS

In [5]:
# file paths of the dataset
DATA = ["SgCorpus"]
COLAB_FILEPATH = './drive/My Drive/next-sentence-predictor/finalData/'

## DECLARE FINAL FILEPATH

In [6]:
# DECLARE THE FINAL FILEPATH TO LOAD THE DATA
filename = COLAB_FILEPATH + DATA[0] + '.txt'

## Import relevant library for training using GPT-2

In [7]:
!pip install gpt-2-simple
import gpt_2_simple as gpt2
from datetime import datetime

Collecting gpt-2-simple
  Downloading https://files.pythonhosted.org/packages/6f/e4/a90add0c3328eed38a46c3ed137f2363b5d6a07bf13ee5d5d4d1e480b8c3/gpt_2_simple-0.7.1.tar.gz
Collecting toposort
  Downloading https://files.pythonhosted.org/packages/e9/8a/321cd8ea5f4a22a06e3ba30ef31ec33bea11a3443eeb1d89807640ee6ed4/toposort-1.5-py2.py3-none-any.whl
Building wheels for collected packages: gpt-2-simple
  Building wheel for gpt-2-simple (setup.py) ... [?25l[?25hdone
  Created wheel for gpt-2-simple: filename=gpt_2_simple-0.7.1-cp36-none-any.whl size=23581 sha256=458b835b554342c4d2a6f6894b3ea9961419acb7f8dcacdde60d8305668ce85a
  Stored in directory: /root/.cache/pip/wheels/0c/f8/23/b53ce437504597edff76bf9c3b8de08ad716f74f6c6baaa91a
Successfully built gpt-2-simple
Installing collected packages: toposort, gpt-2-simple
Successfully installed gpt-2-simple-0.7.1 toposort-1.5
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.c

In [8]:
# download the gpt-2 small model
gpt2.download_gpt2(model_name="124M")

Fetching checkpoint: 1.05Mit [00:00, 341Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 89.3Mit/s]                                                   
Fetching hparams.json: 1.05Mit [00:00, 899Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:02, 196Mit/s]                                   
Fetching model.ckpt.index: 1.05Mit [00:00, 287Mit/s]                                                
Fetching model.ckpt.meta: 1.05Mit [00:00, 176Mit/s]                                                 
Fetching vocab.bpe: 1.05Mit [00:00, 154Mit/s]                                                       


## Fine-tuning the GPT-2 model

In [9]:
# fine-tuning of gpt-2 model
tf.reset_default_graph()
    
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              dataset=filename,
              model_name='124M',
              steps=100,
              restore_from='fresh',
              run_name='run1',
              print_every=25,
              sample_every=20,
              save_every=10
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt
Loading dataset...


100%|██████████| 1/1 [00:24<00:00, 24.08s/it]


dataset has 5476259 tokens
Training...
Saving checkpoint/run1/model-10
Saving checkpoint/run1/model-20
Instructions for updating:
Use standard file APIs to delete files with this prefix.
 i, i cannot fathom being able to write a book about. i think i am just doing my best to try, but my best guess is, that i did not like the ending of the first two chapters. i think that's just the feeling of a disappointment that i did not deserve to finish my story even though it was already done. it still feels good after but i do not think i have fully decided if i want to continue reading. i am sad this is an issue that has not happened in this series i have written thus far... not only that i do not know the outcome at all, but i will not even know my own story that i am going to.

so, it has been more than a week and i have no sleep yet. i am tired of living with the stress of daily tasks. i went home from work and had a great start in my work with this project, i was really glad that i found ou

In [10]:
# save checkpoint to google drive
gpt2.copy_checkpoint_to_gdrive(run_name='run1')

## Test the fine-tuned GPT-2 model on the 3 test sentences

## Text starter 1: here i am testing my nlg project…

In [None]:
# once the training is done, do the text generation
gpt2.generate(sess,
              length=30,  # number of tokens generated
              temperature=0.7, # randomness in the values
              prefix="here i am testing my nlg project", # the start of the sentence
              nsamples=3, # number of sample sentences produced
              batch_size=1, # get the batch size
              top_k=50, # number of tokens taken into account for the text generation
              run_name='run1')

here i am testing my nlg project. this is the first time i have tested an object from nlg. the results are encouraging. but i can not believe what i am doing
here i am testing my nlg project. i am in a ctu, and i am really into studying nlg. but it is still hard work to get a nlg
here i am testing my nlg project from this week and i am so happy to be back in nlg. i am so happy to be back in nlg and thank you


## Text starter 2: never did i have…

In [11]:
# once the training is done, do the text generation
gpt2.generate(sess,
              length=30,  # number of tokens generated
              temperature=0.7, # randomness in the values
              prefix="never did i have", # the start of the sentence
              nsamples=3, # number of sample sentences produced
              batch_size=1, # get the batch size
              top_k=50, # number of tokens taken into account for the text generation
              run_name='run1')

never did i have to talk to her. i was worried i would get mad at her and i am glad she is fine. i do not know who the girl is
never did i have time to think about it .i am sorry i thought of that but i just can not look at it. if i did, i would not have
never did i have to put in the effort to stay in the school.i really feel that i will never get to stay in a school that is so full of people


## Text starter 3: someday i will…

In [None]:
# once the training is done, do the text generation
gpt2.generate(sess,
              length=30,  # number of tokens generated
              temperature=0.7, # randomness in the values
              prefix="someday i will", # the start of the sentence
              nsamples=3, # number of sample sentences produced
              batch_size=1, # get the batch size
              top_k=50, # number of tokens taken into account for the text generation
              run_name='run1')

someday i will be leaving you and the rest of the world with my own little girl. but i need some help with my feelings for you. you seem to be
someday i will be back and i have been thinking of you for some time.
is there any way to stop seeing you?.i am in a similar situation
someday i will be attending the first day of the new year celebrations at the local library to celebrate my new year. please feel free to come out to see me and


## Long text generation example

## Text starter 4
i'm very interested in the field of artificial intelligence, it gives me ample opportunities to explore what i want to know

In [12]:
# once the training is done, do the text generation
gpt2.generate(sess,
              length=100,  # number of tokens generated
              temperature=0.7, # randomness in the values
              prefix="""i'm very interested in the field of artificial intelligence, it gives me ample opportunities to explore what i want to know""", # the start of the sentence
              nsamples=3, # number of sample sentences produced
              batch_size=1, # get the batch size
              top_k=50, # number of tokens taken into account for the text generation
              run_name='run1')

i'm very interested in the field of artificial intelligence, it gives me ample opportunities to explore what i want to know and what i am not. i am also much more interested in what i am doing and what i am not doing.i have also found that i am not as motivated as i used to be. i have been doing it for years, but the pace of it has slowly changed. i am now on my own and i am being forced to do things that i am not prepared to do. i am not sure if i am really ready yet, but i am feeling a little guilty about it.
i'm very interested in the field of artificial intelligence, it gives me ample opportunities to explore what i want to know. i have no idea where i am going to go this year, but i am hoping to find someone who knows me.
i have been doing this for a few years now, and i have never been afraid to post this at least twice. it is so good to be able to see my relatives for the first time, and i do not feel bad because i am very happy to have them around. but no matter how much i try,