# Current.cam workshop - Volumetric Personas
## GPT-2 Text Generation 

Made by [Artem Konevskikh](http://aiculedssul.net/) for [Current.cam](https://current.cam/)

Based on notebook by [Max Woolf](http://minimaxir.com). For more about `gpt-2-simple`, you can visit [this GitHub repository](https://github.com/minimaxir/gpt-2-simple).

## Installation

In [1]:
#@title Imports
#@markdown By running this cell we are loading libraries needed to work with GPT2
!pip install git+https://github.com/minimaxir/gpt-2-simple
import gpt_2_simple as gpt2
from datetime import datetime


  Running command git clone -q https://github.com/minimaxir/gpt-2-simple 'C:\Users\ferna\AppData\Local\Temp\pip-req-build-wq9fmx1f'


Collecting git+https://github.com/minimaxir/gpt-2-simple
  Cloning https://github.com/minimaxir/gpt-2-simple to c:\users\ferna\appdata\local\temp\pip-req-build-wq9fmx1f
  Resolved https://github.com/minimaxir/gpt-2-simple to commit d1e97f580cfbd53eee95066c7efed8d4476de943


## Text Generation

There are two ways of generating text with GPT-2 by using either standard pretrained models or those you finetuned on your texts.

In this notebook we will use standard models, but we will play with finetuning later (in another workshop).

In [None]:
#@title Load Pretrained Model

#@markdown There are four released sizes of GPT-2 that we can use to generate text in Colab:

#@markdown * `124M` (default): the "small" model, 500MB on disk.
#@markdown * `355M`: the "medium" model, 1.5GB on disk.
#@markdown * `774M`: the "large" model, cannot currently be finetuned with Colaboratory but can be used to generate text from the pretrained model
#@markdown * `1558M`: the "extra large", true model. Will not work if a K80/P4 GPU is attached to the notebook. Also, like `774M`, it cannot be finetuned in Colab.

#@markdown Larger model has more knowledge, but takes longer to generate text. You can specify which base model to use by changing `model_name`.

model_name = "1558M" #@param ["124M", "355M", "774M", "1558M"] {allow-input: true}

#@markdown This cell downloads it from Google Cloud Storage and saves it in the Colaboratory VM at `/models/<model_name>`.

#@markdown This model isn't permanently saved in the Colaboratory VM; you'll have to redownload it if you want to retrain it at a later time.

gpt2.download_gpt2(model_name=model_name)
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, model_name=model_name)

#@markdown *__Select the model and run the cell to load it__*

Fetching checkpoint: 1.05Mit [00:00, 525Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:01, 585kit/s]                                                    
Fetching hparams.json: 1.05Mit [00:00, 523Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001:  68%|██████████▏    | 4.21G/6.23G [1:47:14<44:15, 760kit/s]

In [None]:
#@title Generation with the pretrained models
#@markdown **Generation parameters**

#@markdown You can pass in a `prefix` to the generate function to force the text to start with a given character sequence and generate text from there
prefix = 'The form of a soul is ' #@param {type: "string"}
#@markdown Number of tokens to generate (default 1023, the maximum)
length = 200  #@param {type:"slider", min:1, max:1023, step:1}
#@markdown The higher the temperature, the crazier the text (default 0.7, recommended to keep between 0.7 and 1.0)
temperature=0.8  #@param {type:"slider", min:0.1, max:1, step:0.1}
#@markdown Limits the generated guesses to the top *k* guesses (default 0 which disables the behavior; if the generated output is super crazy, you may want to set `top_k=40`)
top_k=0  #@param {type: "number"}
#@markdown Nucleus sampling: limits the generated guesses to a cumulative probability. (gets good results on a dataset with `top_p=0.9`)
top_p=0.9  #@param {type:"slider", min:0, max:1, step:0.1}
#@markdown Number of samples to generate
nsamples=10  #@param {type: "number"}
#@markdown Number of samples to generate in pararallel to speed up the process
batch_size=5  #@param {type:"slider", min:1, max:20, step:1}
#@markdown Save samples to text file
save_to_file = False #@param {type:"boolean"}


#@markdown *__Set parameters and  and run the cell to generate samples__*
gen_file = 'gpt2_gentext_{:%Y%m%d_%H%M%S}.txt'.format(datetime.utcnow()) if save_to_file else None
gpt2.generate(sess,
              model_name=model_name,
              destination_path=gen_file,
              prefix=None if prefix=='' else prefix,
              length=length,
              temperature=temperature,
              top_k=int(top_k),
              top_p=top_p,
              nsamples=int(nsamples),
              batch_size=batch_size
              )