# 0. Install Dependencies

Run the cell below to install and import Python packages on the Google Colaboratory virtual machine (VM).

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [1]:
%%capture
!pip3 install happytransformer
from happytransformer import HappyGeneration, GENSettings

#1.  Fix Overflow

Runu the cell below to perform a hacky workaround resize the output window to word wrap so the output is visible. Only run once.

In [2]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

# 2. Download the GPT-Neo Trained Models

* `gpt-neo-125M`: the "small" model.
* `gpt-neo-1.3B`: the "medium" model.
* `gpt-neo-2.7B`: the "large" model.

Larger models have more knowledge, but take longer to generate text. I would recommend either the 125M or 1.3B models. You can specify which base model to use by changing the value of the `model_name` variable below.

In [3]:
#@title Model Selection fields

model = 'gpt-neo-125M' #@param ["gpt-neo-125M", "gpt-neo-1.3B", "gpt-neo-2.7B"]

In [4]:
%%capture
happy_gen = HappyGeneration("GPT-NEO", f"EleutherAI/{model}")

# 4. Generate Text!

**Change the prompt**. Think about it as the writing prompt for the GPT-Neo model. If you want, you can exclude the prompt text from the final output.

|**Parameter**|**Default**|**Definition**|
| :- | :- | :- |
|`min_length`|10|Minimum number of generated tokens|
|`max_length`|50|Maximum number of generated tokens|
|`do_sample`|False|When True, picks words based on their conditional probability|
|`early_stopping`|False|When True, generation finishes if the EOS token is reached|
|`num_beams`|1|Number of steps for each search path|
|`temperature`|1.0|How sensitive the algorithm is to selecting low probability options (higher temperatures generate more random text)|
|`top_k`|50|How many potential answers are considered when performing sampling|
|`top_p`|1.0|Min number of tokens are selected where their probabilities add up to top\_p|
|`no_repeat_ngram_size`|0|The size of an n-gram that cannot occur more than once. (0=infinity)|
|`bad_words`|None|List of words/phrases that cannot be generated.|



In [None]:
import random
gen_settings = GENSettings(no_repeat_ngram_size=2,do_sample=True, early_stopping=True, top_k=40, temperature=.7, min_length=20, max_length=500)
new_text =[]
for i in range(500):
  #read a random line adn use it as a prefix to generate successive words
  line = random.choice(open('alllines.txt').readlines())
  line = line.strip('\n')
  line = line.strip('\"')
  result = happy_gen.generate_text(line, args=gen_settings)
  print(result.text)
  new_text.append(result.text)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
CHAPTER 7
HARMONT
I'm a fool, but I'll tell. If I had told you the truth, why would I be so mad? You must have heard the stories about the cat and the fish. They're all true, he says, even if they're not true. Why, do you suppose I'd be a foolish old fool? But that's what I mean, isn't it? What do _you_ think's wrong with me? Why are you so absurd? Are you afraid of me, or of _your_ troubles? If you were afraid, there would be no need to fear me; you would just be frightened. Then, for the sake of your story I might not be afraid; but what do I know of any human being, who is a human, when he's afraid? They are not human; they are neither human nor mad. There is no truth in this; there is only the one truth. What's the difference? Is there anything more good in a story than the other? Only if it is told to me in such a manner that it makes me angry, would that be acceptable? The reader will believe it, too, that
 We are g

In [None]:
# assign label as GPTneo
import pandas as pd
df = pd.DataFrame(new_text,columns=['text'],index= None)
df['Label'] = 'GPTNeo'
df

Unnamed: 0,text,Label
0,"but I ask you, how shall I ever know?\n\nHow ...",GPTNeo
1,to the scene by a team of experts. With their...,GPTNeo
2,"never go to war,” she said.\n\n“I don’t think...",GPTNeo
3,a 'honest man'. I'm here to tell you that for...,GPTNeo
4,"\n That they could not stand in their way, w...",GPTNeo
...,...,...
495,", we should do the same. In the first place, h...",GPTNeo
496,", I can't sleep.\n\n– I'm going to see the gir...",GPTNeo
497,"\n\n""I can't tell you how much I want to drink...",GPTNeo
498,love for the moment was made to be. The searc...,GPTNeo


In [None]:
# save in json
df.to_json('./GPT_neo_jokes.json')

In [None]:
import pandas as pd
test = pd.read_json('/content/GPT_neo_jokes.json')
test

Unnamed: 0,text,Label
0,"but I ask you, how shall I ever know?\n\nHow ...",GPTNeo
1,to the scene by a team of experts. With their...,GPTNeo
2,"never go to war,” she said.\n\n“I don’t think...",GPTNeo
3,a 'honest man'. I'm here to tell you that for...,GPTNeo
4,"\n That they could not stand in their way, w...",GPTNeo
...,...,...
495,", we should do the same. In the first place, h...",GPTNeo
496,", I can't sleep.\n\n– I'm going to see the gir...",GPTNeo
497,"\n\n""I can't tell you how much I want to drink...",GPTNeo
498,love for the moment was made to be. The searc...,GPTNeo


# To generate single sample

In [7]:
gen_settings = GENSettings(no_repeat_ngram_size=2,do_sample=True, early_stopping=True, top_k=40, temperature=.7, min_length=6, max_length=15)
prompt = "Well, God give thee the spirit of persuasion and him the ears of profiting"
result = happy_gen.generate_text(prompt, args=gen_settings)
print(result.text)

 by thy words, but he be not a man for the world." "
