<a href="https://colab.research.google.com/github/hochthom/OG-AI4Artists-2022/blob/main/Text_Generation/GPT_Neo_Text_Generation_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Generation with GPT

GPT = Generative pretrained Transformers

Transformers are special AI models that are very well suited for text generation! Thus, we first have to install the necessary python Transformer package.

In [1]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-433963e8-385e-1f2f-eb98-1b2712bffefb)


In [2]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.19.0-py3-none-any.whl (4.2 MB)
[K     |████████████████████████████████| 4.2 MB 5.1 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 54.3 MB/s 
[?25hCollecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 53.1 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.6.0-py3-none-any.whl (84 kB)
[K     |████████████████████████████████| 84 kB 3.3 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
Successfully installed huggingface-hub-0.

Next we initialise a transformer model for text generation. This means, we have to download a pretrained model that is the GPT-Neo in our case.

In [3]:
import tensorflow as tf

##Loading of the GPT-neo model

In [13]:
from transformers import GPTNeoForCausalLM, GPT2TokenizerFast

model_path = 'EleutherAI/gpt-neo-1.3B'
#model_path = 'EleutherAI/gpt-neo-125M'

tokenizer = GPT2TokenizerFast.from_pretrained(model_path)
model = GPTNeoForCausalLM.from_pretrained(model_path, pad_token_id=tokenizer.eos_token_id)

Downloading:   0%|          | 0.00/200 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/779k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

## Simple text generation

First we have to provide a context, and then the model tries to imaging how the extend the text from that.

In [14]:
context = 'The flower is blooming. The'

In [15]:
input_ids = tokenizer(context, return_tensors="pt").input_ids

In [16]:
gen_tokens = model.generate(input_ids, max_length=25)
output = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

In [17]:
output

['The flower is blooming. The sun is shining. The birds are singing. The world is a beautiful place.\n\n']

# Text Generation with Sampling
Sampling provides more variability to the generated text. It will give us the possibility to choose a sample from a list of likely extentions from our context. 

In [28]:
# set seed to reproduce results. Feel free to change the seed though to get different results
tf.random.set_seed(3)

In [9]:
gen_tokens = model.generate(input_ids, 
                            do_sample=True, 
                            max_length=25,
                            temperature=1.0,
                            top_k=200,
                            top_p=1.0,
                            num_return_sequences=5)
output = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

In [7]:
def pretty_print(output):
  print("Output:")
  for i, sample in enumerate(output):
    print(100 * '-')
    print("{}: {}".format(i, sample))

In [10]:
pretty_print(output)

Output:
----------------------------------------------------------------------------------------------------
0: The flower is blooming. The time is past when one can sleep peacefully, while you take care of your friends, own
----------------------------------------------------------------------------------------------------
1: The flower is blooming. The fields are green, and the sun keeps shining. A little snow just melts on the roof
----------------------------------------------------------------------------------------------------
2: The flower is blooming. The rains are falling on me now. The maddening, dreariness of the
----------------------------------------------------------------------------------------------------
3: The flower is blooming. The spring is in full flow. It's time to ripen your fruit into the light,
----------------------------------------------------------------------------------------------------
4: The flower is blooming. The year is 1917 and on a May afterno

Something more complex:

In [21]:
import numpy as np
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [22]:
def gen_text(context, model, tokenizer, temperature=0.9, top_k=50, top_p=0.95, max_length=100, num_return_sequences=1):
  input_ids = tokenizer(context, return_tensors="pt").input_ids
  gen_tokens = model.generate(input_ids, 
                              do_sample=True, 
                              max_length=max_length,
                              temperature=temperature,
                              top_k=top_k,
                              top_p=top_p,
                              num_return_sequences=num_return_sequences)
  return tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

In [15]:
context = "AI art has greatly developed since 2018. But can the machine be truly creative?"

In [16]:
output = gen_text(context, model, tokenizer, max_length=80, num_return_sequences=3)

pretty_print(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
0: AI art has greatly developed since 2018. But can the machine be truly creative?

We’re a little late on this one, but I’m going to make a confession. I’m not the most creative person in the world. In fact, I’m not an artist at all. I mean, I’m a computer programmer. And not to
----------------------------------------------------------------------------------------------------
1: AI art has greatly developed since 2018. But can the machine be truly creative? We find out!

The question of whether humans have the ability to create art is one that has been answered a few times before. But the latest study on the subject gives a fresh outlook on the topic.

The study, published in Psychological Science in 2017, found that creative people tend to show creative tendencies
----------------------------------------------------------------------------------------------------
2: AI art h

In [23]:
sent_text = nltk.sent_tokenize(output[0]) # this gives us a list of sentences
len(sent_text)

4

In [18]:
pretty_print(sent_text)

Output:
----------------------------------------------------------------------------------------------------
0: AI art has greatly developed since 2018.
----------------------------------------------------------------------------------------------------
1: But can the machine be truly creative?
----------------------------------------------------------------------------------------------------
2: We’re a little late on this one, but I’m going to make a confession.
----------------------------------------------------------------------------------------------------
3: I’m not the most creative person in the world.
----------------------------------------------------------------------------------------------------
4: In fact, I’m not an artist at all.
----------------------------------------------------------------------------------------------------
5: I mean, I’m a computer programmer.
----------------------------------------------------------------------------------------------------
6

Now we build our new context for a new interation.

In [24]:
idx = np.random.permutation(range(len(sent_text)-1))
context = ' '.join([sent_text[i] for i in idx])

In [None]:
context

'AI art has greatly developed since 2018. But can the machine be truly creative? The concept of Machine Learning (ML) has gained popularity in recent years. A vast range of new ML techniques are being introduced in the market at an ever-increasing rate. So, what does all of this mean for the creators of art? How does the new technology apply to their artistic works?'

Iterative loop

In [30]:
context = 'The flower is blooming. The'

In [31]:
for i in range(5):
  output = gen_text(context, model, tokenizer, max_length=70, num_return_sequences=1)
  # get sentence list
  sent_text = nltk.sent_tokenize(output[0])
  # sample new context
  idx = np.random.permutation(range(len(sent_text)-1))
  context = ' '.join([sent_text[i] for i in idx[:2]])
  print('It: %i' % (i+1))
  print(context)



It: 1
All the trees are blooming. The leaves are budding.
It: 2
The sun is shining. I feel so alive.
It: 3
I am so grateful. The sun is shining.
It: 4
I am so grateful. The trees are green.
It: 5
The house is ours. I'm on my phone.
