<a href="https://colab.research.google.com/github/hochthom/OG-AI4Artists-2022/blob/main/Text_Generation/GPT_Neo_Text_Generation_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Generation with GPTneo

First we have to install the necessary python Transformer package. Transformers are special AI models that are very well suited for text generation!

In [None]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/d5/43/cfe4ee779bbd6a678ac6a97c5a5cdeb03c35f9eaebbb9720b036680f9a2d/transformers-4.6.1-py3-none-any.whl (2.2MB)
[K     |████████████████████████████████| 2.3MB 6.9MB/s 
Collecting huggingface-hub==0.0.8
  Downloading https://files.pythonhosted.org/packages/a1/88/7b1e45720ecf59c6c6737ff332f41c955963090a18e72acbcbeac6b25e86/huggingface_hub-0.0.8-py3-none-any.whl
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)
[K     |████████████████████████████████| 901kB 29.8MB/s 
[?25hCollecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/d4/e2/df3543e8ffdab68f5acc73f613de9c2b155ac47f162e725dcac87c521c11/tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3MB)
[K     |█

Next we initialise a transformer model for text generation. This means, we have to download a pretrained model that is the GPT-Neo in our case.

In [None]:
import tensorflow as tf
from transformers import GPTNeoForCausalLM, GPT2Tokenizer

In [None]:
model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1347.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=5312753599.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=798156.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456356.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=90.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=200.0, style=ProgressStyle(description_…




## Simple text generation

First we have to provide a context, and then the model tries to imaging how the extend the text from that.

In [None]:
context = 'The flower is blooming. The'

In [None]:
input_ids = tokenizer(context, return_tensors="pt").input_ids
gen_tokens = model.generate(input_ids, 
                            max_length=25)
output = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
def pretty_print(output):
  print("Output:")
  for i, sample in enumerate(output):
    print(100 * '-')
    print("{}: {}".format(i, sample))

In [None]:
pretty_print(output)

Output:
----------------------------------------------------------------------------------------------------
0: The flower is blooming. The sun is shining. The birds are singing. The world is a beautiful place.




# Text Generation with Sampling
Sampling provides more variability to the generated text. It will give us the possibility to choose a sample from a list of likely extentions from our context. 

In [None]:
# set seed to reproduce results. Feel free to change the seed though to get different results
tf.random.set_seed(3)

In [None]:
gen_tokens = model.generate(input_ids, 
                            do_sample=True, 
                            max_length=25,
                            temperature=1.0,
                            top_k=200,
                            top_p=1.0,
                            num_return_sequences=5)
output = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
pretty_print(output)

Output:
----------------------------------------------------------------------------------------------------
0: The flower is blooming. The fire is crackling. The
gods spoke to Adam about the future. The serpent
----------------------------------------------------------------------------------------------------
1: The flower is blooming. The sun
towers in the sky they rise early, they call in the silence, they
----------------------------------------------------------------------------------------------------
2: The flower is blooming. The night is as deep as ever I see it, and the sky makes a deep night:
----------------------------------------------------------------------------------------------------
3: The flower is blooming. The bird is singing. A strange sense of peace engulfs the tiny creature lying on the soft
----------------------------------------------------------------------------------------------------
4: The flower is blooming. The smell of jasmine and narcissus sweete

Something more complex:

In [None]:
import numpy as np
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
def gen_text(context, model, tokenizer, temperature=0.9, top_k=50, top_p=0.95, max_length=100, num_return_sequences=1):
  input_ids = tokenizer(context, return_tensors="pt").input_ids
  gen_tokens = model.generate(input_ids, 
                              do_sample=True, 
                              max_length=max_length,
                              temperature=temperature,
                              top_k=top_k,
                              top_p=top_p,
                              num_return_sequences=num_return_sequences)
  return tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

In [None]:
context = "AI art has greatly developed since 2018. But can the machine be truly creative?"

In [None]:
output = gen_text(context, model, tokenizer, max_length=80, num_return_sequences=3)

pretty_print(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
0: AI art has greatly developed since 2018. But can the machine be truly creative?

We have come to expect it for many artists. Artists are taught to express themselves through different mediums. They might be taught to paint, or sculpt, or make sculptures, or create works of art. But they are not taught how to create.

I have written about machine learning and its ability to
----------------------------------------------------------------------------------------------------
1: AI art has greatly developed since 2018. But can the machine be truly creative? This post gives some insight into the human creative process and a few questions that artists and designers are often asked.

Creativity is the most powerful human trait. It’s what drives us to create something that hasn’t been done before. It’s what keeps us working at all times,
----------------------------------------------

In [None]:
sent_text = nltk.sent_tokenize(output[0]) # this gives us a list of sentences
len(sent_text)

12

In [None]:
pretty_print(sent_text)

Output:
----------------------------------------------------------------------------------------------------
0: AI art has greatly developed since 2018.
----------------------------------------------------------------------------------------------------
1: But can the machine be truly creative?
----------------------------------------------------------------------------------------------------
2: The concept of Machine Learning (ML) has gained popularity in recent years.
----------------------------------------------------------------------------------------------------
3: A vast range of new ML techniques are being introduced in the market at an ever-increasing rate.
----------------------------------------------------------------------------------------------------
4: So, what does all of this mean for the creators of art?
----------------------------------------------------------------------------------------------------
5: How does the new technology apply to their artistic works?


Now we build our new context for a new interation.

In [None]:
idx = np.random.permutation(range(len(sent_text)-1))
context = ' '.join([sent_text[i] for i in idx])

In [None]:
context

'AI art has greatly developed since 2018. But can the machine be truly creative? The concept of Machine Learning (ML) has gained popularity in recent years. A vast range of new ML techniques are being introduced in the market at an ever-increasing rate. So, what does all of this mean for the creators of art? How does the new technology apply to their artistic works?'

Iterative loop

In [None]:
context = 'The flower is blooming. The'

In [None]:
for i in range(10):
  output = gen_text(context, model, tokenizer, max_length=50, num_return_sequences=1)
  # get sentence list
  sent_text = nltk.sent_tokenize(output[0])
  # sample new context
  idx = np.random.permutation(range(len(sent_text)-1))
  context = ' '.join([sent_text[i] for i in idx[:2]])
  print(context)



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


There is a time when I felt quite a deep sadness and loss of purpose. The garden is thriving.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I am happy to continue to tend the flowers that fill my house with their glorious scent. There is a time when I felt quite a deep sadness and loss of purpose.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was during one of my more trying times in my life. There is a time when I felt quite a deep sadness and loss of purpose.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


There is a time when I felt quite a deep sadness and loss of purpose. It was during one of my more trying times in my life.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was during one of my more trying times in my life. There is a time when I felt quite a deep sadness and loss of purpose.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was during one of my more trying times in my life. There is a time when I felt quite a deep sadness and loss of purpose.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It was during one of my more trying times in my life. There is a time when I felt quite a deep sadness and loss of purpose.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


There is a time when I felt quite a deep sadness and loss of purpose. It was during one of my more trying times in my life.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


There is a time when I felt quite a deep sadness and loss of purpose. It was during one of my more trying times in my life.
It was during one of my more trying times in my life. It was during the time I was learning to cook.
