<a href="https://colab.research.google.com/github/hochthom/OG-AI4Artists-2023/blob/main/Text-Generation/ai4artists_Text_Generation_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Generation with GPT

GPT = Generative pretrained Transformers

Transformers are special AI models that are very well suited for text generation! Thus, we first have to install the necessary python Transformer package.

In [None]:
!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-40021917-817c-ea14-17fb-4ce3a374280a)


In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Next we initialise a transformer model for text generation. This means, we have to download a pretrained model that is the Pythia-1.4b in our case. For other variants see here: https://github.com/EleutherAI/pythia

In [None]:
import tensorflow as tf

In [None]:
from transformers import GPTNeoXForCausalLM, AutoTokenizer

#MODEL = "EleutherAI/pythia-1.4b-deduped"
#MODEL = "EleutherAI/pythia-1b-deduped"
MODEL = "EleutherAI/pythia-410m-deduped"

model = GPTNeoXForCausalLM.from_pretrained(MODEL)
tokenizer = AutoTokenizer.from_pretrained(MODEL)

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/911M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/396 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

# Text Generation with Sampling
Sampling provides more variability to the generated text. It will give us the possibility to choose a sample from a list of likely extentions from our context. 

In [None]:
# set seed to reproduce results. Feel free to change the seed though to get different results
tf.random.set_seed(635)

In [None]:
prompt = "Hello, I am"

In [None]:
inputs = tokenizer(prompt, return_tensors="pt")
tokens = model.generate(**inputs)
tokenizer.decode(tokens[0])

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


'Hello, I am a newbie to the world of programming. I am trying to create a program'

In [None]:
tokens = model.generate(**inputs, 
                        do_sample=True, 
                        max_length=25,
                        temperature=1.0,
                        top_k=200,
                        top_p=1.0,
                        num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


In [None]:
output = [tokenizer.decode(itm) for itm in tokens]
output

["Hello, I am Jiri Gratzov, your new customer.\n\nI've just requested confirmation of some invoice transactions",
 "Hello, I am new to coding and I've started a java project and I've got java code whichINDOOR:",
 'Hello, I am having an event this Wednesday (Feb. 9) at 7 PM in Mountain House, Wyoming. That way',
 'Hello, I am very new to vj and do not know what to do.\n\n> What does the ODD',
 'Hello, I am Jeff Barnes, I am also a pastor, and when you came by I explained to the children that God']

In [None]:
def pretty_print(output):
  print("Output:")
  for i, sample in enumerate(output):
    print(100 * '-')
    print("{}: {}".format(i, sample))

In [None]:
pretty_print(output)

Output:
----------------------------------------------------------------------------------------------------
0: Hello, I am not sure what happened

I am sorry I had to use the the old names sometimes, and I
----------------------------------------------------------------------------------------------------
1: Hello, I am asking if you have your permission for my reply or don't
<savioli> well, as
----------------------------------------------------------------------------------------------------
2: Hello, I am trying to do it simple with a single click in the toolbar.  I have something similar to the one
----------------------------------------------------------------------------------------------------
3: Hello, I am gonna kill you! what is going on here? Won't you tell me anything other then get out of
----------------------------------------------------------------------------------------------------
4: Hello, I am going to fix, I have changed my background text, but only the text wh