# Text Generation Using GPT

There are two types of language modeling, causal and masked. This guide illustrates causal language modeling. Causal language models are frequently used for text generation. You can use these models for creative applications like choosing your own text adventure or an intelligent coding assistant like Copilot or CodeParrot.

Causal language modeling predicts the next token in a sequence of tokens, and the model can only attend to tokens on the left. This means the model cannot see future tokens. GPT-2 is an example of a causal language model.

In [None]:
pip install transformers datasets evaluate

Collecting transformers
  Downloading transformers-4.32.0-py3-none-any.whl (7.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.14.4-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.3/519.3 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate
  Downloading evaluate-0.4.0-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_6

In [None]:
# loading the model
from transformers import AutoTokenizer
model='gpt2-xl'
tokenizer = AutoTokenizer.from_pretrained(model)

Downloading (…)lve/main/config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
from transformers import TFAutoModelForCausalLM

model = TFAutoModelForCausalLM.from_pretrained(model)

Downloading tf_model.h5:   0%|          | 0.00/6.23G [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2-xl.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [None]:
from tensorflow.keras.optimizers import Adam
# Lower learning rates are often better for fine-tuning transformers
model.compile(optimizer=Adam(3e-5))  # No loss argument!

In [None]:
import tensorflow as tf
import pandas as pd
txt="Transformers are the"
tokenizer(txt,return_tensors='np')

{'input_ids': array([[41762,   364,   389,   262]]), 'attention_mask': array([[1, 1, 1, 1]])}

In [None]:
ids=tokenizer(txt,return_tensors='np')['input_ids']

In [None]:
ids

array([[41762,   364,   389,   262]])

In [None]:
ids[0]

array([41762,   364,   389,   262])

In [None]:
tokenizer.decode(ids[0])

'Transformers are the'

In [None]:
model(ids)

TFCausalLMOutputWithCrossAttentions(loss=None, logits=<tf.Tensor: shape=(1, 4, 50257), dtype=float32, numpy=
array([[[ 2.7387857 ,  4.821137  ,  1.7155694 , ..., -6.995636  ,
         -4.9382052 , -0.30557126],
        [ 4.6023574 ,  7.025563  ,  2.0684116 , ..., -8.223229  ,
         -4.8385983 ,  2.658939  ],
        [ 1.9228    ,  0.9972217 , -3.3385165 , ..., -3.4645438 ,
         -3.6464558 , -0.43070713],
        [ 0.3186251 ,  0.81004375, -2.997366  , ..., -3.2711644 ,
         -4.383729  ,  0.8114318 ]]], dtype=float32)>, past_key_values=(<tf.Tensor: shape=(2, 1, 25, 4, 64), dtype=float32, numpy=
array([[[[[ 1.78889379e-01,  7.50041246e-01, -4.72855777e-01, ...,
           -5.11268318e-01,  2.69862145e-01, -3.13556105e-01],
          [ 1.01758093e-02, -1.37737721e-01,  5.72274216e-02, ...,
            7.02178180e-01, -2.30362669e-01,  9.19960380e-01],
          [-2.18054697e-01, -6.52768373e-01,  3.11026633e-01, ...,
            2.05231890e-01, -4.28684860e-01,  4.89852726e-01]

In [None]:
model(ids)[0]
# herein gpt2 we have 50257 words so every word is tokenized

<tf.Tensor: shape=(1, 4, 50257), dtype=float32, numpy=
array([[[ 2.7387857 ,  4.821137  ,  1.7155694 , ..., -6.995636  ,
         -4.9382052 , -0.30557126],
        [ 4.6023574 ,  7.025563  ,  2.0684116 , ..., -8.223229  ,
         -4.8385983 ,  2.658939  ],
        [ 1.9228    ,  0.9972217 , -3.3385165 , ..., -3.4645438 ,
         -3.6464558 , -0.43070713],
        [ 0.3186251 ,  0.81004375, -2.997366  , ..., -3.2711644 ,
         -4.383729  ,  0.8114318 ]]], dtype=float32)>

In [None]:
model(ids).logits[0,-1,:]

<tf.Tensor: shape=(50257,), dtype=float32, numpy=
array([ 0.3186251 ,  0.81004375, -2.997366  , ..., -3.2711644 ,
       -4.383729  ,  0.8114318 ], dtype=float32)>

In [None]:
import numpy as np
tf.nn.softmax(model(ids).logits[0,-1,:],axis=-1)

<tf.Tensor: shape=(50257,), dtype=float32, numpy=
array([1.5102715e-05, 2.4687388e-05, 5.4819901e-07, ..., 4.1689751e-07,
       1.3704035e-07, 2.4721690e-05], dtype=float32)>

In [None]:
tf.argsort(
    tf.nn.softmax(model(ids).logits[0,-1,:],axis=-1), axis=-1, direction='DESCENDING')[0]

<tf.Tensor: shape=(), dtype=int32, numpy=749>

In [None]:
tf.argsort(
    tf.nn.softmax(model(ids).logits[0,-1,:],axis=-1), axis=-1, direction='DESCENDING')[1]

<tf.Tensor: shape=(), dtype=int32, numpy=691>

In [None]:
tf.argsort(
    tf.nn.softmax(model(ids).logits[0,-1,:],axis=-1), axis=-1, direction='DESCENDING')[3]

<tf.Tensor: shape=(), dtype=int32, numpy=39185>

In [None]:
tf.nn.softmax(model(ids).logits[0,-1,:],axis=-1)[749]

<tf.Tensor: shape=(), dtype=float32, numpy=0.085346304>

In [None]:
tf.argsort(
    tf.nn.softmax(model(ids).logits[0,-1,:],axis=-1), axis=-1, direction='DESCENDING')[None,0,None]

<tf.Tensor: shape=(1, 1), dtype=int32, numpy=array([[749]], dtype=int32)>

In [None]:
# step by step text generation using greedy search
n_steps=8
choice_per_step=5
res=[]
for i in range(n_steps):
  result=dict()
  result['input']=tokenizer.decode(ids[0])

  # output
  next=model(ids).logits[0,-1,:]
  next_prob=tf.nn.softmax(model(ids).logits[0,-1,:],axis=-1)
  sort_id=tf.argsort(tf.nn.softmax(model(ids).logits[0,-1,:],axis=-1), axis=-1, direction='DESCENDING')
  for j in range(choice_per_step):
    t_id=sort_id[j]
    t_prob=next_prob[t_id].numpy()
    t_choice=next_prob[t_id].numpy()*100
    result[f"choice {j}"]=(f"{tokenizer.decode(t_id)} ({t_choice:.2f}%)")
  ids=tf.concat([ids,sort_id[None,0,None]],axis=-1)
  res.append(result)

In [None]:
pd.DataFrame(res)

Unnamed: 0,input,choice 0,choice 1,choice 2,choice 3,choice 4
0,Transformers are the,most (8.53%),only (4.96%),best (4.65%),Transformers (4.37%),ultimate (2.16%)
1,Transformers are the most,popular (16.78%),powerful (5.37%),common (4.96%),famous (3.72%),successful (3.20%)
2,Transformers are the most popular,toy (10.63%),toys (7.23%),Transformers (6.60%),of (5.46%),and (3.76%)
3,Transformers are the most popular toy,line (34.38%),in (18.20%),of (11.71%),brand (6.10%),line (2.69%)
4,Transformers are the most popular toy line,in (46.28%),of (15.09%),", (4.94%)",on (4.40%),ever (2.72%)
5,Transformers are the most popular toy line in,the (65.99%),history (12.42%),America (6.91%),Japan (2.44%),North (1.40%)
6,Transformers are the most popular toy line in the,world (69.26%),United (4.55%),history (4.29%),US (4.23%),U (2.30%)
7,Transformers are the most popular toy line in ...,", (39.73%)",. (30.64%),and (9.87%),with (2.32%),today (1.74%)


In [None]:
# acheivibg Greedy search with tensorflow generate function
ids

<tf.Tensor: shape=(1, 12), dtype=int32, numpy=
array([[41762,   364,   389,   262,   749,  2968, 13373,  1627,   287,
          262,   995,    11]], dtype=int32)>

In [None]:
ids=tokenizer(txt,return_tensors='np')['input_ids']

In [None]:
ids

array([[41762,   364,   389,   262]])

In [None]:
output=model.generate(ids,max_new_tokens=n_steps,do_sample=False)
output

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


<tf.Tensor: shape=(1, 12), dtype=int32, numpy=
array([[41762,   364,   389,   262,   749,  2968, 13373,  1627,   287,
          262,   995,    11]], dtype=int32)>

In [None]:
tokenizer.decode(output[0])

'Transformers are the most popular toy line in the world,'

In [None]:
# Reproduce a story
max_length=128
text=""" In a shocking finding, scientist discovered a herd of unicorns living in a remote , previously unexplored valley,in the Andres Mountains. Even more surprising to the\
 researchers was that unicorn spoke perfect English. \n\n """


In [None]:
 ids=tokenizer(text,return_tensors='np')['input_ids']

In [None]:
ids

array([[  554,   257, 14702,  4917,    11, 11444,  5071,   257, 27638,
          286, 28000, 19942,  2877,   287,   257,  6569,   837,  4271,
        31286,  1850, 19272,    11,   259,   262,   843,   411, 21124,
           13,  3412,   517,  6452,   284,   612,   325,   283,  3533,
          373,   326, 44986,  5158,  2818,  3594,    13,   220,   628,
          220]])

In [None]:
output=model.generate(ids,max_length=max_length,do_sample=False)
output

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


<tf.Tensor: shape=(1, 128), dtype=int32, numpy=
array([[  554,   257, 14702,  4917,    11, 11444,  5071,   257, 27638,
          286, 28000, 19942,  2877,   287,   257,  6569,   837,  4271,
        31286,  1850, 19272,    11,   259,   262,   843,   411, 21124,
           13,  3412,   517,  6452,   284,   612,   325,   283,  3533,
          373,   326, 44986,  5158,  2818,  3594,    13,   628,   220,
          198,   198,   464, 44986, 27638,   373,  5071,   416,   257,
         1074,   286,  5519,   422,   262,  2059,   286,  3442,    11,
         7802,    11,   508,   547, 14523,   257,  2050,   319,   262,
          843,   274, 21124,    13,   383,  1074,   373, 10342,   329,
          262,  4387, 27638,   286,  4295, 14260,   287,   262,   995,
           13,   383,  1074,   373,  2957,   416,  1583,    13,  3271,
          367,    13,  4176,    11,   257,  6240,   286, 17219,   379,
        14417,  7802,    13,   383,  1074,   373, 10342,   329,   262,
         4387, 27638,   286, 

In [None]:
tokenizer.decode(output[0])

' In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley,in the Andres Mountains. Even more surprising to theresearchers was that unicorn spoke perfect English.\n\n \n\nThe unicorn herd was discovered by a team of scientists from the University of California, Davis, who were conducting a study on the Andes Mountains. The team was searching for the largest herd of wild horses in the world. The team was led by Dr. David H. Smith, a professor of biology at UC Davis. The team was searching for the largest herd of wild horses in the world. The team'

In [None]:
# beam search Decoding
output=model.generate(ids,max_length=max_length,do_sample=False,num_beams=5)
print(output)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tf.Tensor(
[[  554   257 14702  4917    11 11444  5071   257 27638   286 28000 19942
   2877   287   257  6569   837  4271 31286  1850 19272    11   259   262
    843   411 21124    13  3412   517  6452   284   612   325   283  3533
    373   326 44986  5158  2818  3594    13   220   628   220   220   220
    220   220   220   220   220   220   220   220   220   220   220   220
    220   220   220   220   220   220   220   220   220   220   220   220
    220   220   220   220   220   220   220   220   220   220   220   220
    220   220   220   220   220   220   220   220   220   220   220   220
    220   220   220   220   220   220   220   220   220   220   220   220
    220   220   220   220   220   220   220   220   220   220   220   220
    220   220   220   220   220   220   220 50256]], shape=(1, 128), dtype=int32)


In [None]:
print(tokenizer.decode(output[0]))

 In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley,in the Andres Mountains. Even more surprising to theresearchers was that unicorn spoke perfect English. 

                                                                                  <|endoftext|>


In [None]:
# try to remove the repetition of words
output=model.generate(ids,max_length=max_length,do_sample=False,num_beams=5,no_repeat_ngram_size=2)
print(output)
print(tokenizer.decode(output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tf.Tensor(
[[  554   257 14702  4917    11 11444  5071   257 27638   286 28000 19942
   2877   287   257  6569   837  4271 31286  1850 19272    11   259   262
    843   411 21124    13  3412   517  6452   284   612   325   283  3533
    373   326 44986  5158  2818  3594    13   220   628   220   198   198
    464 44986 27638   373  5071   416   257  1074   286  5519   422   262
   2059   286  3442    11  8909  8742    11   290   262  2351  3250  4809
     13   383  5519   547 14523   257  2050   319   262  3048   286  4258
   1487   319   843 11025 15599    11   618   484  1625  1973   262 27638
     13  1119   547  6655   284  1064   326   262  4695   547  1498   284
  10996   351  1123   584    11   772   996   484   547 11266   416  4138
    286  4608    13   628   198  4821   284   262]], shape=(1, 128), dtype=int32)
 In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley,in the Andres Mountains. Even more surprising to there

In [None]:
# maintaining Temperature
output=model.generate(ids,max_length=max_length,do_sample=False,temperature=2.0,top_k=0)
print(output)
print(tokenizer.decode(output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tf.Tensor(
[[  554   257 14702  4917    11 11444  5071   257 27638   286 28000 19942
   2877   287   257  6569   837  4271 31286  1850 19272    11   259   262
    843   411 21124    13  3412   517  6452   284   262  4837   373   326
  44986  5158  2818  3594    13   220   628   220   198   198   464  4837
    547  6655   284  1064   326   262 28000 19942   547   407   691  1498
    284 10996   351  5384    11   475   635   351  1123   584    13   383
   4837   547  6655   284  1064   326   262 28000 19942   547   407   691
   1498   284 10996   351  5384    11   475   635   351  1123   584    13
    198   198   464  4837   547  1498   284 10996   351   262 28000 19942
    416  1262   257  2041  3335   326  3142   606   284 10996   351   262
   4695    13   383  4837   547  1498   284 10996]], shape=(1, 128), dtype=int32)
 In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley,in the Andres Mountains. Even more surprising to the r

In [None]:
# maintaining Temperature
output=model.generate(ids,max_length=max_length,do_sample=False,temperature=0.5,top_k=0)
print(output)
print(tokenizer.decode(output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tf.Tensor(
[[  554   257 14702  4917    11 11444  5071   257 27638   286 28000 19942
   2877   287   257  6569   837  4271 31286  1850 19272    11   259   262
    843   411 21124    13  3412   517  6452   284   262  4837   373   326
  44986  5158  2818  3594    13   220   628   220   198   198   464  4837
    547  6655   284  1064   326   262 28000 19942   547   407   691  1498
    284 10996   351  5384    11   475   635   351  1123   584    13   383
   4837   547  6655   284  1064   326   262 28000 19942   547   407   691
   1498   284 10996   351  5384    11   475   635   351  1123   584    13
    198   198   464  4837   547  1498   284 10996   351   262 28000 19942
    416  1262   257  2041  3335   326  3142   606   284 10996   351   262
   4695    13   383  4837   547  1498   284 10996]], shape=(1, 128), dtype=int32)
 In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley,in the Andres Mountains. Even more surprising to the r

In [None]:
# top-k samples
output=model.generate(ids,max_length=max_length,do_sample=False,top_k=50)
print(output)
print(tokenizer.decode(output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tf.Tensor(
[[  554   257 14702  4917    11 11444  5071   257 27638   286 28000 19942
   2877   287   257  6569   837  4271 31286  1850 19272    11   259   262
    843   411 21124    13  3412   517  6452   284   262  4837   373   326
  44986  5158  2818  3594    13   220   628   220   198   198   464  4837
    547  6655   284  1064   326   262 28000 19942   547   407   691  1498
    284 10996   351  5384    11   475   635   351  1123   584    13   383
   4837   547  6655   284  1064   326   262 28000 19942   547   407   691
   1498   284 10996   351  5384    11   475   635   351  1123   584    13
    198   198   464  4837   547  1498   284 10996   351   262 28000 19942
    416  1262   257  2041  3335   326  3142   606   284 10996   351   262
   4695    13   383  4837   547  1498   284 10996]], shape=(1, 128), dtype=int32)
 In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley,in the Andres Mountains. Even more surprising to the r

In [None]:
# maintaining probabilities
output=model.generate(ids,max_length=max_length,do_sample=False,top_p=0.90)
print(output)
print(tokenizer.decode(output[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tf.Tensor(
[[  554   257 14702  4917    11 11444  5071   257 27638   286 28000 19942
   2877   287   257  6569   837  4271 31286  1850 19272    11   259   262
    843   411 21124    13  3412   517  6452   284   262  4837   373   326
  44986  5158  2818  3594    13   220   628   220   198   198   464  4837
    547  6655   284  1064   326   262 28000 19942   547   407   691  1498
    284 10996   351  5384    11   475   635   351  1123   584    13   383
   4837   547  6655   284  1064   326   262 28000 19942   547   407   691
   1498   284 10996   351  5384    11   475   635   351  1123   584    13
    198   198   464  4837   547  1498   284 10996   351   262 28000 19942
    416  1262   257  2041  3335   326  3142   606   284 10996   351   262
   4695    13   383  4837   547  1498   284 10996]], shape=(1, 128), dtype=int32)
 In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley,in the Andres Mountains. Even more surprising to the r