# Text generation using Transformers and Hugging Face 

## Credits: https://github.com/raghavbali/text_generation/blob/master/notebooks/text_generation_03.ipynb

### Install Transformers

In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.2.0-py3-none-any.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 1.9 MB/s eta 0:00:01
Collecting sacremoses
  Downloading sacremoses-0.0.43.tar.gz (883 kB)
[K     |████████████████████████████████| 883 kB 17.1 MB/s eta 0:00:01
[?25hCollecting tokenizers==0.9.4
  Downloading tokenizers-0.9.4-cp37-cp37m-macosx_10_11_x86_64.whl (2.0 MB)
[K     |████████████████████████████████| 2.0 MB 15.1 MB/s eta 0:00:01
Collecting regex!=2019.12.17
  Downloading regex-2020.11.13-cp37-cp37m-macosx_10_9_x86_64.whl (284 kB)
[K     |████████████████████████████████| 284 kB 4.0 MB/s eta 0:00:01
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25ldone
[?25h  Created wheel for sacremoses: filename=sacremoses-0.0.43-py3-none-any.whl size=893258 sha256=f84064dfd23377380bb15d56152bda7f37f1729bd7a12810828ed3b932f9b388
  Stored in directory: /Users/sumedha/Library/Caches/pip/wheels/69/09/d1/

### Import Libraries

In [2]:
import tensorflow as tf
import transformers
from numpy import random
from transformers import (TFGPT2LMHeadModel,
                          GPT2Tokenizer,
                          GPT2Config)

In [3]:
print("tf version={}".format(tf.__version__))
print("huggingface/transformer version={}".format(transformers.__version__))

tf version=2.2.0
huggingface/transformer version=4.2.0


### Model Setup

In [4]:
model_name = "gpt2-medium"
config = GPT2Config.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = TFGPT2LMHeadModel.from_pretrained(model_name, config=config)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=718.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1419628976.0, style=ProgressStyle(descr…




All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2-medium.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


### Generate Text

In [5]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode('Watson you are', return_tensors='tf')
input_ids

<tf.Tensor: shape=(1, 4), dtype=int32, numpy=array([[   54, 13506,   345,   389]], dtype=int32)>

#### Greedy Decoding

In [9]:
# generate text until the output length (which includes the context length) reaches 50
greedy_output = model.generate(input_ids, max_length=30)

print("Output:\n" + 110 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


Output:
--------------------------------------------------------------------------------------------------------------
Watson you are a great guy and I hope you are doing well. I am sorry to hear that you are not able to attend the upcoming event


#### Sampled Decoding

In [12]:
tf.random.set_seed(0)

# Use a combination of decoding techniques
sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=100, 
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=5
)

#print("Output:\n" + 110 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
    print("-"*110)

Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


0: Watson you are in an accident?

Pam: Oh yes, and I would tell you, but I'm the only one around who can see me. So you know, it's not an accident, I am at the place where I am, so you know I'm still in an accident." Watson: "Are you sure that's what you're saying?"

Pam: "Yes sir."

Pam Watson: "I don't think you're saying all
--------------------------------------------------------------------------------------------------------------
1: Watson you are the greatest football fan ever, what can you say in your defence?

"I could probably give you one reason why I love being a fan of rugby union so much.

"I'm not going to pretend to agree with most of their players, and some players are a bit weird when it comes to the whole thing but they still love their sport so I'm sure this is just a silly way of saying they love football to some degree but I'm going to
--------------------------------------------------------------------------------------------------------------
2: Watson you are