# DEEP LEARNING : TRANSFORMERS

In [1]:
# Import de packages

import tensorflow as tf
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

In [2]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Pour éviter les avertissements
model = TFGPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [3]:
# Le texte 

text = "I am Amelie Poulain.  I was born in June 1974.   I lived alone with my father when I was a child.  Now I live in Montmartre. I work in a small café whose name is Les Deux Moulins. I am single, and I used to feel very lonely. I like dipping my hand into grain sacks and throwing stones on the Saint-Martin canal. One day, I dropped a plastic perfume-stopper, which dislodged a wall tile. I discovered an old metal box of childhood memorabilia. This box was hidden by a boy who lived in my apartment decades earlier.  I decide to track down the boy and return the box to him. If you know this boy, you need to come to see me in Montmartre."

In [4]:
input_ids = tokenizer.encode(text, return_tensors='tf')

step 1 

In [52]:
#  Génération du texte avec le parametre beams

beam_output = model.generate(
    input_ids,  
    max_length=300, 
    num_beams=5, 
    early_stopping=True
)

print("Output:\n" + 200 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True)[637 :])

Output:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 If you don't know this boy, you need to come to see me in Montmartre. If you don't know this boy, you need to come to see me in Montmartre. If you don't know this boy, you need to come to see me in Montmartre. If you don't know this boy, you need to come to see me in Montmartre. If you don't know this boy, you need to come to see me in Montmartre. If you don't know this boy, you need to come to see me in Montmartre. If you don't know this boy, you need to come to see


step 2

In [53]:
# Arreter de repéter les phrases

beam_output = model.generate(
    input_ids, 
    max_length=300, 
    num_beams=5, 
    no_repeat_ngram_size=2, 
    early_stopping=True
)

print("Output:\n" + 200 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True)[637 :])

Output:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 It is a beautiful place, but I don't know if I will be able to find him again.

I have been living here for a long time. My father, who lives in Paris, has been here since he was 14 years old. He is the only person I know who has lived here with me for more than 20 years. When I first moved here, my parents were very poor, so I had no money to pay my rent. The only way I could afford to live here was to buy a house in the suburbs of the city. In the summer, when the weather was good, we would go to the market and buy clothes for the


step 3 

In [56]:
tf.random.set_seed(0)

# utiliser la température pour diminuer la sensibilité aux candidats de faible probabilité
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=300, 
    top_k=0, 
    temperature=0.7
)

print("Output:\n" + 200 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True)[637 :])

Output:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  I am at the great university of Montmartre.  I am here to see about the memory of my father as well as the memory of our mother.  I am not a person of the state.  I am a person of the country.  I am a person of the Peace family. I am a person of the Church.  I am a person of the Church of Lebanon.  I am a national of the government of Lebanon.  I am a national of the government of Lebanon.  I am a national of the government of the United States.  I am a national of the government of the United States.  I am a national


step 4 

In [5]:
tf.random.set_seed(0)

# désactivation de l'échantillonnage top_k et échantillonnage uniquement à partir des 92% de mots les plus probables
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=300, 
    top_p=0.92, 
    top_k=0
)

print("Output:\n" + 200 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True)[637 :])

Output:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 I am already well-off.  I enjoy eating at the inn. On the other hand, one of my son has been working in the bank since June 2, 1989.   I have a few friends.  They are very good friends. The bank owner told me that I needed money that had arrived from North America. I received it from an old friend that also worked for that organization. I looked at the remnants of that work. The note from his address read: � A few efects in myself� and the laughter of this poor Tottnest arrived from North America.  I call him R. � "R� for Saint


step 5 

In [68]:
tf.random.set_seed(1)

# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=285, 
    top_k=20, 
    top_p=0.95, 
    num_return_sequences=3
)

print("Output:\n" + 200 * '-')
for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True))[640 :])

Output:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   In order to see him, you have to make sure that he doesn't see me.

I don't want to live alone in my apartment in Montmartre, my home town, but I will try to live with my family in my apartment.   The reason I want to live in this little home town, which is a suburb from Lyon, is so that I might be able to get out of the town for a few years.  I will stay in the apartment and keep things together.   I like living alone in this little apartment, where I can
  I am living in the village of Montmartre in the northern French countryside. I have two children: a daughter, a son, a niece and a niece-in-law. My sister-in-law is a nurse.  I work in the local restaurant with the owner of Café Montmartre. In November 2015, my cousin-in-law died after going into hospital.  He had just left

## GPT2 avec PyTorch

In [17]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Import textwrap library to display context
import textwrap
wrapper = textwrap.TextWrapper(width=80) 

MODEL_NAME = 'gpt2-large'

## Tokenizer


tokenizer = GPT2Tokenizer.from_pretrained(MODEL_NAME)

text = "I am Amelie Poulain.  I was born in June 1974.   I lived alone with my father when I was a child.  Now I live in Montmartre. I work in a small café whose name is Les Deux Moulins. I am single, and I used to feel very lonely. I like dipping my hand into grain sacks and throwing stones on the Saint-Martin canal. One day, I dropped a plastic perfume-stopper, which dislodged a wall tile. I discovered an old metal box of childhood memorabilia. This box was hidden by a boy who lived in my apartment decades earlier.  I decide to track down the boy and return the box to him. If you know this boy, you need to come to see me in Montmartre."

sample_text = text
tokenizer.encode(sample_text, return_tensors='pt')

## Model

model = GPT2LMHeadModel.from_pretrained(MODEL_NAME)

## generation de text
#random.set_seed(0)
def generate_text(initial_text, model, tokenizer, display=False):
    # Generate text
    encoded_input = tokenizer.encode(initial_text, return_tensors='pt')
    outputs = model.generate(
        encoded_input,
        do_sample=True,
        max_length=300,
        top_k=20,
        top_p=0.95,
        temperature=1,
        num_return_sequences=1)
    
    generated_text = []
    for i, token_id in enumerate(outputs):
        generated_text.append(tokenizer.decode(token_id, skip_special_tokens=True))

    generated_text = ''.join(generated_text)

    # Afficage
    if display:
        print('='*21)
        print('='*6, 'INITIAL', '='*6)
        print(initial_text)

        print('='*21)
        print('='*8, 'TEXT', '='*7)
        print(wrapper.fill(generated_text)[637 :])
    else:
        return generated_text

.0

generate_text(text, model, tokenizer, display=True)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I am Amelie Poulain.  I was born in June 1974.   I lived alone with my father when I was a child.  Now I live in Montmartre. I work in a small café whose name is Les Deux Moulins. I am single, and I used to feel very lonely. I like dipping my hand into grain sacks and throwing stones on the Saint-Martin canal. One day, I dropped a plastic perfume-stopper, which dislodged a wall tile. I discovered an old metal box of childhood memorabilia. This box was hidden by a boy who lived in my apartment decades earlier.  I decide to track down the boy and return the box to him. If you know this boy, you need to come to see me in Montmartre.
  I will send you a photo of me from my old age so your curiosity
won't go out of control.  You see, it's not only the boy's face. If you are able
to locate him, I promise you an unforgettable experience.  The boy's name is
Louis-Philippe de Castiglioni.  You can call me "L.P.", since everyone knows my
family name. I will introduce myself as "Poulain" to you, 