In [8]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

#load pretrained model/tokenizer
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

Downloading model.safetensors: 100%|██████████| 268M/268M [00:08<00:00, 30.2MB/s] 


In [30]:
import torch

sequence1 = "Testing how a transformer model works with hugging face."
seq_batch = [sequence1, sequence1]
tokens = [tokenizer.tokenize(sequence) for sequence in seq_batch]
ids = [tokenizer.convert_tokens_to_ids(token_list) for token_list in tokens]
input = torch.tensor(ids)
model_out = model(input, attention_mask=torch.ones(input.shape, dtype=torch.long))
print("Tokens:", tokens)
print("IDs:", input)
print("Output:", model_out.logits)
print(model_out.logits.shape)


Tokens: [['testing', 'how', 'a', 'transform', '##er', 'model', 'works', 'with', 'hugging', 'face', '.'], ['testing', 'how', 'a', 'transform', '##er', 'model', 'works', 'with', 'hugging', 'face', '.']]
IDs: tensor([[ 5604,  2129,  1037, 10938,  2121,  2944,  2573,  2007, 17662,  2227,
          1012],
        [ 5604,  2129,  1037, 10938,  2121,  2944,  2573,  2007, 17662,  2227,
          1012]])
Output: tensor([[ 1.7090, -1.3880],
        [ 1.7090, -1.3880]], grad_fn=<AddmmBackward0>)
torch.Size([2, 2])


In [35]:
sequences = ['Testing how transformers work.', 'This sentence comes after the first one.']
model_inputs = tokenizer(sequences, padding='longest', truncation=True, return_tensors="pt")
model_out = model(**model_inputs)
print('model_inputs:', model_inputs)
print('logits:',model_out.logits)
scores = torch.nn.functional.softmax(model_out.logits, dim=1)
print('classification:', scores)

model_inputs: {'input_ids': tensor([[  101,  5604,  2129, 19081,  2147,  1012,   102,     0,     0,     0],
        [  101,  2023,  6251,  3310,  2044,  1996,  2034,  2028,  1012,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
logits: tensor([[-0.1780,  0.3090],
        [ 1.9908, -1.6774]], grad_fn=<AddmmBackward0>)
classification: tensor([[0.3806, 0.6194],
        [0.9751, 0.0249]], grad_fn=<SoftmaxBackward0>)


GPT-2 Sample:

In [36]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

Simple pipeline version:

In [43]:
from transformers import pipeline, set_seed

generator = pipeline('text-generation', model='gpt2')
set_seed(42)
generator("The first thing to learn about dodgeball is", max_length=50, num_return_sequences=5)

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'The first thing to learn about dodgeball is how simple it is. A block is either a vertical or a horizontal block with a horizontal, vertical or horizontal defender. Your opponent should be well past this mark, with the ability to dodge this pass,'},
 {'generated_text': 'The first thing to learn about dodgeball is to look at the angles and how long the ball must travel before you can hit the ground. It is important to know which angles to use.\n\nIn this page we teach you how to set it'},
 {'generated_text': "The first thing to learn about dodgeball is that you'll have to do some reading before you can spot the perfect spot to use your feet. In order to find a perfect spot, you usually want to take breaks from the pool and the practice game"},
 {'generated_text': "The first thing to learn about dodgeball is you can do a simple block of time, which, frankly, doesn't look anything like the skill you learn in school. Take the long approach to dodgeball. This is somethi

More complete version. Experimenting with different text generation strategies:

In [57]:
sequence = "Post Malone has postponed the call on the phone with Al Capone about his metronome."

tokenizer.pad_token_id = tokenizer.eos_token_id
input = tokenizer.encode(sequence, return_tensors="pt")
print('Input:', input)
generated = model.generate(input, max_length=50, do_sample=False, num_beams=1, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('Greedy:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, do_sample=False, num_beams=5, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('Beam Search:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, do_sample=True, num_beams=1, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('Sampling:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, do_sample=True, num_beams=5, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('Beam+Sampling:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, penalty_alpha=0.6, top_k=4, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('ContrastiveSearch:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, num_beam_groups=5, num_beams=5, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('DiverseBeam:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, do_sample=True, temperature=1.5, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('SamplingHiTemp:', tokenizer.decode(generated[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Input: tensor([[ 6307, 40735,   468, 33922,   262,   869,   319,   262,  3072,   351,
           978,  4476,   505,   546,   465,  1138,  1313,   462,    13]])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Greedy: Post Malone has postponed the call on the phone with Al Capone about his metronome.

"I'm not going to talk to him about it," Malone said. "I'm not going to talk to him about it. I'm


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Beam Search: Post Malone has postponed the call on the phone with Al Capone about his metronome.

"I don't want to talk about it," Malone said. "I don't want to talk about it. I don't want to talk


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Sampling: Post Malone has postponed the call on the phone with Al Capone about his metronome. Al's spokesman said the actor declined to talk to Malone about the situation.

ESPN Radio has been conducting an investigation and interviewed Malone about the story,


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Beam+Sampling: Post Malone has postponed the call on the phone with Al Capone about his metronome.

"I don't know what he's going to do," Malone said. "I don't know what he's going to do."




Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


ContrastiveSearch: Post Malone has postponed the call on the phone with Al Capone about his metronome.

"I'm going to have to get back to New York," Malone said. "I'm going to have to get back to New York.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


DiverseBeam: Post Malone has postponed the call on the phone with Al Capone about his metronome.

"I'm not going to talk to him about it," Malone said. "I'm not going to talk to him about it. I'm
SamplingHiTemp: Post Malone has postponed the call on the phone with Al Capone about his metronome. After all, who was the first one ever born on the outside of the dome? That would certainly mean that the Al-Wasl, the "Old


Sampling and SamplingHiTemp seem to give the best results. The other variants have a large amount of repetition.

Retrying the experiment with the larger GPT-2 model to see if the trend is the same or different:

In [63]:
checkpoint = "gpt2-large"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, cache_dir='E:\\cache')
model = AutoModelForCausalLM.from_pretrained(checkpoint, cache_dir='E:\\cache')
tokenizer.pad_token_id = tokenizer.eos_token_id
input = tokenizer.encode(sequence, return_tensors="pt")
print('Input:', input)
generated = model.generate(input, max_length=50, do_sample=False, num_beams=1, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('Greedy:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, do_sample=False, num_beams=5, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('Beam Search:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, do_sample=True, num_beams=1, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('Sampling:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, do_sample=True, num_beams=5, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('Beam+Sampling:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, penalty_alpha=0.6, top_k=4, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('ContrastiveSearch:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, num_beam_groups=5, num_beams=5, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('DiverseBeam:', tokenizer.decode(generated[0], skip_special_tokens=True))
generated = model.generate(input, max_length=50, do_sample=True, temperature=1.5, attention_mask=torch.ones(input.shape, dtype=torch.long))
print('SamplingHiTemp:', tokenizer.decode(generated[0], skip_special_tokens=True))

Downloading (…)lve/main/config.json: 100%|██████████| 666/666 [00:00<?, ?B/s] 
Downloading (…)olve/main/vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 43.4MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 65.2MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 19.6MB/s]
Downloading model.safetensors: 100%|██████████| 3.25G/3.25G [01:54<00:00, 28.3MB/s]
Downloading (…)neration_config.json: 100%|██████████| 124/124 [00:00<00:00, 124kB/s]
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Input: tensor([[ 6307, 40735,   468, 33922,   262,   869,   319,   262,  3072,   351,
           978,  4476,   505,   546,   465,  1138,  1313,   462,    13]])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Greedy: Post Malone has postponed the call on the phone with Al Capone about his metronome.

"I'm not going to call him," Malone said. "I'm not going to call him. I'm not going to call him.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Beam Search: Post Malone has postponed the call on the phone with Al Capone about his metronome.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Sampling: Post Malone has postponed the call on the phone with Al Capone about his metronome. The legendary music mogul has never heard from the New York City rap star since a few days after a New York Newsday article on the New York Post's


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Beam+Sampling: Post Malone has postponed the call on the phone with Al Capone about his metronome.

"I don't want to get into it," Malone said. "I don't want to get into it. I don't want to get


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


ContrastiveSearch: Post Malone has postponed the call on the phone with Al Capone about his metronome.

Capone is in the middle of a prison sentence for racketeering in New York, according to TMZ. The rap legend's rep says he's


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


DiverseBeam: Post Malone has postponed the call on the phone with Al Capone about his metronome.

"I'm not going to call him," Malone said. "I'm not going to call him. I'm not going to call him.
SamplingHiTemp: Post Malone has postponed the call on the phone with Al Capone about his metronome. A few hours later, I've called to book a taxi for someone who can deliver our gear. Capone knows. Or at least, he did.


Sampling provides the best result in this case. A lot of the issues from the smaller GPT-2 model are present in the larger model. This time, ContrastiveSearch provides more contextual information that is valuable, and SamplingHiTemp provides a weird output.