<a href="https://colab.research.google.com/github/jonas-jun/DL_scratch2_study/blob/master/transformers_tasks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Practice using transformers in short
by Junmai, Nov. 2020.
***

[reference: Huggingface Transformers task summary](https://huggingface.co/transformers/task_summary.html)
  
  1. Sentiment Analysis
  2. Extractive Question Answering
  3. Language Modeling
    - paraphrase
    - masked token
  4. Translation
  5. Summarization

In [None]:
!pip install transformers

In [2]:
from transformers import pipeline

## Sentiment Analysis

In [3]:
nlp = pipeline('sentiment-analysis')
result = nlp(['I hate you', 'I love you'])

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=629.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=267844284.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




In [4]:
result # 데이터 구조

[{'label': 'NEGATIVE', 'score': 0.9991129040718079},
 {'label': 'POSITIVE', 'score': 0.9998656511306763}]

In [5]:
for i in result:
    print('label: {}, with score: {:0.4}%'.format(i['label'], i['score']*100))

label: NEGATIVE, with score: 99.91%
label: POSITIVE, with score: 99.99%


## Extractive Question Answering

In [6]:
nlp = pipeline('question-answering')
context = r'''
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
'''

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=473.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=260793700.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




In [7]:
result1 = nlp(question='What is extractive question answering?', context=context)
result2 = nlp(question='What is a good example of a question answering data?', context=context)



In [8]:
context2 = """
Hayuen's nickname is Bbogui. She was born in 1994. Now she is writing a post on her blog which name is 'B_blog' in the room. She is extremely cute.
"""
result3 = nlp(question='What is Hayuen doing?', context=context2)
result4 = nlp(question="What is her blog's name?", context=context2)
result5 = nlp(question='Where is she?', context=context2)
result6 = nlp(question='what is her nickname?', context=context2)



In [9]:
print(result3, '\n', result4, '\n', result5, '\n', result6)

{'score': 0.6958386301994324, 'start': 63, 'end': 89, 'answer': 'writing a post on her blog'} 
 {'score': 0.6217727065086365, 'start': 104, 'end': 112, 'answer': "'B_blog'"} 
 {'score': 0.30579933524131775, 'start': 81, 'end': 89, 'answer': 'her blog'} 
 {'score': 0.9929165244102478, 'start': 22, 'end': 29, 'answer': 'Bbogui.'}


## Language Modeling
1. masked token
2. whether a sentence is continuous or not

In [10]:
nlp = pipeline('fill-mask')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=480.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=331070498.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…




Some weights of RobertaForMaskedLM were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['lm_head.decoder.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
from pprint import pprint
pprint(nlp(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks."))

[{'score': 0.17927460372447968,
  'sequence': '<s>HuggingFace is creating a tool that the community uses to '
              'solve NLP tasks.</s>',
  'token': 3944,
  'token_str': 'Ġtool'},
 {'score': 0.1134939044713974,
  'sequence': '<s>HuggingFace is creating a framework that the community uses '
              'to solve NLP tasks.</s>',
  'token': 7208,
  'token_str': 'Ġframework'},
 {'score': 0.05243545398116112,
  'sequence': '<s>HuggingFace is creating a library that the community uses to '
              'solve NLP tasks.</s>',
  'token': 5560,
  'token_str': 'Ġlibrary'},
 {'score': 0.03493543714284897,
  'sequence': '<s>HuggingFace is creating a database that the community uses '
              'to solve NLP tasks.</s>',
  'token': 8503,
  'token_str': 'Ġdatabase'},
 {'score': 0.02860247902572155,
  'sequence': '<s>HuggingFace is creating a prototype that the community uses '
              'to solve NLP tasks.</s>',
  'token': 17715,
  'token_str': 'Ġprototype'}]


In [12]:
text1 = "Hayuen's nickname is Bbogui. She was born in 1994. Now she is {} a post on her blog which name is 'B_blog' in the room. She is extremely cute."\
    .format(nlp.tokenizer.mask_token)
pprint(nlp(text1), width=100)

[{'score': 0.6959057450294495,
  'sequence': "<s>Hayuen's nickname is Bbogui. She was born in 1994. Now she is writing a post on "
              "her blog which name is 'B_blog' in the room. She is extremely cute.</s>",
  'token': 2410,
  'token_str': 'Ġwriting'},
 {'score': 0.08015522360801697,
  'sequence': "<s>Hayuen's nickname is Bbogui. She was born in 1994. Now she is posting a post on "
              "her blog which name is 'B_blog' in the room. She is extremely cute.</s>",
  'token': 6016,
  'token_str': 'Ġposting'},
 {'score': 0.030806194990873337,
  'sequence': "<s>Hayuen's nickname is Bbogui. She was born in 1994. Now she is making a post on "
              "her blog which name is 'B_blog' in the room. She is extremely cute.</s>",
  'token': 442,
  'token_str': 'Ġmaking'},
 {'score': 0.021556127816438675,
  'sequence': "<s>Hayuen's nickname is Bbogui. She was born in 1994. Now she is typing a post on "
              "her blog which name is 'B_blog' in the room. She is extrem

In [13]:
text2 = "Hayuen's nickname is {}. She was born in 1994. Now she is writing a post on her blog which name is 'B_blog' in the room. She is extremely cute."\
    .format(nlp.tokenizer.mask_token)
pprint(nlp(text2), width=100)

[{'score': 0.0527629591524601,
  'sequence': "<s>Hayuen's nickname is B. She was born in 1994. Now she is writing a post on her "
              "blog which name is 'B_blog' in the room. She is extremely cute.</s>",
  'token': 163,
  'token_str': 'ĠB'},
 {'score': 0.017350677400827408,
  'sequence': "<s>Hayuen's nickname is Bella. She was born in 1994. Now she is writing a post on "
              "her blog which name is 'B_blog' in the room. She is extremely cute.</s>",
  'token': 13172,
  'token_str': 'ĠBella'},
 {'score': 0.015132908709347248,
  'sequence': "<s>Hayuen's nickname is Barbie. She was born in 1994. Now she is writing a post on "
              "her blog which name is 'B_blog' in the room. She is extremely cute.</s>",
  'token': 31304,
  'token_str': 'ĠBarbie'},
 {'score': 0.010320466011762619,
  'sequence': "<s>Hayuen's nickname is Barbara. She was born in 1994. Now she is writing a post on "
              "her blog which name is 'B_blog' in the room. She is extremely cute

2. whether a sentence is continuous or not: paraphrase or not

In [14]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

In [15]:
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased-finetuned-mrpc')
model = AutoModelForSequenceClassification.from_pretrained('bert-base-cased-finetuned-mrpc', return_dict=True)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433297515.0, style=ProgressStyle(descri…




In [16]:
classes = ['Not paraphrase', 'is paraphrase']

sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

paraphrase = tokenizer(sequence_0, sequence_2, return_tensors='pt')
not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors='pt')
print(paraphrase) # 결과에서 attention_mask는 무슨 뜻?, #return_tensors는 파이토치로 받아달라는 뜻인가?
print(not_paraphrase)

{'input_ids': tensor([[  101,  1109,  1419, 20164, 10932,  2271,  7954,  1110,  1359,  1107,
          1203,  1365,  1392,   102, 20164, 10932,  2271,  7954,   112,   188,
          3834,  1132,  3629,  1107,  6545,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1]])}
{'input_ids': tensor([[  101,  1109,  1419, 20164, 10932,  2271,  7954,  1110,  1359,  1107,
          1203,  1365,  1392,   102,  7302,  1116,  1132,  2108,  2213,  1111,
          1240,  2332,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


In [17]:
paraphrase_classification_logits = model(**paraphrase, return_dict=True).logits
not_paraphrase_classification_logits = model(**not_paraphrase, return_dict=True).logits

In [18]:
print(paraphrase_classification_logits)
print(not_paraphrase_classification_logits)

tensor([[-0.3495,  1.9004]], grad_fn=<AddmmBackward>)
tensor([[ 0.5386, -2.2197]], grad_fn=<AddmmBackward>)


In [19]:
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0]

In [20]:
for i in range(len(classes)):
    print('{}: {}%'.format(classes[i], int(round(100*paraphrase_results[i]))))
for i in range(len(classes)):
    print('{}: {}%'.format(classes[i], int(round(100*not_paraphrase_results[i]))))

Not paraphrase: 10%
is paraphrase: 90%
Not paraphrase: 94%
is paraphrase: 6%


## Masked Language Modeling (token)

Here is an example of doing masked language modeling using a model and a tokenizer. The process is the following:

1. Instantiate a tokenizer and a model from the checkpoint name. The model is
   identified as a DistilBERT model and loads it with the weights stored in the
   checkpoint.
2. Define a sequence with a masked token, placing the `tokenizer.mask_token` instead of a word.
3. Encode that sequence into a list of IDs and find the position of the masked token in that list.
4. Retrieve the predictions at the index of the mask token: this tensor has the
   same size as the vocabulary, and the values are the scores attributed to each
   token. The model gives higher score to tokens it deems probable in that
   context.
5. Retrieve the top 5 tokens using the PyTorch `topk` or TensorFlow `top_k` methods.
6. Replace the mask token by the tokens and print the results

In [21]:
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained('distilbert-base-cased')
model = AutoModelForMaskedLM.from_pretrained('distilbert-base-cased')
sequence = 'Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {} our carbon footprint.'\
            .format(tokenizer.mask_token)
input = tokenizer.encode(sequence, return_tensors='pt')
mask_token_index = torch.where(input == tokenizer.mask_token_id)[1]

token_logits = model(input, return_dict=True).logits
mask_token_logits = token_logits[0, mask_token_index, :] # token_logits[0][23][:] len:28,996
top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist() # [4851, 2773, 9711, 18134, 4507]


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=411.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=263273408.0, style=ProgressStyle(descri…




In [22]:
for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.


## Causal Language Modeling
the task of predicting the token following a sequence of tokens. only attends to the left context.

In [23]:
from transformers import AutoModelForCausalLM, AutoTokenizer, top_k_top_p_filtering
from torch.nn import functional as F

tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForCausalLM.from_pretrained('gpt2')

sequence = f"Hugging Face is based in DUMBO, New York City, and"
input_ids = tokenizer.encode(sequence, return_tensors='pt')
next_token_logits = model(input_ids)[0][:,-1,:] # get logits of last hidden state, [1, 50257]

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=665.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=548118077.0, style=ProgressStyle(descri…




In [24]:
print(input_ids)
print(next_token_logits)
print(next_token_logits.shape)

tensor([[48098,  2667, 15399,   318,  1912,   287,   360,  5883,  8202,    11,
           968,  1971,  2254,    11,   290]])
tensor([[-110.4283, -110.4649, -112.1735,  ..., -115.2661, -113.5502,
         -107.8964]], grad_fn=<SliceBackward>)
torch.Size([1, 50257])


In [25]:
print('next_token_logits: {}'.format(next_token_logits))
# filter
filtered_next_token_logits = top_k_top_p_filtering(next_token_logits.clone().detach(), 
                                                   top_k=50, top_p=1.0)
print('filtered_next_token_logits: {}'.format(filtered_next_token_logits))
# sample
probs = F.softmax(filtered_next_token_logits, dim=-1)
print('probs: {}'.format(probs))
next_token = torch.multinomial(probs, num_samples=1, generator=torch.Generator())
print('next_token: {}'.format(next_token))
generated = torch.cat([input_ids, next_token], dim=-1)
resulting_string = tokenizer.decode(generated.tolist()[0])

next_token_logits: tensor([[-110.4283, -110.4649, -112.1735,  ..., -115.2661, -113.5502,
         -107.8964]], grad_fn=<SliceBackward>)
filtered_next_token_logits: tensor([[-inf, -inf, -inf,  ..., -inf, -inf, -inf]])
probs: tensor([[0., 0., 0.,  ..., 0., 0., 0.]])
next_token: tensor([[318]])


In [26]:
print(resulting_string)

Hugging Face is based in DUMBO, New York City, and is


In [27]:
target_idx = torch.multinomial(probs, num_samples=1)
print(target_idx.item())
print(tokenizer.decode([target_idx.item()]))

3033
 features


In [28]:
max_idx = torch.argmax(probs)
print(max_idx.item())
print(tokenizer.decode([max_idx]))

318
 is


In [29]:
idxs = torch.multinomial(probs, num_samples=30, replacement=True)
idxs

tensor([[  262,   460,   468,   318, 44119,   468,   318,  4635,   340,  2716,
           318,   373,  4539,   262,  3199,  8096,  3033,  3033,  4539,   318,
          4539,   318,   318, 10874,   318,  4635,   318,   318,  3033,   318]])

## Translation
- an example of translation dataset is the WMT English to German dataset

In [30]:
from transformers import pipeline
translator = pipeline('translation_en_to_de')
print(translator('Hugging Face is a technology company based in New York and Paris', max_length=40))

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891691430.0, style=ProgressStyle(descri…




Some weights of T5Model were not initialized from the model checkpoint at t5-base and are newly initialized: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…


[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]


Because the translation pipeline depends on the `PretrainedModel.generate()` method, we can override the default arguments
of `PretrainedModel.generate()` directly in the pipeline as is shown for `max_length` above.

Here is an example of doing translation using a model and a tokenizer. The process is the following:

1. Instantiate a tokenizer and a model from the checkpoint name. Translation is usually done using an encoder-decoder model, such as `Bart` or `T5`.
2. Define the article that should be translated.
3. Add the T5 specific prefix "translate English to German: "
4. Use the `PretrainedModel.generate()` method to perform the translation.

In [35]:
from transformers import AutoModelWithLMHead, AutoTokenizer
model = AutoModelWithLMHead.from_pretrained('t5-base')
tokenizer = AutoTokenizer.from_pretrained('t5-base')
sentence = 'translate English to German: Hugging Face is a technology company based in New York and Paris'
inputs = tokenizer.encode(sentence, return_tensors='pt')
outputs = model.generate(input_ids=inputs, max_length=40, num_beams=4, early_stopping=True)



In [38]:
print('output tensors: {}'.format(outputs))
print('tranlated: {}'.format(tokenizer.decode(outputs[0])))

output tensors: tensor([[    0, 11560,  3896,  8881,   229,   236,     3, 14366, 15377,   181,
         11216,    16,   368,  1060,    64,  1919,     5,     1]])
tranlated: Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.


In [33]:
inputs

tensor([[13959,  1566,    12,  2968,    10, 11560,  3896,  8881,    19,     3,
             9,   748,   349,     3,   390,    16,   368,  1060,    11,  1919,
             1]])

## Summarization
- an example of a summarization dataset is the CNN / Daily Mail dataset.
- pipeline: It leverages a Bart model that was fine-tuned on the CNN / Daily Mail dataset.

In [39]:
from transformers import pipeline
summarizer = pipeline("summarization")
ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1621.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1222317369.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=26.0, style=ProgressStyle(description_w…




In [42]:
summed = summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)
len(summed)

1

In [49]:
summed[0]['summary_text']

' Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002 . At one time, she was married to eight men at once, prosecutors say .'

In [66]:
from transformers import AutoModelWithLMHead, AutoTokenizer
model = AutoModelWithLMHead.from_pretrained('t5-base')
tokenizer = AutoTokenizer.from_pretrained('t5-base')
# T5 uses a max_length of 512 so we cut the article to 512 tokens.
inputs = tokenizer.encode(ARTICLE, return_tensors='pt', max_length=512) # type of outputs: 'list' if not return_tensors
outputs = model.generate(input_ids=inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [69]:
print('type of input: {}, length of input: {}'.format(type(inputs), len(inputs[0])))
print(inputs)

type of input: <class 'torch.Tensor'>, length of input: 512
tensor([[  368,  1060,    41,   254, 17235,    61, 10555,   301, 13662,  1386,
          3483,   235,     7,    47,  1902,   203,   625,     6,   255,   530,
          4464,    16,  1244, 13263,  1334,     6,   368,  1060,     5,    71,
           215,   865,     6,   255,   530,  4464,   541,    16,  1244, 13263,
          1334,     6,    68,    12,     3,     9,   315,   388,    11,   406,
          1227,  1967,    75,    53,   160,   166,  2553,     5,  3462,   507,
           477,   227,    24,  5281,     6,   255,   530,  1560,  4513,   780,
           541,     5,    37,    29,     6,  1386,  3483,   235,     7, 10126,
            96,   196,   103,   121,   874,    72,   648,     6,  1664,   163,
           441,   192,  1274,    13,   284,   119,     5,    86,  8693,   255,
          4464,   728,    72,     6,    48,    97,    16,     8,  4027,    29,
           226,     5,    86,    46,   917,    21,     3,     9,  5281,

In [70]:
print(outputs[0])
print(tokenizer.decode(outputs[0]))

tensor([    0, 32099,   874,    72,   648,     6,  1664,   441,   192,  1274,
           13,   284,   119,     5,    37,    29,     6,   507,   477,   865,
            6,   255,   530,  1560,  4513,   541,    16,     8,  4027,    29,
          226,     3,     5,    86,  8693,   255,  4464,   541,     6,    68,
          406,  1227,  1967,    75,    53,   160,   166,  2553,     3,     5,
           37,  5281,     7,   130,   294,    13,    46, 10653, 13236,     6,
            3, 29905,   497,     3,     5,  1386,  3483,   235,     7,     3,
        30827,    26,    59, 10945,    44,  1015,  8531,  2243,    16,     8,
         4027,    29,   226,    30,  1701,     3,     5,     1])
<extra_id_0> five more times, sometimes within two weeks of each other. Then, 18 days later, she got hitched again in the Bronx. In 2010, she married again, but without divorcing her first husband. The marriages were part of an immigration scam, prosecutors say. Barrientos pleaded not guilty at State Supreme C