# Exemplos com Transformers

## Dependências

In [None]:
%pip install transformers
%pip install sentencepiece

## Transformer encoder: BERT

In [None]:
from transformers import BertForTokenClassification, BertTokenizer, pipeline

model = BertForTokenClassification.from_pretrained('monilouise/ner_news_portuguese')
tokenizer = BertTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased',
                                          model_max_length=512,
                                          do_lower_case=False)

nlp = pipeline('ner', model=model, tokenizer=tokenizer)
result = nlp("O Tribunal de Contas da União é localizado em Brasília.", 
             ignore_labels=[])

for token in result:
  print(token['entity'], "\t", token['word'])

O 	 O
B-PUB 	 Tribunal
I-PUB 	 de
I-PUB 	 Contas
I-PUB 	 da
L-PUB 	 União
O 	 é
O 	 localizado
O 	 em
B-LOC 	 Brasília
O 	 .


In [None]:
from transformers import BertTokenizer, BertForTokenClassification
import torch

model = BertForTokenClassification.from_pretrained("monilouise/ner_news_portuguese")
tokenizer = BertTokenizer.from_pretrained("neuralmind/bert-base-portuguese-cased")

inputs = tokenizer(
    "O Tribunal de Contas da União é localizado em Brasília", 
    add_special_tokens=False, 
    return_tensors="pt"
)

with torch.no_grad():
    logits = model(**inputs).logits

predicted_token_class_ids = logits.argmax(-1)
predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
print(predicted_tokens_classes)

['O', 'B-PUB', 'I-PUB', 'I-PUB', 'I-PUB', 'L-PUB', 'O', 'O', 'O', 'B-LOC']


## Decoder: GPT-2

In [None]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2", model_max_length=256)

prompt = "Tell me where is Brazil and what is it?"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(inputs.input_ids, max_length=40)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Tell me where is Brazil and what is it?

Brazil is a country that has been in the forefront of the development of the world's largest economy. It is a country that has been in


## Decoder OPT-1.3b

In [None]:
from transformers import GPT2Tokenizer, OPTForCausalLM

model = OPTForCausalLM.from_pretrained("facebook/opt-1.3b")
tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt-1.3b")

prompt = "Tell me where is Brazil and what is it?"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
output = model.generate(inputs.input_ids, max_length=40)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Tell me where is Brazil and what is it?
It's a country in South America.
I know, but what is it?
It's a country in South America.
I


## Transformer encoder-decoder: T5

O treinamento do T5 inclui Inglês => Francês

Do Google Tradutor:

**Inglês**: I'm a medical student because I want to help people.

**Francês**: Je suis étudiant en médecine parce que je veux aider les gens

In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained("t5-small")
tokenizer = T5Tokenizer.from_pretrained("t5-small", model_max_length=256)

input_ids = tokenizer("translate English to French: \
                       I'm a medical student because I want to help people.", 
                      return_tensors="pt").input_ids

outputs = model.generate(input_ids)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Je suis étudiant en médecine parce que je veux aider les gens.


## Testes com OPT-1.3b

In [None]:
from transformers import GPT2Tokenizer, OPTForCausalLM

model = OPTForCausalLM.from_pretrained("facebook/opt-1.3b")
tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt-1.3b")

In [None]:
prompt = "How many legs does have a car?"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
output = model.generate(inputs.input_ids, max_length=40)

print(tokenizer.decode(output[0], skip_special_tokens=True))

How many legs does have a car?
I think it's a car with legs.


In [None]:
prompt = "Do you have a soul?"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
output = model.generate(inputs.input_ids, max_length=40)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Do you have a soul?
I do not. I am a ghost. I am a ghost of a ghost. I am a ghost of a ghost. I am a ghost of a ghost


In [None]:
prompt = "What is a neural network?"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
output = model.generate(inputs.input_ids, max_length=40)

print(tokenizer.decode(output[0], skip_special_tokens=True))

What is a neural network?

A neural network is a type of artificial intelligence that is trained to learn from data. It is a type of artificial intelligence that is trained to learn from data


In [None]:
prompt = "How many eyes have a house?"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
output = model.generate(inputs.input_ids, max_length=40)

print(tokenizer.decode(output[0], skip_special_tokens=True))

How many eyes have a house?

The answer is: none.

The eyes of a house are the windows.

The windows of a house are the doors.




In [None]:
prompt = "Who is your father?"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
output = model.generate(inputs.input_ids, max_length=40)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Who is your father?
I don't know.
I don't know who he is.
I don't know anything about him.
I don't know anything about him.



In [None]:
prompt = "What is the size of the Earth?"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
output = model.generate(inputs.input_ids, max_length=40)

print(tokenizer.decode(output[0], skip_special_tokens=True))

What is the size of the Earth?

The Earth is a sphere, about the size of a football field. It is the only planet in the solar system that is round.


