# Chapter 1: Hello Transformers

In [1]:
import pandas as pd

from transformers import pipeline




## A Tour of Transformers Applications

In [2]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

### Text Classifications

In [3]:
classifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.





All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


In [4]:
outputs = classifier(text)

In [5]:
outputs

[{'label': 'NEGATIVE', 'score': 0.9015461206436157}]

In [6]:
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.901546


### Named Entity Recognition

In [7]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFBertForTokenClassification.

All the weights of TFBertForTokenClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.


In [8]:
outputs = ner_tagger(text)

In [9]:
outputs

[{'entity_group': 'ORG',
  'score': 0.87901014,
  'word': 'Amazon',
  'start': 5,
  'end': 11},
 {'entity_group': 'MISC',
  'score': 0.9908588,
  'word': 'Optimus Prime',
  'start': 36,
  'end': 49},
 {'entity_group': 'LOC',
  'score': 0.9997547,
  'word': 'Germany',
  'start': 90,
  'end': 97},
 {'entity_group': 'MISC',
  'score': 0.556568,
  'word': 'Mega',
  'start': 208,
  'end': 212},
 {'entity_group': 'PER',
  'score': 0.590257,
  'word': '##tron',
  'start': 212,
  'end': 216},
 {'entity_group': 'ORG',
  'score': 0.66969234,
  'word': 'Decept',
  'start': 253,
  'end': 259},
 {'entity_group': 'MISC',
  'score': 0.49834955,
  'word': '##icons',
  'start': 259,
  'end': 264},
 {'entity_group': 'MISC',
  'score': 0.7753613,
  'word': 'Megatron',
  'start': 350,
  'end': 358},
 {'entity_group': 'MISC',
  'score': 0.98785394,
  'word': 'Optimus Prime',
  'start': 367,
  'end': 380},
 {'entity_group': 'PER',
  'score': 0.81209606,
  'word': 'Bumblebee',
  'start': 502,
  'end': 511}]

In [10]:
pd.DataFrame(outputs)

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556568,Mega,208,212
4,PER,0.590257,##tron,212,216
5,ORG,0.669692,Decept,253,259
6,MISC,0.49835,##icons,259,264
7,MISC,0.775361,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812096,Bumblebee,502,511


### Question Answering

In [11]:
reader = pipeline("question-answering")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.


In [12]:
question = "What does the customer want?"

In [13]:
outputs = reader(question=question, context=text)

In [14]:
outputs

{'score': 0.6312922239303589,
 'start': 335,
 'end': 358,
 'answer': 'an exchange of Megatron'}

In [15]:
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


### Summarization

In [16]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to google-t5/t5-small and revision d769bba (https://huggingface.co/google-t5/t5-small).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [17]:
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)

In [18]:
outputs

[{'summary_text': 'last week, I ordered an Optimus Prime action figure from your online store in germany. when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead'}]

In [19]:
print(outputs[0]["summary_text"])

last week, I ordered an Optimus Prime action figure from your online store in germany. when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead


### Translation

In [20]:
translator = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de")

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-de.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


source.spm:   0%|          | 0.00/768k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


target.spm:   0%|          | 0.00/797k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.27M [00:00<?, ?B/s]



In [21]:
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)

In [None]:
outputs

In [22]:
print(outputs[0]["translation_text"])

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur bestellte ich. Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, von Ihnen bald zu hören. Aufrichtig, Bumblebee.


### Text Generation

In [27]:
generator = pipeline("text-generation")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [28]:
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."

In [29]:
prompt = text + "\n\nCustomer service response:\n" + response

In [32]:
outputs = generator(prompt, max_length=200, truncation=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [33]:
print(outputs[0]["generated_text"])

Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. The packaging itself
supports a lot, but the action figure I was ordered for had not been delivered by 9/03. The

action figure was sent to me in a box instead. I sent your service in-line and will send

this figure back in-line


### ~~Testes~~

In [1]:
import tensorflow as tf
from tensorflow.python.client import device_lib
print("Num GPUs Available: ", len(tf.config.list_physical_devices()))


Num GPUs Available:  1


In [2]:
tf.test.gpu_device_name()

''

In [3]:
tf.config.list_physical_devices("GPU")

[]

In [4]:
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 12919535774274552829
 xla_global_id: -1]