# Chapter 1. Hello Transformers

This chapter is one of the chapters of the book, Natural Language Processing with Transformers. I do practice the source code of the chapter 1 while reading the book. You can also examine all the main chapters' code over the original GitHub repository of the book: https://github.com/nlp-with-transformers/notebooks

## Text Classification

In [1]:
from transformers import pipeline

import torch
import pandas as pd

In [2]:
classifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [3]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

In [4]:
outputs = classifier(text)
positive = {'label': 'POSITIVE', 'score': 1 - outputs[0]['score']}
outputs.append(positive)

pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.901546
1,POSITIVE,0.098454


## Named Entity Recognition

In [5]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

In [6]:
outputs = ner_tagger(text)
pd.DataFrame(outputs).sort_values('score', ascending=False)

Unnamed: 0,entity_group,score,word,start,end
2,LOC,0.999755,Germany,90,97
1,MISC,0.990859,Optimus Prime,36,49
8,MISC,0.987854,Optimus Prime,367,380
0,ORG,0.87901,Amazon,5,11
9,PER,0.812096,Bumblebee,502,511
7,MISC,0.775361,Megatron,350,358
5,ORG,0.669692,Decept,253,259
4,PER,0.590258,##tron,212,216
3,MISC,0.556568,Mega,208,212
6,MISC,0.49835,##icons,259,264


## Question Answering

In [7]:
reader = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

In [8]:
question = "What does the customer want?"

outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


In [9]:
BOLD = '\033[1m'
END = '\033[0m'

print(text.replace(outputs['answer'], BOLD + '_' + outputs['answer'] + '_' + END))

Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand [1m_an exchange of Megatron_[0m for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee.


## Summarization

In [10]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

In [11]:
outputs = summarizer(text, max_length=57, clean_up_tokenization_spaces=True)

print(outputs[0]['summary_text'])

 Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead. As a lifelong enemy of the Decepticons, I hope


## Translation

In [12]:
translator = pipeline(
    "translation_en_to_de",
    model="Helsinki-NLP/opus-mt-en-de"
)

Downloading:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/284M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/750k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

In [13]:
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)

print(outputs[0]['translation_text'])

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt. Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, bald von Ihnen zu hören. Aufrichtig, Bumblebee.


## Text Generation

In [14]:
generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [15]:
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response

print(prompt)

Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up.


In [16]:
prompt_encode_ids = generator.tokenizer.encode(prompt)

In [17]:
generator.tokenizer.decode(prompt_encode_ids)

'Dear Amazon, last week I ordered an Optimus Prime action figure\nfrom your online store in Germany. Unfortunately, when I opened the package,\nI discovered to my horror that I had been sent an action figure of Megatron\ninstead! As a lifelong enemy of the Decepticons, I hope you can understand my\ndilemma. To resolve the issue, I demand an exchange of Megatron for the\nOptimus Prime figure I ordered. Enclosed are copies of my records concerning\nthis purchase. I expect to hear from you soon. Sincerely, Bumblebee.\n\nCustomer service response:\nDear Bumblebee, I am sorry to hear that your order was mixed up.'

In [18]:
torch_ids = torch.LongTensor(prompt_encode_ids)[None]
generated_preds = generator.model.generate(torch_ids, max_length=200)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [19]:
generated_preds.shape, generated_preds

(torch.Size([1, 200]),
 tensor([[20266,  6186,    11,   938,  1285,   314,  6149,   281, 44863,  5537,
           2223,  3785,   198,  6738,   534,  2691,  3650,   287,  4486,    13,
           8989,    11,   618,   314,  4721,   262,  5301,    11,   198,    40,
           5071,   284,   616,  9961,   326,   314,   550,   587,  1908,   281,
           2223,  3785,   286,  8336, 23484,   198, 38070,     0,  1081,   257,
          25837,  4472,   286,   262,  1024,   984, 34280,    11,   314,  2911,
            345,   460,  1833,   616,   198,    67,   576, 21672,    13,  1675,
          10568,   262,  2071,    11,   314,  3512,   281,  5163,   286,  8336,
          23484,   329,   262,   198, 27871, 20704,  5537,  3785,   314,  6149,
             13,  2039, 20225,   389,  9088,   286,   616,  4406,  9305,   198,
           5661,  5001,    13,   314,  1607,   284,  3285,   422,   345,  2582,
             13,  4619, 38015,    11,   347, 10344, 20963,    13,   198,   198,
          44939, 

In [20]:
print(generator.tokenizer.decode(generated_preds[0].numpy()))

Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. However, you have received the following response: "Thank you." You can view the original order here:

www.mybidders.etsy.com/shop/Megatron/

Note: If the order shipped and the packaging arrived in the other order's shipping addresses
