# 1 Hello Transformers

Before transformers, recurrent neural network (RNN) and long short-term memory (LSTM) models are used for sequence to sequence tasks. These approaches consist of encoder an decoder blocks such that the whole context is sequeezed into the final hidden state. In order to expand access to whole hidden states, **attention** mechanism is introduced in (Bahdanau et al., 2014).

<img src="assets/ch1/1.png" width=750>


**Transformer** architecture removed the sequential nature and introduced **self-attention** mechanism in (Vaswani et al., 2017). By the way, transformer name is literal. Model transforms given sequence into another sequence.

Inspired by the vision field's pretraining and transfer learning approach, NLP researchers also designed a task independent approach for language modeling. Unsupervised training with generative approach in (Radford et al., 2017) achieved good results for producing a base model. ULMFIT paper (Howard and Ruder, 2014) goes like this: language modelling with pretraining => domain adaptation => task specific fine tuning. Using this approach, instead of model training with large amount of data for each task, pretrained model is used as a base model and only small amount of labeled data is used for task dependent fine tuning.

Finally, two work force of modern ML models are introduced: encoder only Bidirectional Encoder Representations from Transformer (BERT) (Devlin et al., 2018) and decoder only Generative Pretrained Transformer (GPT) (Radford et al., 2018)

# References

- D. Bahdanau et al., "Neural machine translation by jointly learning to align and translate", 2014.
- A. Vaswani et al., "Attention is all you need", 2017.
- A. Radford et al., “Learning to Generate Reviews and Discovering Sentiment”, 2017.
- J. Howard and S. Ruder, "Universal language model fine-tuning for text classification.", 2018.
- J. Devlin et al., “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding”, 2018.
- A. Radford et al., “Improving Language Understanding by Generative Pre-Training”, 2018.

In [None]:
from transformers import pipeline
import pandas as pd

text = """Dear Amazon, last week I ordered an Optimus Prime action figure
from your online store in Germany. Unfortunately, when I opened the package,
I discovered to my horror that I had been sent an action figure of Megatron
instead! As a lifelong enemy of the Decepticons, I hope you can understand my
dilemma. To resolve the issue, I demand an exchange of Megatron for the
Optimus Prime figure I ordered. Enclosed are copies of my records concerning
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

classifier = pipeline(
    task="text-classification",
    model="distilbert/distilbert-base-uncased-finetuned-sst-2-english",
    revision="714eb0f",
)
pd.DataFrame(classifier(text))

In [None]:
ner_tagger = pipeline(
    task="ner",
    aggregation_strategy="simple",
    model="dbmdz/bert-large-cased-finetuned-conll03-english",
    revision="4c53496",
)
outputs = ner_tagger(text)
pd.DataFrame(outputs)

In [None]:
reader = pipeline(
    task="question-answering",
    model="distilbert/distilbert-base-cased-distilled-squad",
    revision="564e9b5",
)
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

In [None]:
summarizer = pipeline(
    task="summarization",
    model="sshleifer/distilbart-cnn-12-6",
    revision="a4f8f3e",
)
outputs = summarizer(text, max_length=45, min_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]["summary_text"])

In [None]:
translator = pipeline(
    task="translation_en_to_de",
    model="google-t5/t5-base",
    revision="a9723ea",
)
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]["translation_text"])