In [None]:
# Uncomment and run this cell if you're on Colab or Kaggle
# !git clone https://github.com/nlp-with-transformers/notebooks.git
# %cd notebooks
# from install import *
# install_requirements()

In [1]:
#hide
from utils import *
setup_chapter()

No GPU was detected! This notebook can be *very* slow without a GPU 🐢
Using transformers v4.16.2
Using datasets v1.16.1


# Hello Transformers

Transformers from their inception outerformed recurrent neural networks (RNNs) on machine translation tasks.  
notable are Universal Language Model Fine-Tuning or [ULMFiT](https://arxiv.org/abs/1801.06146), which leverages long short-term memory (LSTM) networks on "a very large and diverse corpus" to enable fine-tuning using datasets with few labels.  Tranformers and transfer-learning advances led to the next biggest breakthroughs in NLP, Generative Pretrained Transformer (GPT) and Bidirectional Encoder Representation (BERT).  Below is a timeline of major NLP breakthrough methods stemming from the publication of the novel Transformer architecture.

The goal of this chapter is to understand the concepts that make transformers so well suited to inferrence.
* Encoder-decoder Framework
* Attention mechanisms
* Transfer Learning

<img alt="transformer-timeline" caption="The transformers timeline" src="images/chapter01_timeline.png" id="transformer-timeline"/>

## The Encoder-Decoder Framework

**Helpful Resources**
* [RNNS Clearly Explained](https://www.youtube.com/watch?v=AsNTP8Kwu80)

### [Review] Before transformers, RNNs were King

**How do RNNs Work?**

Essentially, RNNs are a specific type of Neural Network with a feedback loop 


**Reccurent versus Feed-Forward NNs** 

RNN hidden layers can pass back weights and biases from one hidden layer to itself in a feedback loop.  Running the inputs through all the layers to get the final output will give us the prediction for the current inputs at the current time point (e.g. State $t_0$).  This mimics Feed forward NNs which can only pass the outputs of the activation functions in the previous layer to the next one.  However, if the weights, biases and outputs are passed along to the prediction of inputs from State $t_1$, then the final output for all the hidden layers can predict the _next_ State $y_{t_1}$. Thus information learned from the previous layer is not passed along.  This gives RNNs the ability to handle sequential data as they can learn weights and biases depending on the trained state from preceding inputs.

<img alt="rnn-feedback-loop1" caption="Feeding weights from input 1 into input 2 layer. (From Stats Cleary Explained on Youtube)" src="images/RNNs_feedback_loop_1.png" id="rnn-feedback-loop1" width="800px"/>


RNNs can also accomodate variable length inputs into the model while feed-forward NNs where only a fixed number of inputs can be feed into the model. This is useful for application such as language translation where input text can have variable word lengths.

<img alt="rnn" caption="Unrolling an RNN in time." src="images/chapter01_rnn.png" id="rnn"/>

For tasks such as translation, RNNs can be stacked in the encoder-decoder architure where one RNN is used to translate words from one language to a numerical representation (encoder) into a hidden state. Then a second RNN is used to translate those numerical representations into the translated language (decoder). However a weakness of this architecture is the hidden state after the encoder step often loses information when the input text is compressed to a fixed numeric representation before it is passed onto the encoder step.  


<img alt="enc-dec" caption="Encoder-decoder architecture with a pair of RNNs. In general, there are many more recurrent layers than those shown." src="images/chapter01_enc-dec.png" id="enc-dec"/>

## Attention Mechanisms

Attention mechanisms are a way around this bottleneck by allowing the decoder block to access all of the hidden states of the encoder block.  The attention mechanism prioritizes the hidden states from the encoder step using varying weights of the encoder layers for each of the decoding states.

<img alt="enc-dec-attn" caption="Encoder-decoder architecture with an attention mechanism for a pair of RNNs." src="images/chapter01_enc-dec-attn.png" id="enc-dec-attn"/> 


Priorization of the encoder states helps the decoder identify non-trivial alignments of words from the original text to the generated translation.  For example, in the figure below, the heatmap identifies the strongest (lighter pixels) attention mapping the English words to their respective French translation regardless of the order they come in.  This is critical in many English to X translations because the order of words dictated by the destination language's grammar often differ from English sentence structure (Notice "Zone" in French is the 5 word, while "Area" in English is the 6th word).

<img alt="attention-alignment" width="500" caption="RNN encoder-decoder alignment of words in English and the generated translation in French (courtesy of Dzmitry Bahdanau)." src="images/chapter02_attention-alignment.png" id="attention-alignment"/> 

<img alt="transformer-self-attn" caption="Encoder-decoder architecture of the original Transformer." src="images/chapter01_self-attention.png" id="transformer-self-attn"/> 

## Transfer Learning in NLP

<img alt="transfer-learning" caption="Comparison of traditional supervised learning (left) and transfer learning (right)." src="images/chapter01_transfer-learning.png" id="transfer-learning"/>  

<img alt="ulmfit" width="500" caption="The ULMFiT process (courtesy of Jeremy Howard)." src="images/chapter01_ulmfit.png" id="ulmfit"/>

## Hugging Face Transformers: Bridging the Gap

## A Tour of Transformer Applications

In [None]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

### Text Classification

In [None]:
#hide_output
from transformers import pipeline

classifier = pipeline("text-classification")

In [None]:
import pandas as pd

outputs = classifier(text)
pd.DataFrame(outputs)    

Unnamed: 0,label,score
0,NEGATIVE,0.901546


### Named Entity Recognition

In [None]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)    

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556569,Mega,208,212
4,PER,0.590256,##tron,212,216
5,ORG,0.669692,Decept,253,259
6,MISC,0.49835,##icons,259,264
7,MISC,0.775361,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812096,Bumblebee,502,511


### Question Answering 

In [None]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])    

Unnamed: 0,score,start,end,answer
0,0.631291,335,358,an exchange of Megatron


### Summarization

In [None]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

 Bumblebee ordered an Optimus Prime action figure from your online store in
Germany. Unfortunately, when I opened the package, I discovered to my horror
that I had been sent an action figure of Megatron instead.


### Translation

In [None]:
translator = pipeline("translation_en_to_de", 
                      model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus
Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete,
entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von
Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich
hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere
einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt.
Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, bald von
Ihnen zu hören. Aufrichtig, Bumblebee.


### Text Generation

In [None]:
#hide
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results

In [None]:
generator = pipeline("text-generation")
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

Dear Amazon, last week I ordered an Optimus Prime action figure from your online
store in Germany. Unfortunately, when I opened the package, I discovered to my
horror that I had been sent an action figure of Megatron instead! As a lifelong
enemy of the Decepticons, I hope you can understand my dilemma. To resolve the
issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase. I expect to hear
from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. The order was
completely mislabeled, which is very common in our online store, but I can
appreciate it because it was my understanding from this site and our customer
service of the previous day that your order was not made correct in our mind and
that we are in a process of resolving this matter. We can assure you that your
order


## The Hugging Face Ecosystem

<img alt="ecosystem" width="500" caption="An overview of the Hugging Face ecosystem of libraries and the Hub." src="images/chapter01_hf-ecosystem.png" id="ecosystem"/>

### The Hugging Face Hub

<img alt="hub-overview" width="1000" caption="The models page of the Hugging Face Hub, showing filters on the left and a list of models on the right." src="images/chapter01_hub-overview.png" id="hub-overview"/> 

<img alt="hub-model-card" width="1000" caption="A example model card from the Hugging Face Hub. The inference widget is shown on the right, where you can interact with the model." src="images/chapter01_hub-model-card.png" id="hub-model-card"/> 

### Hugging Face Tokenizers

### Hugging Face Datasets

### Hugging Face Accelerate

## Main Challenges with Transformers

## Conclusion