# __Week 3__: Attention & Transformers
*** 

***
### Preliminaries:
##### Sequence-to-Sequence Models
* Takes a sequence of items (such as strings, images, etc.) and outputs another sequence of items
    * For example, a __translation__ task: "I am sad." --> "Yo estoy triste."

* Sequence-to-sequence models consist of an __encoder__ and a __decoder__ (both of which are Neural Networks).
    * The encoder processes each item in the input (turning it into a __context__ vector), then sends to to the decoder.
        * The encoder is a recurrent neural network (RNN), which takes at each step an input and a __hidden state__. 
        * The last hidden state of the encoder is passed to the decoder as a context. 
    * The decoder then begins processing the output item by item.
        * The decoder also maintains hidden states processed at each step.

In [1]:
from googletrans import Translator, constants

In [2]:
translator = Translator()

In [6]:
# Spanish, German, Tagolog, Farsi, Urdu, Japanese
langs = ['es', 'de', 'tl', 'fa', 'ur', 'ja']

In [5]:
string = "I am very excited to learn about mathematics and natural language."

In [12]:
for lang in langs:
    translation = translator.translate(string, dest=lang)
    print(translation.text)
    print("")

Estoy muy emocionado de aprender sobre matemáticas y lenguaje natural.

Ich freue mich sehr, etwas über Mathematik und natürliche Sprache zu lernen.

Labis akong nasasabik na malaman ang tungkol sa matematika at likas na wika.

من بسیار مشتاق هستم که ریاضیات و زبان طبیعی را بیاموزم.

میں ریاضی اور قدرتی زبان کے بارے میں جان کر بہت پرجوش ہوں۔

私は数学と自然言語について学ぶことにとても興奮しています。



***

## Attention
* Sequence-to-sequence models tend to struggle with longer sentences.
* Attention work with RNN's to allow for the model to *give more attention* to relevant parts of the sequence as needed.
    * Attention models pass all hidden states to the decoder, as opposed to only the last hidden state like in sequence-to-sequence models. 
    * Each hidden state is then associated with a specific item within an input and scored to amplify hidden states with high scores and drown out those with low scores.
    * Can be thought of as mapping a query and a set of key-value pairs to an output
    
<br>

* __Multi-head Attention__ - allows the model to jointly attend to information from different representation subspaces at different positions

* __Self-attention__ - used in place of RNN's to perform encodings

In [24]:
<img src="figures/hydra.png" alt="multihead" title="multihead" width="900"/>

SyntaxError: invalid syntax (<ipython-input-24-f7d91d922e2f>, line 1)

In [19]:
from transformers import AutoTokenizer, AutoModel
from bertviz import model_view

In [21]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModel.from_pretrained("distilbert-base-uncased", output_attentions=True)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.bias', 'vocab_projector.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [26]:
inputs = tokenizer.encode("I am very excited to learn about mathematics and natural language.", return_tensors='pt')
outputs = model(inputs)

In [27]:
attention = outputs[-1]  # Output includes attention weights when output_attentions=True
tokens = tokenizer.convert_ids_to_tokens(inputs[0]) 

In [28]:
model_view(attention, tokens)

<IPython.core.display.Javascript object>

***

## Transformers
* Used to boost the speed in which Attention models are trained and lend themselves well to parallelization
* Rely solely on self-attention to compute representations of inputs and outputs
<br>
* Starts by creating embeddings for each word, then using self-attention aggregates information from all other words, generating a new representation per word informed by the entire context
    *  This step is then repeated multiple times in parallel for all words, successively generating new representations
    * The decoder then works similarly, but sequentially 

<img src="figures/transformer.png" alt="transformer example" title="weights" width="900"/>

### Transformer Tasks

In [14]:
from transformers import pipeline

##### Question Answering

In [7]:
question_answerer = pipeline('question-answering')

In [36]:
context = "Spongebob is Patrick's very best friend. Squidward hates Patrick. \
Spongebob is also Sally's best friend. Sally is not Patrick's best friend. \
Mr. Krabs loves both his girlfriend Ms. Puff and his daughter Pearl."

questions = ["Who is Patrick's best friend?", "Does anyone hate Patrick?", "Who is Mr. Krabs daughter?"]

In [37]:
for q in questions:
    result = question_answerer(question=q, context=context)
    print(q)
    print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}")
    print("")

Who is Patrick's best friend?
Answer: 'Spongebob', score: 0.889

Does anyone hate Patrick?
Answer: 'Squidward', score: 0.725

Who is Mr. Krabs daughter?
Answer: 'Pearl', score: 0.9661



##### Sentiment Analysis

In [15]:
classifier = pipeline("sentiment-analysis")

In [16]:
result = classifier("I never smile when I'm around you.")[0]

In [17]:
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: NEGATIVE, with score: 0.9373


In [44]:
sentences = ["I hate your guts.", "I love machine learning.", \
             "You are the absolute worst.", "The sun is shining and the birds are singing."]

In [45]:
for s in sentences:
    result = classifier(s)[0]
    print(s)
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
    print("")

I hate your guts.
label: NEGATIVE, with score: 0.9993

I love machine learning.
label: POSITIVE, with score: 0.9998

You are the absolute worst.
label: NEGATIVE, with score: 0.9998

The sun is shining and the birds are singing.
label: POSITIVE, with score: 0.9998



##### Text Generation

In [19]:
text_generator = pipeline("text-generation")

In [20]:
starts = ["If you go to New York", "I believe the next president will be", "Tomorrow brings", "My wife is"]

In [23]:
gen_texts = []
for s in starts:
    x = text_generator(s, max_length=35, do_sample=False)
    gen_texts.append(x[0]['generated_text'])
i = -1
for gt in gen_texts:
    i = i+1
    print(str(gt))
    print("")
    print("")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


If you go to New York City, you'll see that the city is a very different place. It's a very different place. It's a very different place. It


I believe the next president will be a man of action, not a man of fear."

The president's first major speech to Congress was on the economy, which he


Tomorrow brings us to the final chapter of the story.

The story begins with the arrival of the first of the new heroes, the Dark Lord, who is the son


My wife is a very good cook and I love to cook. I have a lot of friends who are very good at cooking and I love to cook. I have a lot




## Sources:
[1] https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/ <br>
[2] https://jalammar.github.io/illustrated-transformer/ <br>
[3] https://nlp.seas.harvard.edu/2018/04/03/attention.html <br>
[4] https://github.com/terryyin/translate-python <br>
[5] https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html <br>
[6] https://huggingface.co/transformers/task_summary.html <br>
[7] https://www.thepythoncode.com/article/translate-text-in-python