## 자연어 처리 딥러닝
# Attention
- https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3
<img src='https://miro.medium.com/max/700/1*qN2Pj5J4VqAFf7dsA2dHpA.png' />

<img src='https://1.bp.blogspot.com/-AVGK0ApREtk/WaiAuzddKVI/AAAAAAAAB_A/WPV5ropBU-cxrcMpqJBFHg73K9NX4vywwCLcBGAs/s640/image2.png' />

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import tensorflow as tf
from tensorflow import keras

### seq2seq 모델에 적용

In [21]:
X_enc = np.random.randn(2,5,3).astype(np.float32) # np.float32
X_dec = np.random.randn(2,4,3).astype(np.float32) # np.float32

In [22]:
lstm_enc = keras.layers.LSTM(3, return_sequences=True, return_state=True)
lstm_dec = keras.layers.LSTM(3)

In [23]:
y, h, c = lstm_enc(X_enc)

y.shape, h.shape, c.shape

(TensorShape([2, 5, 3]), TensorShape([2, 3]), TensorShape([2, 3]))

- attention 계산

In [24]:
W = keras.layers.Dense(3) # 인코더의 출력값(y)에 곱해준다

In [26]:
score = tf.matmul(X_dec, W(y), transpose_b=True)

score.shape

TensorShape([2, 4, 5])

In [28]:
alignment = tf.math.softmax(score)
alignment.shape

TensorShape([2, 4, 5])

In [29]:
context = tf.matmul(alignment, y)
context.shape

TensorShape([2, 4, 3])

- 디코더 입력값에 context 를 붙여서 디코더에 입력한다

In [49]:
x = tf.concat([X_dec, context], axis=-1)
x.shape

TensorShape([2, 4, 6])

In [51]:
result = lstm_dec(x, initial_state=[h,c])
result.shape

TensorShape([2, 3])

### keras.layers.Attention
- query : 디코더 입력값
- key : 인코더 상태값 (score 계산에 사용)
- value : 일반적으로 key 와 동일한 값 (context 계산에 사용)

In [33]:
att = keras.layers.Attention()

In [35]:
context2 = att([X_dec, y])
context2.shape

TensorShape([2, 4, 3])

- 직접 계산값과 비교 (W 를 적용하지 않음)

In [38]:
tf.matmul(tf.nn.softmax(tf.matmul(X_dec, y, transpose_b=True)), y)

<tf.Tensor: shape=(2, 4, 3), dtype=float32, numpy=
array([[[-0.15611765,  0.1217734 ,  0.16531764],
        [-0.15917274,  0.12554456,  0.16847825],
        [-0.15341723,  0.11986105,  0.16378854],
        [-0.15732676,  0.12845924,  0.1706277 ]],

       [[ 0.00454625,  0.03929135, -0.07950398],
        [-0.00950378,  0.03603135, -0.03380698],
        [ 0.00108905,  0.03290253, -0.08777024],
        [-0.01207615,  0.03576973, -0.01779978]]], dtype=float32)>

In [37]:
context2

<tf.Tensor: shape=(2, 4, 3), dtype=float32, numpy=
array([[[-0.15611765,  0.1217734 ,  0.16531764],
        [-0.15917274,  0.12554456,  0.16847825],
        [-0.15341723,  0.11986105,  0.16378854],
        [-0.15732676,  0.12845924,  0.1706277 ]],

       [[ 0.00454625,  0.03929135, -0.07950398],
        [-0.00950378,  0.03603135, -0.03380698],
        [ 0.00108905,  0.03290253, -0.08777024],
        [-0.01207615,  0.03576973, -0.01779978]]], dtype=float32)>

### keras.layers.MultiHeadAttention
- Attention is all you Need 논문 (Transformer)
<img src='http://jalammar.github.io/images/t/transformer_resideual_layer_norm_3.png' />

In [43]:
mhead = keras.layers.MultiHeadAttention(num_heads=2, key_dim=2)

In [45]:
context3, scores = mhead(X_dec, y, return_attention_scores=True)
context3.shape, scores.shape # scores -> (batch, mhead, dec, inc)

(TensorShape([2, 4, 3]), TensorShape([2, 2, 4, 5]))

- self-attention

In [46]:
context4, scores = mhead(X_dec, X_dec, return_attention_scores=True)
context4.shape, scores.shape # scores -> (batch, mhead, dec, inc)

(TensorShape([2, 4, 3]), TensorShape([2, 2, 4, 4]))

### 최신 언어 딥러닝 모델
- transformer : http://jalammar.github.io/illustrated-transformer/
- BERT : http://jalammar.github.io/illustrated-bert/
- GPT-3 : http://jalammar.github.io/how-gpt3-works-visualizations-animations/