<a href="https://colab.research.google.com/github/risker93/Hello_World/blob/main/daily/2021_07_14_%E1%84%8C%E1%85%A1%E1%84%8B%E1%85%A7%E1%86%AB%E1%84%8B%E1%85%A5%E1%84%89%E1%85%B5%E1%86%AF%E1%84%89%E1%85%B3%E1%86%B8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

RNN의 문제점
- 연산량이 크다.
- 기울기 소실 (정보손실)
- 번역에는 사용하기 힘들다.  
나는 점심을 먹는다.  
i eat lunch

## seq2seq

### Encoder

In [None]:
import tensorflow as tf

In [None]:
class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units):
    super(Encoder, self).__init__()
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.lstm = tf.keras.layers.LSTM(enc_units) # return_sequences 매개변수 False전달
  def call(self, x):
    print("입력 shape:", x.shape)

    x = self.embedding(x)
    print("Embedding Layer를 거친 shape ", x.shape)

    output = self.lstm(x)
    print("LSTM shape의 output shape:", output.shape)

    return output

In [None]:
vocab_size = 30000
emb_size = 256
lstm_size = 512
batch_size = 1
sample_seq_len = 3

print("Vocab Size : {0}".format(vocab_size))
print("Embedding Size : {0}".format(emb_size))
print("LSTM Size : {0}".format(lstm_size))
print("Batch Size : {0}".format(batch_size))
print("Sample Sequence Length : {0}\n".format(sample_seq_len))

Vocab Size : 30000
Embedding Size : 256
LSTM Size : 512
Batch Size : 1
Sample Sequence Length : 3



In [None]:
encoder = Encoder(vocab_size, emb_size, lstm_size)
sample_input = tf.zeros((batch_size, sample_seq_len)) # (1,3)

sample_output = encoder(sample_input)

입력 shape: (1, 3)
Embedding Layer를 거친 shape  (1, 3, 256)
LSTM shape의 output shape: (1, 512)


![](https://aiffelstaticprd.blob.core.windows.net/media/images/GN-4-L-6.max-800x600.jpg)

### LSTM Decoder

In [None]:
class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units):
    super(Decoder, self).__init__()
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.lstm = tf.keras.layers.LSTM(dec_units, return_sequences=True)
    self.fc = tf.keras.layers.Dense(vocab_size)
    self.softmax = tf.keras.layers.Softmax(axis=-1)
  
  def call(self, x, context_v):
    print("입력 shape:", x.shape)

    x = self.embedding(x)
    print("Embedding Layer을 거친 Shape:", x.shape)

    context_v = tf.repeat(tf.expand_dims(context_v, axis=1), repeats=x.shape[1], axis=1)
    x = tf.concat([x, context_v], axis= -1)
    print("Context Vector가 더해진 shape :", x.shape)

    x = self.lstm(x)
    print("LSTM Layer의 Output shape:", x.shape)

    output = self.fc(x)
    print("Decoder의 최종 Output shape:", output.shape)
    
    return self.softmax(output)

In [None]:
print("vocab size : {0}".format(vocab_size))
print("Embedding Size : {0}".format(emb_size))
print("LSTM size : {0}".format(lstm_size))
print("Batch size : {0}".format(batch_size))
print("Sample Sequence Length : {0}".format(sample_seq_len))

vocab size : 30000
Embedding Size : 256
LSTM size : 512
Batch size : 1
Sample Sequence Length : 3


In [None]:
decoder = Decoder(vocab_size, emb_size, lstm_size)
sample_input = tf.zeros((batch_size, sample_seq_len))

dec_output = decoder(sample_input, sample_output)

입력 shape: (1, 3)
Embedding Layer을 거친 Shape: (1, 3, 256)
Context Vector가 더해진 shape : (1, 3, 768)
LSTM Layer의 Output shape: (1, 3, 512)
Decoder의 최종 Output shape: (1, 3, 30000)


![](https://aiffelstaticprd.blob.core.windows.net/media/images/GN-4-L-7.max-800x600.jpg)

Rnn에 기반한 seq2seq모델의 2가지 문제점
1. 기울기 소실
2. 하나의 고정된 벡터에 모든 정보를 압축하려다 보니까 정보 손실이 발생

## 어텐션 메커니즘 (Attention Mechanism)

- 어텐션 아이디어는 디코더에서 출력 단어를 예측하는 매 시점(time step)마다 인코더에서 전체 입력 문장을 다시 한번 참고 한다는 점
- 전체 입력 문장을 전부 다 동일한 비율로 참고하는 것이 아니라, 해당 시점에서 예측해야할 단어와 연관이 있는 입력 단어 부분을 좀 더 집중해서 보자

In [None]:
dict = {"2017" : "Transformer", "2018": "Bert"}

In [None]:
print(dict["2017"])

Transformer


![](https://wikidocs.net/images/page/22893/%EC%BF%BC%EB%A6%AC.PNG)

Attention(Q, K, V) = Attention Value

```
Q = Query : t시점의 디코더 셀에서의 은닉 상태
K = Keys : 모든 시점의 인코더 셀의 은닉 상태들
V = Values : 모든 시점의 인코더 셀의 은닉 상태들
```

### 닷 프로덕트 어텐션(Dot-Product Attention)

![](https://wikidocs.net/images/page/22893/dotproductattention1_final.PNG)

### 어텐션 스코어 구하기

![](https://wikidocs.net/images/page/22893/dotproductattention2_final.PNG)

$$score(s_t, h_i) = S_t^T h_i $$

$$e^t = [s_t^T h_1, ..., s_t^T h_N]$$

### 소프트맥스 함수를 통해 어텐션 분포를 구한다.

![](https://wikidocs.net/images/page/22893/dotproductattention3_final.PNG)

$${\alpha}^t = softmax(e^t)$$

### 각 인코더의 어텐션 가중치와 은닉상태를 가중합하여 어텐션 값(Attention Value)를 구한다.

![](https://wikidocs.net/images/page/22893/dotproductattention4_final.PNG)

$$a_t = \sum_{i=1}^{N}{{\alpha}_i^th_i}$$ 

### 어텐션 값과 디코더의 t시점의 은닉 상태를 연결한다.

![](https://wikidocs.net/images/page/22893/dotproductattention5_final_final.PNG)

### 출력층 연산의 입력이 되는 $\tilde{s_t}$를 계산한다.

![](https://wikidocs.net/images/page/22893/st.PNG)

$$ \tilde{s_{t}}=tanh(W_c[a_t ; s_t] + b_c)$$

### $\tilde{s_t}$를 출력층의 입력으로 사용한다.

$$ \hat{y_t}=Softmax(W_y\tilde{s_t}+b_y)$$

## Bahdanau Attention

- Bahdanau Attention
$$ Score_{alignment} = W * tanh(W_{decoder} * H_{decoder} + W_{encoder} * H_{encoder}) $$

In [None]:
class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W_decoder = tf.keras.layers.Dense(units)
    self.W_encoder = tf.keras.layers.Dense(units)
    self.W_combine = tf.keras.layers.Dense(1)
  
  def call(self, H_encoder, H_decoder):
    print("[H_encoder] shape :", H_encoder.shape)

    H_encoder = self.W_encoder(H_encoder)
    print("[W_encoder X H_encoder shape:", H_encoder.shape)

    print("\n[H_decoder] shape: ", H_decoder.shape)
    H_decoder = tf.expand_dims(H_decoder, 1)
    H_decoder = self.W_decoder(H_decoder)

    print("[W_decoder X H_decoder] shape:", H_decoder.shape)

    score = self.W_combine(tf.nn.tanh(H_decoder+H_encoder))
    print("[Score Alignment] shape :", score.shape)

    attention_weights = tf.nn.softmax(score, axis = 1)
    print("\n 최종 weight : \n", attention_weights.numpy())

    context_vector = attention_weights * H_decoder
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights
  
  W_size = 100

  print("Hidden State를 {0}차원으로 Mapping \n".format(W_size))

  attention = BahdanauAttention(W_size)

  enc_state = tf.random.uniform((1, 10, 512))
  dec_state = tf.random.uniform((1, 512))

  _ = attention(enc_state, dec_state)

Hidden State를 100차원으로 Mapping 

[H_encoder] shape : (1, 10, 512)
[W_encoder X H_encoder shape: (1, 10, 100)

[H_decoder] shape:  (1, 512)
[W_decoder X H_decoder] shape: (1, 1, 100)
[Score Alignment] shape : (1, 10, 1)

 최종 weight : 
 [[[0.08770792]
  [0.16786946]
  [0.09227852]
  [0.09054877]
  [0.07592397]
  [0.16650593]
  [0.06394071]
  [0.0899744 ]
  [0.11246318]
  [0.05278726]]]


![](https://aiffelstaticprd.blob.core.windows.net/media/original_images/GN-4-L-9.jpg)

## Loung Attention

$$ Score(H_{decoder}, H_{encoder}) = H_{decoder}^T*W_{combine}*H_{encoder}$$

In [None]:
class LuongAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(LuongAttention, self).__init__()
    self.W_combine = tf.keras.layers.Dense(units)

  def call(self, H_encoder, H_decoder):
    print("[H_encoder] shape: ",H_encoder.shape)

    WH = self.W_combine(H_encoder)
    print("[W_encoder X H_encoder] shape: ", WH.shape)

    H_decoder = tf.expand_dims(H_decoder, 1)
    alignment = tf.matmul(WH, tf.transpose(H_decoder, [0, 2, 1]))
    print("[Score_alignment] shape :", alignment.shape)

    attention_weights = tf.nn.softmax(alignment, axis = 1)
    print("\n최종 weight : \n", attention_weights.numpy())

    attention_weights = tf.squeeze(attention_weights, axis= -1)
    context_vector = tf.matmul(attention_weights, H_encoder)

    return context_vector, attention_weights

In [None]:
emb_size = 512
attention = LuongAttention(emb_size)

enc_state = tf.random.uniform((1, 10, emb_size))
dec_state = tf.random.uniform((1, emb_size))

_ = attention(enc_state, dec_state)

[H_encoder] shape:  (1, 10, 512)
[W_encoder X H_encoder] shape:  (1, 10, 512)
[Score_alignment] shape : (1, 10, 1)

최종 weight : 
 [[[3.3648440e-03]
  [2.5860232e-01]
  [2.2048462e-04]
  [1.4769311e-02]
  [1.8158993e-02]
  [1.1614115e-04]
  [6.6085684e-01]
  [1.6638229e-03]
  [4.2246096e-02]
  [1.0583559e-06]]]


## 양방향 LSTM과 어텐션 메커니즘(IMDB리뷰 데이터)


### IMDB 리뷰 데이터 전처리 하기

In [1]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.utils import to_categorical 
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [2]:
vocab_size = 10000
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


In [3]:
print('리뷰의 최대 길이 : {}'.format(max(len(l) for l in x_train)))
print('리뷰의 평균 길이 : {}'.format(sum(map(len, x_train))/len(x_train)))

리뷰의 최대 길이 : 2494
리뷰의 평균 길이 : 238.71364


In [4]:
max_len = 500
x_train = pad_sequences(x_train, maxlen=max_len)
x_test = pad_sequences(x_test, maxlen = max_len)

### 바다나우 어텐션

$$score(query, key)=V^T tanh(W_1 key + W_2 query) $$

In [5]:
import tensorflow as tf

In [6]:
class BahdanauAttention(tf.keras.Model):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = Dense(units)
    self.W2 = Dense(units)
    self.V = Dense(1)

  def call(self, values, query):
    hidden_with_time_axis = tf.expand_dims(query, 1)
    ## query size (batch_size, hidden size)
    ## hidden_with_time_axis (batch, 1 , hidden size)

    score = self.V(tf.nn.tanh(self.W1(values) + self.W2(hidden_with_time_axis)))
    # score == (batch size, max_length, 1)

    attention_weights = tf.nn.softmax(score,axis=1)
    # attention weights == (batch size, max_length, 1)

    # context_vector after sum == (batch size, hidden size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

### 양방향 LSTM + 어텐션 메커니즘

In [8]:
from tensorflow.keras.layers import Dense, Embedding, Bidirectional, LSTM, Concatenate, Dropout
from tensorflow.keras import Input, Model
from tensorflow.keras import optimizers
import os

In [9]:
sequence_input = Input(shape=(max_len,), dtype='int32')
embedded_sequences = Embedding(vocab_size, 128, input_length=max_len, mask_zero=True)(sequence_input)

In [11]:
lstm = Bidirectional(LSTM(64, dropout=0.5, return_sequences=True))(embedded_sequences) # 64


In [13]:
lstm, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(64, dropout=0.5, return_sequences=True, return_state=True))(lstm)
# 128 ([forward 64: backward 64])

In [14]:
print(lstm.shape, forward_h.shape, forward_c.shape, backward_h.shape, backward_c.shape)

(None, 500, 128) (None, 64) (None, 64) (None, 64) (None, 64)


In [15]:
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])

In [16]:
attention = BahdanauAttention(64)
context_vector, attention_weight = attention(lstm, state_h)

In [18]:
dense1 = Dense(20, activation='relu')(context_vector)
dropout = Dropout(0.5)(dense1)
output = Dense(1, activation="sigmoid")(dropout)
model = Model(inputs = sequence_input, outputs=output)

In [23]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 500)]        0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 500, 128)     1280000     input_1[0][0]                    
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, 500, 128)     98816       embedding[0][0]                  
__________________________________________________________________________________________________
bidirectional_4 (Bidirectional) [(None, 500, 128), ( 98816       bidirectional_2[0][0]            
______________________________________________________________________________________________

In [19]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [24]:
history = model.fit(x_train, y_train, epochs=1, batch_size=256, validation_data = (x_test, y_test), verbose=1)



In [25]:
print('\n 테스트 정확도 : %.4f'%(model.evaluate(x_test, y_test)[1]))


 테스트 정확도 : 0.8722


## seq2seq with attention 스페인-영어 번역기




### 데이터 준비하기

In [26]:
import tensorflow as tf
import numpy as np

from sklearn.model_selection import train_test_split

import matplotlib.ticker as ticker
import matplotlib.pyplot as plt

import time
import re
import os
import io

In [27]:
path_to_zip = tf.keras.utils.get_file('spa-eng.zip', origin='http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip', extract=True)

Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip


In [28]:
path_to_file = os.path.dirname(path_to_zip)+"/spa-eng/spa.txt"

In [29]:
with open(path_to_file, 'r') as f:
  raw = f.read().splitlines()

print("Data Size: ", len(raw))
print("Example :")

for sen in raw[0:100][::20]:print(">>",sen)

Data Size:  118964
Example :
>> Go.	Ve.
>> Wait.	Esperen.
>> Hug me.	Abrázame.
>> No way!	¡Ni cagando!
>> Call me.	Llamame.


### 데이터 정제

In [34]:
def preprocess_sentence(sentence, s_token=False, e_token=False):
  sentence = sentence.lower().strip()

  sentence = re.sub(r"([?.!,])", r"\1 ",sentence)
  sentence = re.sub(r'[" "]+', " ",sentence)
  sentence = re.sub(r"[^a-zA-Z?.!,]+"," ",sentence)

  sentence = sentence.strip()

  if s_token:
    sentence = '<start> ' + sentence

  if e_token:
    sentence += ' <end>'

  return sentence

In [35]:
enc_corpus = []
dec_corpus = []

num_examples = 30000

for pair in raw[:num_examples]:
  eng, spa = pair.split("\t")

  enc_corpus.append(preprocess_sentence(eng))
  dec_corpus.append(preprocess_sentence(spa, s_token=True, e_token=True))

print("English :", enc_corpus[100])
print("Spanish :", dec_corpus[100])

English : go away!
Spanish : <start> salga de aqu ! <end>


### 데이터 전처리: 토큰화

In [39]:
def tokenize(corpus):
  tokenize = tf.keras.preprocessing.text.Tokenizer(filters='')
  tokenize.fit_on_texts(corpus)

  tensor = tokenize.texts_to_sequences(corpus)

  tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor, padding='post')
  
  return tensor, tokenize

In [40]:
enc_tensor, enc_tokenizer = tokenize(enc_corpus)
dec_tensor, dec_tokenizer = tokenize(dec_corpus)

### 훈련 데이터와 검증 데이터 분리하기

In [41]:
enc_train, enc_val, dec_train, dec_val = train_test_split(enc_tensor, dec_tensor, test_size=0.2)

In [42]:
print('English Vocab Size : ', len(enc_tokenizer.index_word))
print('Spanish Vocab Size : ', len(dec_tokenizer.index_word))

English Vocab Size :  7577
Spanish Vocab Size :  12352


In [43]:
def call(self, h_enc, h_dec):
    # h_enc == batch x length x units
    # h_dec == batch x units

    h_enc = self.w_enc(h_enc)
    h_dec = tf.expand_dims(h_dec, 1)
    h_dec = self.w_dec(h_dec)

    score = self.w_com(tf.nn.tanh(h_dec+h_enc))

    attn = tf.nn.softmax(score, axis=1)

    context_vec = attn * h_enc
    context_vec = tf.reduce_sum(context_vec, axis=1)
    return context_vec, attn

![](https://aiffelstaticprd.blob.core.windows.net/media/images/GN-4-P-2.max-800x600.jpg)

In [44]:
class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units):
    super(Encoder, self).__init__()
    # todo 
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(enc_units, return_sequences=True)

  def call(self, x):
    # todo
    out = self.embedding(x)
    out = self.gru(out)

    return out

In [46]:
BACTH_SIZE = 64
src_vocab_size = len(enc_tokenizer.index_word) +1

units = 1024
embedding_dim = 512

encoder = Encoder(src_vocab_size, embedding_dim, units)

# sample input
sequence_len = 30
sample_enc = tf.random.uniform((BACTH_SIZE, sequence_len))
sample_output = encoder(sample_enc)

print('Encoder Output :', sample_output.shape)

Encoder Output : (64, 30, 1024)


In [47]:
class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units):
    super(Decoder, self).__init__()
    ## Todo

  def call(self, x, h_dec, enc_out):
    ## Todo

    return out, h_dec, attn

In [None]:
BACTH_SIZE = 64
tgt_vocab_size = len(dec_tokenizer.index_word) +1

units = 1024
embedding_dim = 512

decoder = Decoder(tgt_vocab_size, embedding_dim, units)

# sample input
sample_state = tf.random.uniform((BATCH_SIZE, units))
sample_logits, h_dec, attn = decoder(tf.random.uniform((BATCH_SIZE, 1)), sample_state, sample_output)

print('Decoder Output :', sample_logits.shape)
print('Decoder Hidden State :', h_dec.shape)
print('Attention :', attn.shape)