# 실습 3. RNN을 이용한 😀감정분석😑 모델 학습하기



<b>학습 목표:    
- LSTM, GRU 등 다양한 RNN 계열 셀들을 활용해본다.
- Bidirectional RNN, Multi-layer RNN, 모델 앙상블을 모델링해본다.
</b>








## #0. 실습 준비하기
지난 실습에서는 SimpleRNN을 사용해 감성분석 모델링을 진행했습니다.    
이번 시간에는 이론으로 학습한 다양한 셀 구조와 모델 아키텍처를 사용해 모델링을 진행해보겠습니다.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
## train, validation, test 데이터 로딩
!cp "/content/gdrive/My Drive/NLP/utils.py" "/content"

import pickle
import numpy as np
with open("/content/gdrive/My Drive/NLP/Sentiment_prepro_data.pkl", "rb") as f:
  prepro_data = pickle.load(f)
train_ids = prepro_data["train_ids"]
train_labels = prepro_data["train_labels"]
val_ids = prepro_data["val_ids"]
val_labels = prepro_data["val_labels"]
test_ids = prepro_data["test_ids"]
test_labels = prepro_data["test_labels"]
label_map = prepro_data["label_map"]
print(len(train_ids), len(train_labels), len(val_ids), len(val_labels), len(test_ids), len(test_labels))

49999 49999 9999 9999 10000 10000


In [None]:
## 단어사전 & text_encoder 로딩
from utils import TextEncoder
import json
with open("/content/gdrive/My Drive/NLP/Sentiment_vocab.json", "r") as f:
  new_vocab_list = json.loads(f.read())

text_encoder = TextEncoder(new_vocab_list)

In [None]:
""" CBOW 워드벡터 로딩 """

## final_embeddings: 70002개 토큰에 대한 워드 벡터 매트릭스 shape=(70002, 128)

with open("/content/gdrive/My Drive/NLP/vecs.tsv") as f:
  vecs = [v.strip() for v in f.readlines()]
  final_embeddings = [v.split("\t") for v in vecs]
  final_embeddings = np.array(final_embeddings, dtype="float32")

## #1. 모델링 실습

### MODEL1: LSTM 셀 사용하기

In [None]:
import tensorflow as tf
tf.keras.backend.clear_session()

LSTM 셀은 tensorflow.keras.layers에 있는 LSTM 레이어를 사용하면 됩니다.   
사용하는 방법은 SimpleRNN과 동일합니다.

In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Embedding, LSTM, GRU, Dense

vocab_size = text_encoder.vocab_size # 단어사전 개수
embedding_dim = final_embeddings.shape[1] # 임베딩 차원
rnn_hidden_dim = 50 # GRU hidden_size
final_dim = len(label_map)

""" MAKE MODEL """
model1 = Sequential(
    [Embedding(vocab_size, embedding_dim, mask_zero=True),
     LSTM(rnn_hidden_dim),
     Dense(rnn_hidden_dim, activation= "relu"),
     Dense(2, activation="softmax")]
)

In [None]:
model1.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 128)         10998912  
_________________________________________________________________
lstm (LSTM)                  (None, 50)                35800     
_________________________________________________________________
dense (Dense)                (None, 50)                2550      
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 102       
Total params: 11,037,364
Trainable params: 11,037,364
Non-trainable params: 0
_________________________________________________________________


In [None]:
"""CBOW로 학습된 워드 임베딩을 Initialize 해주기"""
import random
org_vocab_size = final_embeddings.shape[0]
rand_initial = np.random.uniform(-1,1,size=[vocab_size-org_vocab_size,embedding_dim])
# CBOW 학습된 임베딩 + 랜덤 initialize한 weight를 모델의 weight에 대입
initial_weight = np.append(final_embeddings, rand_initial, axis = 0)
model1.weights[0].assign(initial_weight)

<tf.Variable 'UnreadVariable' shape=(85929, 128) dtype=float32, numpy=
array([[-1.2835134e-02,  3.8169596e-02,  1.2824427e-02, ...,
        -4.1749455e-02, -6.7193434e-04, -2.5152588e-02],
       [-4.3288276e-02, -2.2840855e-01, -3.3235773e-01, ...,
        -6.2215126e-01, -2.1829844e-01,  5.5536860e-01],
       [ 1.4566300e+00, -6.7591065e-01,  2.8122848e-01, ...,
         5.9197694e-01, -2.6638773e-01, -5.2011847e-01],
       ...,
       [ 3.6430800e-01,  1.3824491e-01,  5.7283497e-01, ...,
        -4.5271674e-01, -6.3190216e-01, -8.7267727e-01],
       [ 8.0739635e-01,  6.2565893e-01, -2.1986499e-01, ...,
        -7.4504662e-01,  1.0706705e-01,  8.4927452e-01],
       [-4.1984853e-01, -2.4882808e-01, -4.7447753e-01, ...,
         3.4534696e-01,  8.9349687e-01, -6.5144444e-01]], dtype=float32)>

In [None]:
## 모델 컴파일
model1.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
## 모델 학습
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=1)

num_epochs = 5
history = model1.fit(train_ids, train_labels, epochs=num_epochs, batch_size=200,
                    validation_data=(val_ids, val_labels), callbacks=[callback])

Epoch 1/5
Epoch 2/5
Epoch 3/5


In [None]:
## 테스트 데이터에 대해 성능 평가
model1.evaluate(test_ids, test_labels)



[0.40304747223854065, 0.8267999887466431]

### MODEL2: Bi-LSTM 모델 만들기

Bi-RNN 모델은 keras.layers의 Bidirectional Layer로 RNN계열 레이어를 감싸서 코딩할 수 있습니다. 

In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Embedding, LSTM, GRU, Dense, Bidirectional

vocab_size = text_encoder.vocab_size # 단어사전 개수
embedding_dim = final_embeddings.shape[1] # 임베딩 차원
rnn_hidden_dim = 50 # GRU hidden_size
final_dim = len(label_map)

""" MAKE MODEL """
model2 = Sequential(
    [Embedding(vocab_size, embedding_dim, mask_zero=True),
     Bidirectional(LSTM(rnn_hidden_dim)),
     Dense(rnn_hidden_dim, activation= "relu"),
     Dense(2, activation="softmax")]
)

In [None]:
model2.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 128)         10998912  
_________________________________________________________________
bidirectional (Bidirectional (None, 100)               71600     
_________________________________________________________________
dense_2 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 102       
Total params: 11,075,664
Trainable params: 11,075,664
Non-trainable params: 0
_________________________________________________________________


👉bidirectional 레이어를 타고 나온 hidden vector의 차원이 100차원인 것을 확인할 수 있습니다.   
orward LSTM에서 나온 50차원의 벡터와 backward LSTM에서 나온 50차원의 벡터를 concatenate했기 때문입니다.

In [None]:
model2.weights[0].assign(initial_weight)

<tf.Variable 'UnreadVariable' shape=(85929, 128) dtype=float32, numpy=
array([[-1.2835134e-02,  3.8169596e-02,  1.2824427e-02, ...,
        -4.1749455e-02, -6.7193434e-04, -2.5152588e-02],
       [-4.3288276e-02, -2.2840855e-01, -3.3235773e-01, ...,
        -6.2215126e-01, -2.1829844e-01,  5.5536860e-01],
       [ 1.4566300e+00, -6.7591065e-01,  2.8122848e-01, ...,
         5.9197694e-01, -2.6638773e-01, -5.2011847e-01],
       ...,
       [ 3.6430800e-01,  1.3824491e-01,  5.7283497e-01, ...,
        -4.5271674e-01, -6.3190216e-01, -8.7267727e-01],
       [ 8.0739635e-01,  6.2565893e-01, -2.1986499e-01, ...,
        -7.4504662e-01,  1.0706705e-01,  8.4927452e-01],
       [-4.1984853e-01, -2.4882808e-01, -4.7447753e-01, ...,
         3.4534696e-01,  8.9349687e-01, -6.5144444e-01]], dtype=float32)>

In [None]:
## 모델 컴파일
model2.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
## 모델 학습
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=1)

num_epochs = 5
history = model2.fit(train_ids, train_labels, epochs=num_epochs, batch_size=200,
                    validation_data=(val_ids, val_labels), callbacks=[callback])

Epoch 1/5
Epoch 2/5
Epoch 3/5


In [None]:
## 테스트 데이터에 대해 성능 평가
model2.evaluate(test_ids, test_labels)



[0.40331748127937317, 0.8224999904632568]

### MODEL3: Multi-layer-LSTM 모델 만들기

Multi-layer RNN 모델을 만들기 위해서는 하단의 RNN 레이어에서 return_sequences 옵션을 True로 설정해야 합니다.   
다음 레이어에서는 이전 레이어에서 반환한 시퀀스 hidden state를 인풋으로 받기 때문입니다.   

In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Embedding, LSTM, GRU, Dense, Dropout

vocab_size = text_encoder.vocab_size # 단어사전 개수
embedding_dim = final_embeddings.shape[1] # 임베딩 차원
rnn_hidden_dim = 50 # GRU hidden_size
final_dim = len(label_map)

""" MAKE MODEL """
model3 = Sequential(
    [Embedding(vocab_size, embedding_dim, mask_zero=True),
     GRU(rnn_hidden_dim, return_sequences = True),
     Dropout(0.2),
     LSTM(rnn_hidden_dim, return_sequences = False),
     Dense(2, activation="softmax")]
)

In [None]:
model3.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 128)         10998912  
_________________________________________________________________
gru (GRU)                    (None, None, 50)          27000     
_________________________________________________________________
dropout (Dropout)            (None, None, 50)          0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 50)                20200     
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 102       
Total params: 11,046,214
Trainable params: 11,046,214
Non-trainable params: 0
_________________________________________________________________


In [None]:
model3.weights[0].assign(initial_weight)

<tf.Variable 'UnreadVariable' shape=(85929, 128) dtype=float32, numpy=
array([[-1.2835134e-02,  3.8169596e-02,  1.2824427e-02, ...,
        -4.1749455e-02, -6.7193434e-04, -2.5152588e-02],
       [-4.3288276e-02, -2.2840855e-01, -3.3235773e-01, ...,
        -6.2215126e-01, -2.1829844e-01,  5.5536860e-01],
       [ 1.4566300e+00, -6.7591065e-01,  2.8122848e-01, ...,
         5.9197694e-01, -2.6638773e-01, -5.2011847e-01],
       ...,
       [ 3.6430800e-01,  1.3824491e-01,  5.7283497e-01, ...,
        -4.5271674e-01, -6.3190216e-01, -8.7267727e-01],
       [ 8.0739635e-01,  6.2565893e-01, -2.1986499e-01, ...,
        -7.4504662e-01,  1.0706705e-01,  8.4927452e-01],
       [-4.1984853e-01, -2.4882808e-01, -4.7447753e-01, ...,
         3.4534696e-01,  8.9349687e-01, -6.5144444e-01]], dtype=float32)>

In [None]:
## 모델 컴파일
model3.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
## 모델 학습
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=1)

num_epochs = 5
history = model3.fit(train_ids, train_labels, epochs=num_epochs, batch_size=200,
                    validation_data=(val_ids, val_labels), callbacks=[callback])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5


In [None]:
## 테스트 데이터에 대해 성능 평가
model3.evaluate(test_ids, test_labels)



[0.3978496193885803, 0.8305000066757202]

### MODEL4: 세 모델의 결과 앙상블하기
마지막으로 위에서 학습한 세 모델을 앙상블하는 코드입니다.   
세 개의 모델을 독립적으로 학습한 후 결과를 앙상블하면 정확도를 높일 수 있습니다. 

In [None]:
def predict(test_ids):
  res1 = model1.predict(test_ids)
  res2 = model2.predict(test_ids)
  res3 = model3.predict(test_ids)
  result = (res1 + res2 + res3) / 3
  return result

In [None]:
prediction = predict(test_ids)
prediction

array([[0.6829951 , 0.31700495],
       [0.68486124, 0.3151388 ],
       [0.36904716, 0.63095284],
       ...,
       [0.9821079 , 0.01789213],
       [0.9926901 , 0.00730986],
       [0.06264149, 0.93735856]], dtype=float32)

👉predict 함수는 세 모델이 예측한 결과를 평균한 확률값을 아웃풋으로 반환합니다.    

최종적으로 예측을 하기 위해서는 이 확률값을 카테고리로 변경해야 하겠지요?   
np.argmax 함수는 주어진 축에 대해 최대값의 위치를 찾아주는 함수입니다.   
이 함수를 사용해 확률값이 가장 높은 카테고리를 모델 예측치로 사용할 수 있습니다.   


In [None]:
""" catecory로 변경 """
prediction = np.argmax(prediction, axis = 1)

In [None]:
print("TEST ACCURACY:")
sum(prediction == test_labels) / len(test_labels)

TEST ACCURACY:


0.8319

---

## #2. DAILY MISSION 🙌

아래의 세 모델은 RNN을 사용하여 만든 감성분석 모델입니다.   
그런데 무슨 문제인지, 학습이 잘 이루어지지 않고 있습니다.   
모델을 살펴보고, 어떤 오류가 있는지 찾아 디버깅한 후 파일을 제출해주세요!

#### model_1

In [None]:
tf.keras.backend.clear_session()

vocab_size = text_encoder.vocab_size # 단어사전 개수
embedding_dim = final_embeddings.shape[1] # 임베딩 차원
rnn_hidden_dim = 50 # GRU hidden_size

""" MAKE MODEL """
model_1 = Sequential(
    [Embedding(vocab_size, embedding_dim, mask_zero = True),
     GRU(rnn_hidden_dim),
     Dense(2, activation = "softmax")]
)
model_1.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 128)         10998912  
_________________________________________________________________
gru (GRU)                    (None, 50)                27000     
_________________________________________________________________
dense (Dense)                (None, 2)                 102       
Total params: 11,026,014
Trainable params: 11,026,014
Non-trainable params: 0
_________________________________________________________________


- 오류가 있는 부분:

In [None]:
model_1.weights[0].assign(initial_weight)
## 모델 컴파일
model_1.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
## 모델 학습
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=1)

num_epochs = 5
history = model_1.fit(train_ids, train_labels, epochs=num_epochs, batch_size=200,
                    validation_data=(val_ids, val_labels), callbacks=[callback])

Epoch 1/5
Epoch 2/5
Epoch 3/5


#### model_2

In [None]:
tf.keras.backend.clear_session()

vocab_size = text_encoder.vocab_size # 단어사전 개수
embedding_dim = final_embeddings.shape[1] # 임베딩 차원
rnn_hidden_dim = 50 # GRU hidden_size

""" MAKE MODEL """
model_2 = Sequential(
    [Embedding(vocab_size, embedding_dim, mask_zero = True),
     GRU(rnn_hidden_dim, return_sequences=True),
     GRU(rnn_hidden_dim, return_sequences=False),
     Dense(2, activation="softmax")]
)
model_2.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 128)         10998912  
_________________________________________________________________
gru (GRU)                    (None, None, 50)          27000     
_________________________________________________________________
gru_1 (GRU)                  (None, None, 50)          15300     
_________________________________________________________________
dense (Dense)                (None, None, 2)           102       
Total params: 11,041,314
Trainable params: 11,041,314
Non-trainable params: 0
_________________________________________________________________


- 오류가 있는 부분:

In [None]:
model_2.weights[0].assign(initial_weight)
## 모델 컴파일
model_2.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
## 모델 학습
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=1)

num_epochs = 5
history = model_2.fit(train_ids, train_labels, epochs=num_epochs, batch_size=200,
                    validation_data=(val_ids, val_labels), callbacks=[callback])

Epoch 1/5


InvalidArgumentError: ignored

#### model_3

In [None]:
tf.keras.backend.clear_session()

vocab_size = text_encoder.vocab_size # 단어사전 개수
embedding_dim = final_embeddings.shape[1] # 임베딩 차원
rnn_hidden_dim = 50 # GRU hidden_size

""" MAKE MODEL """
model_3 = Sequential(
    [Embedding(vocab_size, embedding_dim, mask_zero = True),
     LSTM(rnn_hidden_dim),
     Dense(2, activation="softmax")]
)
model_3.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 128)         10998912  
_________________________________________________________________
lstm (LSTM)                  (None, 50)                35800     
_________________________________________________________________
dense (Dense)                (None, 1)                 51        
Total params: 11,034,763
Trainable params: 11,034,763
Non-trainable params: 0
_________________________________________________________________


- 오류가 있는 부분:

In [None]:
model_3.weights[0].assign(initial_weight)
## 모델 컴파일
model_3.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
## 모델 학습
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=1)

num_epochs = 5
history = model_3.fit(train_ids, train_labels, epochs=num_epochs, batch_size=200,
                    validation_data=(val_ids, val_labels), callbacks=[callback])

Epoch 1/5
