## Tensorflow + RNN 을 활용한 영화 리뷰 감성 분석


#### (1) 전처리 된 데이터 로드

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
import pickle

with open("/content/drive/My Drive/Colab Notebooks/preprocessed_data.pkl", "rb") as f:
  saved_data = pickle.load(f)
  
word2idx = saved_data["word2idx"]
embedding_matrix = saved_data["embedding_matrix"]

test_sents = saved_data["test_sents"]
test_labels = saved_data["test_labels"]

#### (2) 인풋 데이터 형태 만들기

In [3]:
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import os

tf.set_random_seed(1109)

In [0]:
max_length = max([len(sent) for sent in test_sents])
test_seqs = []
for sent in test_sents:
  tmp = np.zeros(max_length, dtype="int32")
  for i, word in enumerate(sent):
    idx = word2idx.get(word)
    if idx != None:
      tmp[i] = idx
    else:
      tmp[i] = word2idx.get("<UNK>") # out of vocab word 처리
  test_seqs.append(tmp)

test_inputs = np.stack(test_seqs)
test_targets = np.array(test_labels, dtype="int32")

In [5]:
test_inputs.shape

(49997, 105)

In [6]:
test_inputs[0]

array([1807,  507,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0], dtype=int32)

In [7]:
test_targets

array([1, 0, 0, ..., 0, 0, 0], dtype=int32)

#### (3) 학습된 모델 로드

In [9]:
model = tf.keras.models.load_model("/content/drive/My Drive/Colab Notebooks/best_model.h5")
model.summary()

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 100)         6125100   
_________________________________________________________________
bidirectional (Bidirectional (None, None, 128)         84992     
___

#### (4) 테스트 데이터 평가

In [10]:
test_batch_size = 512
model.evaluate(test_inputs, test_targets, batch_size=test_batch_size)



[0.31378243993260296, 0.8625318]