# [BAT512] Advanced Data Mining with AI <br/><br/> 11주차(9강) 실습자료

- 라이브러리 임포트

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset 
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split

## 데이터

- 데이터 로드
IMDB 영화 리뷰 데이터
    - review: 영화 리뷰 텍스트
    - sentiment: 해당 리뷰의 감정 분류(positive/negative)

In [2]:
rawdata = pd.read_csv("data/IMDB Dataset.csv", nrows=100)

In [3]:
rawdata

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
...,...,...
995,Nothing is sacred. Just ask Ernie Fosselius. T...,positive
996,I hated it. I hate self-aware pretentious inan...,negative
997,I usually try to be professional and construct...,negative
998,If you like me is going to see this in a film ...,negative


In [4]:
X = rawdata["review"]
y = rawdata["sentiment"]

- 텍스트 전처리

html 태그 제거

In [5]:
print(X[49])

Average (and surprisingly tame) Fulci giallo which means it's still quite bad by normal standards, but redeemed by its solid build-up and some nice touches such as a neat time twist on the issues of visions and clairvoyance.<br /><br />The genre's well-known weaknesses are in full gear: banal dialogue, wooden acting, illogical plot points. And the finale goes on much too long, while the denouement proves to be a rather lame or shall I say: limp affair.<br /><br />Fulci's ironic handling of giallo norms is amusing, though. Yellow clues wherever you look.<br /><br />3 out of 10 limping killers


In [6]:
import re
html_cleaner = re.compile("<.*?>")
print(re.sub(html_cleaner, "", X[49]))

Average (and surprisingly tame) Fulci giallo which means it's still quite bad by normal standards, but redeemed by its solid build-up and some nice touches such as a neat time twist on the issues of visions and clairvoyance.The genre's well-known weaknesses are in full gear: banal dialogue, wooden acting, illogical plot points. And the finale goes on much too long, while the denouement proves to be a rather lame or shall I say: limp affair.Fulci's ironic handling of giallo norms is amusing, though. Yellow clues wherever you look.3 out of 10 limping killers


In [7]:
X_html_cleaned = X.apply(lambda x: re.sub(html_cleaner, "", x))

소문자 변환

In [8]:
print(X_html_cleaned[49])

Average (and surprisingly tame) Fulci giallo which means it's still quite bad by normal standards, but redeemed by its solid build-up and some nice touches such as a neat time twist on the issues of visions and clairvoyance.The genre's well-known weaknesses are in full gear: banal dialogue, wooden acting, illogical plot points. And the finale goes on much too long, while the denouement proves to be a rather lame or shall I say: limp affair.Fulci's ironic handling of giallo norms is amusing, though. Yellow clues wherever you look.3 out of 10 limping killers


In [9]:
print(X_html_cleaned[49].lower())

average (and surprisingly tame) fulci giallo which means it's still quite bad by normal standards, but redeemed by its solid build-up and some nice touches such as a neat time twist on the issues of visions and clairvoyance.the genre's well-known weaknesses are in full gear: banal dialogue, wooden acting, illogical plot points. and the finale goes on much too long, while the denouement proves to be a rather lame or shall i say: limp affair.fulci's ironic handling of giallo norms is amusing, though. yellow clues wherever you look.3 out of 10 limping killers


In [10]:
X_lowered = X_html_cleaned.apply(lambda x: x.lower())

구두점(punctuation) 및 숫자 제거

In [11]:
print(X_lowered[49])

average (and surprisingly tame) fulci giallo which means it's still quite bad by normal standards, but redeemed by its solid build-up and some nice touches such as a neat time twist on the issues of visions and clairvoyance.the genre's well-known weaknesses are in full gear: banal dialogue, wooden acting, illogical plot points. and the finale goes on much too long, while the denouement proves to be a rather lame or shall i say: limp affair.fulci's ironic handling of giallo norms is amusing, though. yellow clues wherever you look.3 out of 10 limping killers


In [12]:
punctuation_number_cleaner = re.compile("[^a-zA-Z\ ]")
print(re.sub(punctuation_number_cleaner, "", X_lowered[49]))

average and surprisingly tame fulci giallo which means its still quite bad by normal standards but redeemed by its solid buildup and some nice touches such as a neat time twist on the issues of visions and clairvoyancethe genres wellknown weaknesses are in full gear banal dialogue wooden acting illogical plot points and the finale goes on much too long while the denouement proves to be a rather lame or shall i say limp affairfulcis ironic handling of giallo norms is amusing though yellow clues wherever you look out of  limping killers


In [13]:
X_punctuation_number_cleaned = X_lowered.apply(lambda x: re.sub(punctuation_number_cleaner, "", x))

토크나이즈

In [14]:
import nltk
from nltk.tokenize import word_tokenize
nltk.download("punkt")

[nltk_data] Downloading package punkt to /home2/glee/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [15]:
print(X_punctuation_number_cleaned[49])

average and surprisingly tame fulci giallo which means its still quite bad by normal standards but redeemed by its solid buildup and some nice touches such as a neat time twist on the issues of visions and clairvoyancethe genres wellknown weaknesses are in full gear banal dialogue wooden acting illogical plot points and the finale goes on much too long while the denouement proves to be a rather lame or shall i say limp affairfulcis ironic handling of giallo norms is amusing though yellow clues wherever you look out of  limping killers


In [16]:
print(word_tokenize(X_punctuation_number_cleaned[49]))

['average', 'and', 'surprisingly', 'tame', 'fulci', 'giallo', 'which', 'means', 'its', 'still', 'quite', 'bad', 'by', 'normal', 'standards', 'but', 'redeemed', 'by', 'its', 'solid', 'buildup', 'and', 'some', 'nice', 'touches', 'such', 'as', 'a', 'neat', 'time', 'twist', 'on', 'the', 'issues', 'of', 'visions', 'and', 'clairvoyancethe', 'genres', 'wellknown', 'weaknesses', 'are', 'in', 'full', 'gear', 'banal', 'dialogue', 'wooden', 'acting', 'illogical', 'plot', 'points', 'and', 'the', 'finale', 'goes', 'on', 'much', 'too', 'long', 'while', 'the', 'denouement', 'proves', 'to', 'be', 'a', 'rather', 'lame', 'or', 'shall', 'i', 'say', 'limp', 'affairfulcis', 'ironic', 'handling', 'of', 'giallo', 'norms', 'is', 'amusing', 'though', 'yellow', 'clues', 'wherever', 'you', 'look', 'out', 'of', 'limping', 'killers']


In [17]:
X_tokenized = X_punctuation_number_cleaned.apply(lambda x: word_tokenize(x))

In [18]:
X_tokenized.head()

0    [one, of, the, other, reviewers, has, mentione...
1    [a, wonderful, little, production, the, filmin...
2    [i, thought, this, was, a, wonderful, way, to,...
3    [basically, theres, a, family, where, a, littl...
4    [petter, matteis, love, in, the, time, of, mon...
Name: review, dtype: object

불용어(stopwords) 제거

In [19]:
from nltk.corpus import stopwords
stopwords_eng = stopwords.words("english")

In [20]:
print(X_tokenized[49])

['average', 'and', 'surprisingly', 'tame', 'fulci', 'giallo', 'which', 'means', 'its', 'still', 'quite', 'bad', 'by', 'normal', 'standards', 'but', 'redeemed', 'by', 'its', 'solid', 'buildup', 'and', 'some', 'nice', 'touches', 'such', 'as', 'a', 'neat', 'time', 'twist', 'on', 'the', 'issues', 'of', 'visions', 'and', 'clairvoyancethe', 'genres', 'wellknown', 'weaknesses', 'are', 'in', 'full', 'gear', 'banal', 'dialogue', 'wooden', 'acting', 'illogical', 'plot', 'points', 'and', 'the', 'finale', 'goes', 'on', 'much', 'too', 'long', 'while', 'the', 'denouement', 'proves', 'to', 'be', 'a', 'rather', 'lame', 'or', 'shall', 'i', 'say', 'limp', 'affairfulcis', 'ironic', 'handling', 'of', 'giallo', 'norms', 'is', 'amusing', 'though', 'yellow', 'clues', 'wherever', 'you', 'look', 'out', 'of', 'limping', 'killers']


In [21]:
print([word for word in X_tokenized[9] if word not in stopwords_eng])

['like', 'original', 'gut', 'wrenching', 'laughter', 'like', 'movie', 'young', 'old', 'love', 'movie', 'hell', 'even', 'mom', 'liked', 'itgreat', 'camp']


In [22]:
X_stopwords_cleaned = X_tokenized.apply(lambda x: [word for word in x if word not in stopwords_eng])

패딩: 시퀀스 길이 통일

In [23]:
X_stopwords_cleaned.apply(lambda x: len(x))

0      167
1       84
2       85
3       66
4      125
      ... 
995    107
996     39
997    152
998    106
999    312
Name: review, Length: 1000, dtype: int64

In [24]:
X_stopwords_cleaned.apply(lambda x: len(x)).describe()

count    1000.000000
mean      118.777000
std        87.078838
min         8.000000
25%        64.000000
50%        89.500000
75%       146.250000
max       621.000000
Name: review, dtype: float64

In [25]:
MAX_LEN = 100

In [26]:
X_stopwords_cleaned[49]+["<PAD>"]*(MAX_LEN-len(X_stopwords_cleaned[49]))

['average',
 'surprisingly',
 'tame',
 'fulci',
 'giallo',
 'means',
 'still',
 'quite',
 'bad',
 'normal',
 'standards',
 'redeemed',
 'solid',
 'buildup',
 'nice',
 'touches',
 'neat',
 'time',
 'twist',
 'issues',
 'visions',
 'clairvoyancethe',
 'genres',
 'wellknown',
 'weaknesses',
 'full',
 'gear',
 'banal',
 'dialogue',
 'wooden',
 'acting',
 'illogical',
 'plot',
 'points',
 'finale',
 'goes',
 'much',
 'long',
 'denouement',
 'proves',
 'rather',
 'lame',
 'shall',
 'say',
 'limp',
 'affairfulcis',
 'ironic',
 'handling',
 'giallo',
 'norms',
 'amusing',
 'though',
 'yellow',
 'clues',
 'wherever',
 'look',
 'limping',
 'killers',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 '<PAD>',
 

In [27]:
X_padded = X_stopwords_cleaned.apply(lambda x: x+["<PAD>"]*(MAX_LEN-len(x)) if len(x)<MAX_LEN else x[:MAX_LEN])

In [28]:
X_padded.apply(lambda x: len(x)).describe()

count    1000.0
mean      100.0
std         0.0
min       100.0
25%       100.0
50%       100.0
75%       100.0
max       100.0
Name: review, dtype: float64

단어 집합(Vocabulary) 생성

In [29]:
unique_words = np.unique(np.concatenate(X_padded.values))

In [30]:
unique_words

array(['<PAD>', 'aaargh', 'aamir', ..., 'zulu', 'zwick',
       'zzzzzzzzzzzzzzzzzz'], dtype='<U56')

In [31]:
print(len(unique_words))

16277


In [32]:
vocab = {}
for i, word in enumerate(unique_words):
    vocab[word] = i

In [33]:
for i, (k, v) in enumerate(vocab.items()):
    print(k, vocab[k])
    if i == 10: break

<PAD> 0
aaargh 1
aamir 2
aaron 3
ab 4
abandon 5
abandoned 6
abandons 7
abba 8
abbey 9
abbot 10


In [34]:
print(X_padded[49])

['average', 'surprisingly', 'tame', 'fulci', 'giallo', 'means', 'still', 'quite', 'bad', 'normal', 'standards', 'redeemed', 'solid', 'buildup', 'nice', 'touches', 'neat', 'time', 'twist', 'issues', 'visions', 'clairvoyancethe', 'genres', 'wellknown', 'weaknesses', 'full', 'gear', 'banal', 'dialogue', 'wooden', 'acting', 'illogical', 'plot', 'points', 'finale', 'goes', 'much', 'long', 'denouement', 'proves', 'rather', 'lame', 'shall', 'say', 'limp', 'affairfulcis', 'ironic', 'handling', 'giallo', 'norms', 'amusing', 'though', 'yellow', 'clues', 'wherever', 'look', 'limping', 'killers', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>']


In [35]:
print([vocab[word] for word in X_padded[49]])

[971, 14078, 14229, 5682, 5903, 8926, 13715, 11397, 1049, 9823, 13598, 11637, 13326, 1863, 9727, 14770, 9638, 14617, 15018, 7504, 15538, 2517, 5858, 15793, 15744, 5687, 5822, 1101, 3823, 16018, 104, 7042, 10810, 10851, 5313, 6003, 9475, 8443, 3687, 11243, 11504, 8026, 12812, 12433, 8330, 212, 7471, 6358, 5903, 9828, 496, 14529, 16179, 2625, 15842, 8453, 8332, 7881, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [36]:
X_indexed = X_padded.apply(lambda x: [vocab[word] for word in x])

In [37]:
X_indexed.head()

0    [10050, 11997, 9006, 15708, 10293, 4701, 16209...
1    [16006, 8390, 11151, 5290, 14308, 15081, 10027...
2    [14532, 16006, 15725, 13472, 14617, 6859, 1400...
3    [1167, 14466, 5087, 8390, 1670, 7575, 14498, 1...
4    [10622, 8856, 8513, 14617, 9299, 15547, 13882,...
Name: review, dtype: object

In [38]:
X_preprocessed = torch.tensor(X_indexed)

레이블 인코딩

In [40]:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
y_preprocessed = torch.tensor(label_encoder.fit_transform(y), dtype=torch.float)
y_preprocessed = y_preprocessed.unsqueeze(1)
print(label_encoder.classes_)

['negative' 'positive']


In [41]:
print(y_preprocessed)

tensor([[1.],
        [1.],
        [1.],
        [0.],
        [1.],
        [1.],
        [1.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [1.],
        [0.],
        [1.],
        [0.],
        [1.],
        [0.],
        [1.],
        [0.],
        [1.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [0.],
        [1.],
        [1.],
        [1.],
        [0.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [0.],
        [1.],
        [0.],
        [1.],
        [1.],
        [1.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
      

학습/테스트 데이터셋 분할

In [42]:
X_train_valid, X_test, y_train_valid, y_test = train_test_split(X_preprocessed, y_preprocessed, test_size=0.2, shuffle=True)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_valid, y_train_valid, test_size=0.2, shuffle=False)

In [43]:
print(f"Train size: {len(X_train)}, Validation size: {len(X_valid)}, Test size: {len(X_test)}")

Train size: 640, Validation size: 160, Test size: 200


커스텀 데이터셋 구축

In [44]:
class TextDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y
    
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        X_out = self.X[idx]
        y_out = self.y[idx]
        
        return X_out, y_out

In [45]:
batch_size = 16

train_set = TextDataset(X_train, y_train)
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, drop_last=True)
valid_set = TextDataset(X_valid, y_valid)
valid_loader = DataLoader(valid_set, batch_size=batch_size)
test_set = TextDataset(X_test, y_test)
test_loader = DataLoader(test_set, batch_size=len(test_set), shuffle=True)

In [46]:
len(train_set)

640

In [47]:
train_set.__getitem__(0)

(tensor([ 1370,  5094, 10018,  6834,  9446, 13045,  9087,   261, 10018,  9420,
          8623,  8623, 10577,  1370,  5094,  9271,  6834,  9446, 10050,  4883,
          3229, 14532,  6138,  3112, 11797, 13766, 11563, 11563,  4643,  9320,
          1665, 13770,  6301,  6732,   420, 10050, 11563,  2557,  6255,  6138,
          4599, 13766,  6138, 14529,  8309,  9475,  2318,  3569,  3801,  8342,
         10758,  6029,  1049,  6834,  1424,  6543,  2600,  5029,   274,  5173,
          5709,  1424, 11563,  5709, 12721,  4466,  6146,  8007,  4492, 14766,
            37, 16112, 15702, 10405,  9420,  2820, 15695, 14617, 14986, 14491,
          8309, 16107,  9420, 11999,  7065,  8431,  5873,  7825, 10018,  4157,
          8028,  5889, 14739,  1390, 14337, 13342,   420, 12185,  4953,  6391]),
 tensor([0.]))

In [48]:
next(iter(train_loader))[0].shape

torch.Size([16, 100])

## RNN 모델 구축

### RNN 모델에 대한 PyTorch 클래스 선언

In [49]:
class RNN(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super(RNN, self).__init__()
        
        # Embedding 계층
        self.embedding = nn.Embedding(input_dim, embedding_dim)
        
        # RNN 계층
        self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
        
        # 완전연결층
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, text):
        # text -> (batch_size, MAX_LEN)
        
        embedded = self.embedding(text) 
        # embedded -> (batch_size, MAX_LEN, embedding_dim)
        
        output, hidden_state = self.rnn(embedded)
        # output -> (batch_size, MAX_LEN, hidden_dim) 전체 시점의 hidden state
        # hidden_state -> (batch_size, 1, hidden_dim) 마지막 시점의 hidden state
        
        output = torch.sigmoid(self.fc(hidden_state.squeeze()))
        # output -> (batch_size, output_dim)
        
        return output

### RNN 모델 구축

In [50]:
INPUT_DIM = len(vocab)
EMBEDDING_DIM = 16
HIDDEN_DIM = 16
OUTPUT_DIM = 1

In [51]:
model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)

In [52]:
print(model)

RNN(
  (embedding): Embedding(16277, 16)
  (rnn): RNN(16, 16, batch_first=True)
  (fc): Linear(in_features=16, out_features=1, bias=True)
)


In [53]:
print("총 파라미터 수: ",sum(p.numel() for p in model.parameters()))
# 

총 파라미터 수:  260993


## 모델 학습

### 모델 학습 수행 함수 선언

In [54]:
def run_epoch(model, data_loader, criterion, optimizer, train=False):
    loss_ep = 0
    if train:
        model.train()
    else:
        model.eval()

    for x, y in data_loader:
        if train:
            y_predicted = model(x)
            loss = criterion(y_predicted, y)
            loss_ep += loss.item()
            loss.backward()
            optimizer.step()
        else:
            with torch.autograd.no_grad():
                y_predicted = model(x)
                loss = criterion(y_predicted, y)
                loss_ep += loss.item()

        return loss_ep

### 하이퍼파라미터 및 학습 방식 정의

In [60]:
learning_rate = 1e-5
max_epochs = 2000

criterion = torch.nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

### 모델 학습 진행

In [61]:
for epoch in range(1, max_epochs+1):
    train_loss = run_epoch(model, train_loader, criterion, optimizer, train=True)
    valid_loss = run_epoch(model, valid_loader, criterion, optimizer, train=False)

    if epoch % 100 == 0:
        print("epoch:{}, train_loss: {:.4f}, valid_loss: {:.4f}".format(epoch, train_loss, valid_loss))

epoch:100, train_loss: 0.7183, valid_loss: 0.7172
epoch:200, train_loss: 0.6816, valid_loss: 0.7171
epoch:300, train_loss: 0.7106, valid_loss: 0.7172
epoch:400, train_loss: 0.6900, valid_loss: 0.7173
epoch:500, train_loss: 0.6793, valid_loss: 0.7173
epoch:600, train_loss: 0.6938, valid_loss: 0.7175
epoch:700, train_loss: 0.6817, valid_loss: 0.7176
epoch:800, train_loss: 0.6882, valid_loss: 0.7176
epoch:900, train_loss: 0.6965, valid_loss: 0.7176
epoch:1000, train_loss: 0.6720, valid_loss: 0.7177
epoch:1100, train_loss: 0.6618, valid_loss: 0.7179
epoch:1200, train_loss: 0.6948, valid_loss: 0.7181
epoch:1300, train_loss: 0.6804, valid_loss: 0.7183
epoch:1400, train_loss: 0.6628, valid_loss: 0.7185
epoch:1500, train_loss: 0.6682, valid_loss: 0.7187
epoch:1600, train_loss: 0.6382, valid_loss: 0.7188
epoch:1700, train_loss: 0.6733, valid_loss: 0.7190
epoch:1800, train_loss: 0.6785, valid_loss: 0.7192
epoch:1900, train_loss: 0.6712, valid_loss: 0.7194
epoch:2000, train_loss: 0.7210, valid_lo

## 모델 검증

In [62]:
from sklearn.metrics import classification_report

In [63]:
with torch.no_grad(): # 학습(가중치 업데이트)을 진행하지 않음
    X_test, y_test = next(iter(test_loader))
    
    y_test_pred = model(X_test)
    y_test_pred = torch.round(y_test_pred)

In [64]:
y_test = y_test.detach().numpy()
y_test_pred = y_test_pred.detach().numpy()

In [65]:
report = classification_report(y_test, y_test_pred, target_names=label_encoder.classes_)

In [66]:
print(report)

              precision    recall  f1-score   support

    negative       0.52      0.15      0.23       108
    positive       0.46      0.84      0.59        92

    accuracy                           0.47       200
   macro avg       0.49      0.49      0.41       200
weighted avg       0.49      0.47      0.40       200



In [67]:
sample_index = 52
print("REVIEW text:", X[sample_index], "\n\nSENTIMENT:", y[sample_index])

REVIEW text: Bela Lugosi appeared in several of these low budget chillers for Monogram Studios in the 1940's and The Corpse Vanishes is one of the better ones.<br /><br />Bela plays a mad scientist who kidnaps young brides and kills them and then extracts fluid from their bodies so he can keep his ageing wife looking young. After a reporter and a doctor stay the night at his home and discover he is responsible for the brides' deaths, the following morning they report these murders to the police and the mad scientist is shot and drops dead shortly afterwards.<br /><br />You have got almost everything in this movie: the scientist's assistants consist of an old hag, a hunchback and dwarf (her sons), a thunderstorm and spooky passages in Bela's house. Bela and his wife find they sleep better in coffins rather than beds in the movie.<br /><br />The Corpse Vanishes is worth a look, especially for Bela Lugosi fans. Great fun.<br /><br />Rating: 3 stars out of 5. 

SENTIMENT: positive


In [68]:
y_sample_pred = int(torch.round(model(X_preprocessed[sample_index].unsqueeze(0))).item())

In [69]:
print("Predicted SENTIMENT:", label_encoder.classes_[y_sample_pred])

Predicted SENTIMENT: positive
