
# Task 2 RNN for Text Classification

# Introduction

This part is task 2 of HW3 focusing on the RNN: Text Classification. We developed a RNN model, specifically a LSTM model, for the text classification NLP task. The last part of this task is to compare RNN and CNN in the same task.

In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [7]:
from sklearn.utils import shuffle

In [8]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.optimizers import SGD
from keras.layers import Embedding, Conv1D, Dropout, MaxPooling1D, Flatten, Dense
from keras import regularizers

In [9]:
from keras.models import Model
from keras.optimizers import SGD
from keras.layers import Input, Embedding, LSTM, Dropout, Bidirectional, Dense, Flatten
from keras import regularizers

In [10]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score
from keras.callbacks import EarlyStopping
from sklearn.metrics import accuracy_score

## Data Loading and Preprocessing

In [11]:
df_n = pd.read_table('/Users/jianwenliu/nlp/rt-polaritydata/rt-polaritydata/rt-polarity.neg', names=['review'],  header=None, encoding='latin-1')
df_p = pd.read_table('/Users/jianwenliu/nlp/rt-polaritydata/rt-polaritydata/rt-polarity.pos', names=['review'],  header=None, encoding='latin-1')



  """Entry point for launching an IPython kernel.
  


In [12]:
df_n.head(10)

Unnamed: 0,review
0,"simplistic , silly and tedious ."
1,"it's so laddish and juvenile , only teenage bo..."
2,exploitative and largely devoid of the depth o...
3,[garbus] discards the potential for pathologic...
4,a visually flashy but narratively opaque and e...
5,"the story is also as unoriginal as they come ,..."
6,about the only thing to give the movie points ...
7,not so much farcical as sour .
8,unfortunately the story and the actors are ser...
9,all the more disquieting for its relatively go...


In [13]:
df_n['lable'] = 0
df_p['lable'] = 1

shuffle the data and make it to be 50-50 positive and negtive sample. This will make the final metrics accuracy useful

In [14]:
df = shuffle(pd.concat((df_n, df_p), axis=0), random_state=6)

In [15]:
df

Unnamed: 0,review,lable
3762,it gives devastating testimony to both people'...,1
3932,wanders all over the map thematically and styl...,0
1802,"it cannot be enjoyed , even on the level that ...",0
3211,"deliberately and devotedly constructed , far f...",0
1172,if the count of monte cristo doesn't transform...,1
2559,"it's a tour de force , written and directed so...",1
519,brady achieves the remarkable feat of squander...,0
5182,soulless and -- even more damning -- virtually...,0
1067,personal velocity ought to be exploring these ...,0
728,[fessenden] is much more into ambiguity and cr...,1


In [16]:
df.shape

(10662, 2)

vectorize the review

In [17]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer()
tokenizer.fit_on_texts(df.review)

In [18]:
sequences = tokenizer.texts_to_sequences(df.review)

get the max_len, prepare for padding

In [19]:
max_len = max([len(seq) for seq in sequences])
max_len

51

padding

In [20]:
sequences = pad_sequences(sequences, maxlen=max_len, dtype='int32', padding='pre', truncating='pre', value=0)

In [21]:
sequences

array([[    0,     0,     0, ...,  2940,    14,    47],
       [    0,     0,     0, ...,     4,    16,   106],
       [    0,     0,     0, ...,   275,   848,   275],
       ...,
       [    0,     0,     0, ..., 10267,     4,  3092],
       [    0,     0,     0, ...,    90,  1169,   225],
       [    0,     0,     0, ...,   718, 19498,  1899]], dtype=int32)

In [22]:
tokenizer.word_index

{'the': 1,
 'a': 2,
 'and': 3,
 'of': 4,
 'to': 5,
 'is': 6,
 'in': 7,
 'that': 8,
 'it': 9,
 'as': 10,
 'but': 11,
 'with': 12,
 'film': 13,
 'for': 14,
 'this': 15,
 'its': 16,
 'an': 17,
 'movie': 18,
 "it's": 19,
 'be': 20,
 'on': 21,
 'you': 22,
 'not': 23,
 'by': 24,
 'one': 25,
 'like': 26,
 'about': 27,
 'more': 28,
 'has': 29,
 'are': 30,
 'at': 31,
 'than': 32,
 'from': 33,
 'all': 34,
 'his': 35,
 'have': 36,
 'so': 37,
 'if': 38,
 'or': 39,
 'story': 40,
 'too': 41,
 'i': 42,
 'out': 43,
 'just': 44,
 'who': 45,
 'up': 46,
 'good': 47,
 'into': 48,
 'what': 49,
 'most': 50,
 'no': 51,
 'much': 52,
 'even': 53,
 'comedy': 54,
 'time': 55,
 'will': 56,
 'can': 57,
 'some': 58,
 'well': 59,
 'characters': 60,
 'only': 61,
 'little': 62,
 'way': 63,
 'funny': 64,
 'their': 65,
 'director': 66,
 'make': 67,
 'been': 68,
 'your': 69,
 'enough': 70,
 'very': 71,
 'never': 72,
 'when': 73,
 'there': 74,
 'makes': 75,
 'life': 76,
 'bad': 77,
 'may': 78,
 'which': 79,
 'best': 80,
 

In [23]:
voc_size=len(tokenizer.word_index)

In [24]:
voc_size

19498

# RNN (LSTM) model for Text Classification

build the LSTM model

In [25]:

# model where N=1, M=1 and K=1
def bilstm_model(output_dim=64, dense_dim=16):
    
    #input and embedding
    input = Input(shape=(max_len,), dtype='float64')
    embed = Embedding(voc_size+1, output_dim, input_length=max_len, embeddings_initializer='random_uniform')(input)
    
    #bi-lstm layer
    lstm = Bidirectional(LSTM(units=100, return_sequences=True))(embed)
    
    
    #flatten and dense layer
    lstm = Flatten()(lstm)
    dense = Dense(dense_dim, activation='relu')(lstm)
    
    #Dropout
    dense = Dropout(0.5)(dense)
    
    output = Dense(1, kernel_regularizer=regularizers.l2(0.03),activity_regularizer=regularizers.l1(0.01),activation='sigmoid')(lstm)
    
    model = Model(input, output)
    
    ##use rmsprop
    model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
    
    return model
    
bilstm_model().summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 51)                0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 51, 64)            1247936   
_________________________________________________________________
bidirectional_1 (Bidirection (None, 51, 200)           132000    
_________________________________________________________________
flatten_1 (Flatten)          (None, 10200)             0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 10201     
Total params: 1,390,137
Trainable params: 1,390,137
Non-trainable params: 0
_________________________________________________________________


In [29]:
EPOCHS = 10
BATCH_SIZE = 128
fold = 8

model = bilstm_model()
X = sequences
y = np.array(df.lable.tolist())

# Model performance : prediction accuracy

In [30]:
kfold = StratifiedKFold(n_splits=fold, shuffle=True, random_state=7)

In [31]:
acr = list()
i = 0
for train_index, valid_index in kfold.split(X, y):
    i += 1
    X_train, X_valid = X[train_index], X[valid_index]
    y_train, y_valid = y[train_index], y[valid_index]
    model = bilstm_model()
    model.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=EPOCHS, verbose=2, validation_data=(X_valid, y_valid),\
              callbacks=[EarlyStopping(patience=1, monitor='val_loss')])
    y_hat = model.predict(X_valid)
    y_pred = [round(pred) for pred in y_hat.reshape(-1)]
    acr.append(accuracy_score(y_valid, y_pred))

Train on 9328 samples, validate on 1334 samples
Epoch 1/10
 - 39s - loss: 1.1655 - acc: 0.5123 - val_loss: 1.1169 - val_acc: 0.6319
Epoch 2/10
 - 30s - loss: 0.9600 - acc: 0.7363 - val_loss: 1.0558 - val_acc: 0.7609
Epoch 3/10
 - 30s - loss: 0.8617 - acc: 0.8310 - val_loss: 1.0812 - val_acc: 0.7346
Train on 9328 samples, validate on 1334 samples
Epoch 1/10
 - 39s - loss: 1.1629 - acc: 0.5163 - val_loss: 1.1038 - val_acc: 0.5585
Epoch 2/10
 - 33s - loss: 0.9685 - acc: 0.7347 - val_loss: 1.0089 - val_acc: 0.7159
Epoch 3/10
 - 43s - loss: 0.8597 - acc: 0.8336 - val_loss: 1.0832 - val_acc: 0.7436
Train on 9328 samples, validate on 1334 samples
Epoch 1/10
 - 37s - loss: 1.1662 - acc: 0.5090 - val_loss: 1.0634 - val_acc: 0.5210
Epoch 2/10
 - 28s - loss: 0.9703 - acc: 0.7230 - val_loss: 0.9933 - val_acc: 0.6582
Epoch 3/10
 - 28s - loss: 0.8643 - acc: 0.8243 - val_loss: 1.0483 - val_acc: 0.7054
Train on 9330 samples, validate on 1332 samples
Epoch 1/10
 - 36s - loss: 1.1636 - acc: 0.5096 - val

In [32]:
print('Average accuracy ', np.mean(np.array(acr)))

Average accuracy  0.727818382475429


# Conclusion

By comparing the RNN and CNN model, we can find out there are at least two main conclusions: Firstly, the training complexity for RNN is much less than CNN. This means RNN is a much faster architecture than CNN. The average ecpoch time for RNN is about 35s, while the average ecpoch time for CNN is about 110s.<br><br>
Secondly, According to the performance, which is the accuracy of model, we can also find out that CNN has a little higher accuracy than RNN (0.74 to 0.72). This indicates that RNN may be less accurate then CNN for some dataset. Even though CNN is essentialy a N-gram based model, it still works for NLP task.<br><br>
Therefore, in NLP task, RNN may be a preference than CNN for its efficiency, while CNN also works. A good application of CNN is to aggregate the Character information for words.