# <center> CNN for text classification & sentiment analysis</center>

## <center> Hartford Insurance Group at Stevens Inst of Tech </center>

<center> Mo Badawy, Sept 25, 2017</center>

### Using Convolutional Neural Network for classification of movie reviews

In this tutorial we will implement a Convolutional Neural Network, a neural network architecture, which has been commonly utilized for image classification problems, to classify movie reviews from http://rottentomoatoes.com. These movie reviews, either from critics or public audience, are either "Fresh", conveying a positive overall review, or "Rotten" for negative ones. We will train & evaluate a CNN model to classify 1-sentence movie reviews from Rottentomatoes.com.

### 1. Dataset

The dataset consists of a bit over 10,000, one-sentence, movie reviews from the Rotten Tomatoes web site. Each review represents an opinion of the movie as either "Fresh" or "Rotten. The dataset can be downoaded directly downloaded from Cornell, here: http://www.cs.cornell.edu/people/pabo/movie-review-data/rt-polaritydata.tar.gz

We will come back later to the dataset as we will need to preprocess the text in order to be able to use it with our model.

### 2. Convolutional Neural Networks

CNN is a multi layered neural network architecture. Several CNN variants (e.g. LeNet, AlexNet) have yielded very good results in image classification problems. Advances in GPU computing in the last decade made it much easier and faster to train CNNs on large image datasets. Although the current accuracy world records for image classification (e.g. the [ImageNet ILSVRC challenge](http://www.image-net.org/challenges/LSVRC/)) are held by much more complicated and deeper networks, e.g. the 152-layer ResNet, which was the first network to surpass human accuracy level (2015), and Inception v3 (2016), convolution is still at the core of these advanced  networks.

### What is convolution?

Convolution is a mathematical operation, combining two functions (signals) to generate a new signal. Historically, convolution has been an essential tool in DSP (Digital Signal Processing), e.g. an impulse response signal can be convolved with base signal to render a certain effect. For example, in audio processing, impulse response samples have been utilized to add the hallmark 'reverb' effect of famous concert halls to studio recordings, see [Wiki article](https://en.wikipedia.org/wiki/Convolution_reverb). 

The image below illustrates how a pluse (square) signal (in red) is convolved with the base signal (blue), producing the new signal (black).

![Cont. Convolution](./images/Convolution_of_spiky_function_with_box2.gif)

In DSP, convolution (as illustrated above) is usually performed on continuous signals. In image processing, convolution of discrete signals, represented by matrices, yields a class of filters that allow us to manipulate images in several different ways. Convolution filters could be configured to sharpen, blur, or even detect edges in a given image. A greyscale image can be represented as a large matrix, one matrix entry per pixel, a kernel matrix (much smaller matrix), can be convolved with original matrix (image) as illustrated below. Colored images can be processed as separate channels (RGB), so a colored image is represented by 3 (greyscale) matrices.

Take a look at the following figures and notice the dimension reduction due to the convolution process.
![convolution](./images/RiverTrain-ImageConvDiagram.png)
<img src="./images/Convolution_schematic.gif" width="400">

Example of a blurring effect
<img src="./images/gimp_blur.png" width="400">

Example of edge-detection
<img src="./images/gimp_edge.png" width="400">

### How do convolution networks work for image classification?

One major challenge for classical classification algorithms when applied to image datasets is the inherent high-dimensionality of the data. If each row of the dataset is a 300x300 image, this actually yields a flattened 90,000-dimensional dataset! 

CNNs, on the other hand, utilize several convolution & pooling (e.g. Max pooling, a moving window extracting pixel of highest value) layers for both dimension reduction & feature selection processes. The "weights" of the convolution kernels (along with weights for activation functions, e.g. ReLU) are iteratively optimized in the process. Optimization is performed to minimize a certain "loss" function, whose optimal minimal value corresponds to a high classification accuracy. "Stochastic Gradient Descent" is a common algorithm for the minimization of the loss function. The weights are updated each iteration based on a back-probagation scheme (mathematically = chain rule) to update the gradient based on the error in classification in previous iteration. 

<img src="./images/cnn.png" width="700">
<img src="./images/cnn.jpg" width="630">

### How would they work for text classification!?

We can think of each word (token) of a sentence as a row in a matrix. Technically, each row is rather a numeric vector representation of the given token, usually via a word embedding algorithm like Word2Vec, GLoVe, or even a simple one-hot encoding to index each word with respect to the corpus of words/tokens in our dataset. So, for example, a 10-word sentence (with vector embedding in a 100 dimensional space) can be represented by a 10X100 matrix (image!).

The convolution filters in this case, have the same width, say a 100 per prior example, and height may vary depending on the use-case, covering 2-5 word span is typical.

### But still, sentences are not very similar to images!

Well, that is in fact true, adjacent words in a sentence (per grammar & syntax rules) are not really similar to adjacent pixels in an image. Besides, certain intuitive image transformations do not yield meaningful results for language applications. So how should we expect CNNs to yield good results? 

Shouldn't we use better suited architectures (e.g. RNNs, LSTMs, Sequence-to-sequence models) for NLP applications? That is also true, but from practical experience, CNNs still deliver good accuracy for simple language tasks, and the real gain here is that they are much easier and faster to train compared to other more advanced architectures. For more complicated language tasks, e.g. question answering, more specialized & elaborate architectures may be required, see [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) for more on the latest developments.

<img src="./images/cnn_text1.png" width="700">
<img src="./images/cnn_text2.png" width="700">

### 3. The Network

For this particular use-case the network used is based on the one introduced in the paper ["Convolutional Neural Networks for Sentence Classification"](https://arxiv.org/abs/1408.5882) (see above image) with minor changes, as follows: 

* Paper utilizes 2 channels for the input layer (see image above), one is static (constant through training), and a dynamic layer that is fine-tuned via back probagation. We'll have only one static channel.

* We will not utilize a pre-built embedding model (e.g. Word2vec or GLoVe), but instead, an embedding layer will be implemented to "learn" the embedding during the training of the model.

* We pad the sentences to have all the same length, 59 words/tokens per sentence per our dataset.

* Filter sizes, i.e. the number of words we want our convolutional filters to cover. For example, [3, 4, 5] means that we will have filters that slide over 3, 4 and 5 words respectively, for a total of 3 * num_filters filters.

#### Regularization

Neural network models tend to overfit over the training data, so a regularization mechanism will help in this regard. Standard regularization techniques (e.g. L1 (LASSO), L2 (ridge)) have been found not to be very effective for these models, see this [paper](https://arxiv.org/abs/1510.03820) for more detailed information. Instead, the "Dropout" technique is used. The idea behind the dropout technique is to randomly disable a number of neurons in the network, so as to prevent too many neurons from co-adapting, and hence force the neurons to individiually learn more relevant features.

### 4. Training code

In [1]:
# # Import python libraries/modules we'll need
import tensorflow as tf
import numpy as np

# class TextCNN(object):
#     """
#     A CNN for text classification.
#     Uses an embedding layer, followed by a convolutional, max-pooling and softmax layer.
#     """
#     def __init__(
#       self, sequence_length, num_classes, vocab_size,
#       embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0):

#         # Placeholders for input, output and dropout
#         self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x")
#         self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y")
#         self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")

#         # Keeping track of l2 regularization loss (optional)
#         l2_loss = tf.constant(0.0)

#         # Embedding layer
#         with tf.device('/cpu:0'), tf.name_scope("embedding"):
#             self.W = tf.Variable(
#                 tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
#                 name="W")
#             self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x)
#             self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

#         # Create a convolution + maxpool layer for each filter size
#         pooled_outputs = []
#         for i, filter_size in enumerate(filter_sizes):
#             with tf.name_scope("conv-maxpool-%s" % filter_size):
#                 # Convolution Layer
#                 filter_shape = [filter_size, embedding_size, 1, num_filters]
#                 W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
#                 b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
#                 conv = tf.nn.conv2d(
#                     self.embedded_chars_expanded,
#                     W,
#                     strides=[1, 1, 1, 1],
#                     padding="VALID",
#                     name="conv")
#                 # Apply nonlinearity
#                 h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
#                 # Maxpooling over the outputs
#                 pooled = tf.nn.max_pool(
#                     h,
#                     ksize=[1, sequence_length - filter_size + 1, 1, 1],
#                     strides=[1, 1, 1, 1],
#                     padding='VALID',
#                     name="pool")
#                 pooled_outputs.append(pooled)

#         # Combine all the pooled features
#         num_filters_total = num_filters * len(filter_sizes)
#         self.h_pool = tf.concat(pooled_outputs, 3)
#         self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])

#         # Add dropout
#         with tf.name_scope("dropout"):
#             self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)

#         # Final (unnormalized) scores and predictions
#         with tf.name_scope("output"):
#             W = tf.get_variable(
#                 "W",
#                 shape=[num_filters_total, num_classes],
#                 initializer=tf.contrib.layers.xavier_initializer())
#             b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
#             l2_loss += tf.nn.l2_loss(W)
#             l2_loss += tf.nn.l2_loss(b)
#             self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
#             self.predictions = tf.argmax(self.scores, 1, name="predictions")

#         # CalculateMean cross-entropy loss
#         with tf.name_scope("loss"):
#             losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y)
#             self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss

#         # Accuracy
#         with tf.name_scope("accuracy"):
#             correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1))
#             self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")


#### Training the model
To train the model on the processed dataset, we will need to excute the `train.py` script in Python (ver 3.x).

In [2]:
%run -i './train.py'


Parameters:
ALLOW_SOFT_PLACEMENT=True
BATCH_SIZE=64
CHECKPOINT_EVERY=100
DEV_SAMPLE_PERCENTAGE=0.1
DROPOUT_KEEP_PROB=0.5
EMBEDDING_DIM=128
EVALUATE_EVERY=100
FILTER_SIZES=3,4,5
L2_REG_LAMBDA=0.0
LOG_DEVICE_PLACEMENT=False
NEGATIVE_DATA_FILE=./data/rt-polaritydata/rt-polarity.neg
NUM_CHECKPOINTS=5
NUM_EPOCHS=200
NUM_FILTERS=128
POSITIVE_DATA_FILE=./data/rt-polaritydata/rt-polarity.pos

Loading data...
Vocabulary Size: 18758
Train/Dev split: 9596/1066
INFO:tensorflow:Summary name embedding/W:0/grad/hist is illegal; using embedding/W_0/grad/hist instead.
INFO:tensorflow:Summary name embedding/W:0/grad/sparsity is illegal; using embedding/W_0/grad/sparsity instead.
INFO:tensorflow:Summary name conv-maxpool-3/W:0/grad/hist is illegal; using conv-maxpool-3/W_0/grad/hist instead.
INFO:tensorflow:Summary name conv-maxpool-3/W:0/grad/sparsity is illegal; using conv-maxpool-3/W_0/grad/sparsity instead.
INFO:tensorflow:Summary name conv-maxpool-3/b:0/grad/hist is illegal; using conv-maxpool-3/b_

2018-06-26T11:08:11.077044: step 92, loss 1.64475, acc 0.484375
2018-06-26T11:08:11.232114: step 93, loss 1.05245, acc 0.59375
2018-06-26T11:08:11.402689: step 94, loss 1.75902, acc 0.515625
2018-06-26T11:08:11.561742: step 95, loss 2.27515, acc 0.421875
2018-06-26T11:08:11.737501: step 96, loss 1.15786, acc 0.65625
2018-06-26T11:08:11.910717: step 97, loss 1.60332, acc 0.421875
2018-06-26T11:08:12.106398: step 98, loss 1.29083, acc 0.5
2018-06-26T11:08:12.267148: step 99, loss 1.36707, acc 0.609375
2018-06-26T11:08:12.439704: step 100, loss 2.03223, acc 0.390625

Evaluation:
2018-06-26T11:08:13.074169: step 100, loss 0.909898, acc 0.547842

Saved model checkpoint to /Users/jostine.ho/Desktop/cnn4text/runs/1530029275/checkpoints/model-100

2018-06-26T11:08:13.437976: step 101, loss 1.17966, acc 0.5625
2018-06-26T11:08:13.609421: step 102, loss 1.54089, acc 0.453125
2018-06-26T11:08:13.781580: step 103, loss 1.21535, acc 0.609375
2018-06-26T11:08:13.931556: step 104, loss 1.23402, acc 0

2018-06-26T11:08:31.758234: step 216, loss 0.848402, acc 0.671875
2018-06-26T11:08:31.907153: step 217, loss 0.780934, acc 0.6875
2018-06-26T11:08:32.057489: step 218, loss 1.10443, acc 0.515625
2018-06-26T11:08:32.200416: step 219, loss 0.950732, acc 0.53125
2018-06-26T11:08:32.358843: step 220, loss 0.836175, acc 0.671875
2018-06-26T11:08:32.518988: step 221, loss 0.731532, acc 0.65625
2018-06-26T11:08:32.691636: step 222, loss 0.977435, acc 0.59375
2018-06-26T11:08:32.846942: step 223, loss 1.07349, acc 0.53125
2018-06-26T11:08:33.003904: step 224, loss 0.829297, acc 0.6875
2018-06-26T11:08:33.162594: step 225, loss 0.86034, acc 0.6875
2018-06-26T11:08:33.325182: step 226, loss 0.996438, acc 0.546875
2018-06-26T11:08:33.473645: step 227, loss 0.745788, acc 0.75
2018-06-26T11:08:33.624136: step 228, loss 0.979236, acc 0.5625
2018-06-26T11:08:33.769790: step 229, loss 1.0926, acc 0.4375
2018-06-26T11:08:33.974387: step 230, loss 1.02629, acc 0.59375
2018-06-26T11:08:34.164026: step 23

2018-06-26T11:08:52.463996: step 341, loss 0.47318, acc 0.734375
2018-06-26T11:08:52.629495: step 342, loss 0.719815, acc 0.6875
2018-06-26T11:08:52.778647: step 343, loss 0.817055, acc 0.515625
2018-06-26T11:08:52.925911: step 344, loss 0.64046, acc 0.6875
2018-06-26T11:08:53.076100: step 345, loss 0.504705, acc 0.734375
2018-06-26T11:08:53.219376: step 346, loss 0.794677, acc 0.671875
2018-06-26T11:08:53.369793: step 347, loss 0.779743, acc 0.59375
2018-06-26T11:08:53.510277: step 348, loss 0.632305, acc 0.703125
2018-06-26T11:08:53.667666: step 349, loss 0.592164, acc 0.703125
2018-06-26T11:08:53.834830: step 350, loss 0.669185, acc 0.71875
2018-06-26T11:08:53.994629: step 351, loss 0.476667, acc 0.78125
2018-06-26T11:08:54.142032: step 352, loss 0.640983, acc 0.671875
2018-06-26T11:08:54.286029: step 353, loss 0.560041, acc 0.71875
2018-06-26T11:08:54.429674: step 354, loss 0.627943, acc 0.625
2018-06-26T11:08:54.583990: step 355, loss 0.529512, acc 0.703125
2018-06-26T11:08:54.739

2018-06-26T11:09:12.133994: step 465, loss 0.657694, acc 0.6875
2018-06-26T11:09:12.275917: step 466, loss 0.573305, acc 0.703125
2018-06-26T11:09:12.428964: step 467, loss 0.737348, acc 0.59375
2018-06-26T11:09:12.571895: step 468, loss 0.60077, acc 0.671875
2018-06-26T11:09:12.719560: step 469, loss 0.572413, acc 0.71875
2018-06-26T11:09:12.875568: step 470, loss 0.576174, acc 0.734375
2018-06-26T11:09:13.030140: step 471, loss 0.543665, acc 0.703125
2018-06-26T11:09:13.228525: step 472, loss 0.555898, acc 0.703125
2018-06-26T11:09:13.460237: step 473, loss 0.594043, acc 0.6875
2018-06-26T11:09:13.627648: step 474, loss 0.551474, acc 0.671875
2018-06-26T11:09:13.787383: step 475, loss 0.540475, acc 0.75
2018-06-26T11:09:13.941680: step 476, loss 0.554333, acc 0.734375
2018-06-26T11:09:14.111014: step 477, loss 0.641955, acc 0.65625
2018-06-26T11:09:14.260982: step 478, loss 0.598753, acc 0.71875
2018-06-26T11:09:14.418292: step 479, loss 0.604655, acc 0.75
2018-06-26T11:09:14.582075:

2018-06-26T11:09:32.264159: step 589, loss 0.592051, acc 0.75
2018-06-26T11:09:32.413450: step 590, loss 0.677917, acc 0.625
2018-06-26T11:09:32.568944: step 591, loss 0.566906, acc 0.71875
2018-06-26T11:09:32.722441: step 592, loss 0.514294, acc 0.828125
2018-06-26T11:09:32.893632: step 593, loss 0.496534, acc 0.765625
2018-06-26T11:09:33.045890: step 594, loss 0.455058, acc 0.765625
2018-06-26T11:09:33.195310: step 595, loss 0.542938, acc 0.703125
2018-06-26T11:09:33.342873: step 596, loss 0.711814, acc 0.59375
2018-06-26T11:09:33.500281: step 597, loss 0.496486, acc 0.765625
2018-06-26T11:09:33.647205: step 598, loss 0.509187, acc 0.71875
2018-06-26T11:09:33.796197: step 599, loss 0.52936, acc 0.78125
2018-06-26T11:09:33.941311: step 600, loss 0.598296, acc 0.7

Evaluation:
2018-06-26T11:09:34.458230: step 600, loss 0.662734, acc 0.615385

Saved model checkpoint to /Users/jostine.ho/Desktop/cnn4text/runs/1530029275/checkpoints/model-600

2018-06-26T11:09:34.780022: step 601, loss 0.

2018-06-26T11:09:53.604388: step 711, loss 0.496462, acc 0.765625
2018-06-26T11:09:53.748139: step 712, loss 0.443644, acc 0.796875
2018-06-26T11:09:53.922622: step 713, loss 0.397637, acc 0.828125
2018-06-26T11:09:54.111181: step 714, loss 0.414376, acc 0.828125
2018-06-26T11:09:54.281013: step 715, loss 0.514129, acc 0.71875
2018-06-26T11:09:54.433946: step 716, loss 0.406706, acc 0.78125
2018-06-26T11:09:54.587665: step 717, loss 0.381912, acc 0.859375
2018-06-26T11:09:54.751128: step 718, loss 0.421749, acc 0.796875
2018-06-26T11:09:54.898635: step 719, loss 0.524465, acc 0.671875
2018-06-26T11:09:55.068804: step 720, loss 0.507901, acc 0.765625
2018-06-26T11:09:55.262112: step 721, loss 0.496809, acc 0.671875
2018-06-26T11:09:55.437939: step 722, loss 0.468804, acc 0.765625
2018-06-26T11:09:55.588843: step 723, loss 0.483107, acc 0.765625
2018-06-26T11:09:55.745254: step 724, loss 0.468673, acc 0.765625
2018-06-26T11:09:55.910984: step 725, loss 0.579229, acc 0.6875
2018-06-26T11:

KeyboardInterrupt: 

### 5. Evaluation code


In [None]:
#! /usr/bin/env python

import tensorflow as tf
import numpy as np
import os
import time
import datetime
import data_helpers
from text_cnn import TextCNN
from tensorflow.contrib import learn
import csv

# Parameters
# ==================================================

# Data Parameters
tf.flags.DEFINE_string("positive_data_file", "./data/rt-polaritydata/rt-polarity.pos", "Data source for the positive data.")
tf.flags.DEFINE_string("negative_data_file", "./data/rt-polaritydata/rt-polarity.neg", "Data source for the negative data.")

# Eval Parameters
tf.flags.DEFINE_integer("batch_size", 64, "Batch Size (default: 64)")
tf.flags.DEFINE_string("checkpoint_dir", "", "Checkpoint directory from training run")
tf.flags.DEFINE_boolean("eval_train", False, "Evaluate on all training data")

# Misc Parameters
tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement")
tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices")


FLAGS = tf.flags.FLAGS
FLAGS._parse_flags()
print("\nParameters:")
for attr, value in sorted(FLAGS.__flags.items()):
    print("{}={}".format(attr.upper(), value))
print("")

# CHANGE THIS: Load data. Load your own data here
if FLAGS.eval_train:
    x_raw, y_test = data_helpers.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)
    y_test = np.argmax(y_test, axis=1)
else:
    x_raw = ["a masterpiece four years in the making", "everything is off."]
    y_test = [1, 0]

# # Map data into vocabulary
# vocab_path = os.path.join(FLAGS.checkpoint_dir, "..", "vocab")
# vocab_processor = learn.preprocessing.VocabularyProcessor.restore(vocab_path)
# x_test = np.array(list(vocab_processor.transform(x_raw)))

# print("\nEvaluating...\n")

# # Evaluation
# # ==================================================
# checkpoint_file = tf.train.latest_checkpoint(FLAGS.checkpoint_dir)
# graph = tf.Graph()
# with graph.as_default():
#     session_conf = tf.ConfigProto(
#       allow_soft_placement=FLAGS.allow_soft_placement,
#       log_device_placement=FLAGS.log_device_placement)
#     sess = tf.Session(config=session_conf)
#     with sess.as_default():
#         # Load the saved meta graph and restore variables
#         saver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file))
#         saver.restore(sess, checkpoint_file)

#         # Get the placeholders from the graph by name
#         input_x = graph.get_operation_by_name("input_x").outputs[0]
#         # input_y = graph.get_operation_by_name("input_y").outputs[0]
#         dropout_keep_prob = graph.get_operation_by_name("dropout_keep_prob").outputs[0]

#         # Tensors we want to evaluate
#         predictions = graph.get_operation_by_name("output/predictions").outputs[0]

#         # Generate batches for one epoch
#         batches = data_helpers.batch_iter(list(x_test), FLAGS.batch_size, 1, shuffle=False)

#         # Collect the predictions here
#         all_predictions = []

#         for x_test_batch in batches:
#             batch_predictions = sess.run(predictions, {input_x: x_test_batch, dropout_keep_prob: 1.0})
#             all_predictions = np.concatenate([all_predictions, batch_predictions])

# # Print accuracy if y_test is defined
# if y_test is not None:
#     correct_predictions = float(sum(all_predictions == y_test))
#     print("Total number of test examples: {}".format(len(y_test)))
#     print("Accuracy: {:g}".format(correct_predictions/float(len(y_test))))

# # Save the evaluation to a csv
# predictions_human_readable = np.column_stack((np.array(x_raw), all_predictions))
# out_path = os.path.join(FLAGS.checkpoint_dir, "..", "prediction.csv")
# print("Saving evaluation to {0}".format(out_path))
# with open(out_path, 'w') as f:
#     csv.writer(f).writerows(predictions_human_readable)

In [13]:
%pwd 

'/Users/jostine.ho/Desktop/cnn4text'

In [3]:
# Function to evaluate movie reviews
import tensorflow as tf
import numpy as np
import os
import time
import datetime
import data_helpers
from text_cnn import TextCNN
from tensorflow.contrib import learn
path = "/Users/jostine.ho/Desktop/cnn4text/runs/1530029275" 
        ## Make sure to change to your local path
    
def evaluate(x_raw, checkpoint_dir= path + "/checkpoints"):
    batch_size = 64
    vocab_path = os.path.join(path, "vocab")
    vocab_processor = learn.preprocessing.VocabularyProcessor.restore(vocab_path)
    x_test = np.array(list(vocab_processor.transform(x_raw)))

    print("\nEvaluating...\n")
    print(checkpoint_dir)
    checkpoint_file = tf.train.latest_checkpoint(checkpoint_dir)
    
    graph = tf.Graph()
    with graph.as_default():
        session_conf = tf.ConfigProto(
          allow_soft_placement=True,
          log_device_placement=False)
        sess = tf.Session(config=session_conf)
        with sess.as_default():
            # Load the saved meta graph and restore variables
            saver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file))
            saver.restore(sess, checkpoint_file)

            # Get the placeholders from the graph by name
            input_x = graph.get_operation_by_name("input_x").outputs[0]
            # input_y = graph.get_operation_by_name("input_y").outputs[0]
            dropout_keep_prob = graph.get_operation_by_name("dropout_keep_prob").outputs[0]

            # Tensors we want to evaluate
            predictions = graph.get_operation_by_name("output/predictions").outputs[0]

            # Generate batches for one epoch
            batches = data_helpers.batch_iter(list(x_test), batch_size, 1, shuffle=False)

            # Collect the predictions here
            all_predictions = []

            for x_test_batch in batches:
                batch_predictions = sess.run(predictions, {input_x: x_test_batch, dropout_keep_prob: 1.0})
                all_predictions = np.concatenate([all_predictions, batch_predictions])
    print(all_predictions)

In [4]:
evaluate(["Movie is awesome", "Not too bad", "Well, could have been much better", "It stinks!"])


Evaluating...

/Users/jostine.ho/Desktop/cnn4text/runs/1530029275/checkpoints
[1. 0. 0. 0.]


In [10]:
evaluate(["Movie is terrible", "I've seen better", "This movie is sick", "It stinks!"])


Evaluating...

INFO:tensorflow:Restoring parameters from /Users/mohamed.badawy/Documents/Projects/CNN4text_github/code/runs/1506109574/checkpoints/model-4800
[ 0.  0.  1.  0.]


In [7]:
# Reviews for "IT (2017)"
evaluate(["Doesn't cut very deep and isn't very scary", 
          "Creepy and yet emotionally resonant, It is a disturbing reboot of a scary '80s film character",
          "A spooky horror film and a stellar coming of age drama",
          "It only seems like fun if you fall for King's obvious, contemptuous treatment of American innocence",
          "I didn't get it, mediocre at best"])


Evaluating...

INFO:tensorflow:Restoring parameters from /Users/mohamed.badawy/Documents/Projects/CNN4text_github/code/runs/1506109574/checkpoints/model-4800
[ 1.  1.  1.  0.  0.]


#### References

1. Code source, tutorial, more details http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
2. https://github.com/dennybritz/cnn-text-classification-tf
3. [Convolutional Neural Networks for Sentence Classification (arxiv)](https://arxiv.org/abs/1408.5882)
4. http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
5. 


#### Image Sources:
Source: http://colah.github.io/posts/2014-07-Understanding-Convolutions/
