# Introduction to Natural Language Processing (NLP) in TensorFlow

### Word Embeddings

Word embeddings, or word vectors, provide a way of mapping words from a vocabulary into a low-dimensional space, where words with similar meanings are close together. Let's play around with a set of pre-trained word vectors, to get used to their properties. There exist many sets of pretrained word embeddings; here, we use ConceptNet Numberbatch, which provides a relatively small download in an easy-to-work-with format (h5).

In [1]:
# Download word vectors
from urllib.request import urlretrieve
import os
if not os.path.isfile('mini.h5'):
    print("Downloading Conceptnet Numberbatch word embeddings...")
    conceptnet_url = 'http://conceptnet.s3.amazonaws.com/precomputed-data/2016/numberbatch/17.06/mini.h5'
    urlretrieve(conceptnet_url, 'mini.h5')

To read an `h5` file, we'll need to use the `h5py` package. Below, we use the package to open the `mini.h5` file we just downloaded. We extract from the file a list of utf-8-encoded words, as well as their $300$-dimensional vectors.

In [2]:
# Load the file and pull out words and embeddings
import h5py
with h5py.File('mini.h5', 'r') as f:
    all_words = [word.decode('utf-8') for word in f['mat']['axis1'][:]]
    all_embeddings = f['mat']['block0_values'][:]
    
print("all_words dimensions: {0}".format(len(all_words)))
print("all_embeddings dimensions: {0}".format(all_embeddings.shape))

print(all_words[10000])

all_words dimensions: 362891
all_embeddings dimensions: (362891, 300)
/c/de/lande


Now, `all_words` is a list of $V$ strings (what we call our *vocabulary*), and `all_embeddings` is a $V \times 300$ matrix. The strings are of the form `/c/language_code/word`—for example, `/c/en/cat` and `/c/es/gato`.

We are interested only in the English words. We use Python list comprehensions to pull out the indices of the English words, then extract just the English words (stripping the six-character `/c/en/` prefix) and their embeddings.

In [3]:
# Restrict our vocabulary to just the English words
english_words = [word[6:] for word in all_words if word.startswith('/c/en/')]
english_word_indices = [i for i, word in enumerate(all_words) if word.startswith('/c/en/')]
english_embeddings = all_embeddings[english_word_indices]

print("all_words dimensions: {0}".format(len(english_words)))
print("all_embeddings dimensions: {0}".format(english_embeddings.shape))

print(english_words[10000])

all_words dimensions: 150875
all_embeddings dimensions: (150875, 300)
bajillion


We want to look up words easily, so we create a dictionary that maps us from a word to its index in the word embeddings matrix.

In [4]:
index = {word: i for i, word in enumerate(english_words)}

Each word has an associated 300-dimensional embedding, which we can directly show by calling the word embeddings with that dictionary word.

In [5]:
our_word='cat'
our_word_index=index['cat']
our_embedding=english_embeddings[our_word_index]
print(our_word,'has word index',our_word_index)
print(our_word, 'has word embedding',our_embedding)

cat has word index 21398
cat has word embedding [  0   0   2  -5   1   0  -1  -3   3  -2   2   1   4   0  -3   0  -1  -4
   0   1   0  -2   0  -2  -2   3   1  -2  -2   3  -2   1   0   0  -3   1
  -4   6  -1   2  -4   4  -3   0  -2  -3  -1  -5   5  -6   2   2  -3   0
  -1   4  -1   2   2   1   3   1   3  -1  -8  -3   2   0   1   4  -1  -5
   2   0  -5   1   0  -7  -4   2   3   4   3   3  -4   8   0   4  -2   0
   3   1  -4   0   3   0  -2   0  -1  -8   0   0   2   0   2   4   7   3
  -2   3  -3   3   2  -3  -5   0  -2   2   0  -3  10  -2   0  -3  -2  -5
   4   0   0   3   2  -1  -1   5  -4  -2   0  -2  -8   0  -4   4   9  -6
   5   4  -7  -1  -2  -3   2   4   0  -1  -6  -2  -4  -4  -2   0   4  -5
   0   2   0   4   1   3   7  -3   4   0   6   0 -12  -4   0  -2  -6   1
   6  -5   4   0   2  -6   2  -2   2   3  10   0   0   4   4  -2   0  -2
   0   3   1   0  -9   3   3  -6   0   0   0  -8   1   5   2  -2   0   1
  -3   4  -5  -5   3   1   4  -4   0   2  -2  -1  -4  -1   6  -2   3   3
  -

The magnitude of a word vector is less important than its direction; the magnitude can be thought of as representing frequency of use, independent of the semantics of the word. 
Here, we will be interested in semantics, so we *normalize* our vectors, dividing each by its length. 
The result is that all of our word vectors are length 1, and as such, lie on a unit circle. 
The dot product of two vectors is proportional to the cosine of the angle between them, and provides a measure of similarity (the bigger the cosine, the smaller the angle).

<img src="Figures/cosine_similarity.png" alt="cosine" style="width: 500px;"/>
<center>Figure adapted from *[Mastering Machine Learning with Spark 2.x](https://www.safaribooksonline.com/library/view/mastering-machine-learning/9781785283451/ba8bef27-953e-42a4-8180-cea152af8118.xhtml)*</center>

In [6]:
import numpy as np

norms = np.linalg.norm(english_embeddings, axis=1)
normalized_embeddings = english_embeddings.astype('float32') / norms.astype('float32').reshape([-1, 1])

Now we are ready to measure the similarity between pairs of words. We use numpy to take dot products.

In [7]:
def similarity_score(w1, w2):
    score = np.dot(normalized_embeddings[index[w1], :], normalized_embeddings[index[w2], :])
    return score

In [8]:
# A word is as similar with itself as possible:
print('cat\tcat\t', similarity_score('cat', 'cat'))

# Closely related words still get high scores:
print('cat\tfeline\t', similarity_score('cat', 'feline'))
print('cat\tdog\t', similarity_score('cat', 'dog'))

# Unrelated words, not so much
print('cat\tmoo\t', similarity_score('cat', 'moo'))
print('cat\tfreeze\t', similarity_score('cat', 'freeze'))

# Antonyms are still considered related, sometimes more so than synonyms
print('antonyms\topposites\t', similarity_score('antonym', 'opposite'))
print('antonyms\tsynonyms\t', similarity_score('antonym', 'synonym'))

cat	cat	 1.0
cat	feline	 0.8199548
cat	dog	 0.590724
cat	moo	 0.0039538248
cat	freeze	 -0.030225184
antonyms	opposites	 0.3941065
antonyms	synonyms	 0.46883982


We can also find, for instance, the most similar words to a given word.

In [9]:
def closest_to_vector(v, n):
    all_scores = np.dot(normalized_embeddings, v)
    best_words = map(lambda i: english_words[i], reversed(np.argsort(all_scores)))
    return [next(best_words) for _ in range(n)]

def most_similar(w, n):
    return closest_to_vector(normalized_embeddings[index[w], :], n)

In [10]:
print(most_similar('cat', 10))
print(most_similar('dog', 10))
print(most_similar('duke', 10))

['cat', 'humane_society', 'kitten', 'feline', 'colocolo', 'cats', 'kitty', 'maine_coon', 'housecat', 'sharp_teeth']
['dog', 'dogs', 'wire_haired_dachshund', 'doggy_paddle', 'lhasa_apso', 'good_friend', 'puppy_dog', 'bichon_frise', 'woof_woof', 'golden_retrievers']
['duke', 'dukes', 'duchess', 'duchesses', 'ducal', 'dukedom', 'duchy', 'voivode', 'princes', 'prince']


**Can you explain the following similarity scores?**

In [11]:
similarity_score("sit", "sits")

0.8478777

In [12]:
similarity_score("want", "wants")

0.858501

In [13]:
similarity_score("sleep", "sleeps")

0.8664926

In [14]:
similarity_score("leave", "leaves")

0.42647985

**Can you find a polysemous word — a word with multiple meanings — so that the list of the top 10 most related words contains words that aren't themselves related to one another?**

In [15]:
print(most_similar('leave', 10))

['leave', 'leaving', 'come_away', 'depart', 'beleave', 'go_forth', 'quit', 'go_away', 'vacate', 'departing']


In [16]:
print(most_similar('leaves', 10))

['leaves', 'leaf', 'foliage', 'banana_leaf', 'leaflike', 'leaved', 'betel_leaf', 'thai_basil', 'oak_trees', 'aspidistra']


In [17]:
print(most_similar('solution', 10))

['solution', 'solutions', 'aqueous_solution', 'subproblem', 'solve', 'aqueous_phase', 'virtuous_circle', 'solves', 'resolvent', 'solving']


We can also use `closest_to_vector` to find words "nearby" vectors that we create ourselves. This allows us to solve analogies. For example, in order to solve the analogy "man : brother :: woman : ?", we can compute a new vector `brother - man + woman`: the meaning of brother, minus the meaning of man, plus the meaning of woman. We can then ask which words are closest, in the embedding space, to that new vector.

In [18]:
def solve_analogy(a1, b1, a2):
    b2 = normalized_embeddings[index[b1], :] - normalized_embeddings[index[a1], :] + normalized_embeddings[index[a2], :]
    return closest_to_vector(b2, 5)
def print_analogy(a1, b1,a2):
    closest_words=solve_analogy(a1,b1,a2)
    print("{0}:{1} as {2}:?".format(a1,b1,a2))
    print("Best guesses are: {}".format(closest_words))
    return None

In [19]:
print_analogy("man", "brother", "woman")
print_analogy("man", "husband", "woman")
print_analogy("spain", "madrid", "france")
print_analogy("dog", "golden_retriever", "cat")

man:brother as woman:?
Best guesses are: ['sister', 'brother', 'sisters', 'kid_sister', 'younger_brother']
man:husband as woman:?
Best guesses are: ['wife', 'husband', 'husbands', 'spouse', 'wifes']
spain:madrid as france:?
Best guesses are: ['paris', 'france', 'le_havre', 'in_france', 'montmartre']
dog:golden_retriever as cat:?
Best guesses are: ['cat', 'maine_coon', 'kitten', 'tabby', 'kitty']


**Note that the new vector, `b2`, is not normalized. Does this matter?  Why or why not?**

These three results are quite good, but in general, the results of these analogies can be disappointing. Try experimenting with other analogies, and see if you can think of ways to get around the problems you notice (i.e., modifications to the solve_analogy algorithm).

### Using word embeddings in deep models
Word embeddings are fun to play around with, but their primary use is that they allow us to think of words as existing in a continuous, Euclidean space; we can then use an existing arsenal of techniques for machine learning with continuous numerical data (like logistic regression or neural networks) to process text.

Let's take a look at an especially simple version of this. We'll perform *sentiment analysis* on a set of movie reviews: in particular, we will attempt to classify a movie review as positive or negative based on its text.

We will use a simplified version of [Simple Word Embedding Model](http://people.ee.duke.edu/~lcarin/acl2018_swem.pdf) (SWEM, Shen et al. 2018) to do so. We will represent a review as the *mean* of the embeddings of the words in the review (SWEM would also update the word embeddings). Then we'll train a three-layer MLP (a neural network) to classify the review as positive or negative.

Download the `movie-simple.txt` file from Google Classroom into this directory. Each line of that file contains 

1. the numeral 0 (for negative) or the numeral 1 (for positive), followed by
2. a tab (the whitespace character), and then
3. the review itself.

In [20]:
import string
remove_punct=str.maketrans('','',string.punctuation)

# This function converts a line of our data file into
# a tuple (x, y), where x is 300-dimensional representation
# of the words in a review, and y is its label.
def convert_line_to_example(line):
    # Pull out the first character: that's our label (0 or 1)
    y = int(line[0])
    
    # Split the line into words using Python's split() function
    words = line[2:].translate(remove_punct).lower().split()
    
    # Look up the embeddings of each word, ignoring words not
    # in our pretrained vocabulary.
    embeddings = [normalized_embeddings[index[w]] for w in words
                  if w in index]
    
    # Take the mean of the embeddings
    x = np.mean(np.vstack(embeddings), axis=0)
    return {'x': x, 'y': y}

# Apply the function to each line in the file.
with open("Resources/movie-simple.txt", "r", encoding='utf-8', errors='ignore') as f:
    dataset = [convert_line_to_example(l) for l in f.readlines()]

In [21]:
len(dataset)

1411

Now that we have a dataset, let's shuffle it and do a train/test split. We use a quarter of the dataset for testing, 3/4 for training (but also ensure that we have a whole number of batches in our training set, to make the code nicer later).

In [22]:
import random
random.shuffle(dataset)

batch_size = 100
total_batches = len(dataset) // batch_size
train_batches = 3*total_batches // 4 
train, test = dataset[:train_batches*batch_size], dataset[train_batches*batch_size:]

Time to build our MLP in Tensorflow. We'll use placeholders for `X` and `y` as usual.

In [23]:
import tensorflow as tf

# Placeholders for input
X = tf.placeholder(tf.float32, [None, 300])
y = tf.placeholder(tf.float32, [None, 1])

# Three-layer MLP
h1 = tf.keras.layers.Dense(100, activation='relu')(X)
h2 = tf.keras.layers.Dense(20, activation='relu')(h1)
logits = tf.keras.layers.Dense(1)(h2)
probabilities = tf.sigmoid(logits)

# Loss and metrics
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=y))
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(probabilities), y), tf.float32))

# Training
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

# Initialization of variables
init_op = tf.global_variables_initializer()

W0707 14:08:54.968827 4324869568 deprecation.py:506] From /anaconda3/envs/tf1/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0707 14:08:55.023981 4324869568 deprecation.py:323] From /anaconda3/envs/tf1/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


We can now begin a session and train our model. We'll train for 250 epochs. When we're finished, we'll evaluate our accuracy on all the test data.

In [24]:
# Train
sess = tf.Session()
sess.run(init_op)

for epoch in range(250):
    for batch in range(train_batches):
        data = train[batch*batch_size:(batch+1)*batch_size]
        reviews = [sample['x'] for sample in data]
        labels  = [sample['y'] for sample in data]
        labels = np.array(labels).reshape([-1,1])
        
        _, l, acc = sess.run([train_step, loss, accuracy], feed_dict={X: reviews, y: labels})
        
    if epoch % 10 == 0:
        print("Epoch: {0} \t Loss: {1} \t Acc: {2}".format(epoch, l, acc))
    
    random.shuffle(train)
        
# Evaluate on test set
test_reviews = [sample['x'] for sample in test]
test_labels  = [sample['y'] for sample in test]
test_labels  = np.array(test_labels).reshape([-1, 1])

acc = sess.run(accuracy, feed_dict={X: test_reviews, y: test_labels})
print("Final accuracy: {0}".format(acc))

Epoch: 0 	 Loss: 0.694661021232605 	 Acc: 0.46000000834465027
Epoch: 10 	 Loss: 0.6678881645202637 	 Acc: 0.6000000238418579
Epoch: 20 	 Loss: 0.667625904083252 	 Acc: 0.5899999737739563
Epoch: 30 	 Loss: 0.6497185230255127 	 Acc: 0.6000000238418579
Epoch: 40 	 Loss: 0.6203657388687134 	 Acc: 0.699999988079071
Epoch: 50 	 Loss: 0.6083799600601196 	 Acc: 0.6700000166893005
Epoch: 60 	 Loss: 0.5409393906593323 	 Acc: 0.800000011920929
Epoch: 70 	 Loss: 0.4703376889228821 	 Acc: 0.8799999952316284
Epoch: 80 	 Loss: 0.4446973502635956 	 Acc: 0.8899999856948853
Epoch: 90 	 Loss: 0.3350175619125366 	 Acc: 0.9300000071525574
Epoch: 100 	 Loss: 0.2757123112678528 	 Acc: 0.9100000262260437
Epoch: 110 	 Loss: 0.2119666039943695 	 Acc: 0.9599999785423279
Epoch: 120 	 Loss: 0.22098800539970398 	 Acc: 0.9200000166893005
Epoch: 130 	 Loss: 0.20852069556713104 	 Acc: 0.9200000166893005
Epoch: 140 	 Loss: 0.18307356536388397 	 Acc: 0.9399999976158142
Epoch: 150 	 Loss: 0.13907337188720703 	 Acc: 0.980

We can now examine what our model has learned, seeing how it responds to word vectors for different words:

In [25]:
# Check some words
words_to_test = ["exciting", "hated", "boring", "loved", "extremely", "rather", "quite"]

for word in words_to_test:
    print(word, sess.run(probabilities, feed_dict={X: normalized_embeddings[index[word]].reshape(1, 300)}))

exciting [[0.99996054]]
hated [[0.]]
boring [[9.049776e-07]]
loved [[0.99999857]]
extremely [[0.5834872]]
rather [[0.04839796]]
quite [[0.98400116]]


Try some words of your own!

In [26]:
sess.close()

This model works great for such a simple dataset, but does a little less well on something more complex. `movie-pang02.txt`, for instance, has 2000 longer, more complex movie reviews. It's in the same format as our simple dataset. On those longer reviews, this model achieves only 60-80% accuracy. (Increasing the number of epochs to, say, 1000, does help.)

To show this in practice, this is a condensed version of the same code above working on this second dataset.

In [27]:
# Apply the function to each line in the file.
with open("Resources/movie-pang02.txt", "r",encoding='utf-8') as f:
    dataset = [convert_line_to_example(l) for l in f.readlines()]
import random
random.shuffle(dataset)
batch_size = 100
total_batches = len(dataset) // batch_size
train_batches = 3 * total_batches // 4
train, test = dataset[:train_batches*batch_size], dataset[train_batches*batch_size:]
sess = tf.Session()
initialize_all = tf.global_variables_initializer() 
sess.run(initialize_all)
for epoch in range(250):
    for batch in range(train_batches):
        data = train[batch*batch_size:(batch+1)*batch_size]
        reviews = [sample['x'] for sample in data]
        labels  = [sample['y'] for sample in data]
        labels = np.array(labels).reshape([-1, 1])
        _, l, acc = sess.run([train_step, loss, accuracy], feed_dict={X: reviews, y: labels})
    if epoch % 10 == 0:
        print("Epoch", epoch, "Loss", l, "Acc", acc)
    random.shuffle(train)

# Evaluate on test set
test_reviews = [sample['x'] for sample in test]
test_labels  = [sample['y'] for sample in test]
test_labels = np.array(test_labels).reshape([-1, 1])
acc = sess.run(accuracy, feed_dict={X: test_reviews, y: test_labels})
print("Final accuracy:", acc)
sess.close()

Epoch 0 Loss 0.69329107 Acc 0.5
Epoch 10 Loss 0.69512314 Acc 0.43
Epoch 20 Loss 0.6928453 Acc 0.5
Epoch 30 Loss 0.6915968 Acc 0.64
Epoch 40 Loss 0.6911556 Acc 0.58
Epoch 50 Loss 0.69427454 Acc 0.45
Epoch 60 Loss 0.6915856 Acc 0.54
Epoch 70 Loss 0.6909847 Acc 0.56
Epoch 80 Loss 0.6919686 Acc 0.5
Epoch 90 Loss 0.69048 Acc 0.72
Epoch 100 Loss 0.69125223 Acc 0.58
Epoch 110 Loss 0.69396216 Acc 0.41
Epoch 120 Loss 0.689751 Acc 0.55
Epoch 130 Loss 0.6901834 Acc 0.63
Epoch 140 Loss 0.68938124 Acc 0.65
Epoch 150 Loss 0.6890285 Acc 0.7
Epoch 160 Loss 0.6888389 Acc 0.56
Epoch 170 Loss 0.6890057 Acc 0.6
Epoch 180 Loss 0.68678284 Acc 0.72
Epoch 190 Loss 0.6907995 Acc 0.56
Epoch 200 Loss 0.6856157 Acc 0.62
Epoch 210 Loss 0.6854085 Acc 0.74
Epoch 220 Loss 0.6849405 Acc 0.73
Epoch 230 Loss 0.689507 Acc 0.57
Epoch 240 Loss 0.6877799 Acc 0.56
Final accuracy: 0.686


We will not plan to go through Recurrent Neural Networks in detail, but below is some example code on RNNs to provide some guidance.

### Recurrent Neural Networks (RNNs)

In the context of deep learning, natural language is commonly modeled with Recurrent Neural Networks (RNNs).
RNNs pass the output of a neuron back to the input of the next time step of the same neuron.
These directed cycles in the RNN architecture gives them the ability to model temporal dynamics, making them particularly suited for modeling sequences (e.g. text).
We can visualize an RNN layer as follows:

<img src="Figures/basic_RNN.PNG" alt="basic_RNN" style="width: 80px;"/>
<center>Figure from *Understanding LSTMs*. https://colah.github.io/posts/2015-08-Understanding-LSTMs/</center>

We can unroll an RNN through time, making the sequence aspect of them more obvious:

<img src="Figures/unrolled_RNN.PNG" alt="basic_RNN" style="width: 400px;"/>
<center>Figure from *Understanding LSTMs*. https://colah.github.io/posts/2015-08-Understanding-LSTMs/</center>

#### RNNs in TensorFlow
How would we implement an RNN in TensorFlow? Given the different forms of RNNs, there are quite a few ways, but we'll stick to a simple one. 

When we are dealing with a Recurrent Neural Network, we can have each word be a separate input to the network.  Given our word embeddings, that will be given by a matrix. The preprocessing pipeline will be slightly different than it was before.

In [28]:
# This function converts a line of our data file into
# a tuple (x, y, w), where x is 300-dimensional representation
# of the words in a review,  y is its label, and w is the concatenated word embeddings
def convert_line_to_example_rnn(line):
    # Pull out the first character: that's our label (0 or 1)
    y = int(line[0])
    # Split the line into words using Python's split() function
    words = line[2:].translate(remove_punct).lower().split()
    # Look up the embeddings of each word, ignoring words not
    # in our pretrained vocabulary.
    embeddings = [normalized_embeddings[index[w]] for w in words
                  if w in index]
    # Take the mean of the embeddings
    x = np.mean(np.vstack(embeddings), axis=0)
    return {'x': x, 'y': y, 'w':embeddings, 'text':line[2:]}

In [29]:
with open("Resources/movie-simple.txt", "r") as f:
    dataset = [convert_line_to_example_rnn(l) for l in f.readlines()]
import random
random.seed(200)
random.shuffle(dataset)
batch_size = 1
total_batches = len(dataset) // batch_size
train_batches = 3 * total_batches // 4
train, test = dataset[:train_batches*batch_size], dataset[train_batches*batch_size:]

In [30]:
print(train[0]['text'])
print(train[0]['y'])
print(train[0]['x'].shape)
print(np.array(train[0]['w']).reshape([1,-1,300]).shape)

The DaVinci Code and Mission Impossible 3 are AWESOME.

1
(300,)
(1, 9, 300)


Applying an RNN to the text reviews, starting with the easier data.

In [31]:
tf.reset_default_graph()
# sizes
n_steps = None
n_inputs = 300
n_neurons = 5
# Build RNN
X= tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y= tf.placeholder(tf.float32, [None, 1])
basic_cell = tf.contrib.rnn.BasicRNNCell(n_neurons,activation=tf.nn.tanh)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
last_cell_output=outputs[:,-1,:]
y_=tf.layers.dense(last_cell_output,1)

W0707 14:09:13.951710 4324869568 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0707 14:09:13.952934 4324869568 deprecation.py:323] From <ipython-input-31-6895aaacaaf8>:9: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.
W0707 14:09:13.954792 4324869568 deprecation.py:323] From <ipython-input-31-6895aaacaaf8>:10: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `ker

In [32]:
# Loss and metrics
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_, labels=y))
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(tf.sigmoid(y_)), y), tf.float32))

# Training
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)

In [33]:
initialize_all = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(initialize_all)
for epoch in range(10):
    for batch in range(train_batches):
        data = train[batch*batch_size:(batch+1)*batch_size]
        reviews = np.array([sample['w'] for sample in data]).reshape([batch_size,-1,300])
        labels  = np.array([sample['y'] for sample in data]).reshape([batch_size,1])
        labels = np.array(labels).reshape([-1, 1])
        _, l, acc = sess.run([train_step, loss, accuracy], feed_dict={X: reviews, y: labels})
        if (batch+1) % 500 == 0:
            print("batch", batch, "Loss", l, "Acc", acc)
    if epoch % 1 == 0:
        print("Epoch", epoch, "Loss", l, "Acc", acc)
    random.shuffle(train)


batch 499 Loss 0.71062994 Acc 0.0
batch 999 Loss 0.25448325 Acc 1.0
Epoch 0 Loss 0.19355781 Acc 1.0
batch 499 Loss 0.10916404 Acc 1.0
batch 999 Loss 0.06045605 Acc 1.0
Epoch 1 Loss 0.08459929 Acc 1.0
batch 499 Loss 0.04655488 Acc 1.0
batch 999 Loss 0.08473836 Acc 1.0
Epoch 2 Loss 0.691367 Acc 1.0
batch 499 Loss 0.011898962 Acc 1.0
batch 999 Loss 0.029992454 Acc 1.0
Epoch 3 Loss 1.4959812 Acc 0.0
batch 499 Loss 0.004806321 Acc 1.0
batch 999 Loss 0.018642053 Acc 1.0
Epoch 4 Loss 0.07946477 Acc 1.0
batch 499 Loss 0.042612962 Acc 1.0
batch 999 Loss 0.026600048 Acc 1.0
Epoch 5 Loss 0.01851268 Acc 1.0
batch 499 Loss 0.42374453 Acc 1.0
batch 999 Loss 0.06636396 Acc 1.0
Epoch 6 Loss 0.0114364475 Acc 1.0
batch 499 Loss 0.071039006 Acc 1.0
batch 999 Loss 2.0734136 Acc 0.0
Epoch 7 Loss 0.012908939 Acc 1.0
batch 499 Loss 0.53614306 Acc 1.0
batch 999 Loss 0.0009447867 Acc 1.0
Epoch 8 Loss 0.0026357113 Acc 1.0
batch 499 Loss 0.007646476 Acc 1.0
batch 999 Loss 4.2392597 Acc 0.0
Epoch 9 Loss 0.0016768

In [34]:
# Evaluate on test set
test_acc=0
n=0
for sample in test:
    test_reviews = np.array([sample['w'] ]).reshape([1,-1,300])
    test_labels  = np.array([sample['y']]).reshape([1,1])
    test_labels = np.array(test_labels).reshape([-1, 1])
    test_acc += sess.run(accuracy, feed_dict={X: test_reviews, y: test_labels})
    n+=1
acc=test_acc/n 
print("Final accuracy:", acc)

Final accuracy: 0.9546742209631728


In [35]:
sess.close()

Switching to an LSTM-based neural network is fairly easy in code.

In [36]:
tf.reset_default_graph()
# sizes
n_steps = None
n_inputs = 300
n_neurons = 5
# Build RNN
X= tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y= tf.placeholder(tf.float32, [None, 1])
basic_cell = tf.contrib.rnn.LSTMCell(n_neurons,activation=tf.nn.tanh)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
last_cell_output=outputs[:,-1,:]
y_=tf.layers.dense(last_cell_output,1)

W0707 14:09:45.469296 4324869568 deprecation.py:323] From <ipython-input-36-17e30b7b2cf4>:9: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.


In [37]:
# Loss and metrics
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_, labels=y))
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(tf.sigmoid(y_)), y), tf.float32))

# Training
train_step = tf.train.AdamOptimizer(0.0005).minimize(loss)

In [38]:
initialize_all = tf.global_variables_initializer()
sess = tf.Session()
sess.run(initialize_all)
for epoch in range(10):
    for batch in range(train_batches):
        data = train[batch*batch_size:(batch+1)*batch_size]
        reviews = np.array([sample['w'] for sample in data]).reshape([batch_size,-1,300])
        labels  = np.array([sample['y'] for sample in data]).reshape([batch_size,1])
        labels = np.array(labels).reshape([-1, 1])
        _, l, acc = sess.run([train_step, loss, accuracy], feed_dict={X: reviews, y: labels})
        if (batch+1) % 500 == 0:
            print("batch", batch, "Loss", l, "Acc", acc)
    if epoch % 1 == 0:
        print("Epoch", epoch, "Loss", l, "Acc", acc)
    random.shuffle(train)

batch 499 Loss 0.5775424 Acc 1.0
batch 999 Loss 0.24180458 Acc 1.0
Epoch 0 Loss 0.24794434 Acc 1.0
batch 499 Loss 0.113387376 Acc 1.0
batch 999 Loss 0.068844244 Acc 1.0
Epoch 1 Loss 0.04553669 Acc 1.0
batch 499 Loss 0.10069063 Acc 1.0
batch 999 Loss 0.05680757 Acc 1.0
Epoch 2 Loss 0.07640334 Acc 1.0
batch 499 Loss 0.0638089 Acc 1.0
batch 999 Loss 0.039913934 Acc 1.0
Epoch 3 Loss 0.13318259 Acc 1.0
batch 499 Loss 0.11254047 Acc 1.0
batch 999 Loss 0.011101064 Acc 1.0
Epoch 4 Loss 0.060186535 Acc 1.0
batch 499 Loss 0.37860316 Acc 1.0
batch 999 Loss 0.024089823 Acc 1.0
Epoch 5 Loss 0.006876581 Acc 1.0
batch 499 Loss 0.027292065 Acc 1.0
batch 999 Loss 0.011620495 Acc 1.0
Epoch 6 Loss 0.0006780038 Acc 1.0
batch 499 Loss 0.03335786 Acc 1.0
batch 999 Loss 0.006775183 Acc 1.0
Epoch 7 Loss 0.0286195 Acc 1.0
batch 499 Loss 0.0084675625 Acc 1.0
batch 999 Loss 0.0048526647 Acc 1.0
Epoch 8 Loss 0.0012415291 Acc 1.0
batch 499 Loss 0.012337194 Acc 1.0
batch 999 Loss 0.0023873444 Acc 1.0
Epoch 9 Loss 0

In [39]:
# Evaluate on test set
test_acc=0
n=0
for sample in test:
    test_reviews = np.array([sample['w'] ]).reshape([1,-1,300])
    test_labels  = np.array([sample['y']]).reshape([1,1])
    test_labels = np.array(test_labels).reshape([-1, 1])
    test_acc += sess.run(accuracy, feed_dict={X: test_reviews, y: test_labels})
    n+=1
acc=test_acc/n 
print("Final accuracy:", acc)

Final accuracy: 0.9858356940509915


In [40]:
sess.close()

Finally, swapping out to a more complex dataset.

In [41]:
with open("Resources/movie-pang02.txt", "r") as f:
    dataset = [convert_line_to_example_rnn(l) for l in f.readlines()]
import random
random.seed(42)
random.shuffle(dataset)
batch_size = 1
total_batches = len(dataset) // batch_size
train_batches = 3 * total_batches // 4
train, test = dataset[:train_batches*batch_size], dataset[train_batches*batch_size:]

By examining one of the reviews, we can see that these are much more complex reviews, and requiring greater nuance to analyze.

In [42]:
print(train[0]['text'])
print(train[0]['y'])
print(train[0]['x'].shape)
print(np.array(train[0]['w']).reshape([1,-1,300]).shape)

 the central focus of michael winterbottom s   welcome to sarajevo   is sarajevo itself   the city under siege   and its different effect on the characters unfortunate enough to be stuck there    it proves the backdrop for a stunningly realized story which refreshingly strays from mythic portents     platoon       racial tumultuosness   the risible   the walking dead     or a tinge of schmaltziness     schindler s list        the two leads   stephen dillane as a reporter and emira nusevic as an orphan with a plight few can identify with   are extremely believable   not one moment with them involved rings false    the question is not what went right    the question is what went wrong    for one   the film fails to provide a political overview of the war as it progresses   the dillane characters reports an american plane departing from sarajevo as it departs   and that s about it        the assortment of high profile supporting actors   ranging from woody harrelson as a yankee reporter  

In [43]:
len(train)

1500

We can retrain on the same LSTM that we were using before. Note that this is _much_ slower because the RNN now has to deal with much longer sequences.

In [44]:
initialize_all = tf.global_variables_initializer()
sess = tf.Session()
sess.run(initialize_all)
for epoch in range(10):
    for batch in range(train_batches):
        data = train[batch*batch_size:(batch+1)*batch_size]
        reviews = np.array([sample['w'] for sample in data]).reshape([batch_size,-1,300])
        labels  = np.array([sample['y'] for sample in data]).reshape([batch_size,1])
        labels = np.array(labels).reshape([-1, 1])
        _, l, acc = sess.run([train_step, loss, accuracy], feed_dict={X: reviews, y: labels})
        if (batch+1) % 500 == 0:
            print("batch", batch, "Loss", l, "Acc", acc)
    if epoch % 1 == 0:
        print("Epoch", epoch, "Loss", l, "Acc", acc)
    random.shuffle(train)

batch 499 Loss 0.60423577 Acc 1.0
batch 999 Loss 0.654327 Acc 1.0
batch 1499 Loss 0.6145861 Acc 1.0
Epoch 0 Loss 0.6145861 Acc 1.0
batch 499 Loss 0.6395343 Acc 1.0
batch 999 Loss 0.6404175 Acc 1.0
batch 1499 Loss 0.52117807 Acc 1.0
Epoch 1 Loss 0.52117807 Acc 1.0
batch 499 Loss 0.91462994 Acc 0.0
batch 999 Loss 0.26005527 Acc 1.0
batch 1499 Loss 0.6265969 Acc 1.0
Epoch 2 Loss 0.6265969 Acc 1.0
batch 499 Loss 0.078632645 Acc 1.0
batch 999 Loss 1.0986214 Acc 0.0
batch 1499 Loss 0.38152325 Acc 1.0
Epoch 3 Loss 0.38152325 Acc 1.0
batch 499 Loss 0.19007088 Acc 1.0
batch 999 Loss 0.33761886 Acc 1.0
batch 1499 Loss 0.10387851 Acc 1.0
Epoch 4 Loss 0.10387851 Acc 1.0
batch 499 Loss 0.35062757 Acc 1.0
batch 999 Loss 0.52068454 Acc 1.0
batch 1499 Loss 0.34132633 Acc 1.0
Epoch 5 Loss 0.34132633 Acc 1.0
batch 499 Loss 0.2740763 Acc 1.0
batch 999 Loss 0.15343344 Acc 1.0
batch 1499 Loss 0.11593928 Acc 1.0
Epoch 6 Loss 0.11593928 Acc 1.0
batch 499 Loss 0.27533206 Acc 1.0
batch 999 Loss 0.30982345 Acc 

In [45]:
# Evaluate on test set
test_acc=0
n=0
for sample in test:
    test_reviews = np.array([sample['w'] ]).reshape([1,-1,300])
    test_labels  = np.array([sample['y']]).reshape([1,1])
    test_labels = np.array(test_labels).reshape([-1, 1])
    test_acc += sess.run(accuracy, feed_dict={X: test_reviews, y: test_labels})
    n+=1
acc=test_acc/n 
print("Final accuracy:", acc)

Final accuracy: 0.732


In [46]:
sess.close()

This final accuracy is not great, but it is _much higher_ than what we got with the MLP approach.  Playing around with the settings can properly improve this more (this really isn't that many epochs).