# Assignment 3 - Tensorflow

Implementing a Linear Classifier for polarity movie reviews.

See course homepage: http://stp.lingfil.uu.se/~nivre/master/ml.html

See assignment: http://stp.lingfil.uu.se/~shaooyan/ml18/Assignment3.pdf 

## Imports

In [1]:
import pandas as pd
import tensorflow as tf
import numpy as np

import matplotlib.pyplot as plt

import util
import collections
import time

## Prepare Dataset

In [223]:
# Type of features to use. This can be set to 'bigram' or 'unigram+bigram'
# to use bigram features instead of or in addition to unigram features.
# Not required for assignment.
feature_type = 'unigram'

data = util.load_movie_data('poldata.zip')

data.select_feature_type(feature_type)

# Split the data set randomly into training, validation and test sets.
training_data, val_data, test_data = data.train_val_test_split()

nfeatures = len(training_data.vocabulary)

# Convert the sparse indices into dense vectors
training_X = np.asarray(util.sparse_to_dense(training_data, nfeatures))
training_y = np.asarray(training_data.labels)

validation_X = np.asarray(util.sparse_to_dense(val_data, nfeatures))
validation_y = np.asarray(val_data.labels)

test_X = np.asarray(util.sparse_to_dense(test_data, nfeatures))
test_y = np.asarray(test_data.labels)

print("Number of features: %s" % nfeatures)

Number of features: 50920


## Convert to sparse data

It is a bit pointless to first create the dense data and then transform it into sparse data,
however I didn't want to mess with the internal data structures.

See https://www.tensorflow.org/api_docs/python/tf/SparseTensor for a documentation (SparseTensorValue is should be used outside of a Graph context). In the graph, a `sparse_placeholder` has to be used, as well as the `sparse_tensor_dense_matmul` instead of `matmul` to multiply the weights and the features (https://www.tensorflow.org/api_docs/python/tf/sparse_tensor_dense_matmul).

In [226]:
def generate_indices(dense_matrix):
    for r_i, row in enumerate(dense_matrix):
        for c_i in np.nditer(np.where(row == 1)):
            yield r_i, int(c_i)

def dense_to_sparse(dense_matrix):            
    indices = list(generate_indices(dense_matrix))
    values = [1 for _ in indices]

    return tf.SparseTensorValue(indices, values, dense_matrix.shape)

In [227]:
training_X_sparse = dense_to_sparse(training_X)
validation_X_sparse = dense_to_sparse(validation_X)
test_X_sparse = dense_to_sparse(test_X)

## Loss Functions (Task 7)

In [4]:
def logistic_loss(y, pred):
    y = tf.cast(y, tf.float32)
    pred = tf.cast(pred, tf.float32)
    return tf.reduce_mean(tf.log(1.0 + tf.exp(-y*pred)))

def hinge_loss(y, pred):
    y = tf.cast(y, tf.float32)
    pred = tf.cast(pred, tf.float32)
    return tf.losses.hinge_loss(y, pred)

def mse_loss(y, pred):
    y = tf.cast(y, tf.float32)
    pred = tf.cast(pred, tf.float32)
    return tf.losses.mean_squared_error(y, pred)

## Hyperparameters

In [261]:
# Regularisation strength
reg_lambda = 0.001

# Learning rate
learning_rate = 0.001

# Number of training iterations
niterations = 15

# Number of elements in one batch
batch_size = 512

# Loss function to use
loss_function = logistic_loss
loss_function_name = "logistic_loss"

# Type of regularisation to use (select one and comment out the other)
# regulariser = tf.contrib.layers.l2_regularizer(reg_lambda)
regulariser = tf.contrib.layers.l1_regularizer(reg_lambda)
regulariser_name = "l1"

# This should only be enabled once you've decided on a final set
# of hyperparameters
enable_test_set_scoring = False

## Building the Computational Graph

In [262]:
graph = tf.Graph()

with graph.as_default():
    with tf.variable_scope('classifier'):

        # Define the placeholder where we feed in the data
        features = tf.sparse_placeholder(tf.int32, [None, nfeatures],
                                         name='input_placeholder')

        labels = tf.placeholder(tf.int32, [None], name='labels_placeholder')
        
        # Define the weights of the classifier
        weights = tf.get_variable('weights', [nfeatures],
                                  initializer=tf.random_normal_initializer())
        
        # The bias is a scalar
        bias = tf.get_variable('bias', [], dtype=tf.float32,
                               initializer=tf.random_normal_initializer())

        # Two tensors must have same dtype and compatible shape for dot product
        features = tf.cast(features, tf.float32)
        exp_weights = tf.reshape(weights, [nfeatures, 1])

        # Compute dot product and predict
        logits = tf.sparse_tensor_dense_matmul(features, exp_weights) + bias

        # Reshape the result to a vector to remove the dimension
        # added to `exp_weights`.
        logits = tf.reshape(logits, [-1])
        
        labels = tf.cast(labels, tf.float32)
        
        # Multiply predictions and labels. When a cell is positive,
        # the guess was correct. If it was negative, it must be corrected.
        a = tf.multiply(logits, labels)
        
        # Create a vector 'mask' with the length of the batchs
        # that has a 1 whereever a predictions was wrong.
        mask = tf.map_fn(lambda x: tf.sign(tf.minimum(x, 0) * -1), a)
        
        # Now create new weights by multiplying labels and the features.
        # Also multiply the mask with the result to 'remove' all entries
        # for already correctly predicted instances.
        fm = tf.sparse_transpose(features) * mask * labels
        
        # Since sparse tensors dont support the 'reduce_mean' operation,
        # we have to simulate it by dividing by the number of instances.
        divider = tf.cast(tf.fill([nfeatures], batch_size), tf.float32)
        
        # Sum up the differences for all instances and add them to the weights.
        weights = weights + tf.sparse_reduce_sum(fm, 1) / divider 
        
        # Now also update the bias in a similar way.
        bias = bias + tf.reduce_mean(labels * mask)
        
        # Calculate the loss so we know if we are getting better.
        loss_ureg = loss_function(labels, logits)
        
        # Initialiser
        init = tf.global_variables_initializer()

graph.finalize()

## Training

In [263]:
# Define a training session and train the classifier
sess = tf.Session(graph=graph)

def predict(input_features):
    """Applies the classifier to the data and returns a list of predicted labels."""
    predictions = []
    pred = sess.run(logits, feed_dict={features: input_features})
    for x in pred:
        if x > 0:
            predictions.append(1.0)
        else:
            predictions.append(-1.0)
    return predictions

def accuracy(gold, hypothesis):
    """Computes an accuracy score given two vectors of labels."""
    assert len(gold) == len(hypothesis)
    return sum(g == h for g, h in zip(gold, hypothesis)) / len(gold)

# Before starting, initialize the variables. We will 'run' this first.
sess.run(init)

# Training iterations
print("Run %s iterations..." % niterations)

# Set up containers to collect logs
stats = {
    "training_loss_reg": [],
    "training_loss_unreg": [],
    "training_acc": [],
    "val_loss": [],
    "val_acc": []
}

training_log = []
for i in range(50):
    num_instances = training_X.shape[0]
    
    # Shuffle the data using a random numpy index array
    permutation = np.random.permutation(num_instances)
    shuffled_X = training_X[permutation]
    shuffled_y = training_y[permutation]
    
    # Now create batches for both
    for position in range(0, num_instances - 1, batch_size):
        # Make sure that the last batch has an apropriate size
        actual_batch_size = batch_size if position + batch_size < num_instances \
            else position + batch_size - num_instances
        
        batch_X = shuffled_X[position:position + actual_batch_size]
        batch_y = shuffled_y[position:position + actual_batch_size]
        
        # Create dense tensor for the features
        sparse_X = dense_to_sparse(batch_X)
        
        b, w, loss_val = sess.run(
            [bias, weights, loss_ureg],
            feed_dict={features: sparse_X, labels: batch_y})
#         print("LOSS", loss_val)

    training_predictions = predict(training_X_sparse)
    training_accuracy = accuracy(training_y, training_predictions)
    print(training_accuracy)
    
print(regulariser)
print('Training completed.')

Run 15 iterations...
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
0.5175
<function l1_regularizer.<locals>.l1 at 0x7f031b3c8730>
Training completed.


In [190]:
df = pd.DataFrame.from_dict(stats)

df.plot(subplots=True, layout=(-1, 2), figsize=(20, 25), sharex=False, sharey=False)

plt.savefig('graphs_%s.pdf' % time.time())

df

TypeError: Empty 'DataFrame': no numeric data to plot

## Evaluation

In [9]:
print('=====================')
print('MODEL CHARACTERISTICS')
print('=====================')
print()

# Display some useful statistics about the model and the training process.
title = 'Data set: %s - Regulariser: %g - Learning rate: %g' % (data.name, reg_lambda, learning_rate)

print()

final_weights = sess.run(weights)
final_bias = sess.run(bias)
util.show_stats(title, training_log, final_weights, final_bias, data.vocabulary,
                top_n=1, write_to_file="results.csv", configuration={
                    'reg_lambda': reg_lambda,
                    'learning_rate': learning_rate,
                    'loss_function': loss_function_name,
                    'regulariser': regulariser_name,
                    'niterations': niterations,
                    'val_accuracy': val_accuracy
                })


# util.create_plots(title, training_log, weights, log_keys=['training_loss_reg', 'val_loss'])

if enable_test_set_scoring:
    # Check the performance on the test set.
    test_loss = sess.run(loss, feed_dict={features: ds_test, labels:test_data.labels})
    test_predictions = predict(ds_test)
    test_accuracy = accuracy(test_data.labels, test_predictions)

    print()
    print('====================')
    print('TEST SET PERFORMANCE')
    print('====================')
    print()
    print('Test loss: %g' % test_loss)
    print('Test accuracy: %g' % test_accuracy)
    
sess.close()

MODEL CHARACTERISTICS


Data set: Movie reviews - Regulariser: 0.001 - Learning rate: 0.001

Best regularised training loss: 0.691499
Final regularised training loss: 0.691499
Best validation loss: 0.691958
Final validation loss: 0.691958

Number of weights: 50920
Bias: 2.77394e-05
Number of weights with magnitude > 0.01: 0

Top 1 positive features:
0.000562158	life

Top 1 negative features:
-0.000954029	bad


## Task 6

Run with standard parameters.

Top 10 positive words:

1. life
2. also
3. best
4. world
5. many
6. both
7. perfect
8. performances
9. very
10. great

Top 10 negative words:
bad
1. worst
2. plot
3. stupid
4. ?
5. boring
6. script
7. nothing
8. why
9. least
10. !