# Experimental results and comparisons

## Reported state-of-the-art results

<img src="./_doc/reported_exp.png" alt="table" style="width: 450px;"/>

The table above is from the paper *Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks*.

The best reported results known to me are (**Fine-grained: 52.1, Binary: 88.6**), as shown in *Ask Me Anything : Dynamic Memory Networks for Natural Language Processing*.

## Results in my experiments

Algorithm | Cell Size | Attn Size | Network Size | Fine-grained Acc. | Binary Acc.
--------- | --------- | --------- | ------------ | ----------------- | -----------
[LSTM baseline](#LSTM-Baseline) | 300 | - | 1,351,550 | 49.8 | 89.0
[LSTM + ATTN 1](#LSTM-+-ATTN) | 150 | 100 | 480,850 | 50.9 | 89.2
[LSTM + ATTN 2](#LSTM-+-ATTN-2) | 200 | 100 | 741,100 | 51.3 | 89.0
[LSTM + ATTN 3](#LSTM-+-ATTN-3) | 300 | 100 | 1,411,600 | 51.1 | 88.1
LSTM + ATTN + Layer2 | 200 | 100 | 1,542,200 | 49.2 | 89.1
LSTM + ATTN + Layer2 | 300 | 100 | 3,183,200 | 51.5 | 89.4
[RNN + ATTN](#RNN-+-ATTN) | 400 | 100 | 800,500 | 45.3 | 84.9
[RNN + ATTN + 2Layers](#RNN-+-ATTN-+-2Layers) | 400 | 150 | 1,801,100 | 47.1 | 86.7
GRU v1 + ATTN | 150 | 100 | 480,850 | 51.1 | 87.4
GRU v1 + ATTN | 400 | 100 | 2,282,100 | 51.9 | 89.4
GRU v1 + ATTN + Layer2 | 200 | 100 | 1,400,200 | 47.5 | 88.2
GRU v1 + ATTN + Layer2 | 300 | 100 | 2,823,200 | 48.6 | 86.5
GRU v1 + ATTN + Layer2 | 400 | 100 | 4,764,200 | 48.7 | 86.5
GRU v2 + ATTN | 400 | 100 | 1,841,700 | 50.1 | 88.5

## Summary

- The combination of Recursive LSTM and attention (LSTM+ATTN) mechanism has resulted in new state-of-the-art result on the binary Stanford Sentiment Treebank (accuracy improved from 88.6% to 89.2%.
- LSTM+ATTN not only has higher accuracy, but also achieve the accuracy with much fewer parameters.
- After using the attention mechanism, the basic recursive neural network is also improved to be a competitve algorithm.
- The attention mechanism provides interpretable results: we can know why a sentence is classified as positive or negative, by inspecting the most important substrings. For example in the example [here](#A-positive-example), the sentence "The movie 's ripe, enrapturing beauty will tempt those willing to probe its inscrutable mysteries." is classified as positive because:
    - It detects some positive words: willing, beauty.
    - It also detects a complex positive phrase: will tempt those willing to probe its inscrutable mysteries.
- The attention mechanism is not a new idea, but the combination with recusive neural network has the following extra advantages:
    - Now we can pay attention to phrases with arbitrary lengths, instead of some independent words.
    - Since the phrase is from part of a parsed tree, so it is generally well structured, and generally more easily to be interpreted and understood.
    - Experiments shows that the attention mechanism indeed captures the intuitively important words or phrases.

## Experimental details

The following sections (after Preliminaries) introduce the experimental details in the following aspects:

- Show the confusion matrix, accuracy and some other metrics of each algorithm.
- Show some sentences that are correctly classified.
- Show some examples of how the attention mechanism works on the correct examples.
- Show some sentences that are incorrectly classified.

# Preliminaries

In [1]:
import tensorflow as tf
import numpy as np
import logging
import cPickle
import os, shutil, random
from os import path

from tfrecord_reader import get_data

from lstm_model import LSTMModel
from lstm_attn_model import LSTMAttnModel
from lstm_attn2_model import LSTMAttn2Model
from rnn_attn_model import RNNAttnModel
from rnn_attn2_model import RNNAttn2Model
from gru_attn_model import GRUAttnModel
from gru_attn2_model import GRUAttn2Model
from gru2_attn_model import GRU2AttnModel

os.environ['CUDA_VISIBLE_DEVICES']='1'
sess = tf.InteractiveSession()

In [2]:
dict_path = "_data/dict.pkl"

(wv_word2ind, wv_ind2word) = cPickle.load(open(dict_path))
print len(wv_word2ind)
wv_ind2word[len(wv_ind2word)+1] = 'OOV'
wv_ind2word[-1] = '-1'

print 'examples of dict:\n'
for _ in range(5):
    (k, v) = random.choice(wv_word2ind.items())
    print k, v

20725
examples of dict:

couples 17732
parent-child 13934
capably 12814
witnessed 7821
behaving 8867


In [3]:
def ind2str(ind, wv_ind2word=wv_ind2word):
    return ' '.join([wv_ind2word[i] for i in ind])

def nodeid2str(id, wv, left, right, is_leaf, wv_ind2word=wv_ind2word):
    s = ''
    if is_leaf[id] == 0:
        s1 = nodeid2str(left[id], wv, left, right, is_leaf)
        s2 = nodeid2str(right[id], wv, left, right, is_leaf)
        s = s1 + ' ' + s2 + ' '
    if not wv[id] == -1:
        s = s + wv_ind2word[wv[id]]
    return s.replace('  ', ' ')

In [4]:
test_record = "_data/finegrained_test.record"
test_size = 2210

test_data = []

l, wv, left, right, label, is_leaf, mask = get_data(test_record)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord, sess=sess)

# Get all test data for demonstration

for _step in range(test_size):
    #print _step
    _l, _wv, _left, _right, _label, _is_leaf = sess.run([l, wv, left, right, label, is_leaf])
    test_data.append((_wv, _label, _left, _right, _is_leaf))
    
print 'examples of test data\n'
for i in range(3):
    print ind2str(test_data[i][0])
    print test_data[i][1]
    print nodeid2str(2, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])
    print

examples of test data

Effective but -1 OOV biopic -1 -1
[3 2 3 1 2 1 2]
Effective but 

If you sometimes like to go to the movies -1 -1 -1 to have fun -1 -1 -1 -1 -1 -1 -1 -1 , Wasabi is a good place to start -1 -1 -1 -1 -1 . -1 -1 -1 -1
[2 2 2 2 2 2 2 2 2 2 2 3 2 2 4 3 3 3 3 2 3 3 3 2 2 2 2 3 2 2 2 2 2 2 2 3 2
 3 2 2 3]
sometimes

Emerges as something rare -1 -1 -1 , -1 an issue movie -1 -1 that 's so honest -1 and -1 keenly observed -1 -1 -1 that it does n't -1 feel like one -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1
[2 2 2 3 3 3 3 2 4 2 2 2 2 2 2 2 2 4 3 2 3 2 2 3 4 3 2 2 2 2 1 2 2 2 2 2 2
 2 2 3 3 4 4 2 4]
something



In [5]:
def test(model, layers=1):
    print("Start testing")
    test_losses = []
    overall_metrics = np.zeros((5, 5), dtype=np.int32)
    root_metrics = np.zeros((5, 5), dtype=np.int32)
    overall_binary_metrics = np.zeros((2, 2), dtype=np.int32)
    root_binary_metrics = np.zeros((2, 2), dtype=np.int32)
    
    right_ind = []
    wrong_ind = []
    if layers == 0: # 0 means there's no attention
        pass
    elif layers == 1:
        attn = []
    else:
        attn1 = []
        attn2 = []

    for i in xrange(test_size):
        if layers == 0:
            test_loss, test_pred, test_binary_pred, target_v = \
                sess.run([model.sum_loss, model.pred, model.binary_pred, model.ground_truth])
        elif layers == 1:
            test_loss, test_pred, test_binary_pred, target_v, attn_vecs = \
                sess.run([model.sum_loss, model.pred, model.binary_pred, model.ground_truth, model.attn_vecs])
            attn.append(attn_vecs)
        else:
            test_loss, test_pred, test_binary_pred, target_v, attn_vecs1, attn_vecs2 = \
                sess.run([model.sum_loss, model.pred, model.binary_pred, model.ground_truth, model.attn_vecs1, model.attn_vecs2])
            attn1.append(attn_vecs1)            
            attn2.append(attn_vecs2)            
        test_losses.append(test_loss)
        root_pred = test_pred[-1]
        root_target = target_v[-1]
        if target_v[-1] != 2:
            root_binary_target = (target_v[-1] > 2).astype(np.int32)
            root_binary_pred = test_binary_pred[-1]
            root_binary_metrics[root_binary_pred, root_binary_target] += 1
            if root_binary_pred == root_binary_target:
                right_ind.append(i)
            else:
                wrong_ind.append(i)
        root_metrics[root_pred, root_target] += 1
        for k in range(len(test_pred)):
            overall_metrics[test_pred[k], target_v[k]] += 1
            target_temp = (target_v[k] > 2).astype(np.int32)
            if not target_v[k] == 2:
                overall_binary_metrics[test_binary_pred[k], target_temp] += 1
        #logger.debug("Validation loss %f" % valid_loss)

    sum_loss = sum(test_losses)
    mean_loss = sum_loss / test_size
    print('test finish')

    print('Root Metrics:\n %s' % str(root_metrics))
    print('Overall Metrics:\n %s' % str(overall_metrics))
    print('Root Binary Metrics:\n %s' % str(root_binary_metrics))
    print('Overall Binary Metrics:\n %s' % str(overall_binary_metrics))
    root_acc = 1.0 * np.trace(root_metrics) / np.sum(root_metrics)
    overall_acc = 1.0 * np.trace(overall_metrics) / np.sum(overall_metrics)
    root_binary_acc = 1.0 * np.trace(root_binary_metrics) / np.sum(root_binary_metrics)
    overall_binary_acc = 1.0 * np.trace(overall_binary_metrics) / np.sum(overall_binary_metrics)
    print('mean loss: %f, root_acc: %f, overall_acc: %f, root_binary_acc: %f, overall_binary_acc: %f' % 
          (mean_loss, root_acc, overall_acc, root_binary_acc, overall_binary_acc))
    
    if layers == 0:
        return right_ind, wrong_ind
    elif layers == 1:
        return right_ind, wrong_ind, attn
    else:
        return right_ind, wrong_ind, attn1, attn2        

# LSTM Baseline
Num of params: 

EMB = 300, CELL = 300

(EMB + 2 x CELL) x 5 x CELL + 5 x CELL = 1,351,500

In [6]:
model_dir = 'model/lstm_joint/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 300,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0,
               'drop_fw_hs': False,
               'drop_fw_cs': False
               }

test_md = LSTMModel(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind = test(test_md, layers=0)

Start testing
test finish
Root Metrics:
 [[ 96  69   9   1   0]
 [157 408 162  54  10]
 [ 17 109 113  80  15]
 [  9  41  98 311 202]
 [  0   6   7  64 172]]
Overall Metrics:
 [[  735   430    58     3     1]
 [ 1070  6256  2376   374    49]
 [  172  2283 52103  2662   221]
 [   29   277  1971  7482  1713]
 [    2     9    40   477  1807]]
Root Binary Metrics:
 [[817 106]
 [ 95 803]]
Overall Binary Metrics:
 [[10241  1023]
 [ 1022 13766]]
mean loss: 15.936891, root_acc: 0.497738, overall_acc: 0.827881, root_binary_acc: 0.889621, overall_binary_acc: 0.921503


In [7]:
# Analysis of some correct predictions

print 'Some correct binary predictions:\n'
for _ in range(5):
    ind = random.choice(right_ind)
    print ind2str(test_data[ind][0]), '\tground truth label:', test_data[ind][1][-1]
    print

Some correct binary predictions:

A movie that tries to fuse the two -1 ` woods ' -1 -1 -1 -1 -1 -1 but -1 winds up -1 a OOV masala mess -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 1

Collateral Damage -1 is , -1 despite its alleged provocation -1 -1 -1 -1 post-9 / 11 , -1 an antique -1 -1 , -1 in the end -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 1

Chicago is sophisticated , brash , sardonic , completely joyful -1 -1 -1 -1 -1 -1 -1 -1 in its execution -1 -1 -1 . -1 -1 	ground truth label: 4

Arguably the year -1 's silliest and -1 most incoherent -1 -1 movie -1 -1 . -1 -1 -1 	ground truth label: 1

The drama -1 is played out -1 with such aching beauty -1 -1 and -1 truth -1 -1 -1 that it brings tears -1 to your eyes -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 4



In [8]:
# Analysis of some incorrect predictions

print 'Some incorrect binary predictions:\n'
for _ in range(5):
    ind = random.choice(wrong_ind)
    print ind2str(test_data[ind][0]), '\tground truth label:', test_data[ind][1][-1]
    print

Some incorrect binary predictions:

Not a bad journey -1 -1 at all -1 -1 . -1 -1 	ground truth label: 3

The best way -1 -1 to hope for any chance -1 of enjoying this film -1 -1 -1 -1 -1 -1 -1 -1 is by lowering your expectations -1 -1 -1 -1 . -1 -1 	ground truth label: 1

While Benigni -LRB- who stars and -1 co-wrote -1 -1 -RRB- -1 -1 -1 seems to be having a wonderful time -1 -1 -1 -1 -1 -1 -1 -1 , he might be alone -1 in that -1 -1 -1 . -1 -1 -1 -1 	ground truth label: 1

For all its failed connections -1 -1 -1 -1 , Divine Secrets -1 of the Ya-Ya Sisterhood -1 -1 -1 -1 is nurturing -1 , -1 in a gauzy -1 , -1 dithering way -1 -1 -1 -1 . -1 -1 -1 -1 	ground truth label: 3

The Transporter -1 is as lively -1 and -1 as fun -1 -1 -1 as it is unapologetically dumb -1 -1 -1 -1 -1 -1 	ground truth label: 3



# LSTM + ATTN 

Num of params: 

EMB = 300, CELL = 150, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 5 x CELL + 2 X ATTN x CELL + ATTN = 480,850

In [6]:
model_dir = 'model/lstm_attn_cell150_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 150,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = LSTMAttnModel(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn = test(test_md)

Start testing
test finish
Root Metrics:
 [[106  76  16   5   3]
 [155 445 176  58  11]
 [  7  54  82  55   7]
 [  9  57 108 337 223]
 [  2   1   7  55 155]]
Overall Metrics:
 [[  924   877   254    37    10]
 [  936  6571  3826   539    66]
 [   84  1314 48576  1760    82]
 [   60   479  3843  8078  1803]
 [    4    14    49   584  1830]]
Root Binary Metrics:
 [[822 107]
 [ 90 802]]
Overall Binary Metrics:
 [[10282  1214]
 [  981 13575]]
mean loss: 17.837229, root_acc: 0.509050, overall_acc: 0.798777, root_binary_acc: 0.891818, overall_binary_acc: 0.915745


In [7]:
# Analysis of some correct predictions

print 'Some correct binary predictions:\n'
for _ in range(5):
    ind = random.choice(right_ind)
    print ind2str(test_data[ind][0]), '\tground truth label:', test_data[ind][1][-1]
    print

Some correct binary predictions:

It wo n't -1 be long -1 before you 'll spy I Spy at a video store -1 -1 near you -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 1

An incredibly clever -1 and -1 superbly paced -1 -1 caper -1 -1 filled with scams -1 -1 within scams -1 -1 within scams -1 -1 . -1 -1 	ground truth label: 3

OOV handily directs and -1 edits around his screenplay 's -1 -1 sappier elements -1 -1 -1 -1 -1 ... -1 and -1 sustains Off -1 the Hook 's -1 -1 buildup -1 with remarkable assuredness -1 for a first-timer -1 -1 -1 -1 -1 -1 -1 . -1 -1 -1 	ground truth label: 3

Its scenes -1 and -1 sensibility -1 are all -1 more than -1 familiar -1 -1 -1 , -1 but -1 it exudes a kind -1 of nostalgic OOV charm -1 -1 -1 -1 -1 and -1 , at the same time -1 -1 -1 , -1 -1 -1 is so fresh -1 -1 and -1 free of the usual thriller nonsense -1 -1 -1 -1 -1 -1 that it all seems to be happening for the first time -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 	ground truth label: 3

Bullo

## A positive example

In [8]:
k = 12 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Attecntion vector of root:'
print attn[i][-1]
print

print 'Indices of top 5 important nodes:'
top_ind = np.argsort(attn[i][-1])[::-1][:5]
print top_ind
print

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

The movie 's -1 -1 ripe , enrapturing -1 -1 beauty -1 -1 will tempt those willing to probe its inscrutable mysteries -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 3

Attecntion vector of root:
[ 0.00349537  0.00455809  0.01480896  0.02800366  0.0266175   0.04388244
  0.00434648  0.03815329  0.03542445  0.03398605  0.05433063  0.03201456
  0.03265094  0.00553974  0.0567746   0.01294396  0.07164057  0.0019329
  0.05077274  0.00636492  0.04187195  0.04173682  0.03137547  0.03093641
  0.03170796  0.03056097  0.03258435  0.03174249  0.03735219  0.03769103
  0.01368807  0.04571361  0.        ]

Indices of top 5 important nodes:
[16 14 10 18 31]

Top 5 important sub-strings:
willing
tempt
beauty
probe
will tempt those willing to probe its inscrutable mysteries . 


In [9]:
k = 24 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Attecntion vector of root:'
print attn[i][-1]
print

print 'Indices of top 5 important nodes:'
top_ind = np.argsort(attn[i][-1])[::-1][:5]
print top_ind
print

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

Cantet perfectly captures the hotel lobbies -1 -1 , -1 two-lane highways -1 -1 , -1 and -1 roadside cafes -1 that permeate Vincent 's -1 days -1 -1 -1 -1 -1 -1 -1 -1 	ground truth label: 3

Attecntion vector of root:
[ 0.04554181  0.03951189  0.04932036  0.00022318  0.030342    0.04970415
  0.04887385  0.05224044  0.00124436  0.03921333  0.06084915  0.03297601
  0.01959155  0.02685734  0.00124436  0.03535077  0.00094744  0.04310905
  0.04978576  0.04826697  0.03682045  0.00094778  0.06222716  0.02522867
  0.00435613  0.0031312   0.00289662  0.00170586  0.0130778   0.01822985
  0.021375    0.0274193   0.03534235  0.03720098  0.        ]

Indices of top 5 important nodes:
[22 10  7 18  5]

Top 5 important sub-strings:
permeate
two-lane
the hotel lobbies 
roadside
lobbies


In [10]:
k = 48 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Attecntion vector of root:'
print attn[i][-1]
print

print 'Indices of top 5 important nodes:'
top_ind = np.argsort(attn[i][-1])[::-1][:5]
print top_ind
print

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

At heart -1 the movie -1 is a deftly wrought suspense yarn -1 -1 -1 -1 whose richer shadings -1 work as coloring rather than substance -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 -1 	ground truth label: 4

Attecntion vector of root:
[ 0.00176917  0.04901871  0.03447079  0.0001959   0.00156532  0.00172315
  0.00304355  0.00046547  0.0300404   0.03139539  0.02612001  0.03626014
  0.03010962  0.03146257  0.03048464  0.03013186  0.00458601  0.05435914
  0.03976879  0.02969466  0.00242321  0.00085276  0.0315246   0.0067939
  0.0005196   0.03945554  0.0386363   0.05209846  0.05335733  0.04505504
  0.0467037   0.03267373  0.0307165   0.02844965  0.03035104  0.00386342
  0.02958277  0.02965588  0.        ]

Indices of top 5 important nodes:
[17 28 27  1 30]

Top 5 important sub-strings:
richer
coloring rather than substance 
rather than substance 
heart
work as coloring rather than substance 


In [11]:
# Analysis of some incorrect predictions

print 'Some incorrect binary predictions:\n'
for _ in range(5):
    ind = random.choice(wrong_ind)
    print ind2str(test_data[ind][0]), '\tground truth label:', test_data[ind][1][-1]
    print

Some incorrect binary predictions:

If there 's a way to effectively teach kids about the dangers -1 of drugs -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 , I think it 's in projects like the -LRB- unfortunately R-rated -RRB- -1 -1 -1 -1 Paid -1 -1 -1 -1 -1 -1 -1 . -1 -1 -1 -1 	ground truth label: 3

No. . -1 	ground truth label: 1

The title Trapped -1 -1 turns out -1 to be a pretty fair -1 description -1 -1 of how you feel while you 're watching this OOV thriller -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 1

Happily for Mr. Chin -1 -1 -1 -- though unhappily for his subjects -1 -1 -1 -1 -- -1 -1 the invisible hand -1 -1 of the marketplace -1 -1 -1 wrote a script -1 -1 that no human screenwriter -1 -1 could have hoped to match -1 -1 -1 -1 -1 -1 -1 . -1 -1 -1 -1 	ground truth label: 3

Somewhere short -1 of Tremors on the modern OOV -1 -1 : -1 neither as funny -1 -1 nor -1 as clever -1 -1 -1 -1 -1 -1 , -1 though -1 -1 an agreeably -1 unpretentious way -1 to spend nine

# LSTM + ATTN 2

Num of params: 

EMB = 300, CELL = 200, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 5 x CELL + 2 X ATTN x CELL + ATTN = 741,100

In [6]:
model_dir = 'model/lstm_attn_cell200_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 200,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = LSTMAttnModel(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn = test(test_md)

Start testing
test finish
Root Metrics:
 [[ 97  66  10   2   2]
 [160 439 170  58  13]
 [  9  59  88  49   8]
 [ 12  66 114 320 186]
 [  1   3   7  81 190]]
Overall Metrics:
 [[  892   786   193    25     8]
 [  969  6722  3932   575    73]
 [   87  1236 48295  1622    79]
 [   55   497  3930  7743  1399]
 [    5    14   198  1033  2232]]
Root Binary Metrics:
 [[806  95]
 [106 814]]
Overall Binary Metrics:
 [[10253  1157]
 [ 1010 13632]]
mean loss: 18.475058, root_acc: 0.513122, overall_acc: 0.797627, root_binary_acc: 0.889621, overall_binary_acc: 0.916820


# LSTM + ATTN 3

Num of params: 

EMB = 300, CELL = 300, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 5 x CELL + 2 X ATTN x CELL + ATTN = 1,411,600

In [6]:
model_dir = 'model/lstm_attn_cell300_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 300,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = LSTMAttnModel(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn = test(test_md)

Start testing
test finish
Root Metrics:
 [[101  69  11   0   1]
 [151 409 148  50   7]
 [ 13  70  88  32   6]
 [ 13  83 138 377 230]
 [  1   2   4  51 155]]
Overall Metrics:
 [[  900   751   154     8     3]
 [  928  6444  3327   449    46]
 [   96  1385 48177  1472    57]
 [   80   666  4788  8421  1791]
 [    4     9   102   648  1894]]
Root Binary Metrics:
 [[770  74]
 [142 835]]
Overall Binary Metrics:
 [[10016   878]
 [ 1247 13911]]
mean loss: 18.070416, root_acc: 0.511312, overall_acc: 0.797046, root_binary_acc: 0.881384, overall_binary_acc: 0.918432


# LSTM + ATTN + Layer2

Num of params: 

EMB = 300, CELL = 200, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 19 x CELL x CELL + 10 x CELL + 4 x ATTN x CELL + 2 x ATTN = 1,542,200

In [6]:
model_dir = 'model/lstm_attn2_cell200_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 200,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'drop_fw_hs': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = LSTMAttn2Model(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn1, attn2 = test(test_md, layers=2)

Start testing
test finish
Root Metrics:
 [[ 76  45   5   0   0]
 [178 464 180  66  11]
 [  6  54  64  31   5]
 [ 19  70 139 380 280]
 [  0   0   1  33 103]]
Overall Metrics:
 [[  761   502    72     0     0]
 [ 1120  7044  3896   592    73]
 [   70  1180 49383  1656    69]
 [   56   528  3179  8367  2064]
 [    1     1    18   383  1585]]
Root Binary Metrics:
 [[807  94]
 [105 815]]
Overall Binary Metrics:
 [[10325  1099]
 [  938 13690]]
mean loss: 16.465627, root_acc: 0.491855, overall_acc: 0.812833, root_binary_acc: 0.890719, overall_binary_acc: 0.921810


# LSTM + ATTN + Layer2

Num of params: 

EMB = 300, CELL = 300, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 19 x CELL x CELL + 10 x CELL + 4 x ATTN x CELL + 2 x ATTN = 3,183,200

In [6]:
model_dir = 'model/lstm_attn2_cell300_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 300,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'drop_fw_hs': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = LSTMAttn2Model(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn1, attn2 = test(test_md, layers=2)

Start testing
test finish
Root Metrics:
 [[138 110  24   1   2]
 [116 378 152  56   9]
 [ 17 101 105  68  17]
 [  8  41  99 268 122]
 [  0   3   9 117 249]]
Overall Metrics:
 [[ 1107  1153   231    21    11]
 [  728  5965  2826   451    52]
 [  135  1798 50803  2105   123]
 [   31   319  2539  7004  1059]
 [    7    20   149  1417  2546]]
Root Binary Metrics:
 [[822 103]
 [ 90 806]]
Overall Binary Metrics:
 [[10214  1004]
 [ 1049 13785]]
mean loss: 16.649294, root_acc: 0.514932, overall_acc: 0.816283, root_binary_acc: 0.894014, overall_binary_acc: 0.921196


# RNN + ATTN 
Num of params: 

EMB = 300, CELL = 400, ATTN = 100

(EMB + 2 x CELL) x CELL + 2 x CELL x CELL + CELL + CELL x ATTN + ATTN =

800,500

In [6]:
model_dir = 'model/rnn_attn_cell400_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 400,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = RNNAttnModel(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn = test(test_md)

Start testing
test finish
Root Metrics:
 [[ 91  70  20   4   2]
 [157 389 157  81  20]
 [ 15  62  69  34  12]
 [ 16 111 139 351 263]
 [  0   1   4  40 102]]
Overall Metrics:
 [[  617   530  1434    54    66]
 [ 1135  6427  6343  1164   137]
 [   88  1148 41865  1124    88]
 [  131  1047  5669  8046  2116]
 [   37   103  1237   610  1384]]
Root Binary Metrics:
 [[766 129]
 [146 780]]
Overall Binary Metrics:
 [[ 9451  1885]
 [ 1812 12904]]
mean loss: 27.970765, root_acc: 0.453394, overall_acc: 0.706283, root_binary_acc: 0.848984, overall_binary_acc: 0.858092


In [7]:
# Analysis of some correct predictions

print 'Some correct binary predictions:\n'
for _ in range(5):
    ind = random.choice(right_ind)
    print ind2str(test_data[ind][0]), '\tground truth label:', test_data[ind][1][-1]
    print

Some correct binary predictions:

I loved it -1 ! -1 -1 	ground truth label: 4

McKay deflates his piece -1 of puffery with a sour cliche -1 -1 -1 -1 and -1 heavy doses -1 of mean-spiritedness -1 -1 -1 -1 -1 -1 -1 	ground truth label: 1

This is a throwaway , junk-food movie -1 -1 -1 -1 whose rap -1 soundtrack -1 was better -1 tended to than the film -1 -1 itself -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 0

Smith 's -1 approach -1 is never -1 to tease -1 -1 , -1 except gently -1 and -1 in that way -1 that makes us consider our own eccentricities -1 -1 -1 -1 and -1 how they are expressed through our homes -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 3

Maybe I found the proceedings -1 a little bit -1 -1 too conventional -1 -1 -1 -1 . -1 -1 -1 	ground truth label: 1



In [14]:
k = 12 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Attecntion vector of root:'
print attn[i][-1]
print

print 'Indices of top 5 important nodes:'
top_ind = np.argsort(attn[i][-1])[::-1][:5]
print top_ind
print

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

The movie 's -1 -1 ripe , enrapturing -1 -1 beauty -1 -1 will tempt those willing to probe its inscrutable mysteries -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 3

Attecntion vector of root:
[ 0.00472508  0.01236391  0.00592726  0.00581668  0.00370169  0.02663458
  0.00336349  0.11336799  0.0454242   0.04113133  0.07280442  0.0481532
  0.04172279  0.00927238  0.08192968  0.00308198  0.02412561  0.00369212
  0.00277828  0.00409471  0.13038668  0.02992338  0.00652305  0.00495997
  0.00332622  0.00362642  0.00552763  0.00367674  0.0142454   0.01965987
  0.058197    0.05986447  0.        ]

Indices of top 5 important nodes:
[20  7 14 10 31]

Top 5 important sub-strings:
inscrutable
enrapturing
tempt
beauty
will tempt those willing to probe its inscrutable mysteries         . 


In [15]:
k = 24 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Attecntion vector of root:'
print attn[i][-1]
print

print 'Indices of top 5 important nodes:'
top_ind = np.argsort(attn[i][-1])[::-1][:5]
print top_ind
print

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

The story -1 loses its bite -1 -1 in a last-minute happy ending -1 -1 -1 that 's even less plausible -1 -1 than the rest -1 of the picture -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 0

Attecntion vector of root:
[ 0.00454059  0.00915212  0.0042886   0.05790763  0.00304922  0.04729232
  0.00421499  0.00873589  0.00848241  0.00367327  0.06181822  0.05534934
  0.00308452  0.04477623  0.08128538  0.05717977  0.00304677  0.00617294
  0.00956263  0.03983257  0.03761867  0.01154378  0.0208891   0.01960257
  0.00449949  0.03010256  0.00673123  0.00268357  0.00449949  0.01133271
  0.01172593  0.0047576   0.00579599  0.0067781   0.00949739  0.0129954
  0.00828245  0.03789391  0.00960336  0.01505514  0.05788542  0.09494645
  0.        ]

Indices of top 5 important nodes:
[41 14 10  3 40]

Top 5 important sub-strings:
loses its bite   in a last-minute happy ending    that 's even less plausible   than the rest  of the picture           . 
last-minute happy ending  
last-minute
lose

In [16]:
k = 48 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Attecntion vector of root:'
print attn[i][-1]
print

print 'Indices of top 5 important nodes:'
top_ind = np.argsort(attn[i][-1])[::-1][:5]
print top_ind
print

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

Gollum 's -1 ` performance ' -1 -1 -1 is incredible -1 ! -1 -1 	ground truth label: 4

Attecntion vector of root:
[ 0.12941813  0.01561992  0.00880938  0.025031    0.0617275   0.02064649
  0.04642288  0.03932489  0.00750063  0.02774767  0.06765139  0.08335026
  0.16085345  0.11467416  0.        ]

Indices of top 5 important nodes:
[12  0 13 11 10]

Top 5 important sub-strings:
!
Gollum
is incredible  ! 
is incredible 
incredible


In [17]:
# Analysis of some incorrect predictions

print 'Some incorrect binary predictions:\n'
for _ in range(5):
    ind = random.choice(wrong_ind)
    print ind2str(test_data[ind][0]), '\tground truth label:', test_data[ind][1][-1]
    print

Some incorrect binary predictions:

Watching Beanie -1 and -1 his gang -1 -1 put together -1 his slasher video -1 -1 -1 from spare parts -1 -1 -1 and -1 borrowed materials is as much fun -1 -1 -1 as it must have been for them to make it -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 3

What Full Frontal -1 lacks in thematic coherence -1 -1 -1 -1 -1 it largely makes up -1 for as loosey-goosey , experimental entertainment -1 -1 -1 -1 -1 -1 . -1 -1 -1 -1 	ground truth label: 3

The two leads -1 -1 are almost -1 good enough -1 to camouflage the dopey plot -1 -1 -1 -1 -1 -1 -1 , -1 but -1 so much naturalistic small talk -1 -1 -1 -1 , -1 delivered in almost muffled -1 exchanges -1 -1 -1 -1 , -1 eventually has a lulling effect -1 -1 -1 -1 -1 -1 . -1 	ground truth label: 1

The tone -1 shifts abruptly -1 from tense -1 -1 to celebratory to soppy -1 -1 -1 -1 . -1 -1 	ground truth label: 1

Blessed with immense physical prowess -1 -1 -1 -1 he may well -1 be -1 -1 -1 , -1 but 

# RNN + ATTN + 2Layers

Num of params:

EMB = 300, CELL = 400, ATTN = 150

(EMB + 2 x CELL) x CELL + 7 x CELL x CELL + 2 x CELL + CELL x ATTN x 4 + ATTN x 2 = 1,801,100

In [6]:
model_dir = 'model/rnn_attn2_cell400_attn150/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 400,
               'attn_size': 150,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'drop_fw_hs': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = RNNAttn2Model(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn1, attn2 = test(test_md, layers=2)

Start testing
test finish
Root Metrics:
 [[ 84  58  19   3   2]
 [168 430 174  91  26]
 [ 14  74  79  44  16]
 [ 13  62 112 320 226]
 [  0   9   5  52 129]]
Overall Metrics:
 [[  685   664   699   217    57]
 [ 1111  6460  7864  1620   182]
 [  102  1315 41943  1663   152]
 [  103   772  5660  6812  1738]
 [    7    44   382   686  1662]]
Root Binary Metrics:
 [[812 142]
 [100 767]]
Overall Binary Metrics:
 [[ 9967  2965]
 [ 1296 11824]]
mean loss: 28.911734, root_acc: 0.471493, overall_acc: 0.696877, root_binary_acc: 0.867106, overall_binary_acc: 0.836442


In [7]:
# Analysis of some correct predictions

print 'Some correct binary predictions:\n'
for _ in range(5):
    ind = random.choice(right_ind)
    print ind2str(test_data[ind][0]), '\tground truth label:', test_data[ind][1][-1]
    print

Some correct binary predictions:

Obvious politics -1 and -1 rudimentary animation -1 -1 reduce the chances -1 that the appeal -1 of Hey Arnold -1 -1 -1 -1 -1 -1 ! -1 -1 	ground truth label: 1

Just when the movie -1 seems confident enough -1 to handle subtlety -1 -1 -1 -1 -1 -1 -1 , it dives into soapy bathos -1 -1 -1 . -1 -1 -1 -1 	ground truth label: 1

It 's neither as romantic -1 nor -1 as thrilling -1 as it should be -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 0

Succeeds in providing a disquiet world -1 -1 -1 the OOV completion -1 -1 of the Police Academy series -1 -1 -1 -1 -1 -1 -1 -1 . -1 	ground truth label: 3

OOV ridiculous -1 , OOV noisy -1 . -1 -1 -1 	ground truth label: 1



In [8]:
k = 12 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Indices of top 5 important nodes (from layer 1):'
top_ind1 = np.argsort(attn1[i][-1])[::-1][:5]
print top_ind1
print 'Indices of top 5 important nodes (from layer 2):'
top_ind2 = np.argsort(attn2[i][-1])[::-1][:5]
print top_ind2
print

print 'Top 5 important sub-strings (from layer 1):'
for ind in top_ind1:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])
print 'Top 5 important sub-strings (from layer 2):'
for ind in top_ind2:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

A thoughtful , provocative , insistently humanizing -1 -1 -1 -1 -1 film -1 -1 . -1 	ground truth label: 4

Indices of top 5 important nodes (from layer 1):
[13 11  9  0  1]
Indices of top 5 important nodes (from layer 2):
[13  5  1  9  3]

Top 5 important sub-strings (from layer 1):
thoughtful , provocative , insistently humanizing film 
thoughtful , provocative , insistently humanizing 
provocative , insistently humanizing 
A
thoughtful
Top 5 important sub-strings (from layer 2):
thoughtful , provocative , insistently humanizing film 
insistently
thoughtful
provocative , insistently humanizing 
provocative


In [9]:
k = 24 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Indices of top 5 important nodes (from layer 1):'
top_ind1 = np.argsort(attn1[i][-1])[::-1][:5]
print top_ind1
print 'Indices of top 5 important nodes (from layer 2):'
top_ind2 = np.argsort(attn2[i][-1])[::-1][:5]
print top_ind2
print

print 'Top 5 important sub-strings (from layer 1):'
for ind in top_ind1:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])
print 'Top 5 important sub-strings (from layer 2):'
for ind in top_ind2:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

Cantet perfectly captures the hotel lobbies -1 -1 , -1 two-lane highways -1 -1 , -1 and -1 roadside cafes -1 that permeate Vincent 's -1 days -1 -1 -1 -1 -1 -1 -1 -1 	ground truth label: 3

Indices of top 5 important nodes (from layer 1):
[32 17 28 27 31]
Indices of top 5 important nodes (from layer 2):
[ 1 18 33 10 11]

Top 5 important sub-strings (from layer 1):
captures the hotel lobbies , two-lane highways , and roadside cafes that permeate Vincent 's days 
the hotel lobbies , two-lane highways , and 
permeate Vincent 's days 
Vincent 's days 
the hotel lobbies , two-lane highways , and roadside cafes that permeate Vincent 's days 
Top 5 important sub-strings (from layer 2):
perfectly
roadside
perfectly captures the hotel lobbies , two-lane highways , and roadside cafes that permeate Vincent 's days 
two-lane
highways


In [10]:
k = 128 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

print 'Indices of top 5 important nodes (from layer 1):'
top_ind1 = np.argsort(attn1[i][-1])[::-1][:5]
print top_ind1
print 'Indices of top 5 important nodes (from layer 2):'
top_ind2 = np.argsort(attn2[i][-1])[::-1][:5]
print top_ind2
print

print 'Top 5 important sub-strings (from layer 1):'
for ind in top_ind1:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])
print 'Top 5 important sub-strings (from layer 2):'
for ind in top_ind2:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

It 's traditional -1 moviemaking all the way -1 -1 -1 -1 -1 , -1 but -1 it 's done with a lot -1 of careful period attention -1 -1 as well as -1 -1 -1 some very welcome wit -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 	ground truth label: 4

Indices of top 5 important nodes (from layer 1):
[16 50 38 46  3]
Indices of top 5 important nodes (from layer 2):
[14 11 29 43 25]

Top 5 important sub-strings (from layer 1):
It 's traditional moviemaking all the way , but 
It 's traditional moviemaking all the way , but it 's done with a lot of careful period attention as well as some very welcome wit 
welcome
with a lot of careful period attention as well as some very welcome wit 
's traditional 
Top 5 important sub-strings (from layer 2):
It 's traditional moviemaking all the way , 
's traditional moviemaking all the way 
careful period attention 
careful period attention as well as some very welcome wit 
careful


In [11]:
# Analysis of some incorrect predictions

print 'Some incorrect binary predictions:\n'
for _ in range(5):
    ind = random.choice(wrong_ind)
    print ind2str(test_data[ind][0]), '\tground truth label:', test_data[ind][1][-1]
    print

Some incorrect binary predictions:

The story -1 is familiar from its many predecessors -1 -1 -1 -1 -1 -1 ; -1 like them -1 , it eventually culminates in the OOV -1 - -1 stunning insight -1 that crime does n't -1 pay -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 	ground truth label: 1

It is far from the worst -1 , -1 thanks to the topical issues -1 -1 it raises , -1 the performances -1 of Stewart and -1 Hardy -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 , and -1 that essential feature -1 -- -1 a decent full-on space battle -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 3

It 's a great deal -1 -1 of sizzle and -1 very little -1 -1 steak -1 -1 -1 -1 . -1 -1 	ground truth label: 1

The story -1 is familiar from its many predecessors -1 -1 -1 -1 -1 -1 ; -1 like them -1 , it eventually culminates in the OOV -1 - -1 stunning insight -1 that crime does n't -1 pay -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 	ground truth label: 1

Based on a David Leavitt story -1 -1 -1 -1 -1 , the film -1 shares that 

# GRU v1+ ATTN 

Num of params: 

EMB = 300, CELL = 150, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 5 x CELL + 2 x ATTN x CELL + ATTN = 480,850

In [6]:
model_dir = 'model/gru_attn_cell150_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 150,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = GRUAttnModel(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn = test(test_md)

Start testing
test finish
Root Metrics:
 [[122  93  16   1   2]
 [136 407 160  77  18]
 [ 10  56  92  50  10]
 [ 10  65 111 285 146]
 [  1  12  10  97 223]]
Overall Metrics:
 [[  970   918   237    26     9]
 [  894  6684  4366   691    85]
 [   89  1201 48175  1662    98]
 [   43   409  3615  7713  1432]
 [   12    43   155   906  2167]]
Root Binary Metrics:
 [[811 129]
 [101 780]]
Overall Binary Metrics:
 [[10359  1377]
 [  904 13412]]
mean loss: 18.100765, root_acc: 0.510860, overall_acc: 0.795508, root_binary_acc: 0.873696, overall_binary_acc: 0.912444


In [7]:
k = 12 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

top_ind = np.argsort(attn[i][-1])[::-1][:5]

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

Offers a breath -1 of the fresh air -1 -1 of true sophistication -1 -1 -1 -1 -1 -1 . -1 	ground truth label: 4

Top 5 important sub-strings:
breath
sophistication
Offers a breath of the fresh air of true sophistication 
a breath of the fresh air of true sophistication 
Offers


In [9]:
k = 25 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

top_ind = np.argsort(attn[i][-1])[::-1][:5]

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

Though it is by no means -1 his best work -1 -1 -1 -1 -1 -1 -1 , OOV is a distinguished and -1 distinctive -1 effort -1 -1 by a bona-fide master -1 -1 , -1 a fascinating film -1 -1 replete with rewards to be had by all willing to make the effort to reap them -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 -1 -1 	ground truth label: 3

Top 5 important sub-strings:
reap
no means his best work 
it is by no means his best work 
is by no means his best work 
by no means his best work 


In [10]:
k = 48 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

top_ind = np.argsort(attn[i][-1])[::-1][:5]

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

A truly moving -1 experience -1 -1 , -1 and -1 a perfect example -1 -1 of how art -- when done right -1 -1 -- -1 -1 -1 can help heal -1 -1 -1 -1 -1 -1 -1 , -1 clarify , -1 and -1 comfort -1 . -1 -1 	ground truth label: 4

Top 5 important sub-strings:
clarify
when
help
heal
perfect example 


# GRU v1+ ATTN 

Num of params: 

EMB = 300, CELL = 400, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 5 x CELL + 2 x ATTN x CELL + ATTN = 2,282,100

In [6]:
model_dir = 'model/gru_attn_cell400_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 400,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = GRUAttnModel(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn = test(test_md)

Start testing
test finish
Root Metrics:
 [[115  91  19   2   0]
 [147 428 157  72  13]
 [  8  54  87  37  12]
 [  8  55 118 316 172]
 [  1   5   8  83 202]]
Overall Metrics:
 [[  892   774   185    14     3]
 [  963  6558  3567   568    64]
 [  105  1498 48751  1751   100]
 [   43   410  3931  7947  1597]
 [    5    15   114   718  2027]]
Root Binary Metrics:
 [[825 106]
 [ 87 803]]
Overall Binary Metrics:
 [[10293  1142]
 [  970 13647]]
mean loss: 17.765202, root_acc: 0.519457, overall_acc: 0.801150, root_binary_acc: 0.894014, overall_binary_acc: 0.918931


# GRU v1+ ATTN + Layer2

Num of params: 

EMB = 300, CELL = 200, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 10 x CELL + 15 x CELL x CELL + 4 x ATTN x CELL + 2 x ATTN = 1,400,200

In [6]:
model_dir = 'model/gru_attn2_cell200_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 200,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'drop_fw_hs': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = GRUAttn2Model(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn1, attn2 = test(test_md, layers=2)

Start testing
test finish
Root Metrics:
 [[163 186  37   9   3]
 [ 84 268 116  41   6]
 [ 23 127 127  84  23]
 [  5  39  89 215  91]
 [  4  13  20 161 276]]
Overall Metrics:
 [[ 1326  2276  2088   274    30]
 [  494  5316  5009   621    40]
 [  119  1127 44671  1431   116]
 [   38   377  3207  5993   717]
 [   31   159  1573  2679  2888]]
Root Binary Metrics:
 [[799 101]
 [113 808]]
Overall Binary Metrics:
 [[10280  1679]
 [  983 13110]]
mean loss: 26.253259, root_acc: 0.474661, overall_acc: 0.728741, root_binary_acc: 0.882482, overall_binary_acc: 0.897820


# GRU v1+ ATTN + Layer2

Num of params: 

EMB = 300, CELL = 300, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 10 x CELL + 15 x CELL x CELL + 4 x ATTN x CELL + 2 x ATTN = 2,823,200

In [6]:
model_dir = 'model/gru_attn2_cell300_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 300,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'drop_fw_hs': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = GRUAttn2Model(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn1, attn2 = test(test_md, layers=2)

Start testing
test finish
Root Metrics:
 [[108  83  24   7   3]
 [152 450 197 104  19]
 [  5  23  39  17   5]
 [ 13  73 121 328 224]
 [  1   4   8  54 148]]
Overall Metrics:
 [[ 1040  1302  1054   219    64]
 [  840  6611  6313  1276   137]
 [   41   592 43713  1057    59]
 [   68   626  4498  7505  1719]
 [   19   124   970   941  1812]]
Root Binary Metrics:
 [[806 140]
 [106 769]]
Overall Binary Metrics:
 [[10133  2254]
 [ 1130 12535]]
mean loss: 26.040727, root_acc: 0.485520, overall_acc: 0.734637, root_binary_acc: 0.864909, overall_binary_acc: 0.870106


# GRU v1+ ATTN + Layer2

Num of params: 

EMB = 300, CELL = 400, ATTN = 100

(EMB + 2 x CELL) x 5 x CELL + 10 x CELL + 15 x CELL x CELL + 4 x ATTN x CELL + 2 x ATTN = 4,764,200

In [6]:
model_dir = 'model/gru_attn2_cell400_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 400,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'drop_fw_hs': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = GRUAttn2Model(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn1, attn2 = test(test_md, layers=2)

Start testing
test finish
Root Metrics:
 [[ 84  61  10   0   1]
 [181 474 192 106  22]
 [ 10  65  96  71  26]
 [  4  33  89 297 225]
 [  0   0   2  36 125]]
Overall Metrics:
 [[  764   654   724   119    33]
 [ 1111  7132  5213   933   119]
 [   88  1088 46324  1510   139]
 [   44   357  3684  7778  1904]
 [    1    24   603   658  1596]]
Root Binary Metrics:
 [[854 187]
 [ 58 722]]
Overall Binary Metrics:
 [[10616  2178]
 [  647 12611]]
mean loss: 22.463824, root_acc: 0.486878, overall_acc: 0.769903, root_binary_acc: 0.865459, overall_binary_acc: 0.891563


# GRU v2 + ATTN 

Num of params: 

EMB = 300, CELL = 400, ATTN = 100

(EMB + 2 x CELL) x 4 x CELL + 4 x CELL + 2 x ATTN x CELL + ATTN = 1,841,700

In [6]:
model_dir = 'model/gru2_attn_cell400_attn100/'
config_dict = {'embed_size': 300,
               'fw_cell_size': 400,
               'attn_size': 100,
               'wv_emb_file': 'tmp/embeddings.pkl',
               'wv_dict': '_data/dict.pkl',
               'wv_vocab_size': 20726,
               'mask_type': 'subtree_mask',
               'drop_embed': False,
               'drop_weight': False,
               'class_size': 5,
               'rec_keep_prob': 0.0,
               'output_keep_prob': 0.0,
               'L2_lambda': 0.0
               }

test_md = GRU2AttnModel(config_dict)
test_md.is_training = False
test_md.add_variables(reuse=False)
test_md.build_model(left, right, wv, label, is_leaf, l, mask)

tf.train.Saver().restore(sess, model_dir)

right_ind, wrong_ind, attn = test(test_md)

Start testing
test finish
Root Metrics:
 [[151 127  28   7   3]
 [ 95 357 128  55   9]
 [ 19  76  95  49  13]
 [ 12  65 128 335 205]
 [  2   8  10  64 169]]
Overall Metrics:
 [[ 1100  1354  1461   131    30]
 [  716  6017  4284   542    48]
 [  100  1280 44969  1438   104]
 [   74   542  4541  8009  1669]
 [   18    62  1293   878  1940]]
Root Binary Metrics:
 [[797  94]
 [115 815]]
Overall Binary Metrics:
 [[10064  1341]
 [ 1199 13448]]
mean loss: 23.230562, root_acc: 0.500905, overall_acc: 0.751029, root_binary_acc: 0.885228, overall_binary_acc: 0.902503


In [8]:
k = 10 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

top_ind = np.argsort(attn[i][-1])[::-1][:5]

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

An utterly -1 compelling ` -1 who wrote it -1 ' -1 in which the reputation -1 of the most famous -1 author -1 -1 who ever lived -1 -1 -1 -1 -1 comes into question -1 -1 -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 3

Top 5 important sub-strings:
compelling ` who wrote it ' in which the reputation of the most famous author who ever lived comes into question . 
compelling ` who wrote it ' in which the reputation of the most famous author who ever lived comes into question 
the most famous author who ever lived 
An utterly 
the reputation of the most famous author who ever lived comes into question 


In [9]:
k = 20 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

top_ind = np.argsort(attn[i][-1])[::-1][:5]

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

At about 95 -1 minutes -1 -1 , Treasure Planet -1 maintains a brisk pace -1 -1 as it races -1 -1 -1 -1 through the familiar story -1 -1 -1 -1 . -1 -1 -1 -1 	ground truth label: 3

Top 5 important sub-strings:
maintains a brisk pace as it races through the familiar story . 
maintains a brisk pace as it races through the familiar story 
a brisk pace as it races 
Treasure Planet maintains a brisk pace as it races through the familiar story . 
brisk pace 


In [10]:
k = 30 # randomly select a correct predition for detailed analysis
i = right_ind[k]
print ind2str(test_data[i][0]), '\tground truth label:', test_data[i][1][-1]
print

top_ind = np.argsort(attn[i][-1])[::-1][:5]

print 'Top 5 important sub-strings:'
for ind in top_ind:
    print nodeid2str(ind, test_data[i][0], test_data[i][2], test_data[i][3], test_data[i][4])

Fuller would surely -1 have called this gutsy -1 and -1 at times exhilarating -1 movie -1 -1 -1 a great yarn -1 -1 -1 -1 -1 -1 . -1 -1 	ground truth label: 4

Top 5 important sub-strings:
would surely have called this gutsy and at times exhilarating movie a great yarn . 
would surely have called this gutsy and at times exhilarating movie a great yarn 
this gutsy and at times exhilarating movie 
Fuller
this gutsy and at times exhilarating movie a great yarn 
