# Plumbing
1. Download and unpack `sentence polarity dataset v1.0` from http://www.cs.cornell.edu/people/pabo/movie-review-data/
2. Download BNC (TODO)
3. Download the EasyCCG parser from http://homepages.inf.ed.ac.uk/s1049478/easyccg.html, unpack the package (you should get a catalog like `easyccg-0.2`). From the same page, download the regular pretrained model (`model.tar.gz`). Unpack the model to the parser's catalog.

# Getting the British National Corpus & the word list

We will parse BNC XML files with lxml. NLTK technically has a dedicated parser for BNC, which is extremely slow in the lazy mode, and in the non-lazy mode it is very slow and also consumes >8GB of memory.

In [1]:
bnc_path = 'BNC/Texts/'
from os.path import exists

def bnc_files_iter():
    top_level = ['A', 'B', 'C', 'D', 'E', 'F', 'H', 'I', 'J', 'K']
    symbols = top_level + ['L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'W', 'V', 'X', 'Y', 'Z',
                           '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
    for top in top_level:
        top_path = bnc_path + '/' + top
        if not exists(top_path):
            continue
        for symbol2 in symbols:
            path2 = top_path + '/' + top + symbol2
            if not exists(path2):
                continue
            for symbol3 in symbols:
                current_path = path2 + '/' + top + symbol2 + symbol3 + '.xml'
                if not exists(current_path):
                    continue
                yield open(current_path)

In [2]:
from lxml import etree

In [3]:
unique_words = set()

for bnc_file in bnc_files_iter():
    file_tree = etree.parse(bnc_file)
    for element in file_tree.iter():
        if (element.tag == 'w' or element.tag == 'c') and element.text:
            unique_words.add(element.text.strip())
    bnc_file.close()
    
unique_words = list(unique_words)
print(unique_words[:10])

['', 'telegrams/telexes/faxes/express', '91–100%', 'toques', 'rush-strewn', 'Angerer', 'Big-Endians', 'Sayed', 'WARMER', 'Brews']


In [4]:
unique_count = len(unique_words)
print(unique_count)

705241


# Getting CCG parse trees for BNC

In [5]:
# we will run the underlying parser with pexpect, and intercept its outputs from within Python
import pexpect
parser = pexpect.spawn('java -jar easyccg-0.2/easyccg.jar --model easyccg-0.2/model')
parser.expect('Model loaded, ready to parse.')
parser.send('The cat chases a ball of yarn.\n')
parser.expect('ID')
parser.expect('\n\(.*\n')
parser_output = parser.after.decode().strip() # encode from bytes into str, strip whitespace
print(parser_output)
parser.terminate()

(<T S[dcl] 1 2> (<T NP[nb] 0 2> (<L NP[nb]/N POS POS The NP[nb]/N>) (<L N POS POS cat N>) ) (<T S[dcl]\NP 0 2> (<L (S[dcl]\NP)/NP POS POS chases (S[dcl]\NP)/NP>) (<T NP[nb] 0 2> (<T NP[nb] 0 2> (<L NP[nb]/N POS POS a NP[nb]/N>) (<L N POS POS ball N>) ) (<T NP\NP 0 2> (<L (NP\NP)/NP POS POS of (NP\NP)/NP>) (<T NP 0 1> (<L N POS POS yarn. N>) ) ) ) ) )


False

Let's see how NLTK can handle parse trees.

In [6]:
import re
only_word = re.compile(r'<L\s\S+\sPOS\sPOS\s(\S+)\s\S+>')
concat_label = re.compile(r'<(\S+)\s(\S+)\s(\S+)\s(\S+)>')

# some string cleanup
def clean_parser_output(parse_output):
    return concat_label.sub(lambda match: '<'+match.group(1)+'_'+match.group(2).replace('(', '[').replace(')', ']')
                            +'_'+match.group(3)+'_'+match.group(4)+'>',
                            only_word.sub(lambda match: match.group(1), parse_output))

from nltk.tree import ParentedTree
tree = ParentedTree.fromstring(clean_parser_output(parser_output))
print(tree)

(<T_S[dcl]_1_2>
  (<T_NP[nb]_0_2> (The ) (cat ))
  (<T_S[dcl]\NP_0_2>
    (chases )
    (<T_NP[nb]_0_2>
      (<T_NP[nb]_0_2> (a ) (ball ))
      (<T_NP\NP_0_2> (of ) (<T_NP_0_1> (yarn. ))))))


In each `(parenthesized expression)`, the first item `(head)` is the category of node, and two next items are its child nodes.

## Learning word embeddings

Our embedding procedure will be based on this Tensorflow [word2vec tutorial](https://www.tensorflow.org/tutorials/word2vec).

In [7]:
# Consistently map each unique word to a integer.
word_map = { word: index for index, word in enumerate(unique_words) }

In [8]:
# Collect all sentences from the corpus, with words as their indices in the word map.
corpus_sents = []

for bnc_file in bnc_files_iter():
    file_tree = etree.parse(bnc_file)
    for element in file_tree.iter():
        if element.tag == 's':
            corpus_sents.append([])
        if (element.tag == 'w' or element.tag == 'c') and element.text:
            corpus_sents[-1].append(word_map[element.text.strip()])
    bnc_file.close()

Generate batches of pairs (context word, target word). For simplicity, we hardcode the window size (2) and number of examples in window.

In [9]:
import numpy as np

In [10]:
from random import randint
from math import floor

vocabulary_size = len(unique_words) + 1 # add the boundary token
embedding_size = 128
batch_size = 128
# Number of sample correct word pairs to be shown to word2vec for one random target word.
num_samples = 4
assert num_samples % 2 == 0
assert batch_size % num_samples == 0
# We need a special token for cases when the target word is near the start or end of sentence.
bound_token_id = vocabulary_size - 1

corp_runs = 6
sent_step = 1 # we train 1/sent_step of all the sentences

def skipgram_batches():
    for run_n in range(corp_runs):
        sent_n = 0
        word_n = 0
        
        target_n = 0 # relative to the current batch
        
        batch = np.ndarray(shape=(batch_size), dtype=np.int32)
        labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
        
        while sent_n < len(corpus_sents):
            for j in range(num_samples):
                batch[target_n*num_samples+j] = corpus_sents[sent_n][word_n]
            # "Good" examples - words near the target (we will let TensorFlow randomize the "bad" ones)
            for j in range(num_samples // 2):
                labels[target_n*num_samples+j*2][0] = (corpus_sents[sent_n][word_n-j-1] if word_n-j-1 >= 0
                                                       else bound_token_id)
                labels[target_n*num_samples+j*2+1][0] = (corpus_sents[sent_n][word_n+j+1]
                                                         if word_n+j+1 < len(corpus_sents[sent_n])
                                                         else bound_token_id)
                
            target_n += 1
            if target_n == (batch_size // num_samples):
                yield batch, labels, False
                batch = np.ndarray(shape=(batch_size), dtype=np.int32)
                labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
                target_n = 0
                
            word_n += 1
            try:
                while word_n == len(corpus_sents[sent_n]):
                    word_n = 0
                    sent_n += sent_step
                    if (floor(sent_n / len(corpus_sents) * 10)
                        > floor((sent_n-sent_step) / len(corpus_sents) * 10)):
                        print('{}0%'.format(floor(sent_n / len(corpus_sents) * 10)), end=' ')
            except IndexError: # happens on the end of the corpus
                break
                
        batch[target_n:] = 0.0
        labels[target_n:, :] = 0.0
        yield batch, labels, (run_n == corp_runs - 1)

In [11]:
import tensorflow as tf
import math

  return f(*args, **kwds)


In [12]:
tf.reset_default_graph()
with tf.device('/cpu:0'):
    # Model parameters: word embeddings and model weights & biases for each word.
    embeddings = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    nce_weights = tf.Variable(tf.truncated_normal([vocabulary_size, embedding_size],
                                                  stddev=1.0 / math.sqrt(embedding_size)))
    nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

In [13]:
with tf.device('/cpu:0'):
    # The computation graph.
    inputs = tf.placeholder(tf.int32, shape=[batch_size])
    labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
    embedding_layer = tf.nn.embedding_lookup(embeddings, inputs)
    # Note that word2vec has no "real" hidden layers apart from the embedding.
    
    # Number of random words to sample apart from the true target; the model should learn to
    # assign low probability to them given the context.
    negative_samples_n = batch_size
    
    loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weights,
                                         biases=nce_biases,
                                         labels=labels,
                                         inputs=embedding_layer,
                                         num_sampled=negative_samples_n,
                                         num_classes=vocabulary_size))
    # Vanilla SGD seems to work here better - since we train practically a different word vector
    # each time, decaying momentum hinders training of later vectors before they can even be shown
    # to the net, especially in the case of Adagrad's vanishing updates.
    optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)

In [14]:
import datetime

trained_embeddings = [] # we want to use them later
with tf.Session() as sess:
    print('Training start:', datetime.datetime.now())
    tf.global_variables_initializer().run()
    i = 0
    for batch_inputs, batch_labels, is_last in skipgram_batches():
        if is_last:
            _, loss_val, trained_embeddings = sess.run([optimizer, loss, embeddings], feed_dict={inputs: batch_inputs,
                                                             labels: batch_labels})
            print('Final loss:', loss_val)
            print('Training end:', datetime.datetime.now())
        else:
            _, loss_val = sess.run([optimizer, loss], feed_dict={inputs: batch_inputs,
                                                             labels: batch_labels})
            if (i % 250000 == 0):
                print('(loss: {})'.format(loss_val), end=' ')
        i += 1

Training start: 2017-12-10 02:20:30.649435
(loss: 756.727783203125) (loss: 0.83027184009552) 10% (loss: 11.908158302307129) 20% (loss: 1.0684494972229004) (loss: 13.583332061767578) 30% (loss: 3.6548843383789062) 40% (loss: 0.25112614035606384) 50% (loss: 1.2839581966400146) 60% (loss: 102.95244598388672) (loss: 2.302624225616455) 70% (loss: 0.3007936179637909) 80% (loss: 0.3345237970352173) 90% (loss: 0.7518746852874756) 100% (loss: 1.0946375131607056) 10% (loss: 0.7554669380187988) 20% (loss: 3.34639310836792) (loss: 0.5009531378746033) 30% (loss: 1.2865395545959473) 40% (loss: 0.3014761209487915) (loss: 0.8342760801315308) 50% (loss: 0.8358651399612427) 60% (loss: 1.373337745666504) 70% (loss: 0.6899360418319702) 80% (loss: 0.18796537816524506) 90% (loss: 1.2796738147735596) 100% (loss: 2.521474599838257) 10% (loss: 0.6453143358230591) (loss: 0.6291516423225403) 20% (loss: 1.1040339469909668) 30% (loss: 1.1350901126861572) 40% (loss: 2.5494096279144287) (loss: 0.5065284967422485) 50

In [15]:
def nearest_neighbor(word):
    dists = np.abs(trained_embeddings - trained_embeddings[word_map[word], ]).sum(axis=1)
    dists[word_map[word]] = 1e6
    return unique_words[dists.argmin(axis=0)]

In [16]:
print('Nearest word vectors for:')
print('cat:', nearest_neighbor('cat'))
print('doctor:', nearest_neighbor('doctor'))
print('cold:', nearest_neighbor('cold'))
print('blue:', nearest_neighbor('blue'))
print('red:', nearest_neighbor('red'))
print('walk:', nearest_neighbor('walk'))
print('bring:', nearest_neighbor('bring'))
print('is:', nearest_neighbor('is'))
print('Europe:', nearest_neighbor('europe'))

Nearest word vectors for:
cat: dog
doctor: nurse
cold: hot
blue: dark
red: yellow
walk: talk
bring: brought
is: was
Europe: clinic


In [27]:
np.savetxt('trained_embeddings.csv', trained_embeddings)

## Learning the transformation matrix

In [17]:
import numpy as np
import torch
from torch.autograd import Variable

In [129]:
enc_W = Variable(torch.randn(embedding_size*2, embedding_size), requires_grad=True)
enc_b = Variable(torch.zeros(1, embedding_size), requires_grad=True)
dec_W = Variable(torch.randn(embedding_size, embedding_size*2), requires_grad=True)
dec_b = Variable(torch.zeros(1, embedding_size*2), requires_grad=True)

In [19]:
def encode_node(child1, child2):
    #"""Both child1 and child2 are numpy arrays of shape (1, embedding_size). Return the encoding
    #(1, embedding_size)."""
    conc_embeds = Variable(torch.cat((child1.data, child2.data), 0))
    # we use.view() because we need to make sure that the return value is a vector (as word embeddings),
    # not a matrix
    return conc_embeds.matmul(enc_W).add(enc_b).tanh().view(embedding_size)

def decode_node(node):
    # node is (1, embedding_size), output is (1, 2*embedding_size)
    return node.matmul(dec_W).add(dec_b).tanh().view(embedding_size*2)

In [100]:
from functools import reduce
from random import choice, randint
encoding_train_batch_size = 50 # number of sentences

# Handle special treatment of parens by our parser.
def nd_lbl(node):
    if node.label() == '-LRB-':
        return '('
    elif node.label() == '-RRB-':
        return ')'
    else:
        return node.label()

# Note that node_encodings are passed by value, so we always modify the dictionary given to
# the topmost function call.
def encode_tree(node, node_encodings):
    "Encode_tree returns a pair of lists of partial derivatives for encoding matrix and bias"
    subtrees = [subtr for subtr in node]
    if len(subtrees) == 0: # a leaf
        if nd_lbl(node) in word_map:
            node_encodings[nd_lbl(node)] = Variable(
                torch.from_numpy(trained_embeddings[word_map[nd_lbl(node)], ]))
        else: # replace unknowns with a random word
            node_encodings[nd_lbl(node)] = Variable(
                torch.from_numpy(trained_embeddings[randint(0, trained_embeddings.shape[0]), ]))
    elif len(subtrees) == 1:
        encode_tree(subtrees[0], node_encodings)
        node_encodings[nd_lbl(node)] = node_encodings[nd_lbl(subtrees[0])]
    else:
        if len(subtrees) != 2: # dbg
            print(subtrees)
        encode_tree(subtrees[0], node_encodings)
        encode_tree(subtrees[1], node_encodings)
        node_encodings[nd_lbl(node)] = encode_node(
            node_encodings[nd_lbl(subtrees[0])],
            node_encodings[nd_lbl(subtrees[1])])

def make_parser():
    parser = pexpect.spawn('java -jar easyccg-0.2/easyccg.jar --model easyccg-0.2/model')
    parser.expect('Model loaded, ready to parse.')
    return parser

def kill_parser(parser):
    parser.terminate()
    
def sentence_tree(sentence_form, parser):
    parser.send(sentence_form+'\n')
    # (this secures us from finding one of the patterns below in the sentence itself:)
    response = parser.expect([pexpect.TIMEOUT, 'ID', pexpect.EOF])
    if response == 1: # can't happen if timed out
        response = parser.expect(['Skipping sentence of length', '\n\(.*\n', pexpect.TIMEOUT])
    if response == 0: # timeout, pass
        return False
    if response == 2:
        print('received EOF from the parser on', sentence_form)
        raise RuntimeError('parser died')
    parser_output = parser.after.decode().strip() # encode from bytes into str, strip whitespace
    return ParentedTree.fromstring(clean_parser_output(parser_output))

In [180]:
# Parser has to be a global, so tree generators can restart it in case of problems,
parser = None

def run_autoencoder(feed_generator, learning_rate=1.0,
        # The following can be supplied for the additional task of sentiment binary classification.
        sentim_classif_fun=None, sentim_W=None, sentim_b=None,
        # for evaluating sentiment classification:
        enable_training=True, measure_accuracy=False):
    
    def commit_gradients():
        # Perform the gradient descent.
        enc_W.data -= enc_W.grad.data * learning_rate
        enc_b.data -= enc_b.grad.data * learning_rate
        dec_W.data -= dec_W.grad.data * learning_rate
        dec_b.data -= dec_b.grad.data * learning_rate
            
        # Clear the gradient after applying.
        enc_W.grad.data.zero_()
        enc_b.grad.data.zero_()
        dec_W.grad.data.zero_()
        dec_b.grad.data.zero_()
    
    global parser
    parser = make_parser()
    accuracy = 0

    tree_n = 0
    for tree, sentence in feed_generator:
        tree_n += 1
        tree_accum_error = 0.0
        tree_node_n = 0 # count the nodes from all trees to average the error

        # Encode the tree.
        node_encodings = dict()
        encode_tree(tree.root(), node_encodings)
        
        # Decode the tree back again.
        # this dictionary in fact maps nodes to their *partial* decodings from which their children are to be
        # recreated; thus for the root it's just its encoding, from which we will retrieve immediate children
        node_decodings = dict()
        node_decodings[nd_lbl(tree.root())] = node_encodings[nd_lbl(tree.root())]
        nodes_to_visit = [ tree.root() ]
        while nodes_to_visit:
            current_node = nodes_to_visit.pop()
            children = [child for child in current_node]
            if len(children) in [0, 1]: # leaf
                continue
            elif len(children) == 2: # not a leaf
                decoded_node = decode_node(node_decodings[nd_lbl(current_node)])
                node_decodings[nd_lbl(children[0])] = decoded_node[:embedding_size]
                node_decodings[nd_lbl(children[1])] = decoded_node[embedding_size:]
                
                nodes_to_visit += children
                
                if enable_training:
                    err = node_encodings[nd_lbl(current_node)].sub(node_decodings[nd_lbl(current_node)]).abs().sum()
                    err.backward(retain_graph=True) # accumulate gradient
                    commit_gradients()
                    tree_accum_error += err.data
                tree_node_n += 1
            else:
                raise RuntimeError('unexpected number of node children ({}) in decode:'.format(
                    len(children), str(current_node)))
                
        # Additional task - sentiment classification.
        if sentim_classif_fun:
            y = Variable(torch.Tensor([ sentim_classif_fun(sentence) ]))
            y_pred = node_encodings[nd_lbl(tree.root())].matmul(sentim_W).add(sentim_b).sigmoid()
            classif_err = y.add(- y_pred).abs()
            
            if measure_accuracy:
                if y.data[0] == y_pred.data[0][0]:
                    accuracy += 1
            
            if enable_training:
                classif_err.backward(retain_graph=True)
                
                sentim_W.data -= sentim_W.grad.data * learning_rate
                sentim_b.data -= sentim_b.grad.data * learning_rate
                sentim_W.grad.data.zero_()
                sentim_b.grad.data.zero_()
                commit_gradients()
            
        if tree_node_n <= 1:
            continue
        
        if enable_training and tree_n % 75 == 0:
            print('Error metric: ', (tree_accum_error / tree_node_n)[0])

    kill_parser(parser)
    if measure_accuracy:
        print('Accuracy: {}'.format(accuracy / tree_n))
    parser = None

In [163]:
bnc_trees_batch_size = 101

def bnc_trees_batch():
    global parser
    for i in range(bnc_trees_batch_size):
        tree = False
        # It's possible that sentence_tree() returns False, if the sentence was too long and
        # rejected by the parser, or it timeouts.
        sentence_form = ''
        while not tree:
            sentence_n = randint(0, len(corpus_sents))
            while sentence_n in used_sents:
                sentence_n = randint(0, len(corpus_sents))
            sentence = corpus_sents[sentence_n]
            used_sents.append(sentence_n)
            
            sentence_form = ' '.join([unique_words[word_id] for word_id in sentence])
            #print(sentence_n, sentence_form)
            try:
                tree = sentence_tree(sentence_form, parser)
            except RuntimeError as err: # parser died
                if err.args[0] == 'parser died':
                    parser = make_parser()
                    tree = False
                    continue
                else:
                    raise RuntimeError(msg)
        yield tree, sentence_form

In [136]:
learning_rate = 0.2
iters_n = 50 # number of batches to be fed to the autoencoder from BNC alone

for i in range(iters_n):
    feed_gen = bnc_trees_batch()
    run_autoencoder(feed_gen, learning_rate)

Error metric:  82.29735565185547
Error metric:  119.57599639892578
Error metric:  124.4228515625
Error metric:  119.85279083251953
Error metric:  119.9255142211914
Error metric:  67.9533462524414
Error metric:  105.62971496582031
Error metric:  115.7391128540039
Error metric:  124.64643859863281
Error metric:  123.59395599365234
Error metric:  121.17572784423828
Error metric:  60.99999237060547
Error metric:  127.54508209228516
Error metric:  121.99143981933594
Error metric:  124.6204605102539
Error metric:  112.07550811767578
Error metric:  129.04229736328125
Error metric:  120.79609680175781
Error metric:  102.4010238647461
Error metric:  105.70718383789062
Error metric:  106.33282470703125
Error metric:  58.47898483276367
Error metric:  119.68660736083984
Error metric:  72.68099212646484
Error metric:  111.3286361694336
Error metric:  114.75283813476562
Error metric:  114.69685363769531
Error metric:  110.8473892211914
Error metric:  120.12096405029297
Error metric:  90.997711181640

# Training a sentiment analysis model

In [43]:
import io
pos_sents = []
neg_sents = []
with io.open('rt-polaritydata/rt-polaritydata/rt-polarity.pos', encoding='latin-1') as pos_file:
    pos_sents = [line.split() for line in pos_file.read().split('\n')]
with io.open('rt-polaritydata/rt-polaritydata/rt-polarity.neg', encoding='latin-1') as neg_file:
    neg_sents = [line.split() for line in neg_file.read().split('\n')]
    
print(pos_sents[0])

['the', 'rock', 'is', 'destined', 'to', 'be', 'the', '21st', "century's", 'new', '"', 'conan', '"', 'and', 'that', "he's", 'going', 'to', 'make', 'a', 'splash', 'even', 'greater', 'than', 'arnold', 'schwarzenegger', ',', 'jean-claud', 'van', 'damme', 'or', 'steven', 'segal', '.']


Now we will split the sentence polarity corpus into test and training slices in proportion 10/90, just as in the paper.

In [137]:
from random import sample
assert len(pos_sents) == len(neg_sents)
test_ids = sample(range(len(sents)), len(sents) // 10)

In [179]:
def sent_polarity_feed(is_test=False):
    global parser
    for sent_i in range(len(pos_sents)):
        if (sent_i in test_ids) != is_test:
                continue
        
        # We have to alternate the positive and negative sentences at least for the training phase.
        for sent_corp in [pos_sents, neg_sents]:
            try:
                tree = sentence_tree(' '.join(sent_corp[sent_i]), parser)
            except RuntimeError as err:
                if err.args[0] == 'parser died':
                    parser = make_parser()
                    tree = False
                else:
                    raise RuntimeError(err)
                    
            if tree:
                #tree.pprint()
                yield tree, ' '.join(sent_corp[sent_i])

In [139]:
polarity_dict = { }
for sent in pos_sents:
    polarity_dict[' '.join(sent)] = 1.0
for sent in neg_sents:
    polarity_dict[' '.join(sent)] = 0.0

def get_sent_polarity(sent):
    return polarity_dict[sent]

In [161]:
learning_rate = 0.5

sentim_W = Variable(torch.randn(embedding_size, 1), requires_grad=True)
sentim_b = Variable(torch.zeros(1, 1), requires_grad=True)

feed_gen = sent_polarity_feed()
run_autoencoder(feed_gen, learning_rate,
                sentim_classif_fun=get_sent_polarity, sentim_W=sentim_W, sentim_b=sentim_b)

Error metric:  133.8715362548828
Error metric:  132.85879516601562
Error metric:  124.361328125
Error metric:  119.7922592163086
Error metric:  128.8044891357422
Error metric:  108.0
Error metric:  116.0915298461914
Error metric:  108.21324920654297
Error metric:  110.42928314208984
Error metric:  98.72684478759766
Error metric:  111.57719421386719
Error metric:  92.7457046508789
Error metric:  118.29249572753906
Error metric:  122.05277252197266
Error metric:  127.01619720458984
Error metric:  119.55056762695312
Error metric:  118.27478790283203
Error metric:  128.71327209472656
Error metric:  123.3331298828125
Error metric:  124.17900848388672
Error metric:  75.33333587646484
Error metric:  113.25749206542969
Error metric:  116.8322982788086
Error metric:  128.8441162109375
Error metric:  59.000083923339844
Error metric:  127.61494445800781
Error metric:  124.37112426757812
Error metric:  118.12250518798828
Error metric:  114.53072357177734
Error metric:  124.99211883544922
Error met

In [181]:
feed_gen = sent_polarity_feed(is_test=True)
run_autoencoder(feed_gen, learning_rate,
                sentim_classif_fun=get_sent_polarity, sentim_W=sentim_W, sentim_b=sentim_b,
                enable_training=False, measure_accuracy=True)

Accuracy: 0.030389363722697058


## Logistic regression

The original paper seems to use just a binary classifier of sentence vectors, without any neural net hidden layers. It is the approach we will try first.

### TODO
Try to do an autoencoder without loops: https://groups.google.com/forum/#!topic/theano-users/O5CM49-jMqQ