# Bake-off: Stanford Sentiment Treebank

In [1]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2018 term"

## Contents

0. [Overview](#Overview)
0. [Bake-off submission](#Bake-off-submission)
0. [Methodological note](#Methodological-note)
0. [Set-up](#Set-up)
0. [Baseline](#Baseline)
0. [TfRNNClassifier wrapper](#TfRNNClassifier-wrapper)
0. [TreeNN wrapper](#TreeNN-wrapper)

## Overview

The goal of this in-class bake-off is to __achieve the highest average F1 score__ on the SST development set, with the binary class function.

The only restriction: __you cannot make any use of the subtree labels__.

## Bake-off submission

1. A description of the model you created.
1. The value of `f1-score` in the `avg / total` row of the classification report.

Submission URL: https://docs.google.com/forms/d/1R41Zxxils7lOPzuThMdv2p1TKmFEy8c0DyUg-YkzTa0/edit

## Methodological note

You don't have to use the experimental framework defined below (based on `sst`). However, if you don't use `sst.experiment` as below, then make sure you're training only on `train`, evaluating on `dev`, and that you report with 

```
from sklearn.metrics import classification_report
classification_report(y_dev, predictions)
```
where `y_dev = [y for tree, y in sst.dev_reader(class_func=sst.binary_class_func)]`

## Set-up

See [the first notebook in this unit](sst_01_overview.ipynb#Set-up) for set-up instructions.

In [11]:
from collections import Counter
from rnn_classifier import RNNClassifier
from sklearn.linear_model import LogisticRegression
import sst
import os
import utils
import vsm
import numpy as np
import tensorflow as tf
from sklearn import naive_bayes
from tf_rnn_classifier import TfRNNClassifier
from tf_shallow_neural_classifier import TfShallowNeuralClassifier
from tree_nn import TreeNN
vsmdata_home = 'vsmdata'
glove_home = os.path.join(vsmdata_home, 'glove.6B')

## Baseline

In [5]:
def unigrams_phi(tree):
    """The basis for a unigrams feature function.
    
    Parameters
    ----------
    tree : nltk.tree
        The tree to represent.
    
    Returns
    -------    
    defaultdict
        A map from strings to their counts in `tree`. (Counter maps a 
        list to a dict of counts of the elements in that list.)
    
    """
    return Counter(tree.leaves())

In [4]:
def fit_maxent_classifier(X, y):        
    mod = LogisticRegression(fit_intercept=True)
    mod.fit(X, y)
    return mod

In [5]:
_ = sst.experiment(
    unigrams_phi,                      # Free to write your own!
    fit_maxent_classifier,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func)  # Fixed.

Accuracy: 0.772
             precision    recall  f1-score   support

   negative      0.783     0.741     0.761       428
   positive      0.762     0.802     0.782       444

avg / total      0.772     0.772     0.772       872



By the way, with some informal hyperparameter search on a GPU machine, I found this model
```
tf_rnn_glove = TfRNNClassifier(
    sst_glove_vocab,
    embedding=glove_embedding, ## 100d version
    hidden_dim=300,
    max_length=52,
    hidden_activation=tf.nn.relu,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=5000,
    batch_size=1028,
    eta=0.001)
```
which finished with almost identical performance to the above:
    
```
             precision    recall  f1-score   support

   negative       0.78      0.75      0.76       428
   positive       0.77      0.80      0.78       444

avg / total       0.77      0.77      0.77       872
```

In [None]:
_ = sst.experiment(
    unigrams_phi,                      # Free to write your own!
    fit_maxent_classifier,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func)  # Fixed.

In [6]:
def fit_nb_classifier_with_crossvalidation_ppmi(X, y):
    ppmi = vsm.pmi(X)
    basemod = naive_bayes.MultinomialNB()
    cv = 3
    param_grid = {'alpha': [3.2, 3.45, 3.38]}    
    return sst.fit_classifier_with_crossvalidation(ppmi, y, basemod, cv, param_grid)

In [12]:
_ = sst.experiment(
    unigrams_phi,                      # Free to write your own!
    fit_nb_classifier_with_crossvalidation_ppmi,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func)  # Fixed.   uni_phi-0.786

Best params {'alpha': 3.45}
Best score: 0.777
Accuracy: 0.792
             precision    recall  f1-score   support

   negative      0.808     0.757     0.782       428
   positive      0.779     0.827     0.802       444

avg / total      0.793     0.792     0.792       872



## TfRNNClassifier wrapper

In [4]:
def rnn_phi(tree):
    return tree.leaves()    

In [19]:
def fit_tf_rnn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TfRNNClassifier(
        vocab, 
        eta=0.05,
        batch_size=2048,
        embed_dim=50,
        hidden_dim=50,
        max_length=52, 
        max_iter=50,
        cell_class=tf.nn.rnn_cell.LSTMCell,
        hidden_activation=tf.nn.tanh,
        train_embedding=True)
    mod.fit(X, y)
    return mod

In [20]:
_ = sst.experiment(
    rnn_phi,
    fit_tf_rnn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

Iteration 50: loss: 2.7455841302871704

Accuracy: 0.552
             precision    recall  f1-score   support

   negative      0.557     0.421     0.479       428
   positive      0.548     0.678     0.606       444

avg / total      0.553     0.552     0.544       872



In [17]:
def unigrams_phi(tree):
    return Counter(tree.leaves())

In [48]:
tf_rnn_glove = TfRNNClassifier(sst_glove_vocab)
def fit_tf_rnn_classifier_with_crossvalidation(X, y):
    basemod = tf_rnn_glove
    cv = 5
    param_grid = {'vocab' : ['sst_glove_vocab'],'hidden_dim': [100, 200, 300], 'embedding' : ['glove_embedding'], 
                  'max_iter' : [10],'embedding' : ['glove_embedding'], 'max_length': [52]}
    return sst.fit_classifier_with_crossvalidation(X, y, basemod, cv, param_grid)

In [61]:
glove_lookup = utils.glove2dict(
    os.path.join(glove_home, 'glove.6B.50d.txt'))
X_rnn_train, y_rnn_train = sst.build_binary_rnn_dataset(sst.train_reader)
sst_train_vocab = sst.get_vocab(X_rnn_train, n_words=3000)
sst_glove_vocab = sorted(set(glove_lookup) & set(sst_train_vocab))
glove_embedding = np.array([glove_lookup[w] for w in sst_glove_vocab])

In [50]:
def rnn_phi(tree):
    return tree.leaves()    

In [66]:
def fit_tf_rnn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TfRNNClassifier(
        sst_train_vocab, 
        eta=0.005,
        batch_size=1024,
        embed_dim=50,
        hidden_dim=300,
        max_length=52, 
        max_iter=100,
        cell_class=tf.nn.rnn_cell.LSTMCell,
        hidden_activation=tf.nn.relu,
        train_embedding=True)
    mod.fit(X, y)
    return mod

In [67]:
_ = sst.experiment(
    rnn_phi,    
    fit_tf_rnn_classifier,
    vectorize=False,
    assess_reader=sst.dev_reader,
    class_func=sst.binary_class_func)

Iteration 100: loss: 4.798083841800695

Accuracy: 0.544
             precision    recall  f1-score   support

   negative      0.570     0.285     0.380       428
   positive      0.535     0.793     0.639       444

avg / total      0.552     0.544     0.512       872



## TreeNN wrapper

In [16]:
def tree_phi(tree):
    return tree

In [23]:
def fit_tree_nn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TreeNN(
        vocab, 
        embed_dim=100, 
        max_iter=50)
    mod.fit(X, y)
    return mod

In [24]:
 _ = sst.experiment(
    rnn_phi,
    fit_tree_nn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

Finished epoch 50 of 50; error is 0.8251852776912483

Accuracy: 0.514
             precision    recall  f1-score   support

   negative      0.506     0.411     0.454       428
   positive      0.519     0.613     0.562       444

avg / total      0.513     0.514     0.509       872

