# Bake-off: Stanford Sentiment Treebank

In [1]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2018 term"

## Contents

0. [Overview](#Overview)
0. [Bake-off submission](#Bake-off-submission)
0. [Methodological note](#Methodological-note)
0. [Set-up](#Set-up)
0. [Baseline](#Baseline)
0. [TfRNNClassifier wrapper](#TfRNNClassifier-wrapper)
0. [TreeNN wrapper](#TreeNN-wrapper)

## Overview

The goal of this in-class bake-off is to __achieve the highest average F1 score__ on the SST development set, with the binary class function.

The only restriction: __you cannot make any use of the subtree labels__.

## Bake-off submission

1. A description of the model you created.
1. The value of `f1-score` in the `avg / total` row of the classification report.

Submission URL: https://docs.google.com/forms/d/1R41Zxxils7lOPzuThMdv2p1TKmFEy8c0DyUg-YkzTa0/edit

## Methodological note

You don't have to use the experimental framework defined below (based on `sst`). However, if you don't use `sst.experiment` as below, then make sure you're training only on `train`, evaluating on `dev`, and that you report with 

```
from sklearn.metrics import classification_report
classification_report(y_dev, predictions)
```
where `y_dev = [y for tree, y in sst.dev_reader(class_func=sst.binary_class_func)]`

## Set-up

See [the first notebook in this unit](sst_01_overview.ipynb#Set-up) for set-up instructions.

In [1]:
from collections import Counter
from rnn_classifier import RNNClassifier
from sklearn.linear_model import LogisticRegression
import sst
import tensorflow as tf
from tf_rnn_classifier import TfRNNClassifier
from tree_nn import TreeNN

  return f(*args, **kwds)
  return f(*args, **kwds)


## Baseline

In [2]:
def unigrams_phi(tree):
    """The basis for a unigrams feature function.
    
    Parameters
    ----------
    tree : nltk.tree
        The tree to represent.
    
    Returns
    -------    
    defaultdict
        A map from strings to their counts in `tree`. (Counter maps a 
        list to a dict of counts of the elements in that list.)
    
    """
    return Counter(tree.leaves())

In [9]:
def fit_maxent_classifier(X, y):
    mod = LogisticRegression(fit_intercept=True)
    mod.fit(X, y)
    return mod

In [11]:
_ = sst.experiment(
    unigrams_phi,                      # Free to write your own!
    fit_maxent_classifier,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func,
    view_errors=5)  # Fixed.

Accuracy: 0.772
             precision    recall  f1-score   support

   negative      0.783     0.741     0.761       428
   positive      0.762     0.802     0.782       444

avg / total      0.772     0.772     0.772       872

Error: Nothing 's at stake , just a twisty double-cross you can smell a mile away -- still , the derivative Nine Queens is lots of fun . marked as negative but was positive
Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: The film serves as a valuable time capsule to remind us of the devastating horror suffered by an entire people . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young Gaghan 's resume . marked as negative but was positive
Error: A poignant , artfully crafted meditation on mortality . marked as negative but was positive


By the way, with some informal hyperparameter search on a GPU machine, I found this model
```
tf_rnn_glove = TfRNNClassifier(
    sst_glove_vocab,
    embedding=glove_embedding, ## 100d version
    hidden_dim=300,
    max_length=52,
    hidden_activation=tf.nn.relu,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=5000,
    batch_size=1028,
    eta=0.001)
```
which finished with almost identical performance to the above:
    
```
             precision    recall  f1-score   support

   negative       0.78      0.75      0.76       428
   positive       0.77      0.80      0.78       444

avg / total       0.77      0.77      0.77       872
```

## TfRNNClassifier wrapper

In [8]:
def rnn_phi(tree):
    return tree.leaves()    

In [11]:
def fit_tf_rnn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TfRNNClassifier(
        vocab, 
        eta=0.05,
        batch_size=2048,
        embed_dim=50,
        hidden_dim=50,
        max_length=52, 
        max_iter=10,
        cell_class=tf.nn.rnn_cell.LSTMCell,
        hidden_activation=tf.nn.tanh,
        train_embedding=True)
    mod.fit(X, y)
    return mod

In [12]:
_ = sst.experiment(
    rnn_phi,
    fit_tf_rnn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

Iteration 10: loss: 2.7656717896461487

Accuracy: 0.522
             precision    recall  f1-score   support

   negative      0.522     0.304     0.384       428
   positive      0.522     0.732     0.609       444

avg / total      0.522     0.522     0.499       872



## TreeNN wrapper

In [13]:
def tree_phi(tree):
    return tree

In [18]:
def fit_tree_nn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TreeNN(
        vocab, 
        embed_dim=2, 
        max_iter=10)
    mod.fit(X, y)
    return mod

In [19]:
_ = sst.experiment(
    rnn_phi,
    fit_tree_nn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)

Finished epoch 10 of 10; error is 0.6924667762476523

Accuracy: 0.505
             precision    recall  f1-score   support

   negative      0.480     0.114     0.185       428
   positive      0.508     0.881     0.644       444

avg / total      0.494     0.505     0.419       872



# Helpers

In [33]:
def combine_phis(*phis):
    def new_phi(tree):
        new_dict = {}
        
        for phi in phis:
            phi_dict = phi(tree)
            for key in phi_dict:
                if key in new_dict:
                    raise Exception('Keys collision: {}'.format(key))
                new_dict[key] = phi_dict[key]
            
        return new_dict
    
    return new_phi

# Baseline + Cross Validation

In [22]:
from sklearn.linear_model import LogisticRegressionCV

def fit_maxent_cv_classifier(X, y):        
    mod = LogisticRegressionCV(fit_intercept=True, n_jobs=-1)
    mod.fit(X, y)
    return mod

In [23]:
_ = sst.experiment(
    unigrams_phi,                      # Free to write your own!
    fit_maxent_cv_classifier,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func)  # Fixed.

Accuracy: 0.768
             precision    recall  f1-score   support

   negative      0.780     0.736     0.757       428
   positive      0.759     0.800     0.779       444

avg / total      0.769     0.768     0.768       872



# Bigrammerinos

In [5]:
def emit_bigrams(unigrams):
    for i in range(len(unigrams)):
        yield unigrams[i]
        if i+1 < len(unigrams):
            yield unigrams[i] + ' ' + unigrams[i+1]

def bigrams_phi(tree):
    return Counter(emit_bigrams(tree.leaves()))

In [29]:
_ = sst.experiment(
    bigrams_phi,                      # Free to write your own!
    fit_maxent_classifier,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func)  # Fixed.

Accuracy: 0.775
             precision    recall  f1-score   support

   negative      0.786     0.745     0.765       428
   positive      0.766     0.804     0.785       444

avg / total      0.776     0.775     0.775       872



# Bigrams No Stop Words

In [33]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/mwilber/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [34]:
from nltk.corpus import stopwords

def emit_bigrams_no_stop_words(unigrams):
    stop_words = stopwords.words('english')
    clean_unigrams = [unigram for unigram in unigrams if unigram not in stop_words]
    return emit_bigrams(clean_unigrams)

def bigrams_no_stop_words_phi(tree):
    return Counter(emit_bigrams_no_stop_words(tree.leaves()))

In [35]:
_ = sst.experiment(
    bigrams_no_stop_words_phi,                      # Free to write your own!
    fit_maxent_classifier,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func)  # Fixed.

Accuracy: 0.764
             precision    recall  f1-score   support

   negative      0.766     0.748     0.757       428
   positive      0.762     0.779     0.771       444

avg / total      0.764     0.764     0.764       872



# Random Forest Unigrams

In [38]:
from sklearn.ensemble import RandomForestClassifier

def fit_random_forest(X, y):
    random_forest = RandomForestClassifier(n_jobs=-1)
    
    return random_forest.fit(X, y)

In [39]:
_ = sst.experiment(
    unigrams_phi,
    fit_random_forest,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func) 

Accuracy: 0.689
             precision    recall  f1-score   support

   negative      0.685     0.680     0.682       428
   positive      0.694     0.698     0.696       444

avg / total      0.689     0.689     0.689       872



In [40]:
_ = sst.experiment(
    bigrams_phi,
    fit_random_forest,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func) 

Accuracy: 0.704
             precision    recall  f1-score   support

   negative      0.710     0.671     0.690       428
   positive      0.699     0.736     0.717       444

avg / total      0.704     0.704     0.704       872



# Support Vector Machine

In [3]:
from sklearn.svm import LinearSVC

def fit_svm(X, y):
    svm = LinearSVC()
    
    return svm.fit(X, y)

In [4]:
_ = sst.experiment(
    unigrams_phi,
    fit_svm,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func,
    view_errors=5) 

Accuracy: 0.757
             precision    recall  f1-score   support

   negative      0.762     0.734     0.748       428
   positive      0.752     0.779     0.765       444

avg / total      0.757     0.757     0.757       872

Error: Dazzles with its fully-written characters , its determined stylishness -LRB- which always relates to characters and story -RRB- and Johnny Dankworth 's best soundtrack in years . marked as negative but was positive
Error: Nothing 's at stake , just a twisty double-cross you can smell a mile away -- still , the derivative Nine Queens is lots of fun . marked as negative but was positive
Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: The film serves as a valuable time capsule to remind us of the devastating horror suffered by an entire people . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young G

In [46]:
_ = sst.experiment(
    bigrams_phi,
    fit_svm,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func) 

Accuracy: 0.755
             precision    recall  f1-score   support

   negative      0.758     0.734     0.746       428
   positive      0.751     0.775     0.763       444

avg / total      0.755     0.755     0.754       872



# Max Entropy Lemmatized

In [19]:
import nltk
from nltk.stem import WordNetLemmatizer

nltk.download('wordnet')

def lemmatized_unigrams_phi(tree):
    lemmatizer = WordNetLemmatizer()
    return Counter([lemmatizer.lemmatize(unigram) for unigram in tree.leaves()])

def lemmatized_bigrams_phi(tree):
    lemmatizer = WordNetLemmatizer()
    return Counter(emit_bigrams([lemmatizer.lemmatize(unigram) for unigram in tree.leaves()]))

[nltk_data] Downloading package wordnet to /Users/mwilber/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [17]:
_ = sst.experiment(
    lemmatized_bigrams_phi,
    fit_maxent_classifier,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func,
    view_errors=5) 

Accuracy: 0.763
             precision    recall  f1-score   support

   negative      0.771     0.734     0.752       428
   positive      0.755     0.791     0.772       444

avg / total      0.763     0.763     0.762       872

Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: The film serves as a valuable time capsule to remind us of the devastating horror suffered by an entire people . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young Gaghan 's resume . marked as negative but was positive
Error: A poignant , artfully crafted meditation on mortality . marked as negative but was positive
Error: Woody Allen 's latest is an ambling , broad comedy about all there is to love -- and hate -- about the movie biz . marked as negative but was positive


In [20]:
_ = sst.experiment(
    lemmatized_unigrams_phi,
    fit_maxent_classifier,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func,
    view_errors=5) 

Accuracy: 0.760
             precision    recall  f1-score   support

   negative      0.776     0.720     0.747       428
   positive      0.747     0.800     0.773       444

avg / total      0.761     0.760     0.760       872

Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: The film serves as a valuable time capsule to remind us of the devastating horror suffered by an entire people . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young Gaghan 's resume . marked as negative but was positive
Error: A poignant , artfully crafted meditation on mortality . marked as negative but was positive
Error: A rarity among recent Iranian films : It 's a comedy full of gentle humor that chides the absurdity of its protagonist 's plight . marked as negative but was positive


In [21]:
[WordNetLemmatizer().lemmatize(word) for word in 'A beguiling splash of pastel colors and prankish comedy from Disney'.split(' ')]

['A',
 'beguiling',
 'splash',
 'of',
 'pastel',
 'color',
 'and',
 'prankish',
 'comedy',
 'from',
 'Disney']

# Stemming

In [22]:
from nltk.stem import PorterStemmer

def stemmed_unigrams_phi(tree):
    stemmer = PorterStemmer()
    return Counter([stemmer.stem(unigram) for unigram in tree.leaves()])

def stemmed_bigrams_phi(tree):
    stemmer = PorterStemmer()
    return Counter(emit_bigrams([stemmer.stem(unigram) for unigram in tree.leaves()]))

In [23]:
_ = sst.experiment(
    stemmed_unigrams_phi,
    fit_maxent_classifier,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func,
    view_errors=5) 

Accuracy: 0.766
             precision    recall  f1-score   support

   negative      0.775     0.738     0.756       428
   positive      0.759     0.793     0.775       444

avg / total      0.766     0.766     0.766       872

Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: The film serves as a valuable time capsule to remind us of the devastating horror suffered by an entire people . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young Gaghan 's resume . marked as negative but was positive
Error: A rarity among recent Iranian films : It 's a comedy full of gentle humor that chides the absurdity of its protagonist 's plight . marked as negative but was positive
Error: Woody Allen 's latest is an ambling , broad comedy about all there is to love -- and hate -- about the movie biz . marked as negative but was positive


In [24]:
_ = sst.experiment(
    stemmed_bigrams_phi,
    fit_maxent_classifier,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func,
    view_errors=5) 

Accuracy: 0.789
             precision    recall  f1-score   support

   negative      0.798     0.764     0.780       428
   positive      0.781     0.813     0.797       444

avg / total      0.789     0.789     0.789       872

Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young Gaghan 's resume . marked as negative but was positive
Error: A poignant , artfully crafted meditation on mortality . marked as negative but was positive
Error: Woody Allen 's latest is an ambling , broad comedy about all there is to love -- and hate -- about the movie biz . marked as negative but was positive
Error: It 's a stunning lyrical work of considerable force and truth . marked as negative but was positive


# Sentence Length

In [28]:
def sentence_length_phi(tree):
    return {'sentence_length': len(tree.leaves())}

stemmed_bigrams_sentence_length_phi = combine_phis(sentence_length_phi, stemmed_bigrams_phi)

In [29]:
_ = sst.experiment(
    stemmed_bigrams_sentence_length_phi,
    fit_maxent_classifier,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func,
    view_errors=5)

Accuracy: 0.790
             precision    recall  f1-score   support

   negative      0.797     0.769     0.782       428
   positive      0.784     0.811     0.797       444

avg / total      0.790     0.790     0.790       872

Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young Gaghan 's resume . marked as negative but was positive
Error: Woody Allen 's latest is an ambling , broad comedy about all there is to love -- and hate -- about the movie biz . marked as negative but was positive
Error: It 's a stunning lyrical work of considerable force and truth . marked as negative but was positive
Error: The inhospitability of the land emphasizes the spare precision of the narratives and helps to give them an atavistic power , as if they were tales that had been handed down since the beginning of time . marked as negative but

# Word Length

In [34]:
def average_word_length_phi(tree):
    return Counter({'avg_word_length': sum(len(leaf) for leaf in tree.leaves()) / len(tree.leaves())})

stemmed_bigrams_sentence_word_length_phi = combine_phis(sentence_length_phi, stemmed_bigrams_phi, average_word_length_phi)

In [35]:
_ = sst.experiment(
    stemmed_bigrams_sentence_word_length_phi,
    fit_maxent_classifier,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func,
    view_errors=5)

Accuracy: 0.787
             precision    recall  f1-score   support

   negative      0.795     0.762     0.778       428
   positive      0.779     0.811     0.795       444

avg / total      0.787     0.787     0.787       872

Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young Gaghan 's resume . marked as negative but was positive
Error: Woody Allen 's latest is an ambling , broad comedy about all there is to love -- and hate -- about the movie biz . marked as negative but was positive
Error: It 's a stunning lyrical work of considerable force and truth . marked as negative but was positive
Error: The inhospitability of the land emphasizes the spare precision of the narratives and helps to give them an atavistic power , as if they were tales that had been handed down since the beginning of time . marked as negative but

# Number of Capitalized Words

In [39]:
import string

def num_capitalized_words_phi(tree):
    return Counter({'num_capitalized_words': sum(1 for leaf in tree.leaves() if leaf[0] in string.ascii_uppercase)})

stemmed_bigrams_sentence_length_caps_phi = combine_phis(sentence_length_phi, stemmed_bigrams_phi, num_capitalized_words_phi)

In [40]:
_ = sst.experiment(
    stemmed_bigrams_sentence_length_caps_phi,
    fit_maxent_classifier,
    train_reader=sst.train_reader,  
    assess_reader=sst.dev_reader,   
    class_func=sst.binary_class_func,
    view_errors=5)

Accuracy: 0.790
             precision    recall  f1-score   support

   negative      0.794     0.773     0.783       428
   positive      0.787     0.806     0.796       444

avg / total      0.790     0.790     0.790       872

Error: A beguiling splash of pastel colors and prankish comedy from Disney . marked as negative but was positive
Error: The film serves as a valuable time capsule to remind us of the devastating horror suffered by an entire people . marked as negative but was positive
Error: ... an otherwise intense , twist-and-turn thriller that certainly should n't hurt talented young Gaghan 's resume . marked as negative but was positive
Error: Woody Allen 's latest is an ambling , broad comedy about all there is to love -- and hate -- about the movie biz . marked as negative but was positive
Error: It 's a stunning lyrical work of considerable force and truth . marked as negative but was positive
