# Homework and bake-off: Word similarity

In [38]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2020"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Dataset readers](#Dataset-readers)
1. [Dataset comparisons](#Dataset-comparisons)
  1. [Vocab overlap](#Vocab-overlap)
  1. [Pair overlap and score correlations](#Pair-overlap-and-score-correlations)
1. [Evaluation](#Evaluation)
  1. [Dataset evaluation](#Dataset-evaluation)
  1. [Dataset error analysis](#Dataset-error-analysis)
  1. [Full evaluation](#Full-evaluation)
1. [Homework questions](#Homework-questions)
  1. [PPMI as a baseline [0.5 points]](#PPMI-as-a-baseline-[0.5-points])
  1. [Gigaword with LSA at different dimensions [0.5 points]](#Gigaword-with-LSA-at-different-dimensions-[0.5-points])
  1. [Gigaword with GloVe for a small number of iterations [0.5 points]](#Gigaword-with-GloVe-for-a-small-number-of-iterations-[0.5-points])
  1. [Dice coefficient [0.5 points]](#Dice-coefficient-[0.5-points])
  1. [t-test reweighting [2 points]](#t-test-reweighting-[2-points])
  1. [Enriching a VSM with subword information [2 points]](#Enriching-a-VSM-with-subword-information-[2-points])
  1. [Your original system [3 points]](#Your-original-system-[3-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

Word similarity datasets have long been used to evaluate distributed representations. This notebook provides basic code for conducting such analyses with a number of datasets:

| Dataset | Pairs | Task-type | Current best Spearman $\rho$ | Best $\rho$ paper |   |
|---------|-------|-----------|------------------------------|-------------------|---|
| [WordSim-353](http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/) | 353 | Relatedness | 82.8 | [Speer et al. 2017](https://arxiv.org/abs/1612.03975) |
| [MTurk-771](http://www2.mta.ac.il/~gideon/mturk771.html) | 771 | Relatedness | 81.0 | [Speer et al. 2017](https://arxiv.org/abs/1612.03975) |
| [The MEN Test Collection](http://clic.cimec.unitn.it/~elia.bruni/MEN) | 3,000 | Relatedness | 86.6 | [Speer et al. 2017](https://arxiv.org/abs/1612.03975)  | 
| [SimVerb-3500-dev](http://people.ds.cam.ac.uk/dsg40/simverb.html) | 500 | Similarity | 61.1 | [Mrki&scaron;&cacute; et al. 2016](https://arxiv.org/pdf/1603.00892.pdf) |
| [SimVerb-3500-test](http://people.ds.cam.ac.uk/dsg40/simverb.html) | 3,000 | Similarity | 62.4 | [Mrki&scaron;&cacute; et al. 2016](https://arxiv.org/pdf/1603.00892.pdf) |

Each of the similarity datasets contains word pairs with an associated human-annotated similarity score. (We convert these to distances to align intuitively with our distance measure functions.) The evaluation code measures the distance between the word pairs in your chosen VSM (which should be a `pd.DataFrame`).

The evaluation metric for each dataset is the [Spearman correlation coefficient $\rho$](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient) between the annotated scores and your distances, as is standard in the literature. We also macro-average these correlations across the datasets for an overall summary. (In using the macro-average, we are saying that we care about all the datasets equally, even though they vary in size.)

This homework ([questions at the bottom of this notebook](#Homework-questions)) asks you to write code that uses the count matrices in `data/vsmdata` to create and evaluate some baseline models as well as an original model $M$ that you design. This accounts for 9 of the 10 points for this assignment.

For the associated bake-off, we will distribute two new word similarity or relatedness datasets and associated reader code, and you will evaluate $M$ (no additional training or tuning allowed!) on those new datasets. Systems that enter will receive the additional homework point, and systems that achieve the top score will receive an additional 0.5 points.

## Set-up

In [39]:
from collections import defaultdict
import csv
import itertools
import numpy as np
import os
import pandas as pd
from scipy.stats import spearmanr
import vsm
from IPython.display import display

In [40]:
VSM_HOME = os.path.join('data', 'vsmdata')

WORDSIM_HOME = os.path.join('data', 'wordsim')

## Dataset readers

In [41]:
def wordsim_dataset_reader(
        src_filename, 
        header=False, 
        delimiter=',', 
        score_col_index=2):
    """Basic reader that works for all similarity datasets. They are 
    all tabular-style releases where the first two columns give the 
    word and a later column (`score_col_index`) gives the score.

    Parameters
    ----------
    src_filename : str
        Full path to the source file.
    header : bool
        Whether `src_filename` has a header. Default: False
    delimiter : str
        Field delimiter in `src_filename`. Default: ','
    score_col_index : int
        Column containing the similarity scores Default: 2

    Yields
    ------
    (str, str, float)
       (w1, w2, score) where `score` is the negative of the similarity
       score in the file so that we are intuitively aligned with our
       distance-based code. To align with our VSMs, all the words are 
       downcased.

    """
    with open(src_filename) as f:
        reader = csv.reader(f, delimiter=delimiter)
        if header:
            next(reader)
        for row in reader:
            w1 = row[0].strip().lower()
            w2 = row[1].strip().lower()
            score = row[score_col_index]
            # Negative of scores to align intuitively with distance functions:
            score = -float(score)
            yield (w1, w2, score)

def wordsim353_reader():
    """WordSim-353: http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/"""
    src_filename = os.path.join(
        WORDSIM_HOME, 'wordsim353', 'combined.csv')
    return wordsim_dataset_reader(
        src_filename, header=True)

def mturk771_reader():
    """MTURK-771: http://www2.mta.ac.il/~gideon/mturk771.html"""
    src_filename = os.path.join(
        WORDSIM_HOME, 'MTURK-771.csv')
    return wordsim_dataset_reader(
        src_filename, header=False)

def simverb3500dev_reader():
    """SimVerb-3500: http://people.ds.cam.ac.uk/dsg40/simverb.html"""
    src_filename = os.path.join(
        WORDSIM_HOME, 'SimVerb-3500', 'SimVerb-500-dev.txt')
    return wordsim_dataset_reader(
        src_filename, delimiter="\t", header=False, score_col_index=3)

def simverb3500test_reader():
    """SimVerb-3500: http://people.ds.cam.ac.uk/dsg40/simverb.html"""
    src_filename = os.path.join(
        WORDSIM_HOME, 'SimVerb-3500', 'SimVerb-3000-test.txt')
    return wordsim_dataset_reader(
        src_filename, delimiter="\t", header=False, score_col_index=3)

def men_reader():
    """MEN: http://clic.cimec.unitn.it/~elia.bruni/MEN"""
    src_filename = os.path.join(
        WORDSIM_HOME, 'MEN', 'MEN_dataset_natural_form_full')
    return wordsim_dataset_reader(
        src_filename, header=False, delimiter=' ') 

This collection of readers will be useful for flexible evaluations:

In [42]:
READERS = (wordsim353_reader, mturk771_reader, simverb3500dev_reader, 
           simverb3500test_reader, men_reader)

## Dataset comparisons

This section does some basic analysis of the datasets. The goal is to obtain a deeper understanding of what problem we're solving – what strengths and weaknesses the datasets have and how they relate to each other. For a full-fledged project, we would want to continue work like this and report on it in the paper, to provide context for the results.

In [43]:
def get_reader_name(reader):
    """Return a cleaned-up name for the similarity dataset 
    iterator `reader`
    """
    return reader.__name__.replace("_reader", "")

### Vocab overlap

How many vocabulary items are shared across the datasets?

In [44]:
def get_reader_vocab(reader):
    """Return the set of words (str) in `reader`."""
    vocab = set()
    for w1, w2, _ in reader():
        vocab.add(w1)
        vocab.add(w2)
    return vocab

In [45]:
def get_reader_vocab_overlap(readers=READERS):
    """Get data on the vocab-level relationships between pairs of 
    readers. Returns a a pd.DataFrame containing this information.
    """
    data = []
    for r1, r2 in itertools.product(readers, repeat=2):       
        v1 = get_reader_vocab(r1)
        v2 = get_reader_vocab(r2)
        d = {
            'd1': get_reader_name(r1),
            'd2': get_reader_name(r2),
            'overlap': len(v1 & v2), 
            'union': len(v1 | v2),
            'd1_size': len(v1),
            'd2_size': len(v2)}
        data.append(d)
    return pd.DataFrame(data)

In [46]:
vocab_overlap = get_reader_vocab_overlap()

In [47]:
def vocab_overlap_crosstab(vocab_overlap):
    """Return an intuitively formatted `pd.DataFrame` giving 
    vocab-overlap counts for all the datasets represented in 
    `vocab_overlap`, the output of `get_reader_vocab_overlap`.
    """        
    xtab = pd.crosstab(
        vocab_overlap['d1'], 
        vocab_overlap['d2'], 
        values=vocab_overlap['overlap'], 
        aggfunc=np.mean)
    # Blank out the upper right to reduce visual clutter:
    for i in range(0, xtab.shape[0]):
        for j in range(i+1, xtab.shape[1]):
            xtab.iloc[i, j] = ''        
    return xtab        

In [48]:
vocab_overlap_crosstab(vocab_overlap)

d2,men,mturk771,simverb3500dev,simverb3500test,wordsim353
d1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
men,751,,,,
mturk771,230,1113.0,,,
simverb3500dev,23,67.0,536.0,,
simverb3500test,30,94.0,532.0,823.0,
wordsim353,86,158.0,13.0,17.0,437.0


This looks reasonable. By design, the SimVerb dev and test sets have a lot of overlap. The other overlap numbers are pretty small, even adjusting for dataset size.

### Pair overlap and score correlations

How many word pairs are shared across datasets and, for shared pairs, what is the correlation between their scores? That is, do the datasets agree?

In [49]:
def get_reader_pairs(reader):
    """Return the set of alphabetically-sorted word (str) tuples 
    in `reader`
    """
    return {tuple(sorted([w1, w2])): score for w1, w2, score in reader()}

In [50]:
def get_reader_pair_overlap(readers=READERS):
    """Return a `pd.DataFrame` giving the number of overlapping 
    word-pairs in pairs of readers, along with the Spearman 
    correlations.
    """    
    data = []
    for r1, r2 in itertools.product(READERS, repeat=2):
        if r1.__name__ != r2.__name__:
            d1 = get_reader_pairs(r1)
            d2 = get_reader_pairs(r2)
            overlap = []
            for p, s in d1.items():
                if p in d2:
                    overlap.append([s, d2[p]])
            if overlap:
                s1, s2 = zip(*overlap)
                rho = spearmanr(s1, s2)[0]
            else:
                rho = None
            # Canonical order for the pair:
            n1, n2 = sorted([get_reader_name(r1), get_reader_name(r2)])
            d = {
                'd1': n1,
                'd2': n2,
                'pair_overlap': len(overlap),
                'rho': rho}
            data.append(d)
    df = pd.DataFrame(data)
    df = df.sort_values(['pair_overlap','d1','d2'], ascending=False)
    # Return only every other row to avoid repeats:
    return df[::2].reset_index(drop=True)

In [51]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    display(get_reader_pair_overlap())

Unnamed: 0,d1,d2,pair_overlap,rho
0,men,mturk771,11,0.592191
1,men,wordsim353,5,0.7
2,mturk771,simverb3500test,4,0.4
3,men,simverb3500test,2,1.0
4,simverb3500dev,simverb3500test,1,
5,simverb3500test,wordsim353,0,
6,simverb3500dev,wordsim353,0,
7,mturk771,wordsim353,0,
8,mturk771,simverb3500dev,0,
9,men,simverb3500dev,0,


This looks reasonable: none of the datasets have a lot of overlapping pairs, so we don't have to worry too much about places where they give conflicting scores.

## Evaluation

This section builds up the evaluation code that you'll use for the homework and bake-off. For illustrations, I'll read in a VSM created from `data/vsmdata/giga_window5-scaled.csv.gz`:

In [52]:
giga5 = pd.read_csv(
    os.path.join(VSM_HOME, "giga_window5-scaled.csv.gz"), index_col=0)

### Dataset evaluation

In [53]:
def word_similarity_evaluation(reader, df, distfunc=vsm.cosine):
    """Word-similarity evalution framework.
    
    Parameters
    ----------
    reader : iterator
        A reader for a word-similarity dataset. Just has to yield
        tuples (word1, word2, score).    
    df : pd.DataFrame
        The VSM being evaluated.        
    distfunc : function mapping vector pairs to floats.
        The measure of distance between vectors. Can also be 
        `vsm.euclidean`, `vsm.matching`, `vsm.jaccard`, as well as 
        any other float-valued function on pairs of vectors.    
        
    Raises
    ------
    ValueError
        If `df.index` is not a subset of the words in `reader`.
    
    Returns
    -------
    float, data
        `float` is the Spearman rank correlation coefficient between 
        the dataset scores and the similarity values obtained from 
        `df` using  `distfunc`. This evaluation is sensitive only to 
        rankings, not to absolute values.  `data` is a `pd.DataFrame` 
        with columns['word1', 'word2', 'score', 'distance'].
        
    """
    data = []
    for w1, w2, score in reader():
        d = {'word1': w1, 'word2': w2, 'score': score}
        for w in [w1, w2]:
            if w not in df.index:
                raise ValueError(
                    "Word '{}' is in the similarity dataset {} but not in the "
                    "DataFrame, making this evaluation ill-defined. Please "
                    "switch to a DataFrame with an appropriate vocabulary.".
                    format(w, get_reader_name(reader))) 
        d['distance'] = distfunc(df.loc[w1], df.loc[w2])
        data.append(d)
    data = pd.DataFrame(data)
    rho, pvalue = spearmanr(data['score'].values, data['distance'].values)
    return rho, data

In [54]:
rho, eval_df = word_similarity_evaluation(men_reader, giga5)

In [55]:
rho

0.40375964105441753

In [56]:
eval_df.head()

Unnamed: 0,distance,score,word1,word2
0,0.956828,-50.0,sun,sunlight
1,0.979143,-50.0,automobile,car
2,0.970105,-49.0,river,water
3,0.980475,-49.0,stairs,staircase
4,0.963624,-49.0,morning,sunrise


### Dataset error analysis

For error analysis, we can look at the words with the largest delta between the gold score and the distance value in our VSM. We do these comparisons based on ranks, just as with our primary metric (Spearman $\rho$), and we normalize both rankings so that they have a comparable number of levels.

In [57]:
def word_similarity_error_analysis(eval_df):    
    eval_df['distance_rank'] = _normalized_ranking(eval_df['distance'])
    eval_df['score_rank'] = _normalized_ranking(eval_df['score'])
    eval_df['error'] =  abs(eval_df['distance_rank'] - eval_df['score_rank'])
    return eval_df.sort_values('error')
    
    
def _normalized_ranking(series):
    ranks = series.rank(method='dense')
    return ranks / ranks.sum()    

Best predictions:

In [58]:
word_similarity_error_analysis(eval_df).head()

Unnamed: 0,distance,score,word1,word2,distance_rank,score_rank,error
1041,0.975007,-32.0,hummingbird,pelican,0.000243,0.000244,2.434543e-07
2315,0.980834,-13.0,lily,pigs,0.000488,0.000487,4.016842e-07
2951,0.983473,-4.0,bucket,girls,0.000602,0.000603,4.151568e-07
150,0.96869,-43.0,night,sunset,0.000102,0.000103,6.520315e-07
2062,0.979721,-17.0,oak,petals,0.000435,0.000436,7.162632e-07


Worst predictions:

In [59]:
word_similarity_error_analysis(eval_df).tail()

Unnamed: 0,distance,score,word1,word2,distance_rank,score_rank,error
67,0.984622,-45.0,branch,twigs,0.00063,7.7e-05,0.000553
190,0.987704,-43.0,birds,stork,0.000657,0.000103,0.000554
185,0.990993,-43.0,bloom,tulip,0.000663,0.000103,0.000561
167,0.99176,-43.0,bloom,blossom,0.000664,0.000103,0.000561
198,0.992406,-43.0,bloom,rose,0.000664,0.000103,0.000561


### Full evaluation

A full evaluation is just a loop over all the readers on which one want to evaluate, with a macro-average at the end:

In [60]:
def full_word_similarity_evaluation(df, readers=READERS, distfunc=vsm.cosine):
    """Evaluate a VSM against all datasets in `readers`.
    
    Parameters
    ----------
    df : pd.DataFrame
    readers : tuple 
        The similarity dataset readers on which to evaluate.
    distfunc : function mapping vector pairs to floats.
        The measure of distance between vectors. Can also be 
        `vsm.euclidean`, `vsm.matching`, `vsm.jaccard`, as well as 
        any other float-valued function on pairs of vectors.    
    
    Returns
    -------
    pd.Series
        Mapping dataset names to Spearman r values.
        
    """        
    scores = {}     
    for reader in readers:
        score, data_df = word_similarity_evaluation(reader, df, distfunc=distfunc)
        scores[get_reader_name(reader)] = score
    series = pd.Series(scores, name='Spearman r')
    series['Macro-average'] = series.mean()
    return series

In [61]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    display(full_word_similarity_evaluation(giga5))

wordsim353         0.327831
mturk771           0.143146
simverb3500dev    -0.065020
simverb3500test   -0.066314
men                0.403760
Macro-average      0.148681
Name: Spearman r, dtype: float64

## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### PPMI as a baseline [0.5 points]

The insight behind PPMI is a recurring theme in word representation learning, so it is a natural baseline for our task. For this question, write a function called `run_giga_ppmi_baseline` that does the following:

1. Reads the Gigaword count matrix with a window of 20 and a flat scaling function into a `pd.DataFrame`s, as is done in the VSM notebooks. The file is `data/vsmdata/giga_window20-flat.csv.gz`, and the VSM notebooks provide examples of the needed code.

1. Reweights this count matrix with PPMI.

1. Evaluates this reweighted matrix using `full_word_similarity_evaluation`. The return value of `run_giga_ppmi_baseline` should be the return value of this call to `full_word_similarity_evaluation`.

The goal of this question is to help you get more familiar with the code in `vsm` and the function `full_word_similarity_evaluation`.

The function `test_run_giga_ppmi_baseline` can be used to test that you've implemented this specification correctly.

In [26]:
def run_giga_ppmi_baseline():
    ##### YOUR CODE HERE
    giga20 = pd.read_csv(os.path.join(VSM_HOME, 'giga_window20-flat.csv.gz'), index_col=0)
    giga20_ppmi = vsm.pmi(giga20, positive=True)
    results = full_word_similarity_evaluation(giga20_ppmi)
    display(results)
    return results

In [27]:
def test_run_giga_ppmi_baseline(run_giga_ppmi_baseline):
    result = run_giga_ppmi_baseline()
    ws_result = result.loc['wordsim353'].round(2)
    ws_expected = 0.58
    assert ws_result == ws_expected, \
        "Expected wordsim353 value of {}; got {}".format(ws_expected, ws_result)

In [28]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_run_giga_ppmi_baseline(run_giga_ppmi_baseline)

wordsim353         0.582573
mturk771           0.495228
simverb3500dev     0.231279
simverb3500test    0.158934
men                0.624885
Macro-average      0.418580
Name: Spearman r, dtype: float64

### Gigaword with LSA at different dimensions [0.5 points]

We might expect PPMI and LSA to form a solid pipeline that combines the strengths of PPMI with those of dimensionality reduction. However, LSA has a hyper-parameter $k$ – the dimensionality of the final representations – that will impact performance. For this problem, write a wrapper function `run_ppmi_lsa_pipeline` that does the following:

1. Takes as input a count `pd.DataFrame` and an LSA parameter `k`.
1. Reweights the count matrix with PPMI.
1. Applies LSA with dimensionality `k`.
1. Evaluates this reweighted matrix using `full_word_similarity_evaluation`. The return value of `run_ppmi_lsa_pipeline` should be the return value of this call to `full_word_similarity_evaluation`.

The goal of this question is to help you get a feel for how much LSA alone can contribute to this problem. 

The  function `test_run_ppmi_lsa_pipeline` will test your function on the count matrix in `data/vsmdata/giga_window20-flat.csv.gz`.

In [29]:
def run_ppmi_lsa_pipeline(count_df, k):
    
    ##### YOUR CODE HERE
    counts_ppmi = vsm.pmi(count_df, positive=True)
    counts_ppmi_lsa = vsm.lsa(counts_ppmi, k=k)
    results = full_word_similarity_evaluation(counts_ppmi_lsa)
    display(results)
    return results

In [30]:
def test_run_ppmi_lsa_pipeline(run_ppmi_lsa_pipeline):
    giga20 = pd.read_csv(
        os.path.join(VSM_HOME, "giga_window20-flat.csv.gz"), index_col=0)
    results = run_ppmi_lsa_pipeline(giga20, k=10)
    men_expected = 0.57
    men_result = results.loc['men'].round(2)
    assert men_result == men_expected,\
        "Expected men value of {}; got {}".format(men_expected, men_result)

In [31]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_run_ppmi_lsa_pipeline(run_ppmi_lsa_pipeline)

wordsim353         0.432193
mturk771           0.416597
simverb3500dev     0.183850
simverb3500test    0.133256
men                0.566648
Macro-average      0.346509
Name: Spearman r, dtype: float64

### Gigaword with GloVe for a small number of iterations [0.5 points]

Ideally, we would run GloVe for a very large number of iterations on a GPU machine to compare it against its close cousin PMI. However, we don't want this homework to cost you a lot of money or monopolize a lot of your available computing resources, so let's instead just probe GloVe a little bit to see if it has promise for our task. For this problem, write a function `run_small_glove_evals` that does the following:

1. Reads in `data/vsmdata/giga_window20-flat.csv.gz`.
1. Runs GloVe for 10, 100, and 200 iterations on `data/vsmdata/giga_window20-flat.csv.gz`, using the `mittens` implementation of `GloVe`. 
  * For all the other parameters to `mittens.GloVe` besides `max_iter`, use the package's defaults.
  * Because of the way that implementation is designed, these will have to be separate runs, but they should be relatively quick. 
1. Stores the values in a `dict` mapping each `max_iter` value to its associated 'Macro-average' score according to `full_word_similarity_evaluation`. `run_small_glove_evals`  should return this `dict`.

The trend should give you a sense for whether it is worth running GloVe for more iterations.

Some implementation notes:

* Your trained GloVe matrix `X` needs to be wrapped in a `pd.DataFrame` to work with `full_word_similarity_evaluation`. `pd.DataFrame(X, index=giga20.index)` will do the trick.

* If `glv` is your GloVe model, then running `glv.sess.close()` after each model is trained will silence warnings from TensorFlow about interactive sessions being active.

Performance will vary a lot for this function, so there is some uncertainty in the testing, but `test_run_small_glove_evals` will at least check that you wrote a function with the right general logic.

In [32]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    !pip install mittens

Collecting mittens
  Downloading mittens-0.2-py3-none-any.whl (15 kB)
Installing collected packages: mittens
Successfully installed mittens-0.2


In [33]:
def run_small_glove_evals(iters=[10, 100, 200]):

    from mittens import GloVe
    
    ##### YOUR CODE HERE
    giga20 = pd.read_csv(os.path.join(VSM_HOME, "giga_window20-flat.csv.gz"), index_col=0)
    results = {}
    for max_iter in iters:
        model = GloVe(max_iter=max_iter)
        glove_matrix = model.fit(giga20.values)
        glove_df = pd.DataFrame(glove_matrix, index=giga20.index)
        similarity_eval = full_word_similarity_evaluation(glove_df)
        display(similarity_eval)
        results[max_iter] = similarity_eval["Macro-average"]
    
    return results

In [34]:
def test_run_small_glove_evals(run_small_glove_evals):
    data = run_small_glove_evals()
    for max_iter in (10, 100, 200):
        assert max_iter in data
        assert isinstance(data[max_iter], float)

In [35]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_run_small_glove_evals(run_small_glove_evals)

Iteration 10: error 14593035.5112

wordsim353         0.058827
mturk771           0.150674
simverb3500dev    -0.048756
simverb3500test   -0.044960
men                0.136597
Macro-average      0.050476
Name: Spearman r, dtype: float64

Iteration 100: error 2711866.4103

wordsim353         0.161448
mturk771           0.240650
simverb3500dev    -0.006228
simverb3500test    0.006953
men                0.270360
Macro-average      0.134637
Name: Spearman r, dtype: float64

Iteration 200: error 1951997.8272

wordsim353         0.243787
mturk771           0.313898
simverb3500dev     0.044016
simverb3500test    0.029366
men                0.375409
Macro-average      0.201295
Name: Spearman r, dtype: float64

### Dice coefficient [0.5 points]

Implement the Dice coefficient for real-valued vectors, as

$$
\textbf{dice}(u, v) = 
1 - \frac{
  2 \sum_{i=1}^{n}\min(u_{i}, v_{i})
}{
    \sum_{i=1}^{n} u_{i} + v_{i}
}$$
 
You can use `test_dice_implementation` below to check that your implementation is correct.

In [36]:
def test_dice_implementation(func):
    """`func` should be an implementation of `dice` as defined above."""
    X = np.array([
        [  4.,   4.,   2.,   0.],
        [  4.,  61.,   8.,  18.],
        [  2.,   8.,  10.,   0.],
        [  0.,  18.,   0.,   5.]]) 
    assert func(X[0], X[1]).round(5) == 0.80198
    assert func(X[1], X[2]).round(5) == 0.67568

In [37]:
def dice(u, v):
    ##### YOUR CODE HERE
    numerator = 2*np.sum(np.minimum(u, v))
    denominator = np.sum(u) + np.sum(v)
    return 1 - numerator/denominator

In [38]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_dice_implementation(dice)

### t-test reweighting [2 points]



The t-test statistic can be thought of as a reweighting scheme. For a count matrix $X$, row index $i$, and column index $j$:

$$\textbf{ttest}(X, i, j) = 
\frac{
    P(X, i, j) - \big(P(X, i, *)P(X, *, j)\big)
}{
\sqrt{(P(X, i, *)P(X, *, j))}
}$$

where $P(X, i, j)$ is $X_{ij}$ divided by the total values in $X$, $P(X, i, *)$ is the sum of the values in row $i$ of $X$ divided by the total values in $X$, and $P(X, *, j)$ is the sum of the values in column $j$ of $X$ divided by the total values in $X$.

For this problem, implement this reweighting scheme. You can use `test_ttest_implementation` below to check that your implementation is correct. You do not need to use this for any evaluations, though we hope you will be curious enough to do so!

In [39]:
def test_ttest_implementation(func):
    """`func` should be an implementation of t-test reweighting as 
    defined above.
    """
    X = pd.DataFrame(np.array([
        [  4.,   4.,   2.,   0.],
        [  4.,  61.,   8.,  18.],
        [  2.,   8.,  10.,   0.],
        [  0.,  18.,   0.,   5.]]))    
    actual = np.array([
        [ 0.33056, -0.07689,  0.04321, -0.10532],
        [-0.07689,  0.03839, -0.10874,  0.07574],
        [ 0.04321, -0.10874,  0.36111, -0.14894],
        [-0.10532,  0.07574, -0.14894,  0.05767]])    
    predicted = func(X)
    assert np.array_equal(predicted.round(5), actual)

In [40]:
def ttest(df):
    ##### YOUR CODE HERE
    x = df.values
    div = x.sum()
    
    # Normlized matrix
    P_x_i_j = x / div
    
    # Normalized row sums
    P_x_i_star = np.sum(x, axis=1) / div

    # Normalized col sums
    P_x_star_j = np.sum(x, axis=0) / div
    
    # Outer product of row sums and col sums
    product = np.outer(P_x_i_star, P_x_star_j)

    return (P_x_i_j - product) / np.sqrt(product)

In [41]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_ttest_implementation(ttest)

### Enriching a VSM with subword information [2 points]

It might be useful to combine character-level information with word-level information. To help you begin asssessing this idea, this question asks you to write a function that modifies an existing VSM so that the representation for each word $w$ is the element-wise sum of $w$'s original word-level representation with all the representations for the n-grams $w$ contains. 

The following starter code should help you structure this and clarify the requirements, and a simple test is included below as well.

You don't need to write a lot of code; the motivation for this question is that the function you write could have practical value.

In [42]:
def subword_enrichment(df, n=4):
    
    # 1. Use `vsm.ngram_vsm` to create a character-level 
    # VSM from `df`, using the above parameter `n` to 
    # set the size of the ngrams.
    
    ##### YOUR CODE HERE
    df_ngrams = vsm.ngram_vsm(df, n=n)
    print(df_ngrams)

        
    # 2. Use `vsm.character_level_rep` to get the representation
    # for every word in `df` according to the character-level
    # VSM you created above.
    
    ##### YOUR CODE HERE
    char_reps = {}
    for word in df.index:
        char_reps[word] = vsm.character_level_rep(word, df_ngrams, n=n)
    
    # 3. For each representation created at step 2, add in its
    # original representation from `df`. (This should use
    # element-wise addition; the dimensionality of the vectors
    # will be unchanged.)
                            
    ##### YOUR CODE HERE
    for word in df.index:
        char_reps[word] += df.loc[word].values
        print(char_reps[word])

    
    # 4. Return a `pd.DataFrame` with the same index and column
    # values as `df`, but filled with the new representations
    # created at step 3.
                            
    ##### YOUR CODE HERE
    ret_df = df.copy()
    for word in df.index:
        ret_df.loc[word] = char_reps[word]
        
    return ret_df



In [43]:
def test_subword_enrichment(func):
    """`func` should be an implementation of subword_enrichment as 
    defined above.
    """
    vocab = ["ABCD", "BCDA", "CDAB", "DABC"]
    df = pd.DataFrame([
        [1, 1, 2, 1],
        [3, 4, 2, 4],
        [0, 0, 1, 0],
        [1, 0, 0, 0]], index=vocab)
    expected = pd.DataFrame([
        [14, 14, 18, 14],
        [22, 26, 18, 26],
        [10, 10, 14, 10],
        [14, 10, 10, 10]], index=vocab)
    new_df = func(df, n=2)
    assert np.array_equal(expected.columns, new_df.columns), \
        "Columns are not the same"
    assert np.array_equal(expected.index, new_df.index), \
        "Indices are not the same"
    assert np.array_equal(expected.values, new_df.values), \
        "Co-occurrence values aren't the same"    

In [44]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    test_subword_enrichment(subword_enrichment)

       0  1  2  3
<w>A   1  1  2  1
AB     2  1  3  1
BC     5  5  4  5
CD     4  5  5  5
D</w>  1  1  2  1
<w>B   3  4  2  4
DA     4  4  3  4
A</w>  3  4  2  4
<w>C   0  0  1  0
B</w>  0  0  1  0
<w>D   1  0  0  0
C</w>  1  0  0  0
[14 14 18 14]
[22 26 18 26]
[10 10 14 10]
[14 10 10 10]


### Your original system [3 points]

This question asks you to design your own model. You can of course include steps made above (ideally, the above questions informed your system design!), but your model should not be literally identical to any of the above models. Other ideas: retrofitting, autoencoders, GloVe, subword modeling, ... 

Requirements:

1. Your code must operate on one of the count matrices in `data/vsmdata`. You can choose which one. __Other pretrained vectors cannot be introduced__.

1. Your code must be self-contained, so that we can work with your model directly in your homework submission notebook. If your model depends on external data or other resources, please submit a ZIP archive containing these resources along with your submission.

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies.

### PPMI

In [48]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    giga20 = pd.read_csv(os.path.join(VSM_HOME, 'giga_window20-flat.csv.gz'), index_col=0)
    giga20_ppmi = vsm.pmi(giga20, positive=True)
    print("giga20_ppmi")
    display(full_word_similarity_evaluation(giga20_ppmi))

giga20_ppmi


wordsim353         0.582573
mturk771           0.495228
simverb3500dev     0.231279
simverb3500test    0.158934
men                0.624885
Macro-average      0.418580
Name: Spearman r, dtype: float64

### PPMI + LSA

In [51]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    print("giga20_ppmi_lsa")
    for k in (5, 10, 20, 50, 100):
        giga20_ppmi_lsa = vsm.lsa(giga20_ppmi, k=k)
        print("========", k, "========")
        display(full_word_similarity_evaluation(giga20_ppmi_lsa))


giga20_ppmi_lsa


wordsim353         0.328601
mturk771           0.339240
simverb3500dev     0.166079
simverb3500test    0.123667
men                0.470291
Macro-average      0.285575
Name: Spearman r, dtype: float64



wordsim353         0.432193
mturk771           0.416597
simverb3500dev     0.183850
simverb3500test    0.133256
men                0.566648
Macro-average      0.346509
Name: Spearman r, dtype: float64



wordsim353         0.489217
mturk771           0.444573
simverb3500dev     0.195391
simverb3500test    0.133209
men                0.607706
Macro-average      0.374019
Name: Spearman r, dtype: float64



wordsim353         0.535330
mturk771           0.472596
simverb3500dev     0.214763
simverb3500test    0.146140
men                0.641217
Macro-average      0.402009
Name: Spearman r, dtype: float64



wordsim353         0.555447
mturk771           0.487073
simverb3500dev     0.233373
simverb3500test    0.160549
men                0.655327
Macro-average      0.418354
Name: Spearman r, dtype: float64

In [60]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    print("giga20_ppmi_lsa")
    for k in (200, 500, 1000):
        giga20_ppmi_lsa = vsm.lsa(giga20_ppmi, k=k)
        print("========", k, "========")
        display(full_word_similarity_evaluation(giga20_ppmi_lsa))

giga20_ppmi_lsa


wordsim353         0.571024
mturk771           0.501608
simverb3500dev     0.239609
simverb3500test    0.168402
men                0.656537
Macro-average      0.427436
Name: Spearman r, dtype: float64



wordsim353         0.580265
mturk771           0.511601
simverb3500dev     0.228796
simverb3500test    0.168358
men                0.642052
Macro-average      0.426214
Name: Spearman r, dtype: float64



wordsim353         0.581723
mturk771           0.511432
simverb3500dev     0.226292
simverb3500test    0.164554
men                0.633490
Macro-average      0.423498
Name: Spearman r, dtype: float64

In [70]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    giga20_ppmi_lsa = vsm.lsa(giga20_ppmi, k=100)
    display(full_word_similarity_evaluation(giga20_ppmi_lsa))

wordsim353         0.555447
mturk771           0.487073
simverb3500dev     0.233373
simverb3500test    0.160549
men                0.655327
Macro-average      0.418354
Name: Spearman r, dtype: float64

### L2 Norm

In [53]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    giga20_l2 = giga20.apply(vsm.length_norm, axis=1)
    display(full_word_similarity_evaluation(giga20_l2))

wordsim353         0.176563
mturk771           0.196009
simverb3500dev    -0.012232
simverb3500test    0.046487
men                0.215239
Macro-average      0.124413
Name: Spearman r, dtype: float64

### Autoencoder

In [56]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    !pip install torch



In [69]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    import torch
    from torch_autoencoder import TorchAutoencoder

    if torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"

    print("Using device: {}".format(device))
    for hidden_dim in (50, 150, 300, 500):
        for l2_strength in (0.0, 0.5, 1.0):
            print("Dim: {}, l2: {}".format(hidden_dim, l2_strength))
            ae = TorchAutoencoder(max_iter=100, hidden_dim=hidden_dim, eta=0.1, l2_strength=l2_strength, device=device)
            giga20_ppmi_ae = ae.fit(giga20_ppmi)
            display(full_word_similarity_evaluation(giga20_ppmi_ae))

Finished epoch 1 of 100; error is 3.0698821544647217

Using device: cuda
Dim: 50, l2: 0.0


Finished epoch 100 of 100; error is 0.17561252415180206

wordsim353         0.332852
mturk771           0.294596
simverb3500dev     0.119231
simverb3500test    0.103771
men                0.358365
Macro-average      0.241763
Name: Spearman r, dtype: float64

Dim: 50, l2: 0.5


  dist = 1.0 - uv / np.sqrt(uu * vv)


wordsim353         0.440495
mturk771           0.366970
simverb3500dev     0.199323
simverb3500test    0.127592
men                0.474385
Macro-average      0.321753
Name: Spearman r, dtype: float64

Dim: 50, l2: 1.0


Finished epoch 100 of 100; error is 0.24019087851047516

wordsim353         0.536076
mturk771           0.407135
simverb3500dev     0.178165
simverb3500test    0.127424
men                0.535160
Macro-average      0.356792
Name: Spearman r, dtype: float64

Dim: 150, l2: 0.0


Finished epoch 100 of 100; error is 0.1610780507326126

wordsim353         0.325914
mturk771           0.265512
simverb3500dev     0.076332
simverb3500test    0.076491
men                0.426522
Macro-average      0.234154
Name: Spearman r, dtype: float64

Dim: 150, l2: 0.5


Finished epoch 100 of 100; error is 0.24584408104419708

wordsim353         0.420318
mturk771           0.279132
simverb3500dev     0.138606
simverb3500test    0.060468
men                0.446730
Macro-average      0.269051
Name: Spearman r, dtype: float64

Dim: 150, l2: 1.0


Finished epoch 100 of 100; error is 0.24325376749038696

wordsim353         0.390269
mturk771           0.245672
simverb3500dev     0.120390
simverb3500test    0.037251
men                0.394941
Macro-average      0.237705
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 15.324353218078613

Dim: 300, l2: 0.0


Finished epoch 100 of 100; error is 0.2609081566333771

wordsim353         0.235238
mturk771           0.189944
simverb3500dev     0.056935
simverb3500test    0.051564
men                0.369013
Macro-average      0.180539
Name: Spearman r, dtype: float64

Dim: 300, l2: 0.5


Finished epoch 100 of 100; error is 0.24181407690048218

wordsim353         0.507652
mturk771           0.444950
simverb3500dev     0.170747
simverb3500test    0.133997
men                0.580343
Macro-average      0.367538
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 0.8020574450492859

Dim: 300, l2: 1.0


Finished epoch 100 of 100; error is 0.25171220302581787

wordsim353         0.546939
mturk771           0.401779
simverb3500dev     0.170598
simverb3500test    0.122196
men                0.551310
Macro-average      0.358564
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 111.53180694580078

Dim: 500, l2: 0.0


Finished epoch 100 of 100; error is 0.19151516258716583

wordsim353         0.289082
mturk771           0.163957
simverb3500dev     0.072032
simverb3500test    0.038249
men                0.272345
Macro-average      0.167133
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 1.405219316482544

Dim: 500, l2: 0.5


Finished epoch 100 of 100; error is 0.24234427511692047

wordsim353         0.586143
mturk771           0.534493
simverb3500dev     0.240346
simverb3500test    0.183395
men                0.645231
Macro-average      0.437922
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 1.3378393650054932

Dim: 500, l2: 1.0


Finished epoch 100 of 100; error is 0.2508573532104492

wordsim353         0.566840
mturk771           0.521720
simverb3500dev     0.206004
simverb3500test    0.166347
men                0.642543
Macro-average      0.420691
Name: Spearman r, dtype: float64

In [71]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    print("Using device: {}".format(device))
    for hidden_dim in (50, 150, 300, 500):
        for l2_strength in (0.0, 0.5, 1.0):
            print("Dim: {}, l2: {}".format(hidden_dim, l2_strength))
            ae = TorchAutoencoder(max_iter=100, hidden_dim=hidden_dim, eta=0.1, l2_strength=l2_strength, device=device)
            giga20_ppmi_ae = ae.fit(giga20_ppmi_lsa)
            display(full_word_similarity_evaluation(giga20_ppmi_ae))

Using device: cuda
Dim: 50, l2: 0.0


Finished epoch 100 of 100; error is 1.8038939237594604

wordsim353         0.534215
mturk771           0.496707
simverb3500dev     0.173119
simverb3500test    0.157890
men                0.645791
Macro-average      0.401544
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 5.277029991149902

Dim: 50, l2: 0.5


Finished epoch 100 of 100; error is 5.519812583923345

wordsim353         0.241636
mturk771           0.123369
simverb3500dev     0.094943
simverb3500test    0.032114
men                0.233112
Macro-average      0.145035
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 6.149313926696777

Dim: 50, l2: 1.0


Finished epoch 100 of 100; error is 6.055180549621582

wordsim353         0.279825
mturk771           0.138127
simverb3500dev     0.106928
simverb3500test    0.038580
men                0.254939
Macro-average      0.163680
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 6.16350793838501

Dim: 150, l2: 0.0


Finished epoch 100 of 100; error is 0.8190496563911438

wordsim353         0.587156
mturk771           0.568848
simverb3500dev     0.220410
simverb3500test    0.197052
men                0.714831
Macro-average      0.457659
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 5.411563396453857

Dim: 150, l2: 0.5


Finished epoch 100 of 100; error is 4.811972141265869

wordsim353         0.257775
mturk771           0.163777
simverb3500dev     0.056085
simverb3500test    0.041440
men                0.233585
Macro-average      0.150533
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 5.782042503356934

Dim: 150, l2: 1.0


Finished epoch 100 of 100; error is 5.621646404266357

wordsim353         0.226002
mturk771           0.135118
simverb3500dev     0.084738
simverb3500test    0.028017
men                0.279986
Macro-average      0.150772
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 16.26687240600586

Dim: 300, l2: 0.0


Finished epoch 100 of 100; error is 0.526965856552124

wordsim353         0.562067
mturk771           0.529460
simverb3500dev     0.263235
simverb3500test    0.195694
men                0.711147
Macro-average      0.452321
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 6.722036361694336

Dim: 300, l2: 0.5


Finished epoch 100 of 100; error is 4.650359630584717

wordsim353         0.279002
mturk771           0.192701
simverb3500dev     0.067709
simverb3500test    0.059963
men                0.299743
Macro-average      0.179824
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 6.609253883361816

Dim: 300, l2: 1.0


Finished epoch 100 of 100; error is 5.2164154052734375

wordsim353         0.277754
mturk771           0.184519
simverb3500dev     0.085541
simverb3500test    0.028095
men                0.335955
Macro-average      0.182373
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 40.05063247680664

Dim: 500, l2: 0.0


Finished epoch 100 of 100; error is 0.40535134077072144

wordsim353         0.544575
mturk771           0.531733
simverb3500dev     0.218291
simverb3500test    0.187124
men                0.706151
Macro-average      0.437575
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 26.445741653442383

Dim: 500, l2: 0.5


Finished epoch 100 of 100; error is 4.483879566192627

wordsim353         0.306169
mturk771           0.248699
simverb3500dev     0.084006
simverb3500test    0.086965
men                0.370279
Macro-average      0.219224
Name: Spearman r, dtype: float64

Finished epoch 1 of 100; error is 7.7993669509887695

Dim: 500, l2: 1.0


Finished epoch 100 of 100; error is 4.930451869964645

wordsim353         0.277375
mturk771           0.211078
simverb3500dev     0.086844
simverb3500test    0.042793
men                0.358423
Macro-average      0.195303
Name: Spearman r, dtype: float64

In [73]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    print("Using device: {}".format(device))
    for hidden_dim in (10, 20, 30):
        for l2_strength in (0.0, 0.5, 1.0):
            for eta in (0.1, 0.5, 1.0):
                print("Dim: {}, l2: {}, eta: {}".format(hidden_dim, l2_strength, eta))
                ae = TorchAutoencoder(max_iter=100, hidden_dim=hidden_dim, eta=eta, l2_strength=l2_strength, device=device)
                giga20_ppmi_ae = ae.fit(giga20_ppmi_lsa)
                display(full_word_similarity_evaluation(giga20_ppmi_ae))

Finished epoch 1 of 100; error is 6.11306095123291

Using device: cuda
Dim: 10, l2: 0.0, eta: 0.1


Finished epoch 100 of 100; error is 3.4306042194366455

wordsim353         0.291471
mturk771           0.314562
simverb3500dev     0.148877
simverb3500test    0.117086
men                0.451390
Macro-average      0.264677
Name: Spearman r, dtype: float64

Dim: 10, l2: 0.0, eta: 0.5


Finished epoch 100 of 100; error is 3.3566787242889404

wordsim353         0.292752
mturk771           0.333698
simverb3500dev     0.131321
simverb3500test    0.119080
men                0.425390
Macro-average      0.260448
Name: Spearman r, dtype: float64

Dim: 10, l2: 0.0, eta: 1.0


Finished epoch 100 of 100; error is 3.5232250690460205

wordsim353         0.343136
mturk771           0.357096
simverb3500dev     0.122547
simverb3500test    0.099359
men                0.447978
Macro-average      0.274023
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 6.3725199699401855

Dim: 10, l2: 0.5, eta: 0.1


Finished epoch 100 of 100; error is 6.502764701843262

wordsim353         0.213178
mturk771           0.080769
simverb3500dev     0.097541
simverb3500test    0.030368
men                0.180317
Macro-average      0.120435
Name: Spearman r, dtype: float64

Finished epoch 3 of 100; error is 6.997666835784912

Dim: 10, l2: 0.5, eta: 0.5


Finished epoch 100 of 100; error is 6.415804862976074

wordsim353         0.141738
mturk771           0.128505
simverb3500dev     0.059976
simverb3500test    0.045288
men                0.253750
Macro-average      0.125851
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 7.4270763397216836

Dim: 10, l2: 0.5, eta: 1.0


Finished epoch 100 of 100; error is 6.742050647735596

wordsim353         0.206332
mturk771           0.131885
simverb3500dev     0.068154
simverb3500test    0.034637
men                0.178068
Macro-average      0.123815
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 6.7501549720764165

Dim: 10, l2: 1.0, eta: 0.1


Finished epoch 100 of 100; error is 6.675544738769531

wordsim353         0.264099
mturk771           0.135139
simverb3500dev     0.139053
simverb3500test    0.040064
men                0.232361
Macro-average      0.162143
Name: Spearman r, dtype: float64

Finished epoch 2 of 100; error is 7.886216640472412

Dim: 10, l2: 1.0, eta: 0.5


Finished epoch 100 of 100; error is 6.605301380157471

wordsim353         0.248433
mturk771           0.156753
simverb3500dev     0.113561
simverb3500test    0.050180
men                0.230968
Macro-average      0.159979
Name: Spearman r, dtype: float64

Finished epoch 2 of 100; error is 10.032896041870117

Dim: 10, l2: 1.0, eta: 1.0


Finished epoch 100 of 100; error is 7.079300403594971

wordsim353         0.219671
mturk771           0.161290
simverb3500dev     0.110399
simverb3500test    0.063797
men                0.291801
Macro-average      0.169392
Name: Spearman r, dtype: float64

Finished epoch 2 of 100; error is 4.134362697601318

Dim: 20, l2: 0.0, eta: 0.1


Finished epoch 100 of 100; error is 2.7006473541259766

wordsim353         0.417910
mturk771           0.394294
simverb3500dev     0.132262
simverb3500test    0.125800
men                0.526487
Macro-average      0.319351
Name: Spearman r, dtype: float64

Finished epoch 2 of 100; error is 5.830965042114258

Dim: 20, l2: 0.0, eta: 0.5


Finished epoch 100 of 100; error is 2.8810653686523438

wordsim353         0.397474
mturk771           0.412087
simverb3500dev     0.172434
simverb3500test    0.135826
men                0.549496
Macro-average      0.333463
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 5.533768653869629

Dim: 20, l2: 0.0, eta: 1.0


Finished epoch 100 of 100; error is 3.0062711238861084

wordsim353         0.401047
mturk771           0.455971
simverb3500dev     0.112159
simverb3500test    0.123965
men                0.535052
Macro-average      0.325639
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 6.102771759033203

Dim: 20, l2: 0.5, eta: 0.1


Finished epoch 100 of 100; error is 5.933148384094238

wordsim353         0.238960
mturk771           0.100978
simverb3500dev     0.111871
simverb3500test    0.036346
men                0.205239
Macro-average      0.138679
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 6.343475818634033

Dim: 20, l2: 0.5, eta: 0.5


Finished epoch 100 of 100; error is 6.174006462097168

wordsim353         0.228247
mturk771           0.100934
simverb3500dev     0.054983
simverb3500test    0.023262
men                0.158604
Macro-average      0.113206
Name: Spearman r, dtype: float64

Finished epoch 3 of 100; error is 9.5312309265136727

Dim: 20, l2: 0.5, eta: 1.0


Finished epoch 100 of 100; error is 5.930158615112305

wordsim353         0.213996
mturk771           0.107609
simverb3500dev     0.101781
simverb3500test    0.033054
men                0.217849
Macro-average      0.134858
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 6.5489373207092285

Dim: 20, l2: 1.0, eta: 0.1


Finished epoch 100 of 100; error is 6.522564888000488

wordsim353         0.258069
mturk771           0.144129
simverb3500dev     0.098881
simverb3500test    0.037969
men                0.256958
Macro-average      0.159201
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 7.210444927215576

Dim: 20, l2: 1.0, eta: 0.5


Finished epoch 100 of 100; error is 6.739877700805664

wordsim353         0.245960
mturk771           0.131166
simverb3500dev     0.130809
simverb3500test    0.067919
men                0.195579
Macro-average      0.154287
Name: Spearman r, dtype: float64

Finished epoch 2 of 100; error is 13.031438827514648

Dim: 20, l2: 1.0, eta: 1.0


Finished epoch 100 of 100; error is 6.536740779876709

wordsim353         0.274907
mturk771           0.240632
simverb3500dev     0.105485
simverb3500test    0.059976
men                0.306816
Macro-average      0.197563
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 2.9736282825469974

Dim: 30, l2: 0.0, eta: 0.1


Finished epoch 100 of 100; error is 2.370361566543579

wordsim353         0.502674
mturk771           0.456348
simverb3500dev     0.133834
simverb3500test    0.137850
men                0.605022
Macro-average      0.367146
Name: Spearman r, dtype: float64

Finished epoch 3 of 100; error is 5.393923282623291

Dim: 30, l2: 0.0, eta: 0.5


Finished epoch 100 of 100; error is 2.6753220558166504

wordsim353         0.460831
mturk771           0.444011
simverb3500dev     0.161880
simverb3500test    0.111606
men                0.605662
Macro-average      0.356798
Name: Spearman r, dtype: float64

Finished epoch 5 of 100; error is 6.113372802734375

Dim: 30, l2: 0.0, eta: 1.0


Finished epoch 100 of 100; error is 2.832245349884033

wordsim353         0.418259
mturk771           0.427392
simverb3500dev     0.180052
simverb3500test    0.112017
men                0.559547
Macro-average      0.339453
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 5.6678247451782235

Dim: 30, l2: 0.5, eta: 0.1


Finished epoch 100 of 100; error is 5.721559047698975

wordsim353         0.211749
mturk771           0.107659
simverb3500dev     0.119320
simverb3500test    0.033906
men                0.210996
Macro-average      0.136726
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 6.456701278686523

Dim: 30, l2: 0.5, eta: 0.5


Finished epoch 100 of 100; error is 5.746185302734375

wordsim353         0.255637
mturk771           0.138244
simverb3500dev     0.119807
simverb3500test    0.050267
men                0.175084
Macro-average      0.147808
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 7.8840093612670948

Dim: 30, l2: 0.5, eta: 1.0


Finished epoch 100 of 100; error is 5.719851493835449

wordsim353         0.250790
mturk771           0.108781
simverb3500dev     0.113053
simverb3500test    0.052952
men                0.178283
Macro-average      0.140772
Name: Spearman r, dtype: float64

Finished epoch 3 of 100; error is 6.614424228668213

Dim: 30, l2: 1.0, eta: 0.1


Finished epoch 100 of 100; error is 6.1493024826049805

wordsim353         0.275325
mturk771           0.122544
simverb3500dev     0.132369
simverb3500test    0.039726
men                0.254508
Macro-average      0.164894
Name: Spearman r, dtype: float64

Finished epoch 4 of 100; error is 7.497975826263428

Dim: 30, l2: 1.0, eta: 0.5


Finished epoch 100 of 100; error is 6.440415382385254

wordsim353         0.290283
mturk771           0.198414
simverb3500dev     0.093659
simverb3500test    0.067166
men                0.246198
Macro-average      0.179144
Name: Spearman r, dtype: float64

Finished epoch 5 of 100; error is 7.7939023971557625

Dim: 30, l2: 1.0, eta: 1.0


Finished epoch 100 of 100; error is 6.664883136749268

wordsim353         0.268336
mturk771           0.202978
simverb3500dev     0.125737
simverb3500test    0.066979
men                0.320146
Macro-average      0.196835
Name: Spearman r, dtype: float64

In [72]:
if 'IS_GRADESCOPE_ENV' not in os.environ:
    print("Using device: {}".format(device))
    for hidden_dim in (150,):
        for l2_strength in (0.0,):
            print("Dim: {}, l2: {}".format(hidden_dim, l2_strength))
            ae = TorchAutoencoder(max_iter=1000, hidden_dim=hidden_dim, eta=0.1, l2_strength=l2_strength, device=device)
            giga20_ppmi_ae = ae.fit(giga20_ppmi_lsa)
            display(full_word_similarity_evaluation(giga20_ppmi_ae))

Finished epoch 1 of 1000; error is 5.741008281707764

Using device: cuda
Dim: 150, l2: 0.0


Finished epoch 1000 of 1000; error is 0.5069425702095032

wordsim353         0.563231
mturk771           0.569908
simverb3500dev     0.239474
simverb3500test    0.197006
men                0.729435
Macro-average      0.459811
Name: Spearman r, dtype: float64

In [62]:
# Enter your system description in this cell.

# (1) System Description
"""
My system is a pipeline based on giga20 consisting of the following:
* PPMI
* LSA with k=100
* AutoEncoder with max_iter=1000, hidden_dim=150, eta=0.1

I tried various tests for the optimal values of K for LSA, and
several hyperparameters for the AutoEncoder, and ended up with the
choices above.

I also tried several other combinations/pipelines, but a majority
of combinations perform worse than simply just applying PPMI by itself.

Also, something somewhat unintuitive to me was that increasing the dimensionality
of the input for the AutoEncoder's hidden layer seemed to give better performance
than it's generally intended use of reducing the dimensionality. This may just be
because the dimensionality reduction done by LSA was good enough already.
"""

# (2) Code
if 'IS_GRADESCOPE_ENV' not in os.environ:
    from torch_autoencoder import TorchAutoencoder
    giga20 = pd.read_csv(os.path.join(VSM_HOME, 'giga_window20-flat.csv.gz'), index_col=0)
    giga20_ppmi = vsm.pmi(giga20, positive=True)
    giga20_ppmi_lsa = vsm.lsa(giga20_ppmi, k=100)
    ae = TorchAutoencoder(max_iter=1000, hidden_dim=150, eta=0.1, l2_strength=0.0)
    giga20_ppmi_lsa_ae = ae.fit(giga20_ppmi_lsa)
    display(full_word_similarity_evaluation(giga20_ppmi_lsa_ae))

# (3) Score
# My peak score was: 0.467820
    
# Please do not remove this comment.

Finished epoch 1000 of 1000; error is 0.5111429691314697

wordsim353         0.578531
mturk771           0.560254
simverb3500dev     0.258844
simverb3500test    0.206764
men                0.721122
Macro-average      0.465103
Name: Spearman r, dtype: float64

## Bake-off [1 point]

For the bake-off, we will release two additional datasets. The announcement will go out on the discussion forum. We will also release reader code for these datasets that you can paste into this notebook. You will evaluate your custom model $M$ (from the previous question) on these new datasets using `full_word_similarity_evaluation`. Rules:

1. Only one evaluation is permitted.
1. No additional system tuning is permitted once the bake-off has started.

The cells below this one constitute your bake-off entry.

People who enter will receive the additional homework point, and people whose systems achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

The announcement will include the details on where to submit your entry.

In [63]:
# Enter your bake-off assessment code into this cell. 
# Please do not remove this comment.

if 'IS_GRADESCOPE_ENV' not in os.environ:
    pass
    # Please enter your code in the scope of the above conditional.
    ##### YOUR CODE HERE
    def mturk287_reader():
        """MTurk-287: http://tx.technion.ac.il/~kirar/Datasets.html"""
        src_filename = os.path.join(
            WORDSIM_HOME, 'bakeoff-wordsim-test-data', 'MTurk-287.csv')
        return wordsim_dataset_reader(
            src_filename, header=False)

    def simlex999_reader(wordsim_test_home=WORDSIM_HOME):
        """SimLex999: https://www.cl.cam.ac.uk/~fh295/SimLex-999.zip"""
        src_filename = os.path.join(
            WORDSIM_HOME, 'bakeoff-wordsim-test-data', 'SimLex-999', 'SimLex-999.txt')
        return wordsim_dataset_reader(
            src_filename, delimiter="\t", header=True, score_col_index=3)

    BAKEOFF = (simlex999_reader, mturk287_reader)    
    display(full_word_similarity_evaluation(giga20_ppmi_lsa_ae, readers=BAKEOFF, distfunc=vsm.cosine))

simlex999        0.281926
mturk287         0.641918
Macro-average    0.461922
Name: Spearman r, dtype: float64

In [None]:
# On an otherwise blank line in this cell, please enter
# your "Macro-average" value as reported by the code above. 
# Please enter only a number between 0 and 1 inclusive.
# Please do not remove this comment.
if 'IS_GRADESCOPE_ENV' not in os.environ:
    # Please enter your score in the scope of the above conditional.
    ##### YOUR CODE HERE
    0.461922