# Homework and bake-off: word-level entailment with neural networks

In [None]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2020"

## Contents

1. [Overview](#Overview)
1. [Set-up](#Set-up)
1. [Data](#Data)
  1. [Edge disjoint](#Edge-disjoint)
  1. [Word disjoint](#Word-disjoint)
1. [Baseline](#Baseline)
  1. [Representing words: vector_func](#Representing-words:-vector_func)
  1. [Combining words into inputs: vector_combo_func](#Combining-words-into-inputs:-vector_combo_func)
  1. [Classifier model](#Classifier-model)
  1. [Baseline results](#Baseline-results)
1. [Homework questions](#Homework-questions)
  1. [Hypothesis-only baseline [2 points]](#Hypothesis-only-baseline-[2-points])
  1. [Alternatives to concatenation [2 points]](#Alternatives-to-concatenation-[2-points])
  1. [A deeper network [2 points]](#A-deeper-network-[2-points])
  1. [Your original system [3 points]](#Your-original-system-[3-points])
1. [Bake-off [1 point]](#Bake-off-[1-point])

## Overview

The general problem is word-level natural language inference.

Training examples are pairs of words $(w_{L}, w_{R}), y$ with $y = 1$ if $w_{L}$ entails $w_{R}$, otherwise $0$.

The homework questions below ask you to define baseline models for this and develop your own system for entry in the bake-off, which will take place on a held-out test-set distributed at the start of the bake-off. (Thus, all the data you have available for development is available for training your final system before the bake-off begins.)

<img src="fig/wordentail-diagram.png" width=600 alt="wordentail-diagram.png" />

## Set-up

See [the first notebook in this unit](nli_01_task_and_data.ipynb) for set-up instructions.

In [1]:
from collections import defaultdict
import json
import numpy as np
import os
import pandas as pd
from torch_shallow_neural_classifier import TorchShallowNeuralClassifier
import nli
import utils

In [2]:
DATA_HOME = 'data'

NLIDATA_HOME = os.path.join(DATA_HOME, 'nlidata')

wordentail_filename = os.path.join(
    NLIDATA_HOME, 'nli_wordentail_bakeoff_data.json')

GLOVE_HOME = os.path.join(DATA_HOME, 'glove.6B')

## Data

I've processed the data into two different train/test splits, in an effort to put some pressure on our models to actually learn these semantic relations, as opposed to exploiting regularities in the sample.

* `edge_disjoint`: The `train` and `dev` __edge__ sets are disjoint, but many __words__ appear in both `train` and `dev`.
* `word_disjoint`: The `train` and `dev` __vocabularies are disjoint__, and thus the edges are disjoint as well.

These are very different problems. For `word_disjoint`, there is real pressure on the model to learn abstract relationships, as opposed to memorizing properties of individual words.

In [3]:
with open(wordentail_filename) as f:
    wordentail_data = json.load(f)

The outer keys are the  splits plus a list giving the vocabulary for the entire dataset:

In [4]:
wordentail_data.keys()

dict_keys(['edge_disjoint', 'vocab', 'word_disjoint'])

### Edge disjoint

In [5]:
wordentail_data['edge_disjoint'].keys()

dict_keys(['dev', 'train'])

This is what the split looks like; all three have this same format:

In [6]:
wordentail_data['edge_disjoint']['dev'][: 5]

[[['sweater', 'stroke'], 0],
 [['constipation', 'hypovolemia'], 0],
 [['disease', 'inflammation'], 0],
 [['herring', 'animal'], 1],
 [['cauliflower', 'outlook'], 0]]

Let's test to make sure no edges are shared between `train` and `dev`:

In [7]:
nli.get_edge_overlap_size(wordentail_data, 'edge_disjoint')

0

As we expect, a *lot* of vocabulary items are shared between `train` and `dev`:

In [8]:
nli.get_vocab_overlap_size(wordentail_data, 'edge_disjoint')

2916

This is a large percentage of the entire vocab:

In [9]:
len(wordentail_data['vocab'])

8470

Here's the distribution of labels in the `train` set. It's highly imbalanced, which will pose a challenge for learning. (I'll go ahead and reveal that the `dev` set is similarly distributed.)

In [10]:
def label_distribution(split):
    return pd.DataFrame(wordentail_data[split]['train'])[1].value_counts()

In [11]:
label_distribution('edge_disjoint')

0    14650
1     2745
Name: 1, dtype: int64

### Word disjoint

In [12]:
wordentail_data['word_disjoint'].keys()

dict_keys(['dev', 'train'])

In the `word_disjoint` split, no __words__ are shared between `train` and `dev`:

In [13]:
nli.get_vocab_overlap_size(wordentail_data, 'word_disjoint')

0

Because no words are shared between `train` and `dev`, no edges are either:

In [14]:
nli.get_edge_overlap_size(wordentail_data, 'word_disjoint')

0

The label distribution is similar to that of `edge_disjoint`, though the overall number of examples is a bit smaller:

In [15]:
label_distribution('word_disjoint')

0    7199
1    1349
Name: 1, dtype: int64

## Baseline

Even in deep learning, __feature representation is vital and requires care!__ For our task, feature representation has two parts: representing the individual words and combining those representations into a single network input.

### Representing words: vector_func

Let's consider two baseline word representations methods:

1. Random vectors (as returned by `utils.randvec`).
1. 50-dimensional GloVe representations.

In [4]:
def randvec(w, n=50, lower=-1.0, upper=1.0):
    """Returns a random vector of length `n`. `w` is ignored."""
    return utils.randvec(n=n, lower=lower, upper=upper)

In [17]:
# Any of the files in glove.6B will work here:

glove_dim = 50

glove_src = os.path.join(GLOVE_HOME, 'glove.6B.{}d.txt'.format(glove_dim))

# Creates a dict mapping strings (words) to GloVe vectors:
GLOVE = utils.glove2dict(glove_src)

def glove_vec(w):    
    """Return `w`'s GloVe representation if available, else return 
    a random vector."""
    return GLOVE.get(w, randvec(w, n=glove_dim))

### Combining words into inputs: vector_combo_func

Here we decide how to combine the two word vectors into a single representation. In more detail, where `u` is a vector representation of the left word and `v` is a vector representation of the right word, we need a function `vector_combo_func` such that `vector_combo_func(u, v)` returns a new input vector `z` of dimension `m`. A simple example is concatenation:

In [8]:
def vec_concatenate(u, v):
    """Concatenate np.array instances `u` and `v` into a new np.array"""
    return np.concatenate((u, v))

`vector_combo_func` could instead be vector average, vector difference, etc. (even combinations of those) – there's lots of space for experimentation here; [homework question 2](#Alternatives-to-concatenation-[1-point]) below pushes you to do some exploration.

### Classifier model

For a baseline model, I chose `TorchShallowNeuralClassifier`:

In [95]:
net = TorchShallowNeuralClassifier(hidden_dim=50, max_iter=100)

### Baseline results

The following puts the above pieces together, using `vector_func=glove_vec`, since `vector_func=randvec` seems so hopelessly misguided for `word_disjoint`!

In [96]:
word_disjoint_experiment = nli.wordentail_experiment(
    train_data=wordentail_data['word_disjoint']['train'],
    assess_data=wordentail_data['word_disjoint']['dev'], 
    model=net, 
    vector_func=glove_vec,
    vector_combo_func=vec_concatenate)

Finished epoch 100 of 100; error is 0.025118726072832942

              precision    recall  f1-score   support

           0      0.924     0.928     0.926      1910
           1      0.404     0.389     0.397       239

    accuracy                          0.868      2149
   macro avg      0.664     0.659     0.661      2149
weighted avg      0.866     0.868     0.867      2149



## Homework questions

Please embed your homework responses in this notebook, and do not delete any cells from the notebook. (You are free to add as many cells as you like as part of your responses.)

### Hypothesis-only baseline [2 points]

During our discussion of SNLI and MultiNLI, we noted that a number of research teams have shown that hypothesis-only baselines for NLI tasks can be remarkably robust. This question asks you to explore briefly how this baseline effects the 'edge_disjoint' and 'word_disjoint' versions of our task.

For this problem, submit two functions:

1. A `vector_combo_func` function called `hypothesis_only` that simply throws away the premise, using the unmodified hypothesis (second) vector as its representation of the example.

1. A function called `run_hypothesis_only_evaluation` that does the following:
    1. Loops over the two conditions 'word_disjoint' and 'edge_disjoint' and the two `vector_combo_func` values `vec_concatenate` and `hypothesis_only`, calling `nli.wordentail_experiment` to train on the conditions 'train' portion and assess on its 'dev' portion, with `glove_vec` as the `vector_func`. So that the results are consistent, use an `sklearn.linear_model.LogisticRegression` with default parameters as the model.
    1. Returns a `dict` mapping `(condition_name, function_name)` pairs to the 'macro-F1' score for that pair, as returned by the call to `nli.wordentail_experiment`. (Tip: you can get the `str` name of your function `hypothesis_only` with `hypothesis_only.__name__`.)
    
The test functions `test_hypothesis_only` and `test_run_hypothesis_only_evaluation` will help ensure that your functions have the desired logic.

In [90]:
##### YOUR CODE HERE
def hypothesis_only(u, v):
    ##### YOUR CODE HERE
    return v

import sklearn

def run_hypothesis_only_evaluation():
    ##### YOUR CODE HERE
    results = {}
    conditions = ['word_disjoint', 'edge_disjoint']
    combo_funcs = [vec_concatenate, hypothesis_only]
    net = sklearn.linear_model.LogisticRegression()
    for condition in conditions:
        for combo_func in combo_funcs:
            results[(condition,combo_func.__name__)] = nli.wordentail_experiment(
                train_data=wordentail_data[condition]['train'],
                assess_data=wordentail_data[condition]['dev'], 
                model=net, 
                vector_func=glove_vec,
                vector_combo_func=combo_func)['macro-F1']
            
    return results



In [91]:
def test_hypothesis_only(hypothesis_only):
    v = hypothesis_only(1, 2)
    assert v == 2   

In [92]:
test_hypothesis_only(hypothesis_only)

In [93]:
def test_run_hypothesis_only_evaluation(run_hypothesis_only_evaluation):
    results = run_hypothesis_only_evaluation()
    assert ('word_disjoint', 'vec_concatenate') in results, \
        "The return value of `run_hypothesis_only_evaluation` does not have the intended kind of keys"
    assert isinstance(results[('word_disjoint', 'vec_concatenate')], float), \
        "The values of the `run_hypothesis_only_evaluation` result should be floats"

In [94]:
test_run_hypothesis_only_evaluation(run_hypothesis_only_evaluation)

              precision    recall  f1-score   support

           0      0.901     0.982     0.940      1910
           1      0.493     0.142     0.221       239

    accuracy                          0.888      2149
   macro avg      0.697     0.562     0.580      2149
weighted avg      0.856     0.888     0.860      2149

              precision    recall  f1-score   support

           0      0.893     0.989     0.939      1910
           1      0.382     0.054     0.095       239

    accuracy                          0.885      2149
   macro avg      0.638     0.522     0.517      2149
weighted avg      0.836     0.885     0.845      2149

              precision    recall  f1-score   support

           0      0.875     0.970     0.920      7376
           1      0.574     0.228     0.326      1321

    accuracy                          0.857      8697
   macro avg      0.725     0.599     0.623      8697
weighted avg      0.830     0.857     0.830      8697

              preci

### Alternatives to concatenation [2 points]

We've so far just used vector concatenation to represent the premise and hypothesis words. This question asks you to explore two simple alternative:

1. Write a function `vec_diff` that, for a given pair of vector inputs `u` and `v`, returns the element-wise difference between `u` and `v`.

1. Write a function `vec_max` that, for a given pair of vector inputs `u` and `v`, returns the element-wise max values between `u` and `v`.

You needn't include your uses of `nli.wordentail_experiment` with these functions, but we assume you'll be curious to see how they do!

In [54]:
def vec_diff(u, v):
    ##### YOUR CODE HERE
    return u-v



    
def vec_max(u, v):
    ##### YOUR CODE HERE
    return np.max([u,v],axis=0)



In [55]:
def test_vec_diff(vec_diff):
    u = np.array([10.2, 8.1])
    v = np.array([1.2, -7.1])
    result = vec_diff(u, v)
    expected = np.array([9.0, 15.2])
    assert np.array_equal(result, expected), \
        "Expected {}; got {}".format(expected, result)

In [56]:
test_vec_diff(vec_diff)

In [57]:
def test_vec_max(vec_max):
    u = np.array([1.2,  8.1])
    v = np.array([10.2, -7.1])
    result = vec_max(u, v)
    expected = np.array([10.2, 8.1])
    assert np.array_equal(result, expected), \
        "Expected {}; got {}".format(expected, result)

In [58]:
test_vec_max(vec_max)

### A deeper network [2 points]

It is very easy to subclass `TorchShallowNeuralClassifier` if all you want to do is change the network graph: all you have to do is write a new `define_graph`. If your graph has new arguments that the user might want to set, then you should also redefine `__init__` so that these values are accepted and set as attributes.

For this question, please subclass `TorchShallowNeuralClassifier` so that it defines the following graph:

$$\begin{align}
h_{1} &= xW_{1} + b_{1} \\
r_{1} &= \textbf{Bernoulli}(1 - \textbf{dropout\_prob}, n) \\
d_{1} &= r_1 * h_{1} \\
h_{2} &= f(d_{1}) \\
h_{3} &= h_{2}W_{2} + b_{2}
\end{align}$$

Here, $r_{1}$ and $d_{1}$ define a dropout layer: $r_{1}$ is a random binary vector of dimension $n$, where the probability of a value being $1$ is given by $1 - \textbf{dropout_prob}$. $r_{1}$ is multiplied element-wise by our first hidden representation, thereby zeroing out some of the values. The result is fed to the user's activation function $f$, and the result of that is fed through another linear layer to produce $h_{3}$. (Inside `TorchShallowNeuralClassifier`, $h_{3}$ is the basis for a softmax classifier, so no activation function is applied to it.)

For your implementation, please use `nn.Sequential`, `nn.Linear`, and `nn.Dropout` to define the required layers.

For comparison, using this notation, `TorchShallowNeuralClassifier` defines the following graph:

$$\begin{align}
h_{1} &= xW_{1} + b_{1} \\
h_{2} &= f(h_{1}) \\
h_{3} &= h_{2}W_{2} + b_{2}
\end{align}$$

The following code starts this sub-class for you, so that you can concentrate on `define_graph`. Be sure to make use of `self.dropout_prob`

For this problem, submit just your completed  `TorchDeepNeuralClassifier`. You needn't evaluate it, though we assume you will be keen to do that!

You can use `test_TorchDeepNeuralClassifier` to ensure that your network has the intended structure.

In [101]:
import torch.nn as nn

class TorchDeepNeuralClassifier(TorchShallowNeuralClassifier):
    def __init__(self, dropout_prob=0.7, **kwargs):
        self.dropout_prob = dropout_prob
        super().__init__(**kwargs)
    
    def define_graph(self):
        """Complete this method!
        
        Returns
        -------
        an `nn.Module` instance, which can be a free-standing class you 
        write yourself, as in `torch_rnn_classifier`, or the outpiut of 
        `nn.Sequential`, as in `torch_shallow_neural_classifier`.
        
        """
        ##### YOUR CODE HERE
        return nn.Sequential(
            nn.Linear(self.input_dim, self.hidden_dim),
            nn.Dropout(self.dropout_prob),
            self.hidden_activation,
            nn.Linear(self.hidden_dim, self.n_classes_))



    

##### YOUR CODE HERE    
net = TorchDeepNeuralClassifier(hidden_dim=50, max_iter=100)
word_disjoint_experiment = nli.wordentail_experiment(
    train_data=wordentail_data['word_disjoint']['train'],
    assess_data=wordentail_data['word_disjoint']['dev'], 
    model=net, 
    vector_func=glove_vec,
    vector_combo_func=vec_concatenate)

Finished epoch 100 of 100; error is 2.5403620302677155

              precision    recall  f1-score   support

           0      0.902     0.990     0.944      1910
           1      0.623     0.138     0.226       239

    accuracy                          0.895      2149
   macro avg      0.762     0.564     0.585      2149
weighted avg      0.871     0.895     0.864      2149



In [66]:
def test_TorchDeepNeuralClassifier(TorchDeepNeuralClassifier):
    dropout_prob = 0.55
    assert hasattr(TorchDeepNeuralClassifier(), "dropout_prob"), \
        "TorchDeepNeuralClassifier must have an attribute `dropout_prob`."
    try:
        inst = TorchDeepNeuralClassifier(dropout_prob=dropout_prob)
    except TypeError:
        raise TypeError("TorchDeepNeuralClassifier must allow the user "
                        "to set `dropout_prob` on initialization")
    inst.input_dim = 10
    inst.n_classes_ = 5
    graph = inst.define_graph()
    assert len(graph) == 4, \
        "The graph should have 4 layers; yours has {}".format(len(graph))    
    expected = {
        0: 'Linear',
        1: 'Dropout',
        2: 'Tanh',
        3: 'Linear'}
    for i, label in expected.items():
        name = graph[i].__class__.__name__
        assert label in name, \
            "The {} layer of the graph should be a {} layer; yours is {}".format(i, label, name)
    assert graph[1].p == dropout_prob, \
        "The user's value for `dropout_prob` should be the value of `p` for the Dropout layer."

In [67]:
test_TorchDeepNeuralClassifier(TorchDeepNeuralClassifier)

### Your original system [3 points]

This is a simple dataset, but our focus on the 'word_disjoint' condition ensures that it's a challenging one, and there are lots of modeling strategies one might adopt. 

You are free to do whatever you like. We require only that your system differ in some way from those defined in the preceding questions. They don't have to be completely different, though. For example, you might want to stick with the model but represent examples differently, or the reverse.

Keep in mind that, for the bake-off evaluation, the 'edge_disjoint' portions of the data are off limits. You can, though, train on the combination of the 'word_disjoint' 'train' and 'dev' portions. You are free to use different pretrained word vectors and the like. Please do not introduce additional entailment datasets into your training data, though.

Please embed your code in this notebook so that we can rerun it.

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies.

In [None]:
# Enter your system description in this cell.
# For the model, we implemented a deeper fully connected network with 2 hidden layers. Then we do a grid 
# search on all possible hyper-parameters including globe vector dimension (50, 100, 200, 300), dropout ratio 
# (0.3, 0.5, 0.7), network channel number (50, 100, 200), and number of iterations for training (100, 200, 300, 500).
# It turns out the best combination is using glove dimension 100, dropout ratio 0.5, network channel number 200, 
# and training iteration 200. The resulted F1 macro average number is 0.7068. We've also tried using retrofit vector
# and Alberta word embedding, none of them gives better results. 


# My peak score was: 0.7068
# This is my code
import torch.nn as nn

class TorchCustomNeuralClassifier(TorchShallowNeuralClassifier):
    def __init__(self, dropout_prob=0.7, **kwargs):
        self.dropout_prob = dropout_prob
        super().__init__(**kwargs)
    
    def define_graph(self):
        """Complete this method!
        
        Returns
        -------
        an `nn.Module` instance, which can be a free-standing class you 
        write yourself, as in `torch_rnn_classifier`, or the outpiut of 
        `nn.Sequential`, as in `torch_shallow_neural_classifier`.
        
        """
        ##### YOUR CODE HERE
        return nn.Sequential(
            nn.Linear(self.input_dim, self.hidden_dim),
            nn.Dropout(self.dropout_prob),
            self.hidden_activation,
            nn.Linear(self.hidden_dim, self.hidden_dim),
            nn.Dropout(self.dropout_prob),
            self.hidden_activation,
            nn.Linear(self.hidden_dim, self.n_classes_))


if 'IS_GRADESCOPE_ENV' not in os.environ:
    glove_dim = 100
    dropout = 0.5
    dim = 200
    iters = 200
    
    def glove_vec(w):    
        """Return `w`'s GloVe representation if available, else return 
        a random vector."""
        return GLOVE.get(w, randvec(w, n=glove_dim))
    glove_src = os.path.join(GLOVE_HOME, 'glove.6B.{}d.txt'.format(glove_dim))

    # Creates a dict mapping strings (words) to GloVe vectors:
    GLOVE = utils.glove2dict(glove_src)


    net = TorchCustomNeuralClassifier(hidden_dim=dim, max_iter=iters, dropout_prob=dropout)
    word_disjoint_experiment = nli.wordentail_experiment(
        train_data=wordentail_data['word_disjoint']['train'],
        assess_data=wordentail_data['word_disjoint']['dev'], 
        model=net, 
        vector_func=glove_vec,
        vector_combo_func=vec_concatenate)

# Please do not remove this comment.

In [9]:

# from nltk.corpus import wordnet as wn
# import retrofitting
# from retrofitting import Retrofitter

# glove_dim = 300

# glove_src = os.path.join(GLOVE_HOME, 'glove.6B.{}d.txt'.format(glove_dim))

# # Creates a dict mapping strings (words) to GloVe vectors:
# GLOVE = utils.glove2dict(glove_src)

# X_glove = pd.DataFrame(GLOVE).T

# def get_wordnet_edges():
#     edges = defaultdict(set)
#     for ss in wn.all_synsets():
#         lem_names = {lem.name() for lem in ss.lemmas()}
#         for lem in lem_names:
#             edges[lem] |= lem_names
#     return edges

# wn_edges = get_wordnet_edges()


# def convert_edges_to_indices(edges, Q):
#     lookup = dict(zip(Q.index, range(Q.shape[0])))
#     index_edges = defaultdict(set)
#     for start, finish_nodes in edges.items():
#         s = lookup.get(start)
#         if s:
#             f = {lookup[n] for n in finish_nodes if n in lookup}
#             if f:
#                 index_edges[s] = f
#     return index_edges
# wn_index_edges = convert_edges_to_indices(wn_edges, X_glove)

# wn_retro = Retrofitter(verbose=True)
# X_retro = wn_retro.fit(X_glove, wn_index_edges)

# def glove_vec(w):    
#     """Return `w`'s GloVe representation if available, else return 
#     a random vector."""
#     return GLOVE.get(w, randvec(w, n=glove_dim))

# net = TorchShallowNeuralClassifier(hidden_dim=50, max_iter=200)
# word_disjoint_experiment = nli.wordentail_experiment(
#     train_data=wordentail_data['word_disjoint']['train'],
#     assess_data=wordentail_data['word_disjoint']['dev'], 
#     model=net, 
#     vector_func=glove_vec,
#     vector_combo_func=vec_concatenate)

# import torch
# roberta = torch.hub.load('pytorch/fairseq', 'roberta.large')
# # roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.mnli')
# roberta.eval()

# # tokens = roberta.encode(['hello', 'hi'])
# # print(tokens)
# # last_layer_features = roberta.extract_features(tokens)
# # print(last_layer_features.shape)

# bert_mapping = {}
# def bert_vec(w):    
#     """Return `w`'s GloVe representation if available, else return 
#     a random vector."""
#     return bert_mapping[w]

# def bert_vec_mapping():
#     for w in wordentail_data['vocab']:
#         tokens = roberta.encode(w)
#         last_layer_features = roberta.extract_features(tokens)
#         bert_mapping[w] = last_layer_features[0,1,:].detach().numpy().shape

# bert_vec_mapping()


# glove_dim = 100


# # net = TorchShallowNeuralClassifier(hidden_dim=300, max_iter=1000)
# max_f1 = 0.0
# max_param = None
# for glove_dim in [50, 100, 200, 300]:              
#     def glove_vec(w):    
#         """Return `w`'s GloVe representation if available, else return 
#         a random vector."""
#         return GLOVE.get(w, randvec(w, n=glove_dim))
#     glove_src = os.path.join(GLOVE_HOME, 'glove.6B.{}d.txt'.format(glove_dim))

#     # Creates a dict mapping strings (words) to GloVe vectors:
#     GLOVE = utils.glove2dict(glove_src)
#     for dropout in [0.3,0.5,0.7]:
#         for dim in [50, 100, 200]:
#             for iters in [100, 200, 300, 500]:  


#                 net = TorchCustomNeuralClassifier(hidden_dim=dim, max_iter=iters, dropout_prob=dropout)
#                 word_disjoint_experiment = nli.wordentail_experiment(
#                     train_data=wordentail_data['word_disjoint']['train'],
#                     assess_data=wordentail_data['word_disjoint']['dev'], 
#                     model=net, 
#                     vector_func=glove_vec,
#                     vector_combo_func=vec_concatenate)
#                 print(dropout, dim, iters, glove_dim, word_disjoint_experiment['macro-F1'])
#                 if word_disjoint_experiment['macro-F1']>max_f1:
#                     max_f1 = word_disjoint_experiment['macro-F1']
#                     max_param = (dropout, dim, iters, glove_dim)

# print("----")
# print(max_param)
# print(max_f1)

Finished epoch 100 of 100; error is 1.1998164653778076

              precision    recall  f1-score   support

           0      0.917     0.968     0.941      1910
           1      0.534     0.297     0.382       239

    accuracy                          0.893      2149
   macro avg      0.725     0.632     0.662      2149
weighted avg      0.874     0.893     0.879      2149

0.3 50 100 50 0.6615683149009361


Finished epoch 200 of 200; error is 0.9480023384094238

              precision    recall  f1-score   support

           0      0.923     0.968     0.945      1910
           1      0.575     0.351     0.436       239

    accuracy                          0.899      2149
   macro avg      0.749     0.660     0.690      2149
weighted avg      0.884     0.899     0.888      2149

0.3 50 200 50 0.6904537323141974


Finished epoch 300 of 300; error is 0.8455384373664856

              precision    recall  f1-score   support

           0      0.918     0.968     0.942      1910
           1      0.548     0.310     0.396       239

    accuracy                          0.895      2149
   macro avg      0.733     0.639     0.669      2149
weighted avg      0.877     0.895     0.882      2149

0.3 50 300 50 0.669063816797222


Finished epoch 500 of 500; error is 0.6920782290399075

              precision    recall  f1-score   support

           0      0.915     0.970     0.941      1910
           1      0.532     0.276     0.364       239

    accuracy                          0.893      2149
   macro avg      0.723     0.623     0.652      2149
weighted avg      0.872     0.893     0.877      2149

0.3 50 500 50 0.6524662123137346


Finished epoch 100 of 100; error is 0.5887668505311012

              precision    recall  f1-score   support

           0      0.918     0.966     0.942      1910
           1      0.536     0.310     0.393       239

    accuracy                          0.893      2149
   macro avg      0.727     0.638     0.667      2149
weighted avg      0.875     0.893     0.881      2149

0.3 100 100 50 0.6670847378970747


Finished epoch 200 of 200; error is 0.37508489564061165

              precision    recall  f1-score   support

           0      0.924     0.971     0.947      1910
           1      0.613     0.364     0.457       239

    accuracy                          0.904      2149
   macro avg      0.768     0.668     0.702      2149
weighted avg      0.890     0.904     0.893      2149

0.3 100 200 50 0.701923173568073


Finished epoch 300 of 300; error is 0.37815694883465767

              precision    recall  f1-score   support

           0      0.926     0.968     0.946      1910
           1      0.595     0.381     0.464       239

    accuracy                          0.902      2149
   macro avg      0.760     0.674     0.705      2149
weighted avg      0.889     0.902     0.893      2149

0.3 100 300 50 0.7052611367127496


Finished epoch 500 of 500; error is 0.23948436975479126

              precision    recall  f1-score   support

           0      0.922     0.963     0.942      1910
           1      0.545     0.351     0.427       239

    accuracy                          0.895      2149
   macro avg      0.734     0.657     0.685      2149
weighted avg      0.880     0.895     0.885      2149

0.3 100 500 50 0.6849312390652031


Finished epoch 100 of 100; error is 0.31455796398222446

              precision    recall  f1-score   support

           0      0.917     0.979     0.947      1910
           1      0.627     0.289     0.395       239

    accuracy                          0.902      2149
   macro avg      0.772     0.634     0.671      2149
weighted avg      0.884     0.902     0.885      2149

0.3 200 100 50 0.6709921121810244


Finished epoch 200 of 200; error is 0.22625968232750893

              precision    recall  f1-score   support

           0      0.918     0.973     0.945      1910
           1      0.592     0.310     0.407       239

    accuracy                          0.899      2149
   macro avg      0.755     0.641     0.676      2149
weighted avg      0.882     0.899     0.885      2149

0.3 200 200 50 0.6758437292245121


Finished epoch 300 of 300; error is 0.19083105307072422

              precision    recall  f1-score   support

           0      0.919     0.973     0.945      1910
           1      0.591     0.314     0.410       239

    accuracy                          0.899      2149
   macro avg      0.755     0.643     0.677      2149
weighted avg      0.882     0.899     0.886      2149

0.3 200 300 50 0.6774510948418191


Finished epoch 500 of 500; error is 0.12709568557329476

              precision    recall  f1-score   support

           0      0.919     0.969     0.943      1910
           1      0.559     0.318     0.405       239

    accuracy                          0.896      2149
   macro avg      0.739     0.643     0.674      2149
weighted avg      0.879     0.896     0.883      2149

0.3 200 500 50 0.674244540742629


Finished epoch 100 of 100; error is 1.8614066541194916

              precision    recall  f1-score   support

           0      0.917     0.979     0.947      1910
           1      0.627     0.289     0.395       239

    accuracy                          0.902      2149
   macro avg      0.772     0.634     0.671      2149
weighted avg      0.884     0.902     0.885      2149

0.5 50 100 50 0.6709921121810244


Finished epoch 200 of 200; error is 1.7258075475692751

              precision    recall  f1-score   support

           0      0.909     0.985     0.946      1910
           1      0.646     0.213     0.321       239

    accuracy                          0.899      2149
   macro avg      0.777     0.599     0.633      2149
weighted avg      0.880     0.899     0.876      2149

0.5 50 200 50 0.6332416800986063


Finished epoch 300 of 300; error is 1.6514739096164703

              precision    recall  f1-score   support

           0      0.909     0.985     0.945      1910
           1      0.637     0.213     0.320       239

    accuracy                          0.899      2149
   macro avg      0.773     0.599     0.633      2149
weighted avg      0.879     0.899     0.876      2149

0.5 50 300 50 0.63260645032187


Finished epoch 500 of 500; error is 1.5279100090265274

              precision    recall  f1-score   support

           0      0.910     0.982     0.945      1910
           1      0.614     0.226     0.330       239

    accuracy                          0.898      2149
   macro avg      0.762     0.604     0.638      2149
weighted avg      0.877     0.898     0.877      2149

0.5 50 500 50 0.6375626965222635


Finished epoch 100 of 100; error is 1.2023473531007767

              precision    recall  f1-score   support

           0      0.916     0.979     0.947      1910
           1      0.630     0.285     0.392       239

    accuracy                          0.902      2149
   macro avg      0.773     0.632     0.669      2149
weighted avg      0.884     0.902     0.885      2149

0.5 100 100 50 0.6692633171334438


Finished epoch 200 of 200; error is 0.9936803504824638

              precision    recall  f1-score   support

           0      0.919     0.980     0.949      1910
           1      0.658     0.314     0.425       239

    accuracy                          0.906      2149
   macro avg      0.789     0.647     0.687      2149
weighted avg      0.890     0.906     0.890      2149

0.5 100 200 50 0.6867358186394368


Finished epoch 300 of 300; error is 0.9402824118733406

              precision    recall  f1-score   support

           0      0.908     0.985     0.945      1910
           1      0.636     0.205     0.310       239

    accuracy                          0.899      2149
   macro avg      0.772     0.595     0.628      2149
weighted avg      0.878     0.899     0.875      2149

0.5 100 300 50 0.6276901118323595


Finished epoch 500 of 500; error is 0.9116632789373398

              precision    recall  f1-score   support

           0      0.908     0.987     0.946      1910
           1      0.662     0.197     0.303       239

    accuracy                          0.899      2149
   macro avg      0.785     0.592     0.625      2149
weighted avg      0.880     0.899     0.874      2149

0.5 100 500 50 0.6245316594946129


Finished epoch 100 of 100; error is 0.7279393561184406

              precision    recall  f1-score   support

           0      0.915     0.981     0.947      1910
           1      0.644     0.272     0.382       239

    accuracy                          0.902      2149
   macro avg      0.779     0.627     0.665      2149
weighted avg      0.885     0.902     0.884      2149

0.5 200 100 50 0.6646479208156229


Finished epoch 200 of 200; error is 0.5861949585378171

              precision    recall  f1-score   support

           0      0.921     0.977     0.948      1910
           1      0.642     0.331     0.436       239

    accuracy                          0.905      2149
   macro avg      0.782     0.654     0.692      2149
weighted avg      0.890     0.905     0.891      2149

0.5 200 200 50 0.6923174100525535


Finished epoch 300 of 300; error is 0.56400155276060187

              precision    recall  f1-score   support

           0      0.911     0.984     0.946      1910
           1      0.647     0.230     0.340       239

    accuracy                          0.900      2149
   macro avg      0.779     0.607     0.643      2149
weighted avg      0.882     0.900     0.879      2149

0.5 200 300 50 0.6428280738379716


Finished epoch 500 of 500; error is 0.43929134495556355

              precision    recall  f1-score   support

           0      0.914     0.980     0.946      1910
           1      0.624     0.264     0.371       239

    accuracy                          0.900      2149
   macro avg      0.769     0.622     0.658      2149
weighted avg      0.882     0.900     0.882      2149

0.5 200 500 50 0.6582602621644918


Finished epoch 100 of 100; error is 2.524699628353119

              precision    recall  f1-score   support

           0      0.916     0.972     0.943      1910
           1      0.561     0.289     0.381       239

    accuracy                          0.896      2149
   macro avg      0.739     0.630     0.662      2149
weighted avg      0.877     0.896     0.881      2149

0.7 50 100 50 0.6621524502537843


Finished epoch 200 of 200; error is 2.5196665078401566

              precision    recall  f1-score   support

           0      0.909     0.976     0.941      1910
           1      0.535     0.222     0.314       239

    accuracy                          0.892      2149
   macro avg      0.722     0.599     0.628      2149
weighted avg      0.868     0.892     0.872      2149

0.7 50 200 50 0.6275118044348813


Finished epoch 300 of 300; error is 2.4628156721591954

              precision    recall  f1-score   support

           0      0.910     0.980     0.944      1910
           1      0.581     0.226     0.325       239

    accuracy                          0.896      2149
   macro avg      0.745     0.603     0.634      2149
weighted avg      0.873     0.896     0.875      2149

0.7 50 300 50 0.6344105620667237


Finished epoch 500 of 500; error is 2.3586134463548663

              precision    recall  f1-score   support

           0      0.917     0.971     0.944      1910
           1      0.567     0.301     0.393       239

    accuracy                          0.897      2149
   macro avg      0.742     0.636     0.668      2149
weighted avg      0.878     0.897     0.882      2149

0.7 50 500 50 0.6684914030318696


Finished epoch 100 of 100; error is 2.1830798387527466

              precision    recall  f1-score   support

           0      0.911     0.982     0.945      1910
           1      0.618     0.230     0.335       239

    accuracy                          0.899      2149
   macro avg      0.764     0.606     0.640      2149
weighted avg      0.878     0.899     0.877      2149

0.7 100 100 50 0.6402270074338022


Finished epoch 200 of 200; error is 2.0344047844409943

              precision    recall  f1-score   support

           0      0.913     0.983     0.947      1910
           1      0.656     0.255     0.367       239

    accuracy                          0.902      2149
   macro avg      0.785     0.619     0.657      2149
weighted avg      0.885     0.902     0.883      2149

0.7 100 200 50 0.6572599019375536


Finished epoch 300 of 300; error is 1.9342128783464432

              precision    recall  f1-score   support

           0      0.910     0.985     0.946      1910
           1      0.659     0.226     0.336       239

    accuracy                          0.901      2149
   macro avg      0.785     0.606     0.641      2149
weighted avg      0.882     0.901     0.879      2149

0.7 100 300 50 0.6414453199354231


Finished epoch 500 of 500; error is 1.8713262975215912

              precision    recall  f1-score   support

           0      0.907     0.986     0.945      1910
           1      0.625     0.188     0.289       239

    accuracy                          0.897      2149
   macro avg      0.766     0.587     0.617      2149
weighted avg      0.875     0.897     0.872      2149

0.7 100 500 50 0.6169794597715889


Finished epoch 100 of 100; error is 1.7333941757678986

              precision    recall  f1-score   support

           0      0.914     0.984     0.948      1910
           1      0.670     0.264     0.378       239

    accuracy                          0.904      2149
   macro avg      0.792     0.624     0.663      2149
weighted avg      0.887     0.904     0.884      2149

0.7 200 100 50 0.6630857843972598


Finished epoch 200 of 200; error is 1.5667931735515594

              precision    recall  f1-score   support

           0      0.912     0.989     0.949      1910
           1      0.727     0.234     0.354       239

    accuracy                          0.905      2149
   macro avg      0.819     0.612     0.652      2149
weighted avg      0.891     0.905     0.883      2149

0.7 200 200 50 0.6515999211642264


Finished epoch 300 of 300; error is 1.3807200714945793

              precision    recall  f1-score   support

           0      0.910     0.988     0.948      1910
           1      0.697     0.222     0.337       239

    accuracy                          0.903      2149
   macro avg      0.804     0.605     0.642      2149
weighted avg      0.887     0.903     0.880      2149

0.7 200 300 50 0.6420174631070941


Finished epoch 500 of 500; error is 1.2947669476270676

              precision    recall  f1-score   support

           0      0.906     0.990     0.946      1910
           1      0.694     0.180     0.286       239

    accuracy                          0.900      2149
   macro avg      0.800     0.585     0.616      2149
weighted avg      0.882     0.900     0.873      2149

0.7 200 500 50 0.6159619714786089


Finished epoch 100 of 100; error is 0.6498827189207077

              precision    recall  f1-score   support

           0      0.921     0.966     0.943      1910
           1      0.555     0.339     0.421       239

    accuracy                          0.896      2149
   macro avg      0.738     0.652     0.682      2149
weighted avg      0.880     0.896     0.885      2149

0.3 50 100 100 0.6818948493367099


Finished epoch 200 of 200; error is 0.47510512173175817

              precision    recall  f1-score   support

           0      0.923     0.949     0.935      1910
           1      0.470     0.364     0.410       239

    accuracy                          0.884      2149
   macro avg      0.696     0.656     0.673      2149
weighted avg      0.872     0.884     0.877      2149

0.3 50 200 100 0.672922287918489


Finished epoch 300 of 300; error is 0.36259709671139717

              precision    recall  f1-score   support

           0      0.924     0.951     0.937      1910
           1      0.489     0.377     0.426       239

    accuracy                          0.887      2149
   macro avg      0.707     0.664     0.681      2149
weighted avg      0.876     0.887     0.880      2149

0.3 50 300 100 0.6814111187371311


Finished epoch 500 of 500; error is 0.26975079253315926

              precision    recall  f1-score   support

           0      0.914     0.968     0.940      1910
           1      0.516     0.276     0.360       239

    accuracy                          0.891      2149
   macro avg      0.715     0.622     0.650      2149
weighted avg      0.870     0.891     0.876      2149

0.3 50 500 100 0.6499458991860271


Finished epoch 100 of 100; error is 0.2478663120418787

              precision    recall  f1-score   support

           0      0.918     0.966     0.942      1910
           1      0.536     0.314     0.396       239

    accuracy                          0.893      2149
   macro avg      0.727     0.640     0.669      2149
weighted avg      0.876     0.893     0.881      2149

0.3 100 100 100 0.6686725451608798


Finished epoch 200 of 200; error is 0.15650203265249734

              precision    recall  f1-score   support

           0      0.926     0.946     0.936      1910
           1      0.475     0.393     0.430       239

    accuracy                          0.884      2149
   macro avg      0.700     0.669     0.683      2149
weighted avg      0.876     0.884     0.879      2149

0.3 100 200 100 0.6828574425828431


Finished epoch 300 of 300; error is 0.14477656874805691

              precision    recall  f1-score   support

           0      0.924     0.951     0.938      1910
           1      0.492     0.377     0.427       239

    accuracy                          0.887      2149
   macro avg      0.708     0.664     0.682      2149
weighted avg      0.876     0.887     0.881      2149

0.3 100 300 100 0.6820523919220969


Finished epoch 500 of 500; error is 0.12513916147872806

              precision    recall  f1-score   support

           0      0.926     0.953     0.939      1910
           1      0.508     0.389     0.441       239

    accuracy                          0.890      2149
   macro avg      0.717     0.671     0.690      2149
weighted avg      0.879     0.890     0.884      2149

0.3 100 500 100 0.6899353904694829


Finished epoch 100 of 100; error is 0.12943522864952683

              precision    recall  f1-score   support

           0      0.919     0.966     0.942      1910
           1      0.546     0.322     0.405       239

    accuracy                          0.895      2149
   macro avg      0.733     0.644     0.674      2149
weighted avg      0.878     0.895     0.883      2149

0.3 200 100 100 0.673790333413933


Finished epoch 200 of 200; error is 0.09955076361075044

              precision    recall  f1-score   support

           0      0.930     0.947     0.938      1910
           1      0.500     0.427     0.460       239

    accuracy                          0.889      2149
   macro avg      0.715     0.687     0.699      2149
weighted avg      0.882     0.889     0.885      2149

0.3 200 200 100 0.6992496040146039


Finished epoch 300 of 300; error is 0.141114001162350186

              precision    recall  f1-score   support

           0      0.922     0.959     0.940      1910
           1      0.521     0.356     0.423       239

    accuracy                          0.892      2149
   macro avg      0.722     0.657     0.682      2149
weighted avg      0.878     0.892     0.883      2149

0.3 200 300 100 0.6816686587595902


Finished epoch 500 of 500; error is 0.086459193378686926

              precision    recall  f1-score   support

           0      0.917     0.966     0.941      1910
           1      0.529     0.305     0.387       239

    accuracy                          0.893      2149
   macro avg      0.723     0.636     0.664      2149
weighted avg      0.874     0.893     0.879      2149

0.3 200 500 100 0.6641771810228133


Finished epoch 100 of 100; error is 1.295236013829708

              precision    recall  f1-score   support

           0      0.922     0.966     0.944      1910
           1      0.564     0.351     0.433       239

    accuracy                          0.898      2149
   macro avg      0.743     0.659     0.688      2149
weighted avg      0.883     0.898     0.887      2149

0.5 50 100 100 0.688361853033459


Finished epoch 200 of 200; error is 1.1103333532810211

              precision    recall  f1-score   support

           0      0.913     0.982     0.946      1910
           1      0.635     0.255     0.364       239

    accuracy                          0.901      2149
   macro avg      0.774     0.618     0.655      2149
weighted avg      0.882     0.901     0.882      2149

0.5 50 200 100 0.6552159716180641


Finished epoch 300 of 300; error is 0.9951177611947062

              precision    recall  f1-score   support

           0      0.919     0.969     0.943      1910
           1      0.559     0.318     0.405       239

    accuracy                          0.896      2149
   macro avg      0.739     0.643     0.674      2149
weighted avg      0.879     0.896     0.883      2149

0.5 50 300 100 0.674244540742629


Finished epoch 500 of 500; error is 0.8930237516760826

              precision    recall  f1-score   support

           0      0.917     0.971     0.943      1910
           1      0.559     0.297     0.388       239

    accuracy                          0.896      2149
   macro avg      0.738     0.634     0.666      2149
weighted avg      0.877     0.896     0.881      2149

0.5 50 500 100 0.6655048390952198


Finished epoch 100 of 100; error is 0.725236538797617

              precision    recall  f1-score   support

           0      0.925     0.955     0.940      1910
           1      0.517     0.385     0.441       239

    accuracy                          0.892      2149
   macro avg      0.721     0.670     0.691      2149
weighted avg      0.880     0.892     0.884      2149

0.5 100 100 100 0.6906054646105326


Finished epoch 200 of 200; error is 0.5480327829718592

              precision    recall  f1-score   support

           0      0.919     0.972     0.945      1910
           1      0.589     0.318     0.413       239

    accuracy                          0.899      2149
   macro avg      0.754     0.645     0.679      2149
weighted avg      0.883     0.899     0.886      2149

0.5 100 200 100 0.6790408230999005


Finished epoch 300 of 300; error is 0.46839436888694763

              precision    recall  f1-score   support

           0      0.915     0.980     0.947      1910
           1      0.635     0.276     0.385       239

    accuracy                          0.902      2149
   macro avg      0.775     0.628     0.666      2149
weighted avg      0.884     0.902     0.884      2149

0.5 100 300 100 0.6657447302561985


Finished epoch 500 of 500; error is 0.38407134450972083

              precision    recall  f1-score   support

           0      0.918     0.971     0.944      1910
           1      0.570     0.305     0.398       239

    accuracy                          0.897      2149
   macro avg      0.744     0.638     0.671      2149
weighted avg      0.879     0.897     0.883      2149

0.5 100 500 100 0.6708001860430297


Finished epoch 100 of 100; error is 0.38020011223852634

              precision    recall  f1-score   support

           0      0.922     0.966     0.944      1910
           1      0.562     0.343     0.426       239

    accuracy                          0.897      2149
   macro avg      0.742     0.655     0.685      2149
weighted avg      0.882     0.897     0.886      2149

0.5 200 100 100 0.6847478103292056


Finished epoch 200 of 200; error is 0.28921281918883324

              precision    recall  f1-score   support

           0      0.928     0.961     0.944      1910
           1      0.565     0.402     0.469       239

    accuracy                          0.899      2149
   macro avg      0.746     0.681     0.707      2149
weighted avg      0.887     0.899     0.891      2149

0.5 200 200 100 0.7068196235259503


Finished epoch 300 of 300; error is 0.28513365797698536

              precision    recall  f1-score   support

           0      0.924     0.976     0.949      1910
           1      0.649     0.356     0.459       239

    accuracy                          0.907      2149
   macro avg      0.786     0.666     0.704      2149
weighted avg      0.893     0.907     0.895      2149

0.5 200 300 100 0.7042714812572246


Finished epoch 500 of 500; error is 0.23476530704647303

              precision    recall  f1-score   support

           0      0.924     0.975     0.949      1910
           1      0.647     0.360     0.462       239

    accuracy                          0.907      2149
   macro avg      0.785     0.668     0.706      2149
weighted avg      0.893     0.907     0.895      2149

0.5 200 500 100 0.705711578174727


Finished epoch 100 of 100; error is 2.3449499011039734

              precision    recall  f1-score   support

           0      0.913     0.960     0.936      1910
           1      0.458     0.272     0.341       239

    accuracy                          0.883      2149
   macro avg      0.686     0.616     0.639      2149
weighted avg      0.863     0.883     0.870      2149

0.7 50 100 100 0.638563848142929


Finished epoch 200 of 200; error is 1.9962491393089294

              precision    recall  f1-score   support

           0      0.913     0.966     0.939      1910
           1      0.492     0.264     0.343       239

    accuracy                          0.888      2149
   macro avg      0.703     0.615     0.641      2149
weighted avg      0.866     0.888     0.872      2149

0.7 50 200 100 0.6410083476758831


Finished epoch 300 of 300; error is 1.9423058927059174

              precision    recall  f1-score   support

           0      0.916     0.965     0.940      1910
           1      0.511     0.293     0.372       239

    accuracy                          0.890      2149
   macro avg      0.713     0.629     0.656      2149
weighted avg      0.871     0.890     0.877      2149

0.7 50 300 100 0.6560835223019085


Finished epoch 500 of 500; error is 1.8452902436256409

              precision    recall  f1-score   support

           0      0.917     0.966     0.941      1910
           1      0.529     0.301     0.384       239

    accuracy                          0.893      2149
   macro avg      0.723     0.634     0.663      2149
weighted avg      0.874     0.893     0.879      2149

0.7 50 500 100 0.6625582462401224


Finished epoch 100 of 100; error is 1.715098500251772

              precision    recall  f1-score   support

           0      0.920     0.964     0.942      1910
           1      0.537     0.331     0.409       239

    accuracy                          0.894      2149
   macro avg      0.729     0.647     0.676      2149
weighted avg      0.878     0.894     0.883      2149

0.7 100 100 100 0.6755221081407546


Finished epoch 200 of 200; error is 1.4907733201980594

              precision    recall  f1-score   support

           0      0.912     0.977     0.943      1910
           1      0.573     0.247     0.345       239

    accuracy                          0.896      2149
   macro avg      0.742     0.612     0.644      2149
weighted avg      0.874     0.896     0.877      2149

0.7 100 200 100 0.6442031942005333


Finished epoch 300 of 300; error is 1.5076206177473068

              precision    recall  f1-score   support

           0      0.912     0.981     0.945      1910
           1      0.611     0.243     0.347       239

    accuracy                          0.899      2149
   macro avg      0.761     0.612     0.646      2149
weighted avg      0.878     0.899     0.879      2149

0.7 100 300 100 0.6461552173151175


Finished epoch 500 of 500; error is 1.3614499792456627

              precision    recall  f1-score   support

           0      0.914     0.976     0.944      1910
           1      0.578     0.264     0.362       239

    accuracy                          0.897      2149
   macro avg      0.746     0.620     0.653      2149
weighted avg      0.876     0.897     0.879      2149

0.7 100 500 100 0.6529332169358358


Finished epoch 100 of 100; error is 1.232787236571312

              precision    recall  f1-score   support

           0      0.920     0.975     0.946      1910
           1      0.613     0.318     0.419       239

    accuracy                          0.902      2149
   macro avg      0.766     0.646     0.683      2149
weighted avg      0.885     0.902     0.888      2149

0.7 200 100 100 0.6825557177411168


Finished epoch 200 of 200; error is 1.1237896680831914

              precision    recall  f1-score   support

           0      0.914     0.981     0.946      1910
           1      0.626     0.259     0.367       239

    accuracy                          0.900      2149
   macro avg      0.770     0.620     0.656      2149
weighted avg      0.882     0.900     0.882      2149

0.7 200 200 100 0.6564117506425199


Finished epoch 300 of 300; error is 0.9636130332946777

              precision    recall  f1-score   support

           0      0.918     0.975     0.946      1910
           1      0.608     0.305     0.407       239

    accuracy                          0.901      2149
   macro avg      0.763     0.640     0.676      2149
weighted avg      0.884     0.901     0.886      2149

0.7 200 300 100 0.6763052992678741


Finished epoch 500 of 500; error is 0.9842429384589195

              precision    recall  f1-score   support

           0      0.917     0.983     0.948      1910
           1      0.673     0.285     0.400       239

    accuracy                          0.905      2149
   macro avg      0.795     0.634     0.674      2149
weighted avg      0.889     0.905     0.887      2149

0.7 200 500 100 0.6742294087923193


Finished epoch 100 of 100; error is 0.3418534845113754

              precision    recall  f1-score   support

           0      0.913     0.951     0.932      1910
           1      0.415     0.276     0.332       239

    accuracy                          0.876      2149
   macro avg      0.664     0.614     0.632      2149
weighted avg      0.858     0.876     0.865      2149

0.3 50 100 200 0.6317265816260791


Finished epoch 200 of 200; error is 0.26090736687183384

              precision    recall  f1-score   support

           0      0.915     0.947     0.931      1910
           1      0.416     0.301     0.350       239

    accuracy                          0.875      2149
   macro avg      0.666     0.624     0.640      2149
weighted avg      0.860     0.875     0.866      2149

0.3 50 200 200 0.6402745229327085


Finished epoch 300 of 300; error is 0.20307589694857597

              precision    recall  f1-score   support

           0      0.913     0.951     0.932      1910
           1      0.415     0.276     0.332       239

    accuracy                          0.876      2149
   macro avg      0.664     0.614     0.632      2149
weighted avg      0.858     0.876     0.865      2149

0.3 50 300 200 0.6317265816260791


Finished epoch 500 of 500; error is 0.18675629142671824

              precision    recall  f1-score   support

           0      0.914     0.953     0.933      1910
           1      0.427     0.280     0.338       239

    accuracy                          0.878      2149
   macro avg      0.670     0.617     0.636      2149
weighted avg      0.860     0.878     0.867      2149

0.3 50 500 200 0.635619392282642


Finished epoch 100 of 100; error is 0.18888165894895792

              precision    recall  f1-score   support

           0      0.921     0.952     0.937      1910
           1      0.480     0.351     0.406       239

    accuracy                          0.886      2149
   macro avg      0.701     0.652     0.671      2149
weighted avg      0.872     0.886     0.878      2149

0.3 100 100 200 0.671230167614442


Finished epoch 200 of 200; error is 0.12836042442359033

              precision    recall  f1-score   support

           0      0.922     0.953     0.938      1910
           1      0.491     0.360     0.415       239

    accuracy                          0.887      2149
   macro avg      0.707     0.657     0.677      2149
weighted avg      0.875     0.887     0.880      2149

0.3 100 200 200 0.6765760185475405


Finished epoch 300 of 300; error is 0.12455742945894599

              precision    recall  f1-score   support

           0      0.920     0.955     0.938      1910
           1      0.488     0.339     0.400       239

    accuracy                          0.887      2149
   macro avg      0.704     0.647     0.669      2149
weighted avg      0.872     0.887     0.878      2149

0.3 100 300 200 0.668790136141793


Finished epoch 500 of 500; error is 0.10621501924470067

              precision    recall  f1-score   support

           0      0.920     0.958     0.939      1910
           1      0.500     0.335     0.401       239

    accuracy                          0.889      2149
   macro avg      0.710     0.646     0.670      2149
weighted avg      0.873     0.889     0.879      2149

0.3 100 500 200 0.6698523688035168


Finished epoch 100 of 100; error is 0.0894282846711576

              precision    recall  f1-score   support

           0      0.923     0.941     0.932      1910
           1      0.440     0.368     0.401       239

    accuracy                          0.878      2149
   macro avg      0.681     0.655     0.666      2149
weighted avg      0.869     0.878     0.873      2149

0.3 200 100 200 0.6663793953253082


Finished epoch 200 of 200; error is 0.09627418685704473

              precision    recall  f1-score   support

           0      0.924     0.940     0.932      1910
           1      0.442     0.381     0.409       239

    accuracy                          0.878      2149
   macro avg      0.683     0.660     0.670      2149
weighted avg      0.870     0.878     0.874      2149

0.3 200 200 200 0.6703651320873565


Finished epoch 300 of 300; error is 0.149441602174192674

              precision    recall  f1-score   support

           0      0.921     0.942     0.931      1910
           1      0.433     0.351     0.388       239

    accuracy                          0.877      2149
   macro avg      0.677     0.647     0.660      2149
weighted avg      0.866     0.877     0.871      2149

0.3 200 300 200 0.6597133629511007


Finished epoch 500 of 500; error is 0.090327335055917556

              precision    recall  f1-score   support

           0      0.922     0.947     0.934      1910
           1      0.457     0.356     0.400       239

    accuracy                          0.881      2149
   macro avg      0.689     0.651     0.667      2149
weighted avg      0.870     0.881     0.875      2149

0.3 200 500 200 0.6670797831138652


Finished epoch 100 of 100; error is 0.9805916845798492

              precision    recall  f1-score   support

           0      0.909     0.965     0.936      1910
           1      0.446     0.226     0.300       239

    accuracy                          0.883      2149
   macro avg      0.678     0.595     0.618      2149
weighted avg      0.857     0.883     0.865      2149

0.5 50 100 200 0.61800406297613


Finished epoch 200 of 200; error is 0.7437615096569061

              precision    recall  f1-score   support

           0      0.911     0.964     0.937      1910
           1      0.460     0.243     0.318       239

    accuracy                          0.884      2149
   macro avg      0.685     0.604     0.627      2149
weighted avg      0.860     0.884     0.868      2149

0.5 50 200 200 0.6272488845699716


Finished epoch 300 of 300; error is 0.6955608502030373

              precision    recall  f1-score   support

           0      0.911     0.961     0.936      1910
           1      0.448     0.251     0.322       239

    accuracy                          0.882      2149
   macro avg      0.679     0.606     0.629      2149
weighted avg      0.860     0.882     0.867      2149

0.5 50 300 200 0.6286286094841276


Finished epoch 500 of 500; error is 0.6332023888826372

              precision    recall  f1-score   support

           0      0.917     0.957     0.936      1910
           1      0.468     0.305     0.370       239

    accuracy                          0.884      2149
   macro avg      0.692     0.631     0.653      2149
weighted avg      0.867     0.884     0.873      2149

0.5 50 500 200 0.6529115869973438


Finished epoch 100 of 100; error is 0.530303843319416

              precision    recall  f1-score   support

           0      0.919     0.962     0.940      1910
           1      0.517     0.322     0.397       239

    accuracy                          0.891      2149
   macro avg      0.718     0.642     0.669      2149
weighted avg      0.874     0.891     0.880      2149

0.5 100 100 200 0.6685303345901337


Finished epoch 200 of 200; error is 0.41549502313137054

              precision    recall  f1-score   support

           0      0.915     0.967     0.940      1910
           1      0.519     0.285     0.368       239

    accuracy                          0.891      2149
   macro avg      0.717     0.626     0.654      2149
weighted avg      0.871     0.891     0.877      2149

0.5 100 200 200 0.6539976330709528


Finished epoch 300 of 300; error is 0.38132304698228836

              precision    recall  f1-score   support

           0      0.916     0.966     0.940      1910
           1      0.519     0.289     0.371       239

    accuracy                          0.891      2149
   macro avg      0.717     0.628     0.656      2149
weighted avg      0.872     0.891     0.877      2149

0.5 100 300 200 0.6556825464644307


Finished epoch 500 of 500; error is 0.26285112649202347

              precision    recall  f1-score   support

           0      0.912     0.968     0.939      1910
           1      0.492     0.251     0.332       239

    accuracy                          0.888      2149
   macro avg      0.702     0.609     0.636      2149
weighted avg      0.865     0.888     0.871      2149

0.5 100 500 200 0.6355979249354621


Finished epoch 100 of 100; error is 0.35275446996092796

              precision    recall  f1-score   support

           0      0.916     0.965     0.940      1910
           1      0.507     0.289     0.368       239

    accuracy                          0.890      2149
   macro avg      0.711     0.627     0.654      2149
weighted avg      0.870     0.890     0.876      2149

0.5 200 100 200 0.6537935253632424


Finished epoch 200 of 200; error is 0.29693037830293187

              precision    recall  f1-score   support

           0      0.923     0.955     0.939      1910
           1      0.503     0.360     0.420       239

    accuracy                          0.889      2149
   macro avg      0.713     0.658     0.679      2149
weighted avg      0.876     0.889     0.881      2149

0.5 200 200 200 0.6791491016762019


Finished epoch 300 of 300; error is 0.24316098727285862

              precision    recall  f1-score   support

           0      0.919     0.958     0.938      1910
           1      0.487     0.322     0.388       239

    accuracy                          0.887      2149
   macro avg      0.703     0.640     0.663      2149
weighted avg      0.871     0.887     0.877      2149

0.5 200 300 200 0.662808799913734


Finished epoch 500 of 500; error is 0.21679653041064743

              precision    recall  f1-score   support

           0      0.914     0.973     0.942      1910
           1      0.552     0.268     0.361       239

    accuracy                          0.894      2149
   macro avg      0.733     0.620     0.651      2149
weighted avg      0.874     0.894     0.878      2149

0.5 200 500 200 0.6514965011984154


Finished epoch 100 of 100; error is 2.0367581099271774

              precision    recall  f1-score   support

           0      0.910     0.958     0.933      1910
           1      0.417     0.243     0.307       239

    accuracy                          0.878      2149
   macro avg      0.664     0.600     0.620      2149
weighted avg      0.855     0.878     0.864      2149

0.7 50 100 200 0.6200207860922147


Finished epoch 200 of 200; error is 1.8980394452810287

              precision    recall  f1-score   support

           0      0.914     0.961     0.937      1910
           1      0.475     0.280     0.353       239

    accuracy                          0.886      2149
   macro avg      0.695     0.621     0.645      2149
weighted avg      0.866     0.886     0.872      2149

0.7 50 200 200 0.6449222213266704


Finished epoch 300 of 300; error is 1.7050997763872147

              precision    recall  f1-score   support

           0      0.912     0.962     0.936      1910
           1      0.459     0.259     0.332       239

    accuracy                          0.884      2149
   macro avg      0.686     0.611     0.634      2149
weighted avg      0.862     0.884     0.869      2149

0.7 50 300 200 0.6339201513243606


Finished epoch 500 of 500; error is 1.5898693427443504

              precision    recall  f1-score   support

           0      0.912     0.963     0.937      1910
           1      0.462     0.255     0.329       239

    accuracy                          0.884      2149
   macro avg      0.687     0.609     0.633      2149
weighted avg      0.862     0.884     0.869      2149

0.7 50 500 200 0.6327168946480822


Finished epoch 100 of 100; error is 1.6153663992881775

              precision    recall  f1-score   support

           0      0.909     0.967     0.937      1910
           1      0.466     0.230     0.308       239

    accuracy                          0.885      2149
   macro avg      0.688     0.599     0.623      2149
weighted avg      0.860     0.885     0.867      2149

0.7 100 100 200 0.6227244005950515


Finished epoch 200 of 200; error is 1.3792806267738342

              precision    recall  f1-score   support

           0      0.912     0.967     0.939      1910
           1      0.492     0.255     0.336       239

    accuracy                          0.888      2149
   macro avg      0.702     0.611     0.637      2149
weighted avg      0.865     0.888     0.872      2149

0.7 100 200 200 0.6374214596000434


Finished epoch 300 of 300; error is 1.2593568116426468

              precision    recall  f1-score   support

           0      0.908     0.975     0.941      1910
           1      0.520     0.213     0.303       239

    accuracy                          0.891      2149
   macro avg      0.714     0.594     0.622      2149
weighted avg      0.865     0.891     0.870      2149

0.7 100 300 200 0.6216710853671967


Finished epoch 500 of 500; error is 1.2042898610234268

              precision    recall  f1-score   support

           0      0.908     0.971     0.939      1910
           1      0.486     0.218     0.301       239

    accuracy                          0.887      2149
   macro avg      0.697     0.594     0.620      2149
weighted avg      0.861     0.887     0.868      2149

0.7 100 500 200 0.6196716084341578


Finished epoch 100 of 100; error is 1.1736869812011719

              precision    recall  f1-score   support

           0      0.916     0.971     0.943      1910
           1      0.552     0.289     0.379       239

    accuracy                          0.895      2149
   macro avg      0.734     0.630     0.661      2149
weighted avg      0.876     0.895     0.880      2149

0.7 200 100 200 0.6608364944663878


Finished epoch 200 of 200; error is 0.9564328417181969

              precision    recall  f1-score   support

           0      0.916     0.965     0.940      1910
           1      0.515     0.293     0.373       239

    accuracy                          0.891      2149
   macro avg      0.715     0.629     0.657      2149
weighted avg      0.871     0.891     0.877      2149

0.7 200 200 200 0.656715098988869


Finished epoch 300 of 300; error is 0.9442865848541263

              precision    recall  f1-score   support

           0      0.911     0.974     0.942      1910
           1      0.538     0.238     0.330       239

    accuracy                          0.893      2149
   macro avg      0.724     0.606     0.636      2149
weighted avg      0.869     0.893     0.874      2149

0.7 200 300 200 0.635999076100705


Finished epoch 500 of 500; error is 0.8457340747117996

              precision    recall  f1-score   support

           0      0.907     0.973     0.939      1910
           1      0.480     0.201     0.283       239

    accuracy                          0.887      2149
   macro avg      0.693     0.587     0.611      2149
weighted avg      0.859     0.887     0.866      2149

0.7 200 500 200 0.6109033522812366


Finished epoch 100 of 100; error is 0.28221018984913826

              precision    recall  f1-score   support

           0      0.918     0.944     0.931      1910
           1      0.418     0.322     0.364       239

    accuracy                          0.875      2149
   macro avg      0.668     0.633     0.647      2149
weighted avg      0.862     0.875     0.868      2149

0.3 50 100 300 0.6473234195073592


Finished epoch 200 of 200; error is 0.18176826741546392

              precision    recall  f1-score   support

           0      0.914     0.939     0.927      1910
           1      0.380     0.297     0.333       239

    accuracy                          0.868      2149
   macro avg      0.647     0.618     0.630      2149
weighted avg      0.855     0.868     0.861      2149

0.3 50 200 300 0.6299931129476584


Finished epoch 300 of 300; error is 0.16412707231938844

              precision    recall  f1-score   support

           0      0.917     0.943     0.930      1910
           1      0.408     0.314     0.355       239

    accuracy                          0.873      2149
   macro avg      0.662     0.628     0.642      2149
weighted avg      0.860     0.873     0.866      2149

0.3 50 300 300 0.6420791580873941


Finished epoch 500 of 500; error is 0.14919451810419563

              precision    recall  f1-score   support

           0      0.911     0.952     0.931      1910
           1      0.399     0.255     0.311       239

    accuracy                          0.874      2149
   macro avg      0.655     0.604     0.621      2149
weighted avg      0.854     0.874     0.862      2149

0.3 50 500 300 0.6210500329163924


Finished epoch 100 of 100; error is 0.1835204018279913

              precision    recall  f1-score   support

           0      0.912     0.936     0.924      1910
           1      0.349     0.276     0.308       239

    accuracy                          0.862      2149
   macro avg      0.630     0.606     0.616      2149
weighted avg      0.849     0.862     0.855      2149

0.3 100 100 300 0.615962713419788


Finished epoch 200 of 200; error is 0.09349926753202453

              precision    recall  f1-score   support

           0      0.917     0.941     0.929      1910
           1      0.407     0.322     0.360       239

    accuracy                          0.872      2149
   macro avg      0.662     0.632     0.645      2149
weighted avg      0.861     0.872     0.866      2149

0.3 100 200 300 0.6445060252602092


Finished epoch 300 of 300; error is 0.096466523129493944

              precision    recall  f1-score   support

           0      0.921     0.939     0.930      1910
           1      0.421     0.356     0.385       239

    accuracy                          0.874      2149
   macro avg      0.671     0.647     0.658      2149
weighted avg      0.865     0.874     0.869      2149

0.3 100 300 300 0.6576128333971217


Finished epoch 500 of 500; error is 0.103030268568545585

              precision    recall  f1-score   support

           0      0.924     0.943     0.933      1910
           1      0.455     0.381     0.415       239

    accuracy                          0.880      2149
   macro avg      0.690     0.662     0.674      2149
weighted avg      0.872     0.880     0.876      2149

0.3 100 500 300 0.6739905117817651


Finished epoch 100 of 100; error is 0.09856183733791113

              precision    recall  f1-score   support

           0      0.917     0.928     0.923      1910
           1      0.366     0.331     0.347       239

    accuracy                          0.862      2149
   macro avg      0.641     0.629     0.635      2149
weighted avg      0.856     0.862     0.859      2149

0.3 200 100 300 0.6349846874437037


Finished epoch 200 of 200; error is 0.099425366148352626

              precision    recall  f1-score   support

           0      0.920     0.930     0.925      1910
           1      0.390     0.356     0.372       239

    accuracy                          0.866      2149
   macro avg      0.655     0.643     0.649      2149
weighted avg      0.861     0.866     0.864      2149

0.3 200 200 300 0.6486355611486568


Finished epoch 300 of 300; error is 0.070024670101702216

              precision    recall  f1-score   support

           0      0.917     0.942     0.930      1910
           1      0.409     0.318     0.358       239

    accuracy                          0.873      2149
   macro avg      0.663     0.630     0.644      2149
weighted avg      0.860     0.873     0.866      2149

0.3 200 300 300 0.6435795325101381


Finished epoch 500 of 500; error is 0.088148603914305575

              precision    recall  f1-score   support

           0      0.915     0.941     0.927      1910
           1      0.386     0.297     0.336       239

    accuracy                          0.869      2149
   macro avg      0.650     0.619     0.632      2149
weighted avg      0.856     0.869     0.862      2149

0.3 200 500 300 0.6315906352474643


Finished epoch 100 of 100; error is 0.7480380982160568

              precision    recall  f1-score   support

           0      0.911     0.953     0.932      1910
           1      0.407     0.255     0.314       239

    accuracy                          0.876      2149
   macro avg      0.659     0.604     0.623      2149
weighted avg      0.855     0.876     0.863      2149

0.5 50 100 300 0.6226603823093633


Finished epoch 200 of 200; error is 0.6300494670867922

              precision    recall  f1-score   support

           0      0.913     0.966     0.939      1910
           1      0.496     0.264     0.344       239

    accuracy                          0.888      2149
   macro avg      0.705     0.615     0.642      2149
weighted avg      0.867     0.888     0.873      2149

0.5 50 200 300 0.6416123276020214


Finished epoch 300 of 300; error is 0.5919140167534351

              precision    recall  f1-score   support

           0      0.907     0.962     0.934      1910
           1      0.410     0.209     0.277       239

    accuracy                          0.879      2149
   macro avg      0.658     0.586     0.605      2149
weighted avg      0.851     0.879     0.861      2149

0.5 50 300 300 0.6053570888305212


Finished epoch 500 of 500; error is 0.49044629186391834

              precision    recall  f1-score   support

           0      0.911     0.963     0.937      1910
           1      0.462     0.251     0.325       239

    accuracy                          0.884      2149
   macro avg      0.686     0.607     0.631      2149
weighted avg      0.861     0.884     0.869      2149

0.5 50 500 300 0.6309141737383268


Finished epoch 100 of 100; error is 0.4992050230503082

              precision    recall  f1-score   support

           0      0.918     0.956     0.937      1910
           1      0.475     0.318     0.381       239

    accuracy                          0.885      2149
   macro avg      0.697     0.637     0.659      2149
weighted avg      0.869     0.885     0.875      2149

0.5 100 100 300 0.6588014020689065


Finished epoch 200 of 200; error is 0.36335050687193873

              precision    recall  f1-score   support

           0      0.916     0.960     0.937      1910
           1      0.479     0.293     0.364       239

    accuracy                          0.886      2149
   macro avg      0.698     0.627     0.651      2149
weighted avg      0.867     0.886     0.874      2149

0.5 100 200 300 0.6505122784192551


Finished epoch 300 of 300; error is 0.31297542527318014

              precision    recall  f1-score   support

           0      0.909     0.969     0.938      1910
           1      0.478     0.226     0.307       239

    accuracy                          0.886      2149
   macro avg      0.694     0.598     0.622      2149
weighted avg      0.861     0.886     0.868      2149

0.5 100 300 300 0.6224917062157306


Finished epoch 500 of 500; error is 0.28552147373557093

              precision    recall  f1-score   support

           0      0.916     0.971     0.943      1910
           1      0.552     0.289     0.379       239

    accuracy                          0.895      2149
   macro avg      0.734     0.630     0.661      2149
weighted avg      0.876     0.895     0.880      2149

0.5 100 500 300 0.6608364944663878


Finished epoch 100 of 100; error is 0.2594151049852371

              precision    recall  f1-score   support

           0      0.916     0.955     0.935      1910
           1      0.459     0.301     0.364       239

    accuracy                          0.883      2149
   macro avg      0.687     0.628     0.650      2149
weighted avg      0.865     0.883     0.872      2149

0.5 200 100 300 0.6495270490657472


Finished epoch 200 of 200; error is 0.20564289484173064

              precision    recall  f1-score   support

           0      0.915     0.950     0.932      1910
           1      0.428     0.297     0.351       239

    accuracy                          0.878      2149
   macro avg      0.671     0.624     0.642      2149
weighted avg      0.861     0.878     0.868      2149

0.5 200 200 300 0.6415300650423521


Finished epoch 300 of 300; error is 0.21300643496215343

              precision    recall  f1-score   support

           0      0.919     0.958     0.938      1910
           1      0.494     0.326     0.393       239

    accuracy                          0.888      2149
   macro avg      0.706     0.642     0.666      2149
weighted avg      0.872     0.888     0.878      2149

0.5 200 300 300 0.6655840361284358


Finished epoch 500 of 500; error is 0.20264989044517284

              precision    recall  f1-score   support

           0      0.909     0.964     0.936      1910
           1      0.443     0.226     0.299       239

    accuracy                          0.882      2149
   macro avg      0.676     0.595     0.617      2149
weighted avg      0.857     0.882     0.865      2149

0.5 200 500 300 0.6174534232724975


Finished epoch 100 of 100; error is 1.8760058283805847

              precision    recall  f1-score   support

           0      0.915     0.954     0.934      1910
           1      0.439     0.289     0.348       239

    accuracy                          0.880      2149
   macro avg      0.677     0.621     0.641      2149
weighted avg      0.862     0.880     0.869      2149

0.7 50 100 300 0.641182454995884


Finished epoch 200 of 200; error is 1.6539913415908813

              precision    recall  f1-score   support

           0      0.913     0.966     0.939      1910
           1      0.496     0.268     0.348       239

    accuracy                          0.888      2149
   macro avg      0.705     0.617     0.643      2149
weighted avg      0.867     0.888     0.873      2149

0.7 50 200 300 0.6433786923332228


Finished epoch 300 of 300; error is 1.5761985927820206

              precision    recall  f1-score   support

           0      0.912     0.964     0.937      1910
           1      0.469     0.255     0.331       239

    accuracy                          0.885      2149
   macro avg      0.691     0.610     0.634      2149
weighted avg      0.863     0.885     0.870      2149

0.7 50 300 300 0.6338787185275772


Finished epoch 500 of 500; error is 1.4908157140016556

              precision    recall  f1-score   support

           0      0.910     0.964     0.936      1910
           1      0.456     0.238     0.313       239

    accuracy                          0.884      2149
   macro avg      0.683     0.601     0.625      2149
weighted avg      0.860     0.884     0.867      2149

0.7 50 500 300 0.624819131046889


Finished epoch 100 of 100; error is 1.422526627779007

              precision    recall  f1-score   support

           0      0.915     0.963     0.938      1910
           1      0.489     0.285     0.360       239

    accuracy                          0.887      2149
   macro avg      0.702     0.624     0.649      2149
weighted avg      0.868     0.887     0.874      2149

0.7 100 100 300 0.6490268329554043


Finished epoch 200 of 200; error is 1.1991243958473206

              precision    recall  f1-score   support

           0      0.909     0.971     0.939      1910
           1      0.491     0.222     0.305       239

    accuracy                          0.888      2149
   macro avg      0.700     0.596     0.622      2149
weighted avg      0.862     0.888     0.869      2149

0.7 100 200 300 0.6222391442140282


Finished epoch 300 of 300; error is 1.1489757224917412

              precision    recall  f1-score   support

           0      0.910     0.970     0.939      1910
           1      0.487     0.230     0.312       239

    accuracy                          0.887      2149
   macro avg      0.698     0.600     0.626      2149
weighted avg      0.863     0.887     0.869      2149

0.7 100 300 300 0.6255860364926509


Finished epoch 500 of 500; error is 1.0441089794039726

              precision    recall  f1-score   support

           0      0.910     0.968     0.938      1910
           1      0.483     0.238     0.319       239

    accuracy                          0.887      2149
   macro avg      0.697     0.603     0.629      2149
weighted avg      0.863     0.887     0.869      2149

0.7 100 500 300 0.6288341269012045


Finished epoch 100 of 100; error is 1.1044548153877258

              precision    recall  f1-score   support

           0      0.912     0.967     0.939      1910
           1      0.488     0.251     0.331       239

    accuracy                          0.887      2149
   macro avg      0.700     0.609     0.635      2149
weighted avg      0.865     0.887     0.871      2149

0.7 200 100 300 0.635003986434892


Finished epoch 200 of 200; error is 0.8670862540602684

              precision    recall  f1-score   support

           0      0.911     0.971     0.940      1910
           1      0.513     0.243     0.330       239

    accuracy                          0.890      2149
   macro avg      0.712     0.607     0.635      2149
weighted avg      0.867     0.890     0.872      2149

0.7 200 200 300 0.6348690273234114


Finished epoch 300 of 300; error is 0.9246631935238838

              precision    recall  f1-score   support

           0      0.910     0.977     0.942      1910
           1      0.557     0.226     0.321       239

    accuracy                          0.894      2149
   macro avg      0.733     0.602     0.632      2149
weighted avg      0.871     0.894     0.873      2149

0.7 200 300 300 0.6319409389197375


Finished epoch 500 of 500; error is 0.8570183813571933

              precision    recall  f1-score   support

           0      0.907     0.984     0.944      1910
           1      0.603     0.197     0.297       239

    accuracy                          0.896      2149
   macro avg      0.755     0.590     0.620      2149
weighted avg      0.873     0.896     0.872      2149

0.7 200 500 300 0.6202569460457679
----
(0.5, 200, 200, 100)
0.7068196235259503


## Bake-off [1 point]

The goal of the bake-off is to achieve the highest macro-average F1 score on __word_disjoint__, on a test set that we will make available at the start of the bake-off. The announcement will go out on the discussion forum. To enter, you'll be asked to run `nli.bake_off_evaluation` on the output of your chosen `nli.wordentail_experiment` run. 

The cells below this one constitute your bake-off entry.

The rules described in the [Your original system](#Your-original-system-[3-points]) homework question are also in effect for the bake-off.

Systems that enter will receive the additional homework point, and systems that achieve the top score will receive an additional 0.5 points. We will test the top-performing systems ourselves, and only systems for which we can reproduce the reported results will win the extra 0.5 points.

Late entries will be accepted, but they cannot earn the extra 0.5 points. Similarly, you cannot win the bake-off unless your homework is submitted on time.

The announcement will include the details on where to submit your entry.

In [None]:
# Enter your bake-off assessment code into this cell. 
# Please do not remove this comment.
##### YOUR CODE HERE




In [None]:
# On an otherwise blank line in this cell, please enter
# your macro-avg f1 value as reported by the code above. 
# Please enter only a number between 0 and 1 inclusive.
# Please do not remove this comment.

##### YOUR CODE HERE


