# HW 1 Classification

Welcome to CS 6741 HW1. To begin this assignment first turn on the Python 3 and GPU backend for this Colab by clicking `Runtime > Change Runtime Type` above.  

In this homework you will be building several varieties of text classifiers. Text classifiers are not that exciting from an NLP point of view, but they are a great way to get up to speed on the core technologies we will use in this class.



## Goal

We ask that you construct the following models in PyTorch:

1. A naive Bayes unigram classifer (follow Wang and Manning http://www.aclweb.org/anthology/P/P12/P12-2.pdf#page=118: you should only implement Naive Bayes, not the combined classifer with SVM).
2. A logistic regression model over word types (you can implement this as $y = \sigma(\sum_i W x_i + b)$) 
3. A continuous bag-of-word neural network with embeddings (similar to CBOW in Mikolov et al https://arxiv.org/pdf/1301.3781.pdf ).
4. A simple convolutional neural network (any variant of CNN as described in Kim http://aclweb.org/anthology/D/D14/D14-1181.pdf ).
5. Your own extensions to these models...

Consult the papers provided for hyperparameters. 


## Setup

This notebook provides a working definition of the setup of the problem itself. You may construct your models inline or use an external setup (preferred) to build your system.

In [0]:
import torch
# Text text processing library and methods for pretrained word embeddings
import torchtext
from torchtext.vocab import Vectors, GloVe

The dataset we will use of this problem is known as the Stanford Sentiment Treebank ( https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf ). It is a variant of a standard sentiment classification task. For simplicity, we will use the most basic form. Classifying a sentence as positive or negative in sentiment. 

To start, `torchtext` requires that we define a mapping from the raw text data to featurized indices. These fields make it easy to map back and forth between readable data and math, which helps for debugging.

In [0]:
# Our input $x$
TEXT = torchtext.data.Field()

# Our labels $y$
LABEL = torchtext.data.Field(sequential=False, unk_token=None)

Next we input our data. Here we will use the standard SST train split, and tell it the fields. Torchtext also gives us the option of using subtrees in the treebank as examples as well. The subtrees can be obtained by passing the option `train_subtrees=True` to splits. Feel free to experiment with using subtrees and report their effect on performance.

In [0]:
train, val, test = torchtext.datasets.SST.splits(
    TEXT, LABEL,
    filter_pred=lambda ex: ex.label != 'neutral')

Let's look at this data. It's still in its original form, we can see that each example consists of a label and the original words.

Be sure to double check that examples with neutral labels were filtered out. 

The length of the training data should be 6920.

In [94]:
print('len(train)', len(train))
print('vars(train[0])', vars(train[0]))

len(train) 6920
vars(train[0]) {'text': ['The', 'Rock', 'is', 'destined', 'to', 'be', 'the', '21st', 'Century', "'s", 'new', '``', 'Conan', "''", 'and', 'that', 'he', "'s", 'going', 'to', 'make', 'a', 'splash', 'even', 'greater', 'than', 'Arnold', 'Schwarzenegger', ',', 'Jean-Claud', 'Van', 'Damme', 'or', 'Steven', 'Segal', '.'], 'label': 'positive'}


In order to map this data to features, we need to assign an index to each word an label. The function build vocab allows us to do this and provides useful options that we will need in future assignments.

In [95]:
TEXT.build_vocab(train)
LABEL.build_vocab(train)
print('len(TEXT.vocab)', len(TEXT.vocab))
print('len(LABEL.vocab)', len(LABEL.vocab))

len(TEXT.vocab) 16284
len(LABEL.vocab) 2


Finally we are ready to create batches of our training data that can be used for training and validating the model. This function produces 3 iterators that will let us go through the train, val and test data. 

In [0]:
train_iter, val_iter, test_iter = torchtext.data.BucketIterator.splits(
    (train, val, test), batch_size=10, device=torch.device("cuda"))

Let's look at a single batch from one of these iterators. The library automatically converts the underlying words into indices. It then produces tensors for batches of x and y. In this case it will consist of the number of words of the longest sentence (with padding) followed by the number of batches. We can use the vocabulary dictionary to convert back from these indices to words.

In [97]:
batch = next(iter(train_iter))
print("Size of text batch:", batch.text.shape)
# example = batch.text.get("batch", 1)
example = batch.text[:,1]
print("Second in batch", example)
print("Converted back to string:", " ".join([TEXT.vocab.itos[i] for i in example.tolist()]))

Size of text batch: torch.Size([24, 10])
Second in batch tensor([ 4087,    25,   279,    25,   170,  6278,    56,  3935,  3958,     3,
           13,  2386,    25,    20,     7, 13161,     2,     1,     1,     1,
            1,     1,     1,     1], device='cuda:0')
Converted back to string: Whenever you think you 've figured out Late Marriage , it throws you for a loop . <pad> <pad> <pad> <pad> <pad> <pad> <pad>


Similarly it produces a vector for each of the labels in the batch. 

In [98]:
print("Size of label batch:", batch.label.shape)
# example = batch.label.get("batch", 1)
example = batch.label[1]
print("Second in batch", example.item())
print("Converted back to string:", LABEL.vocab.itos[example.item()])

Size of label batch: torch.Size([10])
Second in batch 0
Converted back to string: positive


Finally the Vocab object can be used to map pretrained word vectors to the indices in the vocabulary. This will be very useful for part 3 and 4 of the problem.  Feel free to experiment with different word vectors and report their effect on performance.

In [99]:
# Build the vocabulary with word embeddings
url = 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.simple.vec'
TEXT.vocab.load_vectors(vectors=Vectors('wiki.simple.vec', url=url))

print("Word embeddings size ", TEXT.vocab.vectors.size())
print("Word embedding of 'follows', first 10 dim ", TEXT.vocab.vectors[TEXT.vocab.stoi['follows']][:10])

Word embeddings size  torch.Size([16284, 300])
Word embedding of 'follows', first 10 dim  tensor([ 0.3925, -0.4770,  0.1754, -0.0845,  0.1396,  0.3722, -0.0878, -0.2398,
         0.0367,  0.2800])


## Assignment

Now it is your turn to build the models described at the top of the assignment. 

Using the data given by this iterator, you should construct 4 different torch models that take in batch.text and produce a distribution over labels. 


In [0]:
def test_code(model):
    "All models should be able to be run with following command."
    upload = []
    # Update: for kaggle the bucket iterator needs to have batch_size 10
    test_iter = torchtext.data.BucketIterator(test, train=False, batch_size=10)
    for batch in test_iter:
        # Your prediction data here (don't cheat!)
        probs = model(batch.text)
        # here we assume that the name for dimension classes is `classes`
        _, argmax = probs.max('classes')
        upload += argmax.tolist()

    with open("predictions.txt", "w") as f:
        f.write("Id,Category\n")
        for i, u in enumerate(upload):
            f.write(str(i) + "," + str(u) + "\n")


#1. Naive Bayes unigram classifer 
Follows [Wang and Manning](https://www.aclweb.org/anthology/P12-2018.pdf).

In [0]:
"""
NaiveBayes
Author: Emily Tseng (et397)

--

This implements Multinomial Naive Bayes per Wang & Manning 2012 (https://www.aclweb.org/anthology/P12-2018.pdf), section 2.1.
    y = wx + b
    w = r = log( (p/l1_norm(p)) / (q/l1_norm(q)) )
    p = alpha + sum of all positive training cases
    q = alpha + sum of all negative training cases
    alpha = smoothing parameter (e.g. 1.0)
    b = log(N+/N−), where:
        N+ = the number of positive cases in the training data
        N- = the number of negative cases in the training data
    Note this formulation featurizes using a binarized indicator:
        x(k) = f^(k) = 1 if f(k) > 0, 0 otherwise
        f(k) = feature count vector; each index is the number of occurrences of that feature in the given training case. e.g.
            "The cat in the hat."
            vocab = [the, cat, in, hat, banana]
            f = [ 2, 1, 1, 1, 0 ]
            f^ = [ 1, 1, 1, 1, 0 ]
"""

from tqdm import tqdm
import torch
import numpy as np


class NaiveBayes:
    def __init__(self, alpha, TEXT, LABEL):
        """
            Initializes with a given smoothing parameter
        """
        self.alpha = alpha
        self.vocab = TEXT.vocab
        self.labels = LABEL.vocab
        # Store which label is which
        self.label_map = {
            self.labels.itos[0]: 0,
            self.labels.itos[1]: 1
        }
        # Initialize with zero seen pos or neg samples
        self.p = self.alpha + np.zeros((len(self.vocab),))
        self.q = self.alpha + np.zeros((len(self.vocab),))
        # Initialize counts at 1 here also to prevent div by 0 error
        self.nplus = 1
        self.nminus = 1
        self.update()
        print("Initialized NaiveBayes model with vocab size {}, label size {}".format(len(self.vocab), len(self.labels)))
        print('\tself.p: {}\n\tself.q: {}\n\tself.r: {}\n\tself.b: {}'.format(self.p, self.q, self.r, self.b))
    
    def update(self):
        self.r = np.log((self.p/np.linalg.norm(self.p, ord=1)) / (self.q/np.linalg.norm(self.q, ord=1)))
        self.b = np.log(self.nplus / self.nminus)

    def featurize(self, x):
        """
            Input: <vec> x
            Output: <vec> fx, featurized using the vocabulary
        """
        output = np.zeros((len(self.vocab),))
        for word_idx in x:
            output[word_idx] = 1
        return output

    def train(self, train_iter, val_iter):
        """
            "Trains" the model based on the provided train and val sets.
        """
        for batch_idx, train_batch in enumerate(train_iter):
            # Update the count vectors...
            for i in range(len(train_batch)):
                x = train_batch.text[:, i]
                fx = self.featurize(x)
                y = train_batch.label[i]
                if self.labels.itos[y.item()] == 'positive':
                  self.p += fx
                  self.nplus += 1
                else:
                  self.q += fx
                  self.nminus += 1
            # And recalculate r & b
            self.update()
            # Let's hope to see improvement at every 10 batches
            if batch_idx % 50 == 0:
              batch_acc = self.evaluate(val_iter)
              print('val acc after training batch {}: {}'.format(batch_idx, batch_acc))
              # print('\tself.p: {}\n\tself.q: {}\n\tself.r: {}\n\tself.b: {}'.format(self.p, self.q, self.r, self.b))


    def evaluate(self, val_iter):
        """
            Evaluates against a batch of given data.
        """
        correct = 0
        total = 0

        for batch_idx, val_batch in enumerate(val_iter):
            for i in range(len(val_batch)):
                x = val_batch.text[:, i]
                y = val_batch.label[i]
                fx = self.featurize(x)
                yhat = self.predict(fx)
                if y == yhat:
                    correct += 1
                total += 1
        
        return float(correct / total)

    def predict(self, x):
        val = np.matmul(self.r.T, x) + self.b
        if val >= 0:
            return self.label_map['positive']
        else:
            return self.label_map['negative']

In [12]:
alpha = 1
model = NaiveBayes(alpha, TEXT, LABEL)
model.train(train_iter, val_iter)
# Evaluate on training set first
train_acc = model.evaluate(train_iter)
print('NaiveBayes train_acc: ', train_acc)

Initialized NaiveBayes model with vocab size 16284, label size 2
	self.p: [1. 1. 1. ... 1. 1. 1.]
	self.q: [1. 1. 1. ... 1. 1. 1.]
	self.r: [0. 0. 0. ... 0. 0. 0.]
	self.b: 0.0
val acc after training batch 0: 0.5378440366972477
val acc after training batch 50: 0.676605504587156
val acc after training batch 100: 0.7087155963302753
val acc after training batch 150: 0.7545871559633027
val acc after training batch 200: 0.7637614678899083
val acc after training batch 250: 0.783256880733945
val acc after training batch 300: 0.7809633027522935
val acc after training batch 350: 0.7935779816513762
val acc after training batch 400: 0.7947247706422018
val acc after training batch 450: 0.805045871559633
val acc after training batch 500: 0.7970183486238532
val acc after training batch 550: 0.7993119266055045
val acc after training batch 600: 0.7993119266055045
val acc after training batch 650: 0.8004587155963303
NaiveBayes train_acc:  0.9486994219653179


In [13]:
# Then run the test set
test_acc = model.evaluate(test_iter)
print('NaiveBayes test_acc: ', test_acc)

NaiveBayes test_acc:  0.8215266337177375


# 2) Logistic Regression

A logistic regression model over word types (you can implement this as $y = \sigma(\sum_i W x_i + b)$) 


In [0]:
"""
LogisticRegression
Author: Emily Tseng et397

--

This implements logistic regression.

"""

import torch
import torch.nn as nn
from torch.autograd import Variable

from tqdm import tqdm

class LogisticRegression(nn.Module):
  def __init__(self, input_size, output_size, batch_size):
    super(LogisticRegression, self).__init__()
    self.batch_size = batch_size
    self.vocab_size = input_size
    self.linear = torch.nn.Linear(input_size, output_size, bias=True)
  
  def forward(self, x):
    x.to(device)
    output = self.linear(x)
    output = torch.sigmoid(output)
    return output
  
  def featurize(self, x):
    """
      Input: <vec> x
      Output: <vec> fx, featurized using the vocabulary
    """
    output = np.zeros((self.vocab_size,))
    for word_idx in x:
        output[word_idx] = 1
    return output
  
  def featurize_batch(self, batch, train=True):
    """
      Takes in a batch from the training data and featurizes it as binarized counts.
    """
    output = np.zeros((self.batch_size, self.vocab_size)) #e.g. 10 x 6000
    for i in range(len(batch)):
      x = batch.text[:,i]
      fx = self.featurize(x)
      output[i] = fx
    if train:
      return Variable(torch.FloatTensor(output)).cuda(), batch.label
    else:
      return Variable(torch.FloatTensor(output)).cuda()
  
  def train(self, train_iter, val_iter, epochs, lr):
    """ Trains the model over the given parameters. Returns vector of epoch losses for plotting.
    """
    # Use SGD and CrossEnt
    crossent = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(self.parameters(), lr=lr)
    losses = []
    for epoch in range(epochs):
      epoch_loss = 0.
      for batch_idx, batch in enumerate(train_iter):
        optimizer.zero_grad()
        batch_fx, batch_y = self.featurize_batch(batch, True)
        output = self.forward(batch_fx)
        loss = crossent(output, batch_y)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
      losses.append(epoch_loss / len(train_iter))
      val_acc = self.evaluate(val_iter)
      print('val acc after training epoch {}: {}'.format(epoch, val_acc))
      print('epoch loss: {}'.format(epoch_loss))
    return losses
  
  def evaluate(self, val_iter):
    """ Evaluates against a batch of given data.
    """
    correct = 0
    total = 0

    for batch_idx, batch in enumerate(val_iter):
      for i in range(len(batch)):
        x = batch.text[:, i]
        y = batch.label[i]
        fx = Variable(torch.FloatTensor(self.featurize(x))).cuda()
        output = self.forward(fx)
        yhat = output.max(0)[1]
        if y == yhat:
          correct += 1
        total += 1
      return float(correct / total)


In [17]:
lr = 1e-1
epochs = 10
batch_size = 10

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

logreg = LogisticRegression(len(TEXT.vocab), len(LABEL.vocab), batch_size)
logreg.to(device)

logreg.train(train_iter, val_iter, epochs, lr)
test_acc = logreg.evaluate(test_iter)
print('final test_acc for LogReg: {}'.format(test_acc))



val acc after training epoch 0: 0.6
epoch loss: 471.56829857826233
val acc after training epoch 1: 0.7
epoch loss: 458.23772871494293
val acc after training epoch 2: 0.8
epoch loss: 449.29273730516434
val acc after training epoch 3: 0.8
epoch loss: 442.3376330137253
val acc after training epoch 4: 0.8
epoch loss: 436.597915828228
val acc after training epoch 5: 0.8
epoch loss: 431.50663301348686
val acc after training epoch 6: 0.8
epoch loss: 427.09725347161293
val acc after training epoch 7: 0.8
epoch loss: 423.08561247587204
val acc after training epoch 8: 0.8
epoch loss: 419.5032809972763
val acc after training epoch 9: 0.8
epoch loss: 416.00726372003555
final test_acc for LogReg: 0.8


# 3) CBOW

A continuous bag-of-word neural network with embeddings (similar to CBOW in Mikolov et al https://arxiv.org/pdf/1301.3781.pdf ).

In [0]:
"""
CBOW NN
Author: Emily Tseng et397

--



"""

import torch
import torch.nn as nn
from torch.autograd import Variable

from tqdm import tqdm

class CBOW(nn.Module):
  def __init__(self, embedding_size, output_size, batch_size):
    super(CBOW, self).__init__()
    # Shared projection layer
    self.embeddings = nn.Embedding(embedding_size[0], embedding_size[1])
    # And includes 1 linear layer
    self.linear = torch.nn.Linear(embedding_size[1], output_size, bias=True)
  
  def forward(self, x):
    x.to(device)
    embedded_x = self.embeddings(x)
    # CBOW uses a vector of averaged word embeddings  
    avg_embed = embedded_x.mean(0)
    output = self.linear(avg_embed)
    logits = nn.functional.softmax(output, dim=1)
    return logits
  
  def predict(self, x):
    logits = self.forward(x)
    return logits.max(1)[1]
  
  def train_model(self, train_iter, val_iter, epochs, lr):
    """ Trains the model over the given parameters. Returns vector of epoch losses for plotting.
    """
    self.train()
    # Use SGD and CrossEnt
    crossent = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(self.parameters(), lr=lr)
    losses = []
    for epoch in range(epochs):
      epoch_loss = 0.
      for batch_idx, batch in enumerate(train_iter):
        optimizer.zero_grad()
        yhat = self.forward(batch.text)

        loss = crossent(yhat, batch.label)
        loss.backward()

        optimizer.step()
        epoch_loss += loss.item()
      losses.append(epoch_loss / len(train_iter))
      print('epoch loss: {}'.format(epoch_loss))
      val_acc = self.evaluate(val_iter)
      print('val acc after training epoch {}: {}'.format(epoch, val_acc))
    return losses
  
  def evaluate(self, val_iter):
    """ Evaluates against a batch of given data.
    """
    with torch.no_grad():
      self.eval()
      correct = 0
      total = 0

      for batch_idx, batch in enumerate(val_iter):
        yhats = self.predict(batch.text)
        for i, yhat in enumerate(yhats):
          if batch.label[i] == yhat:
            correct += 1
          total += 1
      return float(correct / total)


In [25]:
lr = 1e-1
epochs = 20
batch_size = 10

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# print("Word embeddings size ", TEXT.vocab.vectors.size())

cbow = CBOW(TEXT.vocab.vectors.size(), len(LABEL.vocab), batch_size)
cbow.to(device)

cbow.train_model(train_iter, val_iter, epochs, lr)
test_acc = cbow.evaluate(test_iter)
print('\nfinal test_acc for CBOW: {}'.format(test_acc))



epoch loss: 488.2887523472309
val acc after training epoch 0: 0.6227064220183486
epoch loss: 477.29427444934845
val acc after training epoch 1: 0.6238532110091743
epoch loss: 466.7005034983158
val acc after training epoch 2: 0.6399082568807339
epoch loss: 463.7628065645695
val acc after training epoch 3: 0.6307339449541285
epoch loss: 463.3408650457859
val acc after training epoch 4: 0.6318807339449541
epoch loss: 458.61534252762794
val acc after training epoch 5: 0.643348623853211
epoch loss: 452.72417974472046
val acc after training epoch 6: 0.6376146788990825
epoch loss: 454.3987463116646
val acc after training epoch 7: 0.6376146788990825
epoch loss: 449.31176966428757
val acc after training epoch 8: 0.6467889908256881
epoch loss: 445.5145474374294
val acc after training epoch 9: 0.6444954128440367
epoch loss: 445.24565187096596
val acc after training epoch 10: 0.6444954128440367
epoch loss: 443.7803664803505
val acc after training epoch 11: 0.6444954128440367
epoch loss: 441.258554

# 4) CNN

A simple convolutional neural network (any variant of CNN as described in Kim http://aclweb.org/anthology/D/D14/D14-1181.pdf ).

In [80]:
def sanitize_batch(batch, min_length):
  """
    If a batch contains sentences less than a defined minimum length, pad it.
  """
  if batch.text.size(0) < min_length:
    print('\tpadding batch')
    nparr = batch.text.data.cpu().numpy()
    nparr = nparr.T
    output = np.zeros((nparr.shape[0], min_length))
    for i, sentence in enumerate(nparr):
      output[i] = np.concatenate((sentence, [pad_idx] * (min_length - nparr.shape[1])))
    return torch.LongTensor(output.T).to(device)
  else:
    return batch.text

sanitized_batch = sanitize_batch(ex, 50)

	padding batch


In [0]:
"""
CNN
Author: Emily Tseng et397

--

"""

import torch
import torch.nn as nn
from torch.autograd import Variable

from tqdm import tqdm

class CNN(nn.Module):
  def __init__(self, embedding_size, output_size, pad_idx):
    super(CNN, self).__init__()
    # Store stuff
    vocab_size = embedding_size[0]
    embed_dim = embedding_size[1]
    self.pad_idx = pad_idx

    # Shared embedding layer, |V| x embed_dim 
    self.embeddings = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)

    # Convolutional layers. Per Kim:
    #   window sizes h=3,4,5 
    #   "100 feature maps" = output size is 100
    self.convs = nn.ModuleList([
                                nn.Conv2d(1, 100, kernel_size=(window_size, embed_dim))
                                for window_size in [3,4,5]
    ])

    # Dropout layer with p=0.5 per Kim
    self.dropout = nn.Dropout(p=0.5)

    # And includes 1 linear layer. 3 filters of size=100 each -> 300
    self.linear = torch.nn.Linear(300, output_size, bias=True)
  
  def forward(self, x, train):
    # x: (max sentence length, batch size)
    x = x.T
    # x: (batch size, max sentence length)
    x.to(device)
    # print('x size: ', x.size())

    embedded_x = self.embeddings(x).unsqueeze(1)
    # embedded_x: (batch size, 1, max sentence length, embed_dim)
    # print('embedded_x size: ', embedded_x.size())

    # Push the sentence through each of the 3 convs + activate via ReLU per Kim
    conved_x = [nn.functional.relu(conv(embedded_x).squeeze(3)) for conv in self.convs]
    # conv_3: (batch size, 100, max sentence length - 3 + 1) etc
    # print('conved_x size: ', conved_x[0].size())

    # Apply max-pooling over each of the layers
    pooled_x = [nn.functional.max_pool1d(conv, conv.size(2)).squeeze(2) for conv in conved_x]
    # pooled_3: (batch size, 100) etc
    # print('pooled_x size: ', pooled_x[0].size())

    # Then concat.
    # output: (batch size, 100 * 3)
    output = torch.cat(pooled_x, dim=1)

    # If training, apply dropout at the penultimate layer for regularization per Kim
    if train:
      output = self.dropout(output)

    # Then apply ultimate linear + softmax
    output = self.linear(output)
    return nn.functional.log_softmax(output, dim=1)
  
  def predict(self, x):
    logits = self.forward(x, train=False)
    print('logits: ', logits)
    return logits.max(1)[1]
  
  def train_model(self, train_iter, val_iter, epochs, lr):
    """ Trains the model over the given parameters. Returns vector of epoch losses for plotting.
    """
    self.train()
    # Use Adadelta (per Kim) and CrossEnt
    lossfunc = torch.nn.CrossEntropyLoss()
    lossfunc.to(device)
    optimizer = torch.optim.Adadelta(self.parameters(), lr=lr)
    losses = []
    for epoch in range(epochs):
      epoch_loss = 0.
      for batch_idx, batch in enumerate(train_iter):
        optimizer.zero_grad()

        # Sanitize so it's at least 5 long
        sanitized_batch = sanitize_batch(batch, 5)
        yhat = self.forward(sanitized_batch, train=True)

        loss = lossfunc(yhat, batch.label)
        loss.backward()

        optimizer.step()
        epoch_loss += loss.item()
      epoch_loss /= len(train_iter)
      losses.append(epoch_loss)
      print('epoch loss: {}'.format(epoch_loss))
      val_acc = self.evaluate(val_iter)
      print('val acc after training epoch {}: {}'.format(epoch, val_acc))
    return losses
  
  def evaluate(self, val_iter):
    """ Evaluates against a batch of given data.
    """
    with torch.no_grad():
      self.eval()
      correct = 0
      total = 0

      for batch_idx, batch in enumerate(val_iter):
        sanitized_batch = sanitize_batch(batch, 5)
        yhats = self.predict(sanitized_batch)
        print('yhats: ', yhats)
        print('batch.label: ', batch.label)
        for i, yhat in enumerate(yhats):
          if batch.label[i] == yhat:
            correct += 1
          total += 1
      return float(correct / total)


In [105]:
lr = 1e-1
epochs = 10

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

pad_idx=TEXT.vocab.stoi[TEXT.pad_token]

cnn = CNN(TEXT.vocab.vectors.size(), len(LABEL.vocab), pad_idx)
cnn.to(device)

cnn.train_model(train_iter, val_iter, epochs, lr)
test_acc = cnn.evaluate(test_iter)
print('\nfinal test_acc for CNN: {}'.format(test_acc))



epoch loss: 0.7201926793063307
logits:  tensor([[-0.4057, -1.0982],
        [-0.6080, -0.7862],
        [-0.6707, -0.7161],
        [-0.8268, -0.5753],
        [-0.7849, -0.6091],
        [-0.7189, -0.6680],
        [-0.5457, -0.8662],
        [-1.1665, -0.3732],
        [-0.6539, -0.7340],
        [-0.7695, -0.6222]], device='cuda:0')
yhats:  tensor([0, 0, 0, 1, 1, 1, 0, 1, 0, 1], device='cuda:0')
batch.label:  tensor([0, 0, 1, 1, 1, 0, 1, 1, 1, 0], device='cuda:0')
logits:  tensor([[-0.4600, -0.9977],
        [-0.3940, -1.1218],
        [-0.7760, -0.6166],
        [-0.7814, -0.6120],
        [-0.4850, -0.9564],
        [-0.9186, -0.5093],
        [-0.5029, -0.9283],
        [-0.6246, -0.7668],
        [-0.3161, -1.3056],
        [-0.5271, -0.8923]], device='cuda:0')
yhats:  tensor([0, 0, 1, 1, 0, 1, 0, 0, 0, 0], device='cuda:0')
batch.label:  tensor([0, 1, 1, 1, 0, 1, 0, 1, 0, 0], device='cuda:0')
logits:  tensor([[-0.9153, -0.5115],
        [-0.8272, -0.5750],
        [-0.6056, -0.7

In addition, you should put up a (short) write-up following the [template](https://github.com/harvardnlp/cs6741/tree/master/nlp-template) provided in the repository. 