# HW 1 Classification

In this homework you will be building several varieties of text classifiers.

## Goal

We ask that you construct the following models in PyTorch:

1. A naive Bayes unigram classifer (follow Wang and Manning http://www.aclweb.org/anthology/P/P12/P12-2.pdf#page=118: you should only implement Naive Bayes, not the combined classifer with SVM).
2. A logistic regression model over word types (you can implement this as $y = \sigma(\sum_i W x_i + b)$) 
3. A continuous bag-of-word neural network with embeddings (similar to CBOW in Mikolov et al https://arxiv.org/pdf/1301.3781.pdf).
4. A simple convolutional neural network (any variant of CNN as described in Kim http://aclweb.org/anthology/D/D14/D14-1181.pdf).
5. Your own extensions to these models...

Consult the papers provided for hyperparameters. 


## Setup

This notebook provides a working definition of the setup of the problem itself. You may construct your models inline or use an external setup (preferred) to build your system.

In [None]:
%pip install torchtext

In [None]:
# Text text processing library and methods for pretrained word embeddings
import torch 
import torchtext
from torchtext.vocab import Vectors, GloVe
import torch.nn as nn 

The dataset we will use of this problem is known as the Stanford Sentiment Treebank (https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf). It is a variant of a standard sentiment classification task. For simplicity, we will use the most basic form. Classifying a sentence as positive or negative in sentiment. 

To start, `torchtext` requires that we define a mapping from the raw text data to featurized indices. These fields make it easy to map back and forth between readable data and math, which helps for debugging.

In [None]:
# Our input $x$
TEXT = torchtext.data.Field(fix_length=56)
#TEXT = torchtext.data.Field() 
# Our labels $y$
LABEL = torchtext.data.Field(sequential=False)

Next we input our data. Here we will use the standard SST train split, and tell it the fields.

In [None]:
train, val, test = torchtext.datasets.SST.splits(
    TEXT, LABEL,
    filter_pred=lambda ex: ex.label != 'neutral')

Let's look at this data. It's still in its original form, we can see that each example consists of a label and the original words.

In [None]:
print('len(train)', len(train))
print('vars(train[0])', vars(train[0]))

In order to map this data to features, we need to assign an index to each word an label. The function build vocab allows us to do this and provides useful options that we will need in future assignments.

In [None]:
TEXT.build_vocab(train)
LABEL.build_vocab(train)
#this is just the set of stuff 
print('len(TEXT.vocab)', len(TEXT.vocab))
print('len(LABEL.vocab)', len(LABEL.vocab))

Finally we are ready to create batches of our training data that can be used for training and validating the model. This function produces 3 iterators that will let us go through the train, val and test data. 

In [None]:
print(len(val))
print(len(test))
print(len(train))
train_iter, val_iter, test_iter = torchtext.data.BucketIterator.splits(
    (train, val, test), batch_size=10, device=-1, repeat=False)

Let's look at a single batch from one of these iterators. The library automatically converts the underlying words into indices. It then produces tensors for batches of x and y. In this case it will consist of the number of words of the longest sentence (with padding) followed by the number of batches. We can use the vocabulary dictionary to convert back from these indices to words.

In [None]:
batch = next(iter(train_iter))
print("Size of text batch [max sent length, batch size]", batch.text.size())
print(batch.text[:, 0].data)
print("Converted back to string: ", " ".join([TEXT.vocab.itos[i] for i in batch.text[:, 0].data]))

In [None]:
count = 0 
for batch in train_iter: 
    count += 1 
print(count)

Similarly it produces a vector for each of the labels in the batch. 

In [None]:
print("Size of label batch [batch size]", batch.label.size())
print("Second in batch", batch.label[0])
print("Converted back to string: ", LABEL.vocab.itos[batch.label.data[0]])

Finally the Vocab object can be used to map pretrained word vectors to the indices in the vocabulary. This will be very useful for part 3 and 4 of the problem.  

In [None]:
# Build the vocabulary with word embeddings
url = 'https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.simple.vec'
TEXT.vocab.load_vectors(vectors=Vectors('wiki.simple.vec', url=url))

print("Word embeddings size ", TEXT.vocab.vectors.size())
print("Word embedding of 'follows', first 10 dim ", TEXT.vocab.vectors[TEXT.vocab.stoi['follows']][:10])

## Assignment

Now it is your turn to build the models described at the top of the assignment. 

Using the data given by this iterator, you should construct 4 different torch models that take in batch.text and produce a distribution over labels. 

When a model is trained, use the following test function to produce predictions, and then upload to the kaggle competition:  https://www.kaggle.com/c/harvard-cs281-hw1

## Section 0: Setup

In [48]:
import torch 
import torchtext
from torchtext.vocab import Vectors, GloVe
import torch.nn as nn 

# Our input $x$
#TEXT = torchtext.data.Field(fix_length=56)
TEXT = torchtext.data.Field() 
# Our labels $y$
LABEL = torchtext.data.Field(sequential=False)

train, val, test = torchtext.datasets.SST.splits(
    TEXT, LABEL,
    filter_pred=lambda ex: ex.label != 'neutral')

TEXT.build_vocab(train)
LABEL.build_vocab(train)
#this is just the set of stuff 
print('len(TEXT.vocab)', len(TEXT.vocab))
print('len(LABEL.vocab)', len(LABEL.vocab))

train_iter, val_iter, test_iter = torchtext.data.BucketIterator.splits(
    (train, val, test), batch_size=10, device=-1, repeat=False)

# Build the vocabulary with word embeddings
url = 'https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.simple.vec'
TEXT.vocab.load_vectors(vectors=Vectors('wiki.simple.vec', url=url))

#Glove embeddings 
TEXT.vocab.load_vectors(vectors= GloVe(name="6B", dim="300"))

len(TEXT.vocab) 16284
len(LABEL.vocab) 3


## Section 1: Naive Bayes

In [49]:
def batch_vectorize_init(word_ind, labels, vocab_dim):
    num_pos = (labels.data == 2).sum()
    num_neg = (labels.data == 1).sum()
    pos_out = torch.zeros(vocab_dim)
    neg_out = torch.zeros(vocab_dim)
    for j in range(word_ind.size(1)):
        curr_vec = torch.zeros(vocab_dim)
        for i in range(word_ind.size(0)):
                curr_vec[int(word_ind[i, j])] = 1
        #neg class 
        if labels.data[j] == 1: 
            neg_out += curr_vec
        elif labels.data[j] == 2: 
            pos_out += curr_vec
        else: 
            print("class not found")
    
    return pos_out, neg_out, num_pos, num_neg

def batch_vectorize(word_ind, vocab_size):
    out = torch.zeros(word_ind.size(1), vocab_size)
    for j in range(word_ind.size(1)): 
        for i in range(word_ind.size(0)):
            out[j, int(word_ind[i, j])] = 1 
    return out

def test_naive(data, vocab_size):
    correct = 0.
    num_examples = 0.
    nll = 0.
    for batch in data:
        if len(batch.label) == 10: 
            text, label = batch.text, batch.label
            x = batch_vectorize(batch.text, vocab_size)
            y_pred = torch.mm(x, R.t()) + b
            y_pred_max, y_pred_argmax = torch.max(y_pred, 1) #prediction is the argmax
            correct += (y_pred_argmax == label.data - 1).sum() 
            num_examples += text.size(1) 
    return correct/num_examples
    
n_pos = 0 
n_neg = 0 
alpha = 0.5 
p = torch.zeros(len(TEXT.vocab))
q = torch.zeros(len(TEXT.vocab))
for batch in train_iter:
    text, label = batch.text, batch.label
    pos_vec, neg_vec, curr_pos, curr_neg = batch_vectorize_init(batch.text, batch.label, len(TEXT.vocab))
    n_pos += curr_pos
    n_neg += curr_neg
    p += pos_vec 
    q += neg_vec 
n_pos_vec = torch.log(torch.Tensor([n_pos/(n_pos+n_neg)])).repeat(10)
n_neg_vec = torch.log(torch.Tensor([n_neg/(n_pos+n_neg)])).repeat(10)
b = torch.cat((n_neg_vec, n_pos_vec)).view(2, -1)
p += alpha 
q += alpha 
R = torch.log(torch.cat((q/torch.abs(q).sum(), p/torch.abs(p).sum())).view(2, -1))

print("train: ", test_naive(train_iter, len(TEXT.vocab)))
print("valid: ", test_naive(val_iter, len(TEXT.vocab)))
print("test: ", test_naive(test_iter, len(TEXT.vocab)))

  return self.add(other)


train:  0.9612716763005781
valid:  0.7931034482758621
test:  0.8203296703296703


## Part 2: Linear Regression

In [50]:
def test_model(model, data):
    correct = 0.
    num_examples = 0.
    nll = 0.
    for batch in data:
        text = batch.text
        label = batch.label
        y_pred = model(text)
        nll_batch = criterion(y_pred, label-1)
        nll += nll_batch.data[0] * text.size(0) #by default NLL is averaged over each batch
        y_pred_max, y_pred_argmax = torch.max(y_pred, 1) #prediction is the argmax
        correct += (y_pred_argmax.data == label.data-1).sum() 
        num_examples += text.size(1) 
    return nll/num_examples, correct/num_examples

class LR_Unigram(nn.Module): 
    def __init__(self, vocab, output_dim=2):
        super(LR_Unigram, self).__init__()
        self.embed = nn.Embedding(len(vocab), len(vocab))
        self.embed.weight.data = torch.eye(len(vocab))
        self.embed.weight.requires_grad = False
        self.input_dim = len(vocab)
        self.linear = nn.Linear(len(vocab), output_dim, bias=True)
        self.sigmoid = nn.Sigmoid() 
        self.logsoftmax = nn.LogSoftmax()
    
    def forward(self, x): 
        x_embed = self.embed(x.t())
        x_flatten = torch.sum(x_embed, dim=1)
        out = self.linear(x_flatten)
        out = self.sigmoid(out)
        return self.logsoftmax(out)

lr_model = LR_Unigram(TEXT.vocab)
print(lr_model)
criterion = nn.NLLLoss()
parameters = filter(lambda p: p.requires_grad, lr_model.parameters())
optim = torch.optim.SGD(parameters, lr = 0.5)
num_epochs = 20
for e in range(num_epochs):
    for batch in train_iter:
        optim.zero_grad()
        #text = torch.autograd.Variable(batch_index_to_vec(batch.text))
        text = batch.text 
        label = batch.label
        y_pred = lr_model(text)
        nll_batch = criterion(y_pred, label - 1)    
        nll_batch.backward()
        optim.step()
    nll_train, accuracy_train = test_model(lr_model, train_iter)
    nll_val, accuracy_val = test_model(lr_model, val_iter)
    print('Training performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_train, accuracy_train))
    print('Validation performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_val, accuracy_val))

LR_Unigram(
  (embed): Embedding(16284, 16284)
  (linear): Linear(in_features=16284, out_features=2)
  (sigmoid): Sigmoid()
  (logsoftmax): LogSoftmax()
)


  """


Training performance after epoch 1: NLL: 1.2077, Accuracy: 0.6874
Validation performance after epoch 1: NLL: 1.2612, Accuracy: 0.6972
Training performance after epoch 2: NLL: 1.1492, Accuracy: 0.7475
Validation performance after epoch 2: NLL: 1.2271, Accuracy: 0.7282
Training performance after epoch 3: NLL: 1.1235, Accuracy: 0.7754
Validation performance after epoch 3: NLL: 1.1963, Accuracy: 0.7626
Training performance after epoch 4: NLL: 1.0818, Accuracy: 0.7880
Validation performance after epoch 4: NLL: 1.1836, Accuracy: 0.7259
Training performance after epoch 5: NLL: 1.0539, Accuracy: 0.8143
Validation performance after epoch 5: NLL: 1.1667, Accuracy: 0.7534
Training performance after epoch 6: NLL: 1.0334, Accuracy: 0.8214
Validation performance after epoch 6: NLL: 1.1602, Accuracy: 0.7511
Training performance after epoch 7: NLL: 1.0144, Accuracy: 0.8448
Validation performance after epoch 7: NLL: 1.1618, Accuracy: 0.7546
Training performance after epoch 8: NLL: 1.0005, Accuracy: 0.8

In [52]:
nll_test, accuracy_test = test_model(lr_model, test_iter)
print('Test performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_test, accuracy_test))



Test performance after epoch 20: NLL: 1.0818, Accuracy: 0.7908


  """


## Part 3: CBOW 


In [59]:
TEXT.vocab.load_vectors(vectors=GloVe(name="twitter.27B", dim="200"))

.vector_cache/glove.twitter.27B.zip: 1.52GB [02:15, 11.2MB/s]                               
100%|██████████| 1193514/1193514 [01:36<00:00, 12338.31it/s]


In [62]:
def test_cbow(model, data):
    correct = 0.
    num_examples = 0.
    nll = 0.
    for batch in data:
        text = batch.text
        label = batch.label
        y_pred = model(text)
        nll_batch = criterion(y_pred, label-1)
        nll += nll_batch.data[0] * text.size(0) #by default NLL is averaged over each batch
        y_pred_max, y_pred_argmax = torch.max(y_pred, 1) #prediction is the argmax
        correct += (y_pred_argmax.data == label.data-1).sum() 
        num_examples += text.size(1) 
    return nll/num_examples, correct/num_examples

class CBOW(nn.Module):

    def __init__(self, vocab, embedding_dim, output_dim=2):
        super(CBOW, self).__init__()
        #linear classifier 
        self.embed = nn.Embedding(len(vocab), embedding_dim)
        self.embed.weight.data.copy_(vocab.vectors)
        self.embed.weight.requires_grad = False
        self.linear = nn.Linear(embedding_dim, output_dim, bias=True)
        #activation function 
        self.sigmoid = nn.Sigmoid() 
        self.logsoftmax = nn.LogSoftmax()

    def forward(self, x):
        x_embed = self.embed(x.t())
        x_flatten = torch.sum(x_embed, dim=1)
        out = self.linear(x_flatten)
        out = self.sigmoid(out)
        return self.logsoftmax(out)
    
cbow_model = CBOW(TEXT.vocab, embedding_dim=200)
print(cbow_model)
criterion = nn.NLLLoss()
parameters = filter(lambda p: p.requires_grad, cbow_model.parameters())
optim = torch.optim.SGD(parameters, lr = 0.5)
num_epochs = 20
for e in range(num_epochs):
    for batch in train_iter:
        optim.zero_grad()
        #text = torch.autograd.Variable(batch_index_to_vec(batch.text))
        text = batch.text 
        label = batch.label
        y_pred = cbow_model(text)
        nll_batch = criterion(y_pred, label - 1)    
        nll_batch.backward()
        optim.step()
    nll_train, accuracy_train = test_cbow(cbow_model, train_iter)
    nll_val, accuracy_val = test_cbow(cbow_model, val_iter)
    print('Training performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_train, accuracy_train))
    print('Validation performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_val, accuracy_val))

torch.save(cbow_model.state_dict(), 'cbow_model')

CBOW(
  (embed): Embedding(16284, 200)
  (linear): Linear(in_features=200, out_features=2)
  (sigmoid): Sigmoid()
  (logsoftmax): LogSoftmax()
)


  """


Training performance after epoch 1: NLL: 1.6389, Accuracy: 0.4783
Validation performance after epoch 1: NLL: 1.6947, Accuracy: 0.4908
Training performance after epoch 2: NLL: 1.6378, Accuracy: 0.4783
Validation performance after epoch 2: NLL: 1.6947, Accuracy: 0.4908
Training performance after epoch 3: NLL: 1.3542, Accuracy: 0.5160
Validation performance after epoch 3: NLL: 1.4012, Accuracy: 0.5195
Training performance after epoch 4: NLL: 1.3546, Accuracy: 0.5267
Validation performance after epoch 4: NLL: 1.4014, Accuracy: 0.5264
Training performance after epoch 5: NLL: 1.3540, Accuracy: 0.5269
Validation performance after epoch 5: NLL: 1.4014, Accuracy: 0.5264
Training performance after epoch 6: NLL: 1.6373, Accuracy: 0.4783
Validation performance after epoch 6: NLL: 1.6947, Accuracy: 0.4908
Training performance after epoch 7: NLL: 1.3535, Accuracy: 0.5299
Validation performance after epoch 7: NLL: 1.4014, Accuracy: 0.5321
Training performance after epoch 8: NLL: 1.6370, Accuracy: 0.4

In [63]:
nll_test, accuracy_test = test_cbow(cbow_model, test_iter)
print('Test performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_test, accuracy_test))

Test performance after epoch 40: NLL: 1.3336, Accuracy: 0.6019


  """


## Part 4 CNN 

In [76]:
import torch 
import torchtext
from torchtext.vocab import Vectors, GloVe
import torch.nn as nn 
import torch.nn.functional as F
from random import shuffle

# Our input $x$
seq_length = 56
TEXT = torchtext.data.Field(fix_length=seq_length)

# Our labels $y$
LABEL = torchtext.data.Field(sequential=False)

train, val, test = torchtext.datasets.SST.splits(
    TEXT, LABEL,
    filter_pred=lambda ex: ex.label != 'neutral')

train_2, val, test = torchtext.datasets.SST.splits(
    TEXT, LABEL,
    filter_pred=lambda ex: ex.label != 'neutral')

TEXT.build_vocab(train)
LABEL.build_vocab(train)
#this is just the set of stuff 
print('len(TEXT.vocab)', len(TEXT.vocab))
print('len(LABEL.vocab)', len(LABEL.vocab))

for i in range(len(train_2)): 
    shuffle(train_2[i].text) 
    
train_iter, val_iter, test_iter = torchtext.data.BucketIterator.splits(
    (train + train_2, val, test), batch_size=10, device=-1, repeat=False, shuffle=True)

#Glove embeddings 
TEXT.vocab.load_vectors(vectors= GloVe(name="6B", dim="300"))

len(TEXT.vocab) 16284
len(LABEL.vocab) 3


len(train) <class 'torchtext.datasets.sst.SST'>
vars(train[0]) ['new', 'that', 'Steven', 'he', 'splash', 'Damme', 'make', 'a', 'greater', 'the', 'destined', 'Century', 'or', 'Van', 'Schwarzenegger', 'and', 'going', 'Conan', "''", 'Rock', 'Segal', '``', 'than', 'Jean-Claud', "'s", 'even', 'to', '.', '21st', ',', 'The', 'be', 'Arnold', 'to', 'is', "'s"]
vars(train[0]) None
vars(train[0]) ['to', 'or', 'that', 'Segal', 'Van', 'even', 'be', "'s", ',', '``', 'Century', 'the', 'Steven', 'Conan', 'Schwarzenegger', 'destined', "'s", 'make', 'is', 'new', 'a', 'to', 'Rock', 'he', 'and', 'The', 'Damme', '.', 'Jean-Claud', 'splash', '21st', 'Arnold', 'greater', 'going', 'than', "''"]


In [None]:
def test_cnn(model, data):
    correct = 0.
    num_examples = 0.
    nll = 0.
    for batch in data:
        text = batch.text
        label = batch.label
        y_pred = model(text)
        nll_batch = criterion(y_pred, label - 1)
        nll += nll_batch.data[0] * text.size(0) #by default NLL is averaged over each batch
        y_pred_max, y_pred_argmax = torch.max(y_pred, 1) #prediction is the argmax
        correct += (y_pred_argmax.data == label.data - 1).sum() 
        num_examples += text.size(1)
    return nll/num_examples, correct/num_examples

class CNN(nn.Module):

    def __init__(self, vocab, seq_len, embedding_dim, output_dim=2):
        super(CNN, self).__init__()
        self.embed = nn.Embedding(len(vocab), embedding_dim)
        self.embed.weight.data.copy_(vocab.vectors)
        map_size = 100
        filter_w = embedding_dim
        filter_h = 4
        self.conv1 = nn.Conv2d(1, map_size, (filter_h, filter_w))
        #self.bn = nn.BatchNorm2d(map_size)
        self.dropout = nn.Dropout(0.3)
        self.relu = nn.ReLU() 
        pool_h = seq_len-filter_h + 1 
        pool_w = 1 
        self.pooling = nn.MaxPool2d((pool_h, pool_w))
        #self.embed.weight.requires_grad = False
        self.fc = nn.Linear(map_size, output_dim)
        self.sigmoid = nn.Sigmoid() 
        self.logsoftmax = nn.LogSoftmax()

    def forward(self, x):
        # here x is batch size x length of post X embedding dim 
        #print(x.shape)
        x_embed = self.embed(x.t())
        #print(x_embed.shape)
        fc = self.conv1(x_embed.unsqueeze(1))
        #print(fc.shape)
        #pool = self.pooling(self.bn(fc))
        pool = self.pooling(self.dropout(fc))
        relu = self.relu(pool)
        #print(pool.shape)
        out = self.fc(pool.view(x_embed.size(0), -1))
        out = self.sigmoid(out)
        return self.logsoftmax(out)

cnn_model = CNN(TEXT.vocab, seq_len=seq_length, embedding_dim=300)
criterion = nn.NLLLoss()
parameters = filter(lambda p: p.requires_grad, cnn_model.parameters())
optim = torch.optim.SGD(parameters, lr = 0.1)
num_epochs = 20
for e in range(num_epochs):
    for batch in train_iter:
        optim.zero_grad()
        text = batch.text
        label = batch.label
        y_pred = cnn_model(text)
        nll_batch = criterion(y_pred, label-1)    
        nll_batch.backward()
        optim.step()
    nll_train, accuracy_train = test_cnn(cnn_model, train_iter)
    nll_val, accuracy_val = test_cnn(cnn_model, val_iter)
    print('Training performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_train, accuracy_train))
    print('Validation performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_val, accuracy_val))

torch.save(cnn_model.state_dict(), 'cnn_model')

In [None]:
nll_test, accuracy_test = test_cnn(cnn_model, test_iter)
print('Test performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_test, accuracy_test))

## LSTM 


In [None]:
def test_lstm(model, data):
    correct = 0.
    num_examples = 0.
    nll = 0.
    for batch in data:
        if batch.text.size(1) == 10: 
            text = batch.text
            label = batch.label
            model.hidden = model.init_hidden()
            y_pred = model(text)
            nll_batch = criterion(y_pred, label - 1)
            nll += nll_batch.data[0] * text.size(0) #by default NLL is averaged over each batch
            y_pred_max, y_pred_argmax = torch.max(y_pred, 1) #prediction is the argmax
            correct += (y_pred_argmax.data == label.data - 1).sum() 
            num_examples += text.size(1)
    return nll/num_examples, correct/num_examples

class LSTM(nn.Module):

    def __init__(self, hidden_dim, vocab, batch_size, seq_len, embedding_dim):
        super(LSTM, self).__init__()
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        self.embed = nn.Embedding(len(vocab), embedding_dim)
        self.embed.weight.data.copy_(vocab.vectors)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=0.5)
        self.fc = nn.Linear(hidden_dim + embedding_dim, 2)
        self.hidden = self.init_hidden()

    def init_hidden(self):
        return (torch.autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_dim)),
               torch.autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_dim)))
        
    def forward(self, sentence):
        embeds = self.embed(sentence.t())
        x = torch.t(embeds)
        lstm_out, self.hidden = self.lstm(x, self.hidden)
        cat_layer = torch.cat((torch.sum(embeds, dim=1), torch.mean(lstm_out, dim=0)), dim=1)
        y  = self.fc(cat_layer) 
        out = F.log_softmax(y, dim=1)
        return out

lstm_model = LSTM(hidden_dim=100, vocab=TEXT.vocab, batch_size=10, seq_len=56, embedding_dim=300)
criterion = nn.NLLLoss()
parameters = filter(lambda p: p.requires_grad, lstm_model.parameters())
optim = torch.optim.SGD(parameters, lr = 0.05, weight_decay=0.0001)
num_epochs = 20

for e in range(num_epochs):
    for batch in train_iter:
        optim.zero_grad()
        lstm_model.hidden = lstm_model.init_hidden()
        #text = torch.cat((batch.text[torch.randperm(15), :], batch.text[15:, :])) 
        text = batch.text
        label = batch.label
        y_pred = lstm_model(text)
        nll_batch = criterion(y_pred, label-1) 
        nll_batch.backward()
        optim.step()
    nll_train, accuracy_train = test_lstm(lstm_model, train_iter)
    nll_val, accuracy_val = test_lstm(lstm_model, val_iter)
    nll_test, accuracy_test = test_lstm(lstm_model, test_iter)
    print('Test performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_test, accuracy_test))
    print('Training performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_train, accuracy_train))
    print('Validation performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_val, accuracy_val))

torch.save(lstm_model.state_dict(), 'lstm_model')

  """


Test performance after epoch 1: NLL: 381.0838, Accuracy: 0.5022
Training performance after epoch 1: NLL: 360.7707, Accuracy: 0.5189
Validation performance after epoch 1: NLL: 368.4312, Accuracy: 0.5149
Test performance after epoch 2: NLL: 1207.8478, Accuracy: 0.5011
Training performance after epoch 2: NLL: 1260.8997, Accuracy: 0.4783
Validation performance after epoch 2: NLL: 1218.2925, Accuracy: 0.4920
Test performance after epoch 3: NLL: 478.3004, Accuracy: 0.5011
Training performance after epoch 3: NLL: 502.2271, Accuracy: 0.4789
Validation performance after epoch 3: NLL: 486.3800, Accuracy: 0.4908
Test performance after epoch 4: NLL: 61.6009, Accuracy: 0.5154
Training performance after epoch 4: NLL: 59.1166, Accuracy: 0.5451
Validation performance after epoch 4: NLL: 55.4619, Accuracy: 0.5402
Test performance after epoch 5: NLL: 17.0739, Accuracy: 0.5489
Training performance after epoch 5: NLL: 17.7876, Accuracy: 0.5426
Validation performance after epoch 5: NLL: 16.5534, Accuracy: 

In [67]:
nll_test, accuracy_test = test_lstm(lstm_model, test_iter)
print('Test performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_test, accuracy_test))

Test performance after epoch 20: NLL: 4.3825, Accuracy: 0.8038


  """


Test performance after epoch 17: NLL: 5.7222, Accuracy: 0.7984


  """


In [86]:
def test_lstm(model, data):
    correct = 0.
    num_examples = 0.
    nll = 0.
    for batch in data:
        if batch.text.size(1) == 10: 
            text = batch.text
            label = batch.label
            model.hidden = model.init_hidden()
            y_pred = model(text)
            nll_batch = criterion(y_pred, label - 1)
            nll += nll_batch.data[0] * text.size(0) #by default NLL is averaged over each batch
            y_pred_max, y_pred_argmax = torch.max(y_pred, 1) #prediction is the argmax
            correct += (y_pred_argmax.data == label.data - 1).sum() 
            num_examples += text.size(1)
    return nll/num_examples, correct/num_examples

class BiLSTM(nn.Module):

    def __init__(self, hidden_dim, vocab, batch_size, seq_len, embedding_dim):
        super(BiLSTM, self).__init__()
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        self.embed = nn.Embedding(len(vocab), embedding_dim)
        self.embed.weight.data.copy_(vocab.vectors)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=0.5, bidirectional=True)
        self.fc = nn.Linear(hidden_dim*2 + embedding_dim, 2)
        self.hidden = self.init_hidden()

    def init_hidden(self):
        return (torch.autograd.Variable(torch.zeros(2, self.batch_size, self.hidden_dim)),
               torch.autograd.Variable(torch.zeros(2, self.batch_size, self.hidden_dim)))
        
    def forward(self, sentence):
        embeds = self.embed(sentence.t())
        x = torch.t(embeds)
        lstm_out, self.hidden = self.lstm(x, self.hidden)
        cat_layer = torch.cat((torch.mean(embeds), torch.mean(lstm_out, dim=0)))
        y  = self.fc(cat_layer)
        out = F.log_softmax(y, dim=1)
        return out

bilstm_model = BiLSTM(hidden_dim=100, vocab=TEXT.vocab, batch_size=10, seq_len=56, embedding_dim=300)
criterion = nn.NLLLoss()
parameters = filter(lambda p: p.requires_grad, bilstm_model.parameters())
optim = torch.optim.SGD(parameters, lr = 0.05, weight_decay=0.0001)
num_epochs = 20
for e in range(num_epochs):
    for batch in train_iter:
        optim.zero_grad()
        bilstm_model.hidden = bilstm_model.init_hidden()
        text = batch.text
        label = batch.label
        y_pred = bilstm_model(text)
        nll_batch = criterion(y_pred, label-1) 
        nll_batch.backward()
        optim.step()
    nll_train, accuracy_train = test_lstm(bilstm_model, train_iter)
    nll_val, accuracy_val = test_lstm(bilstm_model, val_iter)
    print('Training performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_train, accuracy_train))
    print('Validation performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_val, accuracy_val))
    nll_test, accuracy_test = test_lstm(bilstm_model, test_iter)
    print('Test performance after epoch %d: NLL: %.4f, Accuracy: %.4f'% (e+1, nll_test, accuracy_test))

torch.save(bilstm_model.state_dict(), 'bilstm_model')

  """


Training performance after epoch 1: NLL: 3.8572, Accuracy: 0.6014
Validation performance after epoch 1: NLL: 3.8562, Accuracy: 0.6161
Test performance after epoch 1: NLL: 3.8573, Accuracy: 0.6148
Training performance after epoch 2: NLL: 3.8103, Accuracy: 0.5247
Validation performance after epoch 2: NLL: 3.8217, Accuracy: 0.5092
Test performance after epoch 2: NLL: 3.8327, Accuracy: 0.4995
Training performance after epoch 3: NLL: 3.7097, Accuracy: 0.6473
Validation performance after epoch 3: NLL: 3.6879, Accuracy: 0.6724
Test performance after epoch 3: NLL: 3.6933, Accuracy: 0.6533
Training performance after epoch 4: NLL: 3.4764, Accuracy: 0.6384
Validation performance after epoch 4: NLL: 3.5845, Accuracy: 0.6333
Test performance after epoch 4: NLL: 3.6006, Accuracy: 0.6104
Training performance after epoch 5: NLL: 3.1567, Accuracy: 0.7425
Validation performance after epoch 5: NLL: 3.3123, Accuracy: 0.7379
Test performance after epoch 5: NLL: 3.3116, Accuracy: 0.7214
Training performance

KeyboardInterrupt: 

In [None]:
class Attack:
    def __init__(self, model):
        self.net = model
        self.optimizer = optim.SGD(params=[self.net.r], lr=0.008)

    def attack(self, x, y_true, y_target, regularization=None):
        _x = x
        _y_target = Variable(torch.LongTensor([y_target]))

        # Reset value of r word perturbations
        self.net.r.data = torch.zeros(1,56*300) 

        y_pred =  np.argmax(self.net(_x).data.numpy())
        incorrect_classify = False
        if y_true != y_pred:
            incorrect_classify = True

        # Optimization Loop 
        for iteration in range(1000):

            self.optimizer.zero_grad() 
            outputs = self.net(_x)
            xent_loss = self.softmaxwithxent(outputs, _y_target) 
            adv_loss = xent_loss + torch.mean(torch.abs(self.net.r))
            adv_loss.backward() 
            self.optimizer.step() 

            # keep optimizing Until classif_op == _y_target
            y_pred_adversarial = np.argmax(self.net(_x).data.numpy())
            if y_pred_adversarial == y_target:
                break 

        if iteration == 999:
            print "Warning: optimization loop ran for 1000 iterations. The result may not be correct"

        return self.net.r.data.numpy(), y_pred, y_pred_adversarial 

In [None]:
net = CNN(TEXT.vocab, embedding_dim=300)
print(net)
SoftmaxWithXent = nn.CrossEntropyLoss()

# OPTIMIZE FOR "r" 
optimizer = optim.SGD(params=[net.r], lr=0.008)


for example in test:
    _x = example.text 
    _y_target = (example.label - 1) ^ (example.label - 1)
    
    # Reset value of r 
    net.r.data = torch.zeros(1, 56) 

    # Classification before Adv 
    y_pred =  np.argmax(net(Variable(_x)).data.numpy())
    y_preds.append(y_pred)
    
    print "Y_TRUE: {} | Y_PRED: {}".format(_y_true, y_pred)
    if _y_true != y_pred:
        print "WARNING: IMAGE WAS NOT CLASSIFIED CORRECTLY"

    # Optimization Loop 
    tqd_loop = trange(1000)
    for iteration in tqd_loop:

        x,y = _x, _y_target
        optimizer.zero_grad() 
        outputs = net(x)
        xent_loss = SoftmaxWithXent(outputs, y) 
        adv_loss  = xent_loss + torch.mean(torch.pow(net.r,2))

        adv_loss.backward() 
        # xent_loss.backward()
        optimizer.step() 

        # print stats 
        classif_op = np.argmax(net(Variable(_x)).data.numpy())
        tqd_loop.set_description("xent Loss: {} classif: {}".format(xent_loss.data.numpy(), classif_op))

        # keep optimizing Until classif_op == _y_target
        if classif_op == _y_target.numpy()[0]:
            tqd_loop.close()
            break 

    # save adv_image and noise to list 
    noise.append(net.r.data.numpy())
    print "After Optimization Image is classified as: "
    print np.argmax(net(Variable(_x)).data.numpy())

## Part 6: Test Code 

In [None]:
def test(model, test_iter):
    "All models should be able to be run with following command."
    upload = []
    # Update: for kaggle the bucket iterator needs to have batch_size 10
    #test_iter = torchtext.data.BucketIterator(test, train=False, batch_size=10)
    for batch in test_iter:
        # Your prediction data here (don't cheat!)
        probs = model(batch.text)
        _, argmax = probs.max(1)
        upload += list(argmax.data)
    with open("predictions.txt", "w") as f:
        f.write('Id,Cat\n')
        for i in range(len(upload)):
            f.write(str(i) + "," + str(upload[i] + 1) + "\n")
test(cnn_model, test_iter)

In addition, you should put up a (short) write-up following the template provided in the repository:  https://github.com/harvard-ml-courses/cs287-s18/blob/master/template/