# HW03 Sentiment Analysis with FNN, RNN, GRU and LSTM
-- by Ruijie Rao

## 1. Data Loading
Regarding cleaning and loading data, I lowered the cases, cleaned out non-alphabetical characters and decontracted. Then, to improve efficiency in further works, I sampled the dataset then pickled it into a pkl file.

## 2. Word2Vec

### 2.1 Import Google News model

I first checked the model's coverage of my corpus vocab by creating a Bag of Words with counts. Then, for every word in my vocab, I check if it is covered by the google model. Finally, sum the counts together to get the total coverage of text.

In the first pass, only 70% of the total text are covered, which is pretty bad. Thus, I skimmed through the words that are not covered sorted in descending order of counts, found out that stopwords like "and', "of", "to" are not covered while having huge counts in my corpus. After removing them, the coverage reached 98% though only 50% of the words are covered. Those are the words that are misspelled or very rare, so I let them be.

#### Check Semantic Similarities
I used the L2 norm as the measurement of distance between 2 vectors.

1. "Cat", "Car", and "Bus"

    - Comparing "car" with both "cat" and "bus" explores if the word vector captures the semantic difference over edit distance.
    - Result: 
        - Cat&Car: 3.5739133148105315
        - Bus&Car: 2.9155472320880684
    - It is shown that "bus" is actually more similar to "car" than "cat" to "car". Passed.


2. King − Man + Woman = Queen
    - This checks the semantic operation if holds.
    - Result:
        - King − Man + Woman vs. Queen: 2.298657801924729
        - King vs. Queen: 2.479692373238762 (as a reference)

3. excellent ∼ outstanding
    - Result:
        - excellent vs. outstanding: 2.4881580884687127
        - excellent vs. terrible: 2.957787185080837
    - Tested if the distance between similar words are closer than opposite words. Passed.

### 2.2 Build my own model

Build the model according to this [tutorial](https://www.tutorialspoint.com/gensim/gensim_creating_a_dictionary.htm) and following the requirements in the hw-instruction: "Set the embedding size to be 300 and the window size to be 13. You can also consider
a minimum word count of 9.'

#### Check Semantic Similarities
I used the L2 norm as the measurement of distance between 2 vectors.

2. King − Man + Woman = Queen
    - This checks the semantic operation if holds.
    - Result:
        - King − Man + Woman vs. Queen: 4.871180486922247
        - King vs. Queen: 1.5191899898462256 (as a reference)
    - The "Made up" Queen is actually so far from Queen itself that even the word King is closer to Queen. Failed.

3. excellent ∼ outstanding
    - Result:
        - excellent vs. outstanding: 10.07535627275851
        - excellent vs. terrible: 11.696851275335957
    - The distance is so large, though it is closer than opposite word. Half-passed.

### 2.3 Conclusion
- In the King and Queen test, my model does terribly since the "made up queen" has a bigger distance from the actually queen than king itself.
- In the excellent ∼ outstanding test, the result is also not satisfying, though atleast excellent and outstanding is more similar than exceleent and terrible.
- To conclude, it is obvious that the google news pretrained model performs much better and more aligned with common sense. I do believe this is due to its huge corpus and difference in context. Google news definitely contains more semantic information than limited amount of product reviews.

## 3. Simple models

- Perceptron Model: 
    - TFIDF: Precision: 0.6866362883623626, Recall: 0.6598754678793339, F-1: 0.6648885837753792, **Accuracy: 0.6600833333333334**
    - Word2Vec: Precision: 0.6598653875356274, Recall: 0.6036417528177076, F-1: 0.6021460529466545, **Accuracy: 0.60375**

- SVM Model: 
    - TFIDF: Precision: 0.7062674968978809, Recall: 0.7082192621000024, F-1: 0.7068133498932015, **Accuracy: 0.7076666666666667**
    - Word2Vec: Precision: 0.7035451625004853, Recall: 0.7070133675657541, F-1: 0.7035801054315015, **Accuracy: 0.70625**

### Conclusion
- For perceptron model, TFIDF works obviously better than word2vec embeddings.
- For SVM, both work equally well.
- Comparing the two models, SVM outperforms Perceptron in both input types.

## 4. Feedforward Neural Networks

### 4.1 Dataloader
Build a dataset class that takes a input of a 2 column dataframe, the first column should be the feature (either mean or concat version of embeddings) while the second column should be the label. Feature array should be of dtype float32. 

Build a function that takes a whole dataframe and splits it into train, validation and test(0.6, 0.2, 0.2) and load them into seperate dataset loaders with a default batch size of 64.

Different batch sizes are tried, finding that smaller batches mildly solves overfitting problems, though its unstability requires a smaller step size/ learning rate and thus more epochs. Batch size of 32-64 and learning rate of 0.001 is best.

### 4.2 Validation set and Best model
I use a validation set to check for overfitting during the training process. Moreover, the model performance on validation set indicates which model state is the best. I save the best performing model on validation set and loads it as the final version that is tested with the testing set. 

### 4.3 FNN
Built a FNN MLP following the instruction in [this](https://www.kaggle.com/mishra1993/pytorch-multi-layer-perceptron-mnist) tutorial. The FNN is composed of input layer (300), hidden layer 1 (100), hidden layer 2 (10), and output layer (3). Following each hidden layer, there is a relu (for non-linearity purpose accelerates descending) and dropout layer (for reducing overfitting). Finally, log-softmax is used on the output vector.

### 4.4 Mean Embedding vs. Concat Embedding
- Mean Embedding: 0.71875
- Concat Embedding: 0.6965

### Conclusion
Mean Embedding outperforms concat version. This may be because for mean embedding, all information are whole though not distinct. But for concat, not all information are included.

FNN outperforms single perceptron model while doing a bit better than SVM.

## 5. Recurrent Neural Network

### 5.1 RNN
The RNN is composed of input layer (300), hidden RNN layer (20), and output layer (3). Following the hidden layer, there is a dropout layer (for reducing overfitting). Finally, log-softmax is used on the output vector. The initial hidden state h0 is created as a zero tensor.

**Result**: 0.728

### 5.2 GRU
The GRU is composed of input layer (300), hidden GRU layer (20), and output layer (3). Following the hidden layer, there is a dropout layer (for reducing overfitting). Finally, log-softmax is used on the output vector. The initial hidden state h0 is created as a zero tensor.

**Result**: 0.74075

### 5.3 LSTM
The LSTM is composed of input layer (300), hidden LSTM layer (20), and output layer (3). Following the hidden layer, there is a dropout layer (for reducing overfitting). Finally, log-softmax is used on the output vector. The initial hidden state and cell state h0,c0 are created as zero tensors.

**Result**: 0.73758

### Conclusion
- Comparing within RNN models, GRU > LSTM > simple RNN regarding accuracy. 
- Comparing RNN against other models above, RNN models are definitely the best performing choice.

# Notebook

In [None]:
import pandas as pd
import numpy as np
import nltk
import re
import csv
import re
import matplotlib.pyplot as plt
import gensim
import pickle
import torch
from torch.utils.data import DataLoader, Dataset, random_split
from torch.utils.data.sampler import SubsetRandomSampler
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms

In [None]:
DATA_DIR = ''

# Preprocess

In [None]:
#with open(DATA_DIR+"sampled_df.pkl","rb") as file:
    #sampled_df = pickle.load(file)

## Read and sampling

In [None]:
raw_df = pd.read_table(DATA_DIR+'amazon_reviews_us_Beauty_v1_00.tsv.gz', compression='gzip', quotechar='"', error_bad_lines=False, quoting=csv.QUOTE_NONE)



  exec(code_obj, self.user_global_ns, self.user_ns)


In [None]:
df = raw_df[["star_rating","review_body"]]

In [None]:
df = df.dropna()

In [None]:
def label_class(x):
    if x<3:
        return 0
    if x>3:
        return 2
    else:
        return 1

In [None]:
df["label"] = df["star_rating"].apply(label_class)

In [None]:
sampled_df = pd.concat([df[df["label"] == k].sample(n=20000) for k in range(3)])

In [None]:
len(sampled_df)

60000

#### contraction dict

In [None]:
contractions = { 
    "dont":"do not",
    "ain't": "are not",
    "aren't": "are not",
    "can't": "cannot",
    "can't've": "cannot have"
    # More..
    }

## decontract

In [None]:
def decontract(x):
    tokens = x.split(' ')
    for i,token in enumerate(tokens):
        if token in contractions.keys():
            tokens[i] = contractions[token]
    return ' '.join(tokens)

## cleaning

In [None]:
def data_cleaning(x):
    my_stopwords = ["a", "and", "of", "to"]
    x = x.lower() #convert all reviews into lowercase
    x = re.sub(r'<br />', '', x)
    x = re.sub(r'\s*https?://\S+(\s+|$)', '', x) #remove the HTML and URLs from the reviews
    x = re.sub(r'[^a-zA-Z ]+', ' ', x) #remove non-alphabetical characters
    x = ' '.join(re.sub(r'\s', ' ', x).split()) #remove extra spaces
    x = ' '.join([word for word in x.split(" ") if word not in my_stopwords])
    x = decontract(x)
    return x

In [None]:
sampled_df["review_cleaned"] = sampled_df["review_body"].apply(data_cleaning)

In [None]:
with open(DATA_DIR+"sampled_df.pkl","wb") as file:
    pickle.dump(sampled_df, file)

# Word2Vec

In [None]:
import gensim.downloader as api
gg_model = api.load('word2vec-google-news-300',)

In [None]:
#gg_model = gensim.models.KeyedVectors.load_word2vec_format(DATA_DIR+"gensim-data/word2vec-google-news-300/word2vec-google-news-300.gz", binary=True)

## create bag of words

In [None]:
bow = {}
def gen_bow(x,bow):
  for word in x.split(" "):
    try:
      bow[word] += 1
    except:
      bow[word] = 1
  return

In [None]:
sampled_df["review_cleaned"].apply(lambda x: gen_bow(x,bow))

In [None]:
with open(DATA_DIR+"bow.pkl","wb") as file:
    pickle.dump(bow, file)

## Check coverage

In [None]:
def check_coverage(wv, bow):
  covered = {}
  uncovered = {}
  for word, count in bow.items():
    try: # found in wv
      temp = wv[word]
      covered[word] = bow[word]
    except: # if not found in wv
      uncovered[word] = bow[word]
  print('Found embeddings for {:.2%} of vocab'.format(len(covered) / len(bow)))
  print('Found embeddings for  {:.2%} of all text'.format(sum(covered.values())/ sum(bow.values())))
  result = sorted(uncovered.items(), key=lambda x: x[1])[::-1]

  return result

In [None]:
with open(DATA_DIR+"bow.pkl","rb") as file:
    bow = pickle.load(file)

In [None]:
uncovered = check_coverage(gg_model, bow)

Found embeddings for 47.71% of vocab
Found embeddings for  98.35% of all text


## Google News Model

### Check semantic similarities

In [None]:
def compare_difference(w,v):
  return np.sqrt(sum(np.square(w-v)))

#### `"Cat", "Car", and "Bus"

In [None]:
compare_difference(gg_model["cat"],gg_model["car"])

3.5739133148105315

In [None]:
compare_difference(gg_model["bus"],gg_model["car"])

2.9155472320880684

#### King − Man + Woman = Queen 

In [None]:
pred = gg_model["king"]-gg_model["man"]+gg_model["woman"]
compare_difference(pred,gg_model["queen"])

2.298657801924729

In [None]:
compare_difference(gg_model["queen"],gg_model["king"])

2.479692373238762

#### excellent ∼ outstanding

In [None]:
compare_difference(gg_model["excellent"],gg_model["outstanding"])

2.4881580884687127

In [None]:
compare_difference(gg_model["excellent"],gg_model["terrible"])

2.957787185080837

## Build my own Word2Vec

In [None]:
from gensim import utils

In [None]:
sentences = sampled_df["review_cleaned"].apply(lambda x: x.split()).values

In [None]:
class MyCorpus:
    """An iterator that yields sentences (lists of str)."""
    def __init__(self, sentences):
      self.sentences = sentences

    def __iter__(self):
        for line in self.sentences:
            # assume there's one document per line, tokens separated by whitespace
            yield line

In [None]:
corpus = MyCorpus(sentences)

In [None]:
my_model = gensim.models.Word2Vec(sentences=corpus, min_count=9, size=300, window=13)

### Check semantic similarities

#### King − Man + Woman = Queen 

In [None]:
pred = my_model["king"]-my_model["man"]+my_model["woman"]
compare_difference(pred,my_model["queen"])

  pred = my_model["king"]-my_model["man"]+my_model["woman"]
  compare_difference(pred,my_model["queen"])


4.871180486922247

In [None]:
compare_difference(my_model["queen"],my_model["king"])

  compare_difference(my_model["queen"],my_model["king"])


1.5191899898462256

#### excellent ∼ outstanding

In [None]:
compare_difference(my_model["excellent"],my_model["outstanding"])

  compare_difference(my_model["excellent"],my_model["outstanding"])


10.07535627275851

In [None]:
compare_difference(my_model["excellent"],my_model["terrible"])

  compare_difference(my_model["excellent"],my_model["terrible"])


11.696851275335957

# Data Loader

In [None]:
def gen_mean_embed(x, wv):
  shp = (1,300)
  result = []
  tokens = x.split(" ")
  count = 0
  for word in tokens:
    try:
      result.append(wv[word].reshape(shp))
      count += 1
    except:
      continue
  if len(result) == 0:
    return np.zeros(shp).astype('float32')
  return np.mean(result, axis=0).astype('float32')

In [None]:
def gen_cat_embed(x, wv, max_len=10):
  shp = wv["king"].shape
  result = []
  tokens = x.split(" ")
  count = 0
  for word in tokens:
    try:
      result.append(wv[word].reshape(shp))
      count += 1
    except:
      continue
    if count==max_len:
      break
  if len(result) < max_len:
    pad_len = max_len-len(result)
    result += [np.zeros(shp) for i in range(pad_len)]
  return np.array(result)

In [None]:
sampled_df["mean_emb"] = sampled_df["review_body"].apply(lambda x: gen_mean_embed(x, gg_model))

In [None]:
sampled_df["cat_emb"] = sampled_df["review_body"].apply(lambda x: gen_cat_embed(x, gg_model))

In [None]:
sampled_df["rnn_emb"] = sampled_df["review_body"].apply(lambda x: gen_cat_embed(x, gg_model, 20))

In [None]:
class Word2Vec(Dataset):
    
  def __init__(self, df, transform=None):
      self.data = df
      self.transform = transform
      
  def __len__(self):
      return len(self.data)
  
  def __getitem__(self, index):
      emb = self.data.iloc[index,0].astype('float32') #.reshape(())
      label = self.data.iloc[index,1]
      
      if self.transform is not None:
          image = self.transform(emb)
          
      return emb, label

In [None]:
def gen_train_test_loader(feature_df, batch_size=64):
  data = Word2Vec(feature_df, transform=transforms.ToTensor())
  train, val, test = random_split(data,[int(np.floor(len(feature_df)*0.6)),int(np.floor(len(feature_df)*0.2)),int(np.floor(len(feature_df)*0.2))])
  train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size,
    shuffle=True)
  val_loader = torch.utils.data.DataLoader(val, batch_size=batch_size,
    shuffle=False)
  test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, 
      shuffle=False)
  return train_loader, val_loader, test_loader

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(sampled_df["mean_emb"], sampled_df[['label']], test_size=0.2, random_state=0)

# Perceptron

In [None]:
from sklearn.metrics import f1_score, precision_score, recall_score, accuracy_score
def evaluation(y_pred, y_true):
    prc = precision_score(y_true, y_pred, average=None)
    recall = recall_score(y_true, y_pred, average=None)
    f1 = f1_score(y_true, y_pred, average=None)
    acc = accuracy_score(y_true, y_pred)
    for cls in range(1,4):
        print(f'<Class {cls}> Precision: {prc[cls-1]}, Recall: {recall[cls-1]}, F-1: {f1[cls-1]}')
    print(f'<Overall Mean> Precision: {np.mean(prc)}, Recall: {np.mean(recall)}, F-1: {np.mean(f1)}, Accuracy: {np.mean(acc)}')

In [None]:
from sklearn.linear_model import Perceptron

In [None]:
#best_seed = find_best_randseed(20)
perceptron_md = Perceptron(tol=1e-3, random_state=0)
perceptron_md.fit(np.stack(X_train), y_train.values.ravel())
evaluation(perceptron_md.predict(np.stack(X_test)), y_test)

<Class 1> Precision: 0.536787040161998, Recall: 0.7958468851638729, F-1: 0.6411367529980853
<Class 2> Precision: 0.5492424242424242, Recall: 0.5388999008919723, F-1: 0.5440220110055027
<Class 3> Precision: 0.8935666982024598, Recall: 0.47617847239727756, F-1: 0.6212793948363756
<Overall Mean> Precision: 0.6598653875356274, Recall: 0.6036417528177076, F-1: 0.6021460529466545, Accuracy: 0.60375


# SVM

In [None]:
from sklearn.svm import SVC, LinearSVC

In [None]:
svm_md = LinearSVC(random_state=0, dual=False, C=0.05)
svm_md.fit(np.stack(X_train), y_train.values.ravel())
evaluation(svm_md.predict(np.stack(X_test)), y_test)

<Class 1> Precision: 0.6834581347855684, Recall: 0.7535651738804103, F-1: 0.7168015230842456
<Class 2> Precision: 0.6408713098308971, Recall: 0.554013875123885, F-1: 0.5942857142857142
<Class 3> Precision: 0.7863060428849903, Recall: 0.813461053692967, F-1: 0.7996530789245447
<Overall Mean> Precision: 0.7035451625004853, Recall: 0.7070133675657541, F-1: 0.7035801054315015, Accuracy: 0.70625


# Train and Eval

In [None]:
def eval(model, loader, criterion):
  test_loss = 0.0
  correct_count = 0
  model.eval()
  for data, label in loader:
    data.to(device)
    label.to(device)
    pred = model(data)
    loss = criterion(pred, label)
    test_loss += loss.item()*data.size(0)
    correct_count += (pred.argmax(axis=1) == label).sum().item() 
  test_loss = test_loss/len(loader.dataset)
  test_acc = correct_count/len(loader.dataset)
  return test_loss, test_acc

def train(model, train_loader, val_loader, test_loader, epoch_num, lr=0.01):
  criterion = nn.CrossEntropyLoss()
  #optimizer = torch.optim.SGD(model.parameters(), lr=lr)
  weight_decay = 1e-4
  optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
  best_val_loss = np.inf
  for epoch in range(epoch_num):
    train_loss = 0.0
    model.train()
    for data, label in train_loader:
      data.to(device)
      label.to(device)
      label = F.one_hot(label).float()
      optimizer.zero_grad()
      pred = model(data)
      #print(pred.shape,label.shape)
      loss = criterion(pred, label)
      loss.backward()
      optimizer.step()
      train_loss += loss.item()*data.size(0)
    train_loss = train_loss/len(train_loader.dataset)
    val_loss, val_acc = eval(model, val_loader, criterion)
    #if epoch_num>10 and (epoch+1)%10 == 0:
    print(f"[Epoch {epoch+1}] Train Loss: {train_loss}")
    print(f"[Epoch {epoch+1}] Val Loss: {val_loss}")
    if val_loss<best_val_loss:
      best_val_loss = val_loss
      torch.save({
            'epoch': epoch+1,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': criterion,
            }, DATA_DIR+'best_model.pth')
  #print(best_val_loss)
  checkpoint = torch.load(DATA_DIR+'best_model.pth')
  model.load_state_dict(checkpoint['model_state_dict'])
  test_loss, test_acc = eval(model, test_loader, criterion)
  
  print(f"Test Loss: {test_loss}")
  print(f"Test Accuracy: {test_acc}")

In [None]:
#device = torch.device("cuda:0")

In [None]:
device = torch.device("cpu")

# FNN

In [None]:
class ForwardNeuralNetwork(nn.Module):
  def __init__(self, input_size=300, dropout=0.2):
    super(ForwardNeuralNetwork, self).__init__()
    hidden_1 = 100
    hidden_2 = 10
    self.input_size = input_size
    self.layer1 = nn.Linear(input_size, hidden_1)
    self.layer2 = nn.Linear(hidden_1, hidden_2)
    self.layer3 = nn.Linear(hidden_2, 3)
    self.dropout = nn.Dropout(dropout)
  
  def forward(self, x):
    x = x.view(-1, self.input_size)
    x = F.relu(self.layer1(x))
    x = self.dropout(x)
    x = F.relu(self.layer2(x))
    x = self.dropout(x)
    x = self.layer3(x)
    return F.log_softmax(x, dim=1)

## Mean Embedding

In [None]:
train_loader, val_loader, test_loader = gen_train_test_loader(sampled_df[["mean_emb","label"]].sample(60000), batch_size=64)

In [None]:
fnn = ForwardNeuralNetwork(dropout=0.2).to(device)
train(fnn, train_loader, val_loader, test_loader, epoch_num=10, lr=0.001)

[Epoch 1] Train Loss: 0.8137290114296807
[Epoch 1] Val Loss: 0.7110626417795817
[Epoch 2] Train Loss: 0.7230893446604411
[Epoch 2] Val Loss: 0.6878786160151164
[Epoch 3] Train Loss: 0.7082717269791498
[Epoch 3] Val Loss: 0.6764097997347513
[Epoch 4] Train Loss: 0.6940510704252455
[Epoch 4] Val Loss: 0.6754676510492961
[Epoch 5] Train Loss: 0.6834821257591247
[Epoch 5] Val Loss: 0.6729069339434306
[Epoch 6] Train Loss: 0.68030215660731
[Epoch 6] Val Loss: 0.6731184935569763
[Epoch 7] Train Loss: 0.6698232576052348
[Epoch 7] Val Loss: 0.6558108727137247
[Epoch 8] Train Loss: 0.6691948196093241
[Epoch 8] Val Loss: 0.655287299156189
[Epoch 9] Train Loss: 0.6626195336447822
[Epoch 9] Val Loss: 0.6526960207621256
[Epoch 10] Train Loss: 0.6559046327273051
[Epoch 10] Val Loss: 0.6530044225056966
0.6526960207621256
Test Loss: 0.6445308755238851
Test Accuracy: 0.71875


## Concat Embedding

In [None]:
train_loader, val_loader, test_loader = gen_train_test_loader(sampled_df[["cat_emb","label"]].sample(60000), batch_size=64)

In [None]:
fnn = ForwardNeuralNetwork(input_size=300*10, dropout=0.3).to(device)
train(fnn, train_loader, val_loader, test_loader, epoch_num=7, lr=0.001)

[Epoch 1] Train Loss: 0.8086624857584636
[Epoch 1] Val Loss: 0.7154063116709392
[Epoch 2] Train Loss: 0.7198042540550232
[Epoch 2] Val Loss: 0.6951276100476583
[Epoch 3] Train Loss: 0.6868773303031921
[Epoch 3] Val Loss: 0.6905115958849589
[Epoch 4] Train Loss: 0.6580080616209242
[Epoch 4] Val Loss: 0.699057316939036
[Epoch 5] Train Loss: 0.6290875034862095
[Epoch 5] Val Loss: 0.6917047092119852
[Epoch 6] Train Loss: 0.6017075389226277
[Epoch 6] Val Loss: 0.6940470565954844
[Epoch 7] Train Loss: 0.5759318087895712
[Epoch 7] Val Loss: 0.7051472538312277
[Epoch 8] Train Loss: 0.5507594144609239
[Epoch 8] Val Loss: 0.7293712181250255
[Epoch 9] Train Loss: 0.5260124085744222
[Epoch 9] Val Loss: 0.7501742374897004
[Epoch 10] Train Loss: 0.5027230452961392
[Epoch 10] Val Loss: 0.7736876887480418
0.6905115958849589
Test Loss: 0.6847688097953797
Test Accuracy: 0.6965


# RNN

In [None]:
class myRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers=1, dropout=0.2):
        super(myRNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.layer1 = nn.RNN(input_size, hidden_size, n_layers, batch_first=True, nonlinearity='relu')
        self.layer2 = nn.Linear(hidden_size, output_size)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size)
        x = x.view(batch_size, -1, self.input_size)
        x, hidden = self.layer1(x, hidden)
        x = self.dropout(x)
        x = self.layer2(x[:, -1, :])
        x = F.log_softmax(x, dim=1)
        return x

    def init_hidden(self, batch_size):
        return torch.zeros(self.n_layers, batch_size, self.hidden_size)

In [None]:
train_loader, val_loader, test_loader = gen_train_test_loader(test_df[["rnn_emb","label"]], batch_size=64)

In [None]:
rnn = myRNN(300, 20, 3, n_layers=1, dropout=0.2)
rnn.to(device)
train(rnn, train_loader, val_loader, test_loader, epoch_num=15, lr=0.001)

[Epoch 1] Train Loss: 1.0802663766013252
[Epoch 1] Val Loss: 0.9088628629048665
[Epoch 2] Train Loss: 0.8709351355234782
[Epoch 2] Val Loss: 0.8131041674613952
[Epoch 3] Train Loss: 0.7944715034696791
[Epoch 3] Val Loss: 0.773356876373291
[Epoch 4] Train Loss: 0.7610397662056817
[Epoch 4] Val Loss: 0.7353272897402445
[Epoch 5] Train Loss: 0.7351154816415575
[Epoch 5] Val Loss: 0.733051476319631
[Epoch 6] Train Loss: 0.7148104082213508
[Epoch 6] Val Loss: 0.6991502415339153
[Epoch 7] Train Loss: 0.6982838567097982
[Epoch 7] Val Loss: 0.6818477902412414
[Epoch 8] Train Loss: 0.6869051894611783
[Epoch 8] Val Loss: 0.6661720527013143
[Epoch 9] Train Loss: 0.6735935222307841
[Epoch 9] Val Loss: 0.6682776913642884
[Epoch 10] Train Loss: 0.6611515500280593
[Epoch 10] Val Loss: 0.6587955342928569
[Epoch 11] Train Loss: 0.6552951300409106
[Epoch 11] Val Loss: 0.6586754361788432
[Epoch 12] Train Loss: 0.6543201027446323
[Epoch 12] Val Loss: 0.6671794934272766
[Epoch 13] Train Loss: 0.64883193688

## GRU

In [None]:
class myGRU(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers=1, dropout=0.2):
        super(myGRU, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.layer1 = nn.GRU(input_size, hidden_size, n_layers, batch_first=True)
        self.layer2 = nn.Linear(hidden_size, output_size)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size)
        x = x.view(batch_size, -1, self.input_size)
        x, hidden = self.layer1(x, hidden)
        x = self.dropout(x)
        #x = x.contiguous().view(-1, self.hidden_dim)
        x = self.layer2(x[:, -1, :])
        x = F.log_softmax(x, dim=1)
        return x

    def init_hidden(self, batch_size):
        return torch.zeros(self.n_layers, batch_size, self.hidden_size)

In [None]:
train_loader, val_loader, test_loader = gen_train_test_loader(sampled_df[["rnn_emb","label"]].sample(60000), batch_size=64)

In [None]:
rnn = myGRU(300, 20, 3, n_layers=1, dropout=0.3)
rnn.to(device)
train(rnn, train_loader, val_loader, test_loader, epoch_num=15, lr=0.001)

[Epoch 1] Train Loss: 0.9135241704516941
[Epoch 1] Val Loss: 0.7000537357330322
[Epoch 2] Train Loss: 0.6907125168906317
[Epoch 2] Val Loss: 0.6515696303049723
[Epoch 3] Train Loss: 0.6555210551685757
[Epoch 3] Val Loss: 0.6322451899846395
[Epoch 4] Train Loss: 0.6360423141055637
[Epoch 4] Val Loss: 0.6234472675323486
[Epoch 5] Train Loss: 0.6196180650658077
[Epoch 5] Val Loss: 0.6086734843254089
[Epoch 6] Train Loss: 0.6099090187284681
[Epoch 6] Val Loss: 0.6082295484542847
[Epoch 7] Train Loss: 0.6014884813096788
[Epoch 7] Val Loss: 0.599206537882487
[Epoch 8] Train Loss: 0.594681343237559
[Epoch 8] Val Loss: 0.5931967439651489
[Epoch 9] Train Loss: 0.5885585050582886
[Epoch 9] Val Loss: 0.5993342148462931
[Epoch 10] Train Loss: 0.5820469205644395
[Epoch 10] Val Loss: 0.590727757136027
[Epoch 11] Train Loss: 0.5759189331266615
[Epoch 11] Val Loss: 0.5888265194892883
[Epoch 12] Train Loss: 0.5716812987327575
[Epoch 12] Val Loss: 0.5919062320391337
[Epoch 13] Train Loss: 0.565135067356

## LSTM

In [None]:
class myLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers=1, dropout=0.2):
        super(myLSTM, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.layer1 = nn.LSTM(input_size, hidden_size, n_layers, batch_first=True)
        self.layer2 = nn.Linear(hidden_size, output_size)
        self.sig = nn.Sigmoid()
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size)
        x = x.view(batch_size, -1, self.input_size)
        x, hidden = self.layer1(x, hidden)
        x = self.dropout(x)
        x = self.layer2(x[:, -1, :])
        x = F.log_softmax(x, dim=1)
        return x

    def init_hidden(self, batch_size):
        h0 = torch.zeros((self.n_layers,batch_size,self.hidden_size)).to(device)
        c0 = torch.zeros((self.n_layers,batch_size,self.hidden_size)).to(device)
        hidden = (h0,c0)
        return hidden

In [None]:
train_loader, val_loader, test_loader = gen_train_test_loader(sampled_df[["rnn_emb","label"]].sample(60000), batch_size=64)

In [None]:
rnn = myLSTM(300, 20, 3, n_layers=1, dropout=0.2)
rnn.to(device)
train(rnn, train_loader, val_loader, test_loader, epoch_num=15, lr=0.001)



[Epoch 1] Train Loss: 0.8904835794766744
[Epoch 1] Val Loss: 0.7527339668273926
[Epoch 2] Train Loss: 0.7216494943300883
[Epoch 2] Val Loss: 0.7012647609710694
[Epoch 3] Train Loss: 0.6794473994572957
[Epoch 3] Val Loss: 0.6808966921170553
[Epoch 4] Train Loss: 0.6562810870276558
[Epoch 4] Val Loss: 0.6736346255938213
[Epoch 5] Train Loss: 0.6380674528545803
[Epoch 5] Val Loss: 0.6466740821202596
[Epoch 6] Train Loss: 0.6264161426756117
[Epoch 6] Val Loss: 0.6452412053743998
[Epoch 7] Train Loss: 0.6166972374386258
[Epoch 7] Val Loss: 0.6355683681170146
[Epoch 8] Train Loss: 0.6050843118561638
[Epoch 8] Val Loss: 0.6251179626782735
[Epoch 9] Train Loss: 0.6000131556193034
[Epoch 9] Val Loss: 0.6296871143976848
[Epoch 10] Train Loss: 0.5929494455125597
[Epoch 10] Val Loss: 0.628282518227895
[Epoch 11] Train Loss: 0.5872453300688002
[Epoch 11] Val Loss: 0.6162628966967265
[Epoch 12] Train Loss: 0.5822600102424622
[Epoch 12] Val Loss: 0.620447011311849
[Epoch 13] Train Loss: 0.57735133502