# Sentiment Analysis with an RNN

In this notebook, you'll implement a recurrent neural network that performs sentiment analysis. 
>Using an RNN rather than a strictly feedforward network is more accurate since we can include information about the *sequence* of words. 

Here we'll use a dataset of form spring bullying, accompanied by sentiment labels: positive or negative.

<img src="assets/reviews_ex.png" width=40%>

### Network Architecture

The architecture for this network is shown below.

<img src="assets/network_diagram.png" width=40%>

>**First, we'll pass in words to an embedding layer.** We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. You should have seen this before from the Word2Vec lesson. You can actually look up diffrent  embedding model and use those embeddings as input, here. However, it's good enough to just have an embedding layer and let the network learn a different embedding table on its own. **In this case, the embedding layer is for dimensionality reduction, rather than for learning semantic representations.??**

>**After input words are passed to an embedding layer, the new embeddings will be passed to LSTM cells.** The LSTM cells will add *recurrent* connections to the network and give us the ability to include information about the *sequence* of words in the movie review data. 

>**Finally, the LSTM outputs will go to a sigmoid output layer.** We're using a sigmoid function because positive and negative = 1 and 0, respectively, and a sigmoid will output predicted, sentiment values between 0-1. 

We don't care about the sigmoid outputs except for the **very last one**; we can ignore the rest. We'll calculate the loss by comparing the output at the last time step and the training label (pos or neg).

---
### Load in and visualize the data

In [2]:
import numpy as np
import pandas as pd
import torch
import string
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from transformers import BertTokenizer, BertModel
import matplotlib.pyplot as plt
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
import re
from sklearn.metrics import fbeta_score
from IPython.display import Image
from transformers import BertTokenizer, BertModel
%matplotlib inline

In [3]:
# First checking if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [4]:
df = pd.read_csv("/kaggle/input/forms-spring/formspring.csv")

In [5]:
df.drop(['post', 'asker', 'bully1', 'bully2', 'bully3'], axis = 1, inplace = True)

In [6]:
def impute_ans_columns(value):
    v = ['No','nan']
    if value in v:
        return 0
    return 1

In [7]:
for col in ['ans1', 'ans2', 'ans3']:
    df[col] = df[col].apply(impute_ans_columns)
df.sample(10)

Unnamed: 0,userid,ques,ans,ans1,severity1,ans2,severity2,ans3,severity3
128,aprilpooh15,Well dats so tight ur friends wit @truace . Ho...,Ohhkayy! First offf i was playin i mean he i...,0,0.0,0,0,0,0
11959,outlaw9000,Imagine someone said to you u201cif you piss ...,go strait to my GF / BF and communicate what ...,0,0.0,0,0,1,2
9327,kellyblake1,If you add your username to your ignore list ...,Why would I read my own posts anyway? :P I po...,0,0.0,0,0,0,0
8346,eatthebaby,something about a sloth and an ipod! :O,Oshit. o: The answer is 3 ;D,0,0.0,0,0,0,0
12569,outlaw9000,What type of sexual performance do you preform...,Intercourse with women Oral with Men. I like ...,0,0.0,0,0,0,0
516,nicomm,alright cool :),:))),0,,0,0,0,0
3402,tabithalocascio,so i make you mad? well maybe you could find m...,sorry i think you left the book in the stands...,0,0.0,0,0,0,0
125,aprilpooh15,the girl on your background bottom right hand ...,:) Aww!! Yeahh she is my best friend * raphae...,0,0.0,0,0,0,0
5799,zooshay,how can ones mood change from happy to sad in ...,wish i new =( where are you??,0,0.0,0,0,0,0
9639,kellyblake1,Tell us all something ourageous you&#039;ve do...,Ermmmm waking up next to a bloke after takin...,0,0.0,0,0,0,0


In [8]:
def impute_severity_columns(value):
    '''Value will be a string. We need to convert it to int'''
    v = ['nan', 'None', '0']
    if value in v:
        return 0
    try:
        return int(value)
    except ValueError as e:
        #print(value)
        return 5

In [9]:
for col in ['severity1', 'severity2', 'severity3']:
    df[col] = df[col].apply(impute_severity_columns)

In [10]:
df['IsBully'] = (
    (df.ans1 * df.severity1 + df.ans2 * df.severity2 + df.ans3 * df.severity3) / 30) >= 0.0333

# Remove uneccessary columns
df_2 = df.drop(['userid','ans1', 'severity1','ans2','severity2','ans3','severity3'], axis = 1)


In [11]:
df_2.sample(10)

Unnamed: 0,ques,ans,IsBully
8634,Are you a heavy or light sleeper?,Definitely a light sleeper. I wake up at the ...,False
11579,have you ever sucked another mans dick?,yes In Jail,False
12528,What latest natural disaster occured nearest y...,nothing. Nada,False
4123,where do you live?,Billings? Hah.,True
12733,Who has let you down in the past month? What h...,no one. I have had a pretty good month,False
148,What was your favorite birthday gift?,Life(: haha!! umm idk thats hard!,False
2633,hahahahha pfffffffffffffftttt everyone knows t...,Haha duhh!,False
8217,ASHLEY your amazing from Gabe :),Awh :3 Thankies :D,False
10730,rofl xD r then let me suck it ?,wish you ;D,True
4968,to the pathetic fool that said this &quot;...t...,baby i love youuuu<33,True


In [12]:
for col in ['ques', 'ans']:
    df_2[col] = df_2[col].str.replace("&#039;", "'") # Put back the apostrophe

    df_2[col] = df_2[col].str.replace("<br>", "") 
    df_2[col] = df_2[col].str.replace("&quot;", "") 

In [13]:
df_2 = df_2.dropna(how='all')

In [14]:
df_2.head()

Unnamed: 0,ques,ans,IsBully
0,what's your favorite song? :D,I like too many songs to have a favorite,False
1,<3,</3 ? haha jk! <33,False
2,hey angel you duh sexy,Really?!?! Thanks?! haha,False
3,(:,;(,False
4,******************MEOWWW*************************,*RAWR*?,False


In [15]:
df_2['ques_ans'] = df_2['ques'] + ' ' + df_2['ans'] 

In [16]:
df_2.drop(['ques','ans'], axis=1)
columns = ['ques_ans','IsBully']
df2_ordered = df_2[columns]
df2_ordered['ques_ans'] = df2_ordered['ques_ans'].str.lower()
# Remove punctuation using regex
df2_ordered['ques_ans'] = df2_ordered['ques_ans'].str.replace(f'[{string.punctuation}]', '', regex=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2_ordered['ques_ans'] = df2_ordered['ques_ans'].str.lower()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2_ordered['ques_ans'] = df2_ordered['ques_ans'].str.replace(f'[{string.punctuation}]', '', regex=True)


In [17]:
df2_ordered = df2_ordered[df2_ordered['ques_ans'].notna()]  # Remove NaN values
df2_ordered = df2_ordered[df2_ordered['ques_ans'].str.strip() != '']  # Remove empty strings
df2_ordered.head()

Unnamed: 0,ques_ans,IsBully
0,whats your favorite song d i like too many so...,False
1,3 3 haha jk 33,False
2,hey angel you duh sexy really thanks haha,False
4,meowww rawr,False
5,any makeup tips i suck at doing my makeup lol ...,False


In [18]:
X = df2_ordered['ques_ans'].values
y = df2_ordered['IsBully'].values

In [19]:
X_str = ' '.join(map(str, X))

In [20]:
words = X_str.split()

In [21]:
words[:30]

['whats',
 'your',
 'favorite',
 'song',
 'd',
 'i',
 'like',
 'too',
 'many',
 'songs',
 'to',
 'have',
 'a',
 'favorite',
 '3',
 '3',
 'haha',
 'jk',
 '33',
 'hey',
 'angel',
 'you',
 'duh',
 'sexy',
 'really',
 'thanks',
 'haha',
 'meowww',
 'rawr',
 'any']

In [22]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

# Initialize the tokenizer and fit it on the text data
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X)

# Convert the text to sequences of integers
sequences = tokenizer.texts_to_sequences(X)

# Pad sequences to ensure uniform input length (e.g., max_len=100)
max_len = 200  # Maximum length of sequences
padded_sequences = pad_sequences(sequences, maxlen=max_len, padding='post')

# Vocabulary size
vocab_size = len(tokenizer.word_index) + 1  # +1 because the tokenizer index starts at 1


In [23]:
# feel free to use this import 
from collections import Counter

## Build a dictionary that maps words to integers
counts = Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)
vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)}

## use the dict to tokenize each review in reviews_split
## store the tokenized reviews in reviews_ints
reviews_ints = []

reviews_ints.append([vocab_to_int[word] for word in words])


In [24]:
# Stratify ensures that the class proportions are maintained across splits
X_train, X_test, y_train, y_test = train_test_split(padded_sequences, y, test_size=0.2, random_state=42)

# Further split train into train and validation
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, stratify=y_train, random_state=42)


In [25]:
def create_attention_mask(X):
    # Convert NumPy array to a PyTorch tensor
    X_tensor = torch.tensor(X)
    
    # Create attention mask (non-zero tokens are 1, padding tokens are 0)
    return (X_tensor != 0).long()

In [26]:
from imblearn.under_sampling import RandomUnderSampler

# Assuming X_train is your feature matrix and y_train is your target (label)
undersampler = RandomUnderSampler(random_state=42)

# Perform undersampling
X_train_resampled, y_train_resampled = undersampler.fit_resample(X_train, y_train)
X_val_resampled, y_val_resampled = undersampler.fit_resample(X_val, y_val)
X_test_resampled, y_test_resampled = undersampler.fit_resample(X_test, y_test)

# Check the new class distribution after undersampling
print("Original class distribution:", pd.Series(y_train).value_counts())
print("Resampled class distribution:", pd.Series(y_train_resampled).value_counts())
print("Original validation class distribution:", pd.Series(y_val).value_counts())
print("Resampled validation class distribution:", pd.Series(y_val_resampled).value_counts())
print("Original test class distribution:", pd.Series(y_test).value_counts())
print("Resampled test class distribution:", pd.Series(y_test_resampled).value_counts())


Original class distribution: False    6535
True     1148
Name: count, dtype: int64
Resampled class distribution: False    1148
True     1148
Name: count, dtype: int64
Original validation class distribution: False    2179
True      382
Name: count, dtype: int64
Resampled validation class distribution: False    382
True     382
Name: count, dtype: int64
Original test class distribution: False    2190
True      371
Name: count, dtype: int64
Resampled test class distribution: False    371
True     371
Name: count, dtype: int64


In [27]:
from torch.utils.data import DataLoader, TensorDataset

# Create attention masks
train_attention_masks = create_attention_mask(X_train_resampled)
test_attention_masks = create_attention_mask(X_test_resampled)
val_attention_masks = create_attention_mask(X_val_resampled)

# Convert the data to PyTorch tensors
X_train_tensor = torch.tensor(X_train_resampled, dtype=torch.long)
y_train_tensor = torch.tensor(y_train_resampled, dtype=torch.float32)  # Assuming binary classification
X_test_tensor = torch.tensor(X_test_resampled, dtype=torch.long)
y_test_tensor = torch.tensor(y_test_resampled, dtype=torch.float32)
X_val_tensor = torch.tensor(X_val_resampled, dtype=torch.long)
y_val_tensor = torch.tensor(y_val_resampled, dtype=torch.float32)

train_attention_masks = torch.tensor(train_attention_masks, dtype=torch.long)
test_attention_masks = torch.tensor(test_attention_masks, dtype=torch.long)
val_attention_masks = torch.tensor(val_attention_masks, dtype=torch.long)

# Create TensorDataset (combines inputs, attention masks, and labels)
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)

# Create DataLoader for training, validation, and testing sets
batch_size =24

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)


  train_attention_masks = torch.tensor(train_attention_masks, dtype=torch.long)
  test_attention_masks = torch.tensor(test_attention_masks, dtype=torch.long)
  val_attention_masks = torch.tensor(val_attention_masks, dtype=torch.long)


## Data pre-processing

The first step when building a neural network model is getting your data into the proper form to feed into the network. Since we're using embedding layers, we'll need to encode each word with an integer. We'll also want to clean it up a bit.

You can see an example of the reviews data above. Here are the processing steps, we'll want to take:
>* We'll want to get rid of periods and extraneous punctuation.

First, let's remove all punctuation. 

### Encoding the words

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.

> **Problem to solve:** Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers **start at 1, not 0**.
> Also, convert the reviews to integers and store the reviews in a new list called `reviews_ints`. 

In [28]:
# feel free to use this import 
from collections import Counter

## Build a dictionary that maps words to integers
counts = Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)
vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)}

## use the dict to tokenize each review in reviews_split
## store the tokenized reviews in reviews_ints
reviews_ints = []

reviews_ints.append([vocab_to_int[word] for word in X_str.split()])

**Test your code**

As a text that you've implemented the dictionary correctly, print out the number of unique words in your vocabulary and the contents of the first, tokenized review.

In [29]:
# stats about vocabulary
print('Unique words: ', len((vocab_to_int)))  # should ~ 74000+
print()

# print tokens in first review
print('Tokenized review: \n', reviews_ints[:1])

Unique words:  20636

Tokenized review: 
 [[102, 9, 123, 209, 42, 2, 19, 79, 126, 867, 3, 13, 5, 123, 78, 78, 44, 523, 553, 165, 1472, 1, 1627, 375, 55, 142, 44, 8778, 2267, 99, 1155, 3929, 2, 472, 52, 231, 16, 1155, 18, 145, 19, 115, 17, 1351, 24, 4757, 38, 19, 1351, 7, 1, 276, 8779, 2, 240, 3930, 39, 6063, 793, 44, 6063, 2103, 42, 2, 8780, 8781, 8782, 1038, 20, 1, 5, 391, 26, 186, 95, 186, 8783, 20, 1, 5, 3002, 95, 8784, 8785, 20, 1, 5, 3002, 95, 57, 3931, 8786, 95, 445, 2, 362, 437, 20, 1, 6, 3932, 146, 168, 665, 959, 12, 365, 485, 648, 1115, 3932, 44, 105, 27, 497, 842, 3383, 120, 18, 523, 20, 1, 90, 12, 5, 3384, 26, 90, 12, 5, 3003, 3003, 554, 2720, 2, 46, 843, 120, 379, 27, 2, 41, 433, 26, 1352, 3933, 20, 437, 763, 6, 543, 158, 8787, 160, 486, 960, 31, 1473, 380, 5, 272, 1115, 1554, 2720, 4758, 6064, 1116, 144, 17, 100, 8788, 1, 1156, 8789, 131, 8790, 39, 679, 202, 2, 41, 320, 3, 366, 210, 43, 2, 87, 8791, 8792, 40, 1, 79, 78, 8793, 6065, 30, 5, 1353, 912, 1351, 90, 48, 2, 77, 21

### Encoding the labels

Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.

> **In this step:** Convert labels from `positive` and `negative` to 1 and 0, respectively, and place those in a new list, `encoded_labels`.

In [30]:
# 1=positive, 0=negative label conversion
labels_split = ' '.join(map(str, y)).rsplit()
encoded_labels = np.array([1 if label == 'True' else 0 for label in labels_split])

Okay, a couple issues here. We seem to have one review with zero length. And, the maximum review length is way too many steps for our RNN. We'll have to remove any super short reviews and truncate super long reviews. This removes outliers and should allow our model to train more efficiently.

> **Now problem to solve:** First, remove *any* reviews with zero length from the `reviews_ints` list and their corresponding label in `encoded_labels`.

---
## Padding sequences

To deal with both short and very long reviews, we'll pad or truncate all our reviews to a specific length. For reviews shorter than some `seq_length`, we'll pad with 0s. For reviews longer than `seq_length`, we can truncate them to the first `seq_length` words. A good `seq_length`, in this case, is 200.

> Define a function that returns an array `features` that contains the padded data, of a standard size, that we'll pass to the network. 
* The data should come from `review_ints`, since we want to feed integers to the network. 
* Each row should be `seq_length` elements long. 
* For reviews shorter than `seq_length` words, **left pad** with 0s. That is, if the review is `['best', 'movie', 'ever']`, `[117, 18, 128]` as integers, the row will look like `[0, 0, 0, ..., 0, 117, 18, 128]`. 
* For reviews longer than `seq_length`, use only the first `seq_length` words as the feature vector.

As a small example, if the `seq_length=10` and an input review is: 
```
[117, 18, 128]
```
The resultant, padded sequence should be: 

```
[0, 0, 0, 0, 0, 0, 0, 117, 18, 128]
```

**Your final `features` array should be a 2D array, with as many rows as there are reviews, and as many columns as the specified `seq_length`.**

This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.

## Training, Validation, Test

With our data in nice shape, we'll split it into training, validation, and test sets.

> **Now:** Create the training, validation, and test sets. 
* You'll need to create sets for the features and the labels, `train_x` and `train_y`, for example. 
* Define a split fraction, `split_frac` as the fraction of data to **keep** in the training set. Usually this is set to 0.8 or 0.9. 
* Whatever data is left will be split in half to create the validation and *testing* data.

---
## DataLoaders and Batching

After creating training, test, and validation data, we can create DataLoaders for this data by following two steps:
1. Create a known format for accessing our data, using [TensorDataset](https://pytorch.org/docs/stable/data.html#) which takes in an input set of data and a target set of data with the same first dimension, and creates a dataset.
2. Create DataLoaders and batch our training, validation, and test Tensor datasets.

```
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, batch_size=batch_size)
```

This is an alternative to creating a generator function for batching our data into full batches.

---
# Sentiment Network with PyTorch

Below is where you'll define the network.

<img src="assets/network_diagram.png" width=40%>

The layers are as follows:
1. An [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) that converts our word tokens (integers) into embeddings of a specific size.
2. An [LSTM layer](https://pytorch.org/docs/stable/nn.html#lstm) defined by a hidden_state size and number of layers
3. A fully-connected output layer that maps the LSTM layer outputs to a desired output_size
4. A sigmoid activation layer which turns all outputs into a value 0-1; return **only the last sigmoid output** as the output of this network.

### The Embedding Layer

We need to add an [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) because there are 74000+ words in our vocabulary. It is massively inefficient to one-hot encode that many classes. So, instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using Word2Vec, then load it here. But, it's fine to just make a new layer, using it for only dimensionality reduction, and let the network learn the weights.


### The LSTM Layer(s)

We'll create an [LSTM](https://pytorch.org/docs/stable/nn.html#lstm) to use in our recurrent network, which takes in an input_size, a hidden_dim, a number of layers, a dropout probability (for dropout between multiple layers), and a batch_first parameter.

Most of the time, you're network will have better performance with more layers; between 2-3. Adding more layers allows the network to learn really complex relationships. 

> **Here implement:** Complete the `__init__`, `forward`, and `init_hidden` functions for the SentimentRNN model class.

Note: `init_hidden` should initialize the hidden and cell state of an lstm layer to all zeros, and move those state to GPU, if available.

In [31]:
# First checking if GPU is available
train_on_gpu=torch.cuda.is_available()

if(train_on_gpu):
    print('Training on GPU.')
else:
    print('No GPU available, training on CPU.')

No GPU available, training on CPU.


In [32]:
import torch.nn as nn

class SentimentRNN(nn.Module):
    """
    The RNN model that will be used to perform Sentiment analysis.
    """

    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.2):
        """
        Initialize the model by setting up the layers.
        """
        super(SentimentRNN, self).__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # embedding and LSTM layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, 
                            dropout=drop_prob, batch_first=True)
        
        # dropout layer
        self.dropout = nn.Dropout(drop_prob)
        
        # linear and sigmoid layers
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()
        

    def forward(self, x, hidden):
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        batch_size = x.size(0)
        batch_size = x.size(0)  # Dynamically get the batch size from the input tensor
        hidden = (torch.zeros(n_layers, batch_size, hidden_dim).to(device),
        torch.zeros(n_layers, batch_size, hidden_dim).to(device))       
        # embeddings and lstm_out
        x = x.long()
        embeds = self.embedding(x)
        
        lstm_out, hidden = self.lstm(embeds, hidden)
    
        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        
        # dropout and fully-connected layer
        out = self.dropout(lstm_out)
        out = self.fc(out)
        # sigmoid function
        sig_out = self.sig(out)
        
        # reshape to be batch_size first
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1] # get last batch of labels
        
        # return last sigmoid output and hidden state
        return sig_out, hidden
    
    
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden
        

## Instantiate the network

Here, we'll instantiate the network. First up, defining the hyperparameters.

* `vocab_size`: Size of our vocabulary or the range of values for our input, word tokens.
* `output_size`: Size of our desired output; the number of class scores we want to output (pos/neg).
* `embedding_dim`: Number of columns in the embedding lookup table; size of our embeddings.
* `hidden_dim`: Number of units in the hidden layers of our LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `n_layers`: Number of LSTM layers in the network. Typically between 1-3

> Define the model  hyperparameters.


In [33]:
# Instantiate the model w/ hyperparams
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 400
hidden_dim = 256
n_layers = 2

net = SentimentRNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)

print(net)

SentimentRNN(
  (embedding): Embedding(20637, 400)
  (lstm): LSTM(400, 256, num_layers=2, batch_first=True, dropout=0.2)
  (dropout): Dropout(p=0.2, inplace=False)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)


---
## Training

Below is the typical training code. If you want to do this yourself, feel free to delete all this code and implement it yourself. You can also add code to save a model by name.

>We'll also be using a new kind of cross entropy loss, which is designed to work with a single Sigmoid output. [BCELoss](https://pytorch.org/docs/stable/nn.html#bceloss), or **Binary Cross Entropy Loss**, applies cross entropy loss to a single value between 0 and 1.

We also have some data and training hyparameters:

* `lr`: Learning rate for our optimizer.
* `epochs`: Number of times to iterate through the training dataset.
* `clip`: The maximum gradient value to clip at (to prevent exploding gradients).

In [34]:
# loss and optimization functions
lr=0.001

criterion = nn.BCELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)


In [35]:
# training params

epochs = 4 # 3-4 is approx where I noticed the validation loss stop decreasing

counter = 0
print_every = 100
clip=5 # gradient clipping

# move model to GPU, if available
if(train_on_gpu):
    net.cuda()

net.train()
# train for some number of epochs
for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        counter += 1

        if(train_on_gpu):
            inputs, labels = inputs.cuda(), labels.cuda()

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        net.zero_grad()

        # get the output from the model
        output, h = net(inputs, h)

        # calculate the loss and perform backprop
        loss = criterion(output.squeeze(), labels.float())
        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()

        # loss stats
        if counter % print_every == 0:
            # Get validation loss
            val_h = net.init_hidden(batch_size)
            val_losses = []
            net.eval()
            for inputs, labels in valid_loader:

                # Creating new variables for the hidden state, otherwise
                # we'd backprop through the entire training history
                val_h = tuple([each.data for each in val_h])

                if(train_on_gpu):
                    inputs, labels = inputs.cuda(), labels.cuda()

                output, val_h = net(inputs, val_h)
                val_loss = criterion(output.squeeze(), labels.float())

                val_losses.append(val_loss.item())

            net.train()
            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Step: {}...".format(counter),
                  "Loss: {:.6f}...".format(loss.item()),
                  "Val Loss: {:.6f}".format(np.mean(val_losses)))

Epoch: 2/4... Step: 100... Loss: 0.670029... Val Loss: 0.696938
Epoch: 3/4... Step: 200... Loss: 0.693244... Val Loss: 0.693148
Epoch: 4/4... Step: 300... Loss: 0.693403... Val Loss: 0.693203


---
## Testing

There are a few ways to test your network.

* **Test data performance:** First, we'll see how our trained model performs on all of our defined test_data, above. We'll calculate the average loss and accuracy over the test data.

* **Inference on user-generated data:** Second, we'll see if we can input just one example review at a time (without a label), and see what the trained model predicts. Looking at new, user input data like this, and predicting an output label, is called **inference**.

In [36]:
# Get test data loss and accuracy

test_losses = [] # track loss
num_correct = 0

# init hidden state
h = net.init_hidden(batch_size)

net.eval()
# iterate over test data
for inputs, labels in test_loader:

    # Creating new variables for the hidden state, otherwise
    # we'd backprop through the entire training history
    h = tuple([each.data for each in h])

    if(train_on_gpu):
        inputs, labels = inputs.cuda(), labels.cuda()
    
    # get predicted outputs
    output, h = net(inputs, h)
    
    # calculate loss
    test_loss = criterion(output.squeeze(), labels.float())
    test_losses.append(test_loss.item())
    
    # convert output probabilities to predicted class (0 or 1)
    pred = torch.round(output.squeeze())  # rounds to the nearest integer
    
    # compare predictions to true label
    correct_tensor = pred.eq(labels.float().view_as(pred))
    correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
    num_correct += np.sum(correct)


# -- stats! -- ##
# avg test loss
print("Test loss: {:.3f}".format(np.mean(test_losses)))

# accuracy over all test data
test_acc = num_correct/len(test_loader.dataset)
print("Test accuracy: {:.3f}".format(test_acc))

Test loss: 0.693
Test accuracy: 0.500
