* I am writing this kernel in order to learn how to build an LSTM using Pytorch.
* So I've chosen this dataset because it's small and easy to use.
* I'll add some comment lines to make code more understandable and readable.


**Thanks for everything:
https://www.analyticsvidhya.com/blog/2020/01/first-text-classification-in-pytorch/**

In [209]:
import torch
import pandas as pd

# nlp library of Pytorch
from torchtext import data

import warnings as wrn
wrn.filterwarnings('ignore')

In [210]:
SEED = 2021

torch.manual_seed(SEED)
torch.backends.cuda.deterministic = True

# Data Preprocessing
Pytorch offers a good way of preprocessing text data: **torchtext**. Altough it seems like not stable and hard-to-use for newbies, it has nice features and it's easy to use.



In [211]:
data_ = pd.read_csv('../input/email-spam-ham-prediction/sms_spam.csv')
data_.head()

Unnamed: 0,type,text
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


In [212]:
data_.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5574 entries, 0 to 5573
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   type    5574 non-null   object
 1   text    5574 non-null   object
dtypes: object(2)
memory usage: 87.2+ KB


In [213]:
# Field is a normal column 
# LabelField is the label column.

TEXT = data.Field(tokenize='spacy',batch_first=True,include_lengths=True)
LABEL = data.LabelField(dtype = torch.float,batch_first=True)

In [214]:
fields = [("type",LABEL),('text',TEXT)]

In [215]:
training_data = data.TabularDataset(path="../input/email-spam-ham-prediction/sms_spam.csv",
                                    format="csv",
                                    fields=fields,
                                    skip_header=True
                                   )

print(vars(training_data.examples[0]))

{'type': 'ham', 'text': ['Go', 'until', 'jurong', 'point', ',', 'crazy', '..', 'Available', 'only', 'in', 'bugis', 'n', 'great', 'world', 'la', 'e', 'buffet', '...', 'Cine', 'there', 'got', 'amore', 'wat', '...']}


In [216]:
import random
# train and validation splitting
train_data,valid_data = training_data.split(split_ratio=0.75,
                                            random_state=random.seed(SEED))


In [217]:
# Building vocabularies => (Token to integer)
TEXT.build_vocab(train_data,
                 min_freq=5)

LABEL.build_vocab(train_data)

In [218]:
print("Size of text vocab:",len(TEXT.vocab))

Size of text vocab: 1748


In [219]:
print("Size of label vocab:",len(LABEL.vocab))

Size of label vocab: 2


In [220]:
TEXT.vocab.freqs.most_common(10)

[('.', 3661),
 ('to', 1615),
 ('I', 1478),
 (',', 1461),
 ('you', 1383),
 ('?', 1086),
 ('!', 1019),
 ('a', 1003),
 ('the', 882),
 ('...', 869)]

In [221]:
# Creating GPU variable
device = torch.device("cuda")

BATCH_SIZE = 64

# We'll create iterators to get batches of data when we want to use them
"""
This BucketIterator batches the similar length of samples and reduces the need of 
padding tokens. This makes our future model more stable

"""
train_iterator,validation_iterator = data.BucketIterator.splits(
    (train_data,valid_data),
    batch_size = BATCH_SIZE,
    # Sort key is how to sort the samples
    sort_key = lambda x:len(x.text),
    sort_within_batch = True,
    device = device
)

# RNN Network
Now we'll use Pytorch to build an LSTM network in order to classify sms messages spam or not.

In [222]:
# Pytorch's nn module has lots of useful feature
import torch.nn as nn

class LSTMNet(nn.Module):
    
    def __init__(self,vocab_size,embedding_dim,hidden_dim,output_dim,n_layers,bidirectional,dropout):
        
        super(LSTMNet,self).__init__()
        
        # Embedding layer converts integer sequences to vector sequences
        self.embedding = nn.Embedding(vocab_size,embedding_dim)
        
        # LSTM layer process the vector sequences 
        self.lstm = nn.LSTM(embedding_dim,
                            hidden_dim,
                            num_layers = n_layers,
                            bidirectional = bidirectional,
                            dropout = dropout,
                            batch_first = True
                           )
        
        # Dense layer to predict 
        self.fc = nn.Linear(hidden_dim * 2,output_dim)
        # Prediction activation function
        self.sigmoid = nn.Sigmoid()
        
    
    def forward(self,text,text_lengths):
        embedded = self.embedding(text)
        
        # Thanks to packing, LSTM don't see padding tokens 
        # and this makes our model better
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu(),batch_first=True)
        
        packed_output,(hidden_state,cell_state) = self.lstm(packed_embedded)
        
        # Concatenating the final forward and backward hidden states
        hidden = torch.cat((hidden_state[-2,:,:], hidden_state[-1,:,:]), dim = 1)
        
        dense_outputs=self.fc(hidden)

        #Final activation function
        outputs=self.sigmoid(dense_outputs)
        
        return outputs
    

* Our model class is ready, let's declare hyperparameters

In [223]:
SIZE_OF_VOCAB = len(TEXT.vocab)
EMBEDDING_DIM = 100
NUM_HIDDEN_NODES = 64
NUM_OUTPUT_NODES = 1
NUM_LAYERS = 2
BIDIRECTION = True
DROPOUT = 0.2

# Training
Now let's create our model instance, optimizer and loss function

In [224]:
model = LSTMNet(SIZE_OF_VOCAB,
                EMBEDDING_DIM,
                NUM_HIDDEN_NODES,
                NUM_OUTPUT_NODES,
                NUM_LAYERS,
                BIDIRECTION,
                DROPOUT
               )

In [225]:
import torch.optim as optim
model = model.to(device)
optimizer = optim.Adam(model.parameters(),lr=1e-4)
criterion = nn.BCELoss()
criterion = criterion.to(device)

In [226]:
model

LSTMNet(
  (embedding): Embedding(1748, 100)
  (lstm): LSTM(100, 64, num_layers=2, batch_first=True, dropout=0.2, bidirectional=True)
  (fc): Linear(in_features=128, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

In [227]:
# We'll use this helper to compute accuracy
def binary_accuracy(preds, y):
    #round predictions to the closest integer
    rounded_preds = torch.round(preds)
    
    correct = (rounded_preds == y).float() 
    acc = correct.sum() / len(correct)
    return acc

In [228]:
def train(model,iterator,optimizer,criterion):
    
    epoch_loss = 0.0
    epoch_acc = 0.0
    
    model.train()
    
    for batch in iterator:
        
        # cleaning the cache of optimizer
        optimizer.zero_grad()
        
        text,text_lengths = batch.text
        
        # forward propagation and squeezing
        predictions = model(text,text_lengths).squeeze()
        
        # computing loss / backward propagation
        loss = criterion(predictions,batch.type)
        loss.backward()
        
        # accuracy
        acc = binary_accuracy(predictions,batch.type)
        
        # updating params
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    # It'll return the means of loss and accuracy
    return epoch_loss / len(iterator), epoch_acc / len(iterator)
        

* Also we need a function to evaluate model

In [229]:
def evaluate(model,iterator,criterion):
    
    epoch_loss = 0.0
    epoch_acc = 0.0
    
    # deactivate the dropouts
    model.eval()
    
    # Sets require_grad flat False
    with torch.no_grad():
        for batch in iterator:
            text,text_lengths = batch.text
            
            predictions = model(text,text_lengths).squeeze()
              
            #compute loss and accuracy
            loss = criterion(predictions, batch.type)
            acc = binary_accuracy(predictions, batch.type)
            
            #keep track of loss and accuracy
            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

* Let's train the model

In [230]:
EPOCH_NUMBER = 15
for epoch in range(1,EPOCH_NUMBER+1):
    
    train_loss,train_acc = train(model,train_iterator,optimizer,criterion)
    
    valid_loss,valid_acc = evaluate(model,validation_iterator,criterion)
    
    # Showing statistics
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')
    print()

	Train Loss: 0.603 | Train Acc: 86.22%
	 Val. Loss: 0.530 |  Val. Acc: 88.28%

	Train Loss: 0.476 | Train Acc: 86.22%
	 Val. Loss: 0.393 |  Val. Acc: 88.28%

	Train Loss: 0.397 | Train Acc: 86.22%
	 Val. Loss: 0.347 |  Val. Acc: 88.28%

	Train Loss: 0.353 | Train Acc: 86.22%
	 Val. Loss: 0.307 |  Val. Acc: 88.28%

	Train Loss: 0.306 | Train Acc: 86.27%
	 Val. Loss: 0.256 |  Val. Acc: 88.28%

	Train Loss: 0.262 | Train Acc: 86.72%
	 Val. Loss: 0.224 |  Val. Acc: 89.63%

	Train Loss: 0.230 | Train Acc: 88.26%
	 Val. Loss: 0.200 |  Val. Acc: 91.55%

	Train Loss: 0.205 | Train Acc: 90.22%
	 Val. Loss: 0.178 |  Val. Acc: 92.68%

	Train Loss: 0.184 | Train Acc: 92.07%
	 Val. Loss: 0.162 |  Val. Acc: 94.46%

	Train Loss: 0.162 | Train Acc: 93.73%
	 Val. Loss: 0.147 |  Val. Acc: 95.56%

	Train Loss: 0.143 | Train Acc: 95.48%
	 Val. Loss: 0.130 |  Val. Acc: 95.96%

	Train Loss: 0.120 | Train Acc: 96.50%
	 Val. Loss: 0.120 |  Val. Acc: 96.22%

	Train Loss: 0.108 | Train Acc: 97.25%
	 Val. Loss: 

# Conclusion
It's real fun to work with Pytorch. I dont't know why but, using and learning Pytorch after Keras API, using a lower-level API and seeing how the things work in deep learning is awesome.

Thanks for your attention. I'm waiting for your upvotes&questions.
