Welcome to this notebook where we'll be implementing a simple RNN character model with PyTorch to familiarize ourselves with the PyTorch library and get started with RNNs. You can run the code we’re using on FloydHub by clicking the button below and creating the project as well.

[![Run on FloydHub](https://static.floydhub.com/button/button-small.svg)](https://floydhub.com/run?template=https://github.com/gabrielloye/RNN-walkthrough)

In this implementation, we'll be building a model that can complete your sentence based on a few characters or a word used as input.
![Example](img/Slide4.jpg)
To keep this short and simple, we won't be using any large or external datasets. Instead, we'll just be defining a few sentences to see how the model learns from these sentences. The process that this implementation will take is as follows:
![Overview](img/Slide5.jpg)

We'll start off by importing the main PyTorch package along with the *Variable* class used to store our data tensors and the *nn* package which we will use when building the model. In addition, we'll only be using numpy to pre-process our data as Torch works really well with numpy.

In [1]:
import torch
from torch import nn
import pandas as pd
# import json
import numpy as np

First, we'll define the sentences that we want our model to output when fed with the first word or the first few characters.

Then we'll create a dictionary out of all the characters that we have in the sentences and map them to an integer. This will allow us to convert our input characters to their respective integers (*char2int*) and vice versa (*int2char*).

In [2]:
text3 = ['hey how are you','good i am fine','have a nice day']
df = pd.read_csv('sentenceSingle.csv')
text = df['input'].tolist()
# text2 = df['target'].tolist()
# input_seq = df['input'].tolist()
# target_seq = df['target'].tolist()

dfOUT = pd.read_csv('sentenceTest.csv')

textOUTInput = dfOUT['input'].tolist()
textOUTTarget = dfOUT['target'].tolist()

textOUTC = textOUTInput + textOUTTarget
text = text  
print(len(text))
print(len(textOUTC))

# Join all the sentences together and extract the unique characters from the combined sentences
#chars = set(''.join(text))
chars = set(''.join(text))

# Creating a dictionary that maps integers to the characters
int2char = dict(enumerate(chars))
#int2char = dict(enumerate(chars))

# Creating another dictionary that maps characters to integers
char2int = {char: ind for ind, char in int2char.items()}

50000
100000


In [3]:
print(char2int)

{'3': 0, '6': 1, '0': 2, '4': 3, '7': 4, ':': 5, ' ': 6, ']': 7, 'n': 8, 'o': 9, '1': 10, 'r': 11, 'i': 12, '[': 13, '9': 14, '8': 15, '-': 16, '2': 17, 't': 18, '5': 19, '>': 20, '.': 21, 'a': 22}


Next, we'll be padding our input sentences to ensure that all the sentences are of the sample length. While RNNs are typically able to take in variably sized inputs, we will usually want to feed training data in batches to speed up the training process. In order to used batches to train on our data, we'll need to ensure that each sequence within the input data are of equal size.

Therefore, in most cases, padding can be done by filling up sequences that are too short with **0** values and trimming sequences that are too long. In our case, we'll be finding the length of the longest sequence and padding the rest of the sentences with blank spaces to match that length.

In [4]:
maxlen = len(max(text, key=len))
print("The longest string has {} characters".format(maxlen))

The longest string has 85 characters


In [5]:
# Padding

# A simple loop that loops through the list of sentences and adds a ' ' whitespace until the length of the sentence matches
# the length of the longest sentence
# half = len(textOUTC) - len(input_seq)
for i in range(len(text)):
    while len(text[i])<maxlen:
        text[i] += ' '
#     while len(target_seq[i])<maxlen:
#         target_seq[i] += ' '
#     while len(textOUTInput[i])<maxlen:
#         textOUTInput[i] += ' '
#     while len(textOUTTarget[i])<maxlen:
#         textOUTTarget[i] += ' '


As we're going to predict the next character in the sequence at each time step, we'll have to divide each sentence into

- Input data
    - The last input character should be excluded as it does not need to be fed into the model
- Target/Ground Truth Label
    - One time-step ahead of the Input data as this will be the "correct answer" for the model at each time step corresponding to the input data

In [6]:
#Creating lists that will hold our input and target sequences
input_seq = []
target_seq = []

for i in range(len(text)):
    # Remove last character for input sequence
    input_seq.append(text[i][:-1])
    
    # Remove firsts character for target sequence
    target_seq.append(text[i][1:])
    print("Input Sequence: {}\nTarget Sequence: {}".format(input_seq[i], target_seq[i]))


Input Sequence: rotation 854-15-92 to 937-957-0  >> rotation: [ 854-15-92 ] -- [937-957-0 ] .       
Target Sequence: otation 854-15-92 to 937-957-0  >> rotation: [ 854-15-92 ] -- [937-957-0 ] .        
Input Sequence: rotation 1-5-98  >> rotation: [ 1-5-98 ] .                                          
Target Sequence: otation 1-5-98  >> rotation: [ 1-5-98 ] .                                           
Input Sequence: rotation 146-92-9 to 4-69-99  >> rotation: [ 146-92-9 ] -- [4-69-99 ] .             
Target Sequence: otation 146-92-9 to 4-69-99  >> rotation: [ 146-92-9 ] -- [4-69-99 ] .              
Input Sequence: rotation 2-02-47  >> rotation: [ 2-02-47 ] .                                        
Target Sequence: otation 2-02-47  >> rotation: [ 2-02-47 ] .                                         
Input Sequence: rotation 3-86-51 to 49-34-1  >> rotation: [ 3-86-51 ] -- [49-34-1 ] .               
Target Sequence: otation 3-86-51 to 49-34-1  >> rotation: [ 3-86-51 ] -- [49-34-1 ] .  

Input Sequence: rotation 55-7-96  >> rotation: [ 55-7-96 ] .                                        
Target Sequence: otation 55-7-96  >> rotation: [ 55-7-96 ] .                                         
Input Sequence: rotation 841-8-9 to 794-75-5  >> rotation: [ 841-8-9 ] -- [794-75-5 ] .             
Target Sequence: otation 841-8-9 to 794-75-5  >> rotation: [ 841-8-9 ] -- [794-75-5 ] .              
Input Sequence: rotation 4-96-84 to 55-7-86  >> rotation: [ 4-96-84 ] -- [55-7-86 ] .               
Target Sequence: otation 4-96-84 to 55-7-86  >> rotation: [ 4-96-84 ] -- [55-7-86 ] .                
Input Sequence: rotation 40-0-0 to 8-7-065  >> rotation: [ 40-0-0 ] -- [8-7-065 ] .                 
Target Sequence: otation 40-0-0 to 8-7-065  >> rotation: [ 40-0-0 ] -- [8-7-065 ] .                  
Input Sequence: rotation 6-160-336  >> rotation: [ 6-160-336 ] .                                    
Target Sequence: otation 6-160-336  >> rotation: [ 6-160-336 ] .                       

Input Sequence: rotation 949-24-00  >> rotation: [ 949-24-00 ] .                                    
Target Sequence: otation 949-24-00  >> rotation: [ 949-24-00 ] .                                     
Input Sequence: rotation 63-1-64 to 8-2-139  >> rotation: [ 63-1-64 ] -- [8-2-139 ] .               
Target Sequence: otation 63-1-64 to 8-2-139  >> rotation: [ 63-1-64 ] -- [8-2-139 ] .                
Input Sequence: rotation 89-952-64  >> rotation: [ 89-952-64 ] .                                    
Target Sequence: otation 89-952-64  >> rotation: [ 89-952-64 ] .                                     
Input Sequence: rotation 375-669-626 to 391-9-11  >> rotation: [ 375-669-626 ] -- [391-9-11 ] .     
Target Sequence: otation 375-669-626 to 391-9-11  >> rotation: [ 375-669-626 ] -- [391-9-11 ] .      
Input Sequence: rotation 071-493-326  >> rotation: [ 071-493-326 ] .                                
Target Sequence: otation 071-493-326  >> rotation: [ 071-493-326 ] .                   

Target Sequence: otation 6-77-4  >> rotation: [ 6-77-4 ] .                                           
Input Sequence: rotation 9-514-9 to 69-9-08  >> rotation: [ 9-514-9 ] -- [69-9-08 ] .               
Target Sequence: otation 9-514-9 to 69-9-08  >> rotation: [ 9-514-9 ] -- [69-9-08 ] .                
Input Sequence: rotation 1-267-9 to 3-845-3  >> rotation: [ 1-267-9 ] -- [3-845-3 ] .               
Target Sequence: otation 1-267-9 to 3-845-3  >> rotation: [ 1-267-9 ] -- [3-845-3 ] .                
Input Sequence: rotation 86-03-69 to 6-4-2  >> rotation: [ 86-03-69 ] -- [6-4-2 ] .                 
Target Sequence: otation 86-03-69 to 6-4-2  >> rotation: [ 86-03-69 ] -- [6-4-2 ] .                  
Input Sequence: rotation 8-399-02  >> rotation: [ 8-399-02 ] .                                      
Target Sequence: otation 8-399-02  >> rotation: [ 8-399-02 ] .                                       
Input Sequence: rotation 318-8-5  >> rotation: [ 318-8-5 ] .                          

Input Sequence: rotation 2-0-36  >> rotation: [ 2-0-36 ] .                                          
Target Sequence: otation 2-0-36  >> rotation: [ 2-0-36 ] .                                           
Input Sequence: rotation 26-7-3  >> rotation: [ 26-7-3 ] .                                          
Target Sequence: otation 26-7-3  >> rotation: [ 26-7-3 ] .                                           
Input Sequence: rotation 478-378-4 to 4-559-428  >> rotation: [ 478-378-4 ] -- [4-559-428 ] .       
Target Sequence: otation 478-378-4 to 4-559-428  >> rotation: [ 478-378-4 ] -- [4-559-428 ] .        
Input Sequence: rotation 504-2-63 to 11-1-3  >> rotation: [ 504-2-63 ] -- [11-1-3 ] .               
Target Sequence: otation 504-2-63 to 11-1-3  >> rotation: [ 504-2-63 ] -- [11-1-3 ] .                
Input Sequence: rotation 55-573-651  >> rotation: [ 55-573-651 ] .                                  
Target Sequence: otation 55-573-651  >> rotation: [ 55-573-651 ] .                     

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)




Input Sequence: rotation 903-728-232 to 01-01-434  >> rotation: [ 903-728-232 ] -- [01-01-434 ] .   
Target Sequence: otation 903-728-232 to 01-01-434  >> rotation: [ 903-728-232 ] -- [01-01-434 ] .    
Input Sequence: rotation 4-50-8 to 273-5-315  >> rotation: [ 4-50-8 ] -- [273-5-315 ] .             
Target Sequence: otation 4-50-8 to 273-5-315  >> rotation: [ 4-50-8 ] -- [273-5-315 ] .              
Input Sequence: rotation 03-4-24 to 721-545-1  >> rotation: [ 03-4-24 ] -- [721-545-1 ] .           
Target Sequence: otation 03-4-24 to 721-545-1  >> rotation: [ 03-4-24 ] -- [721-545-1 ] .            
Input Sequence: rotation 6-16-98  >> rotation: [ 6-16-98 ] .                                        
Target Sequence: otation 6-16-98  >> rotation: [ 6-16-98 ] .                                         
Input Sequence: rotation 673-5-2 to 3-9-09  >> rotation: [ 673-5-2 ] -- [3-9-09 ] .                 
Target Sequence: otation 673-5-2 to 3-9-09  >> rotation: [ 673-5-2 ] -- [3-9-09 ] .   

Now we can convert our input and target sequences to sequences of integers instead of characters by mapping them using the dictionaries we created above. This will allow us to one-hot-encode our input sequence subsequently.

In [7]:
for i in range(len(text)):
        input_seq[i] = [char2int[character] for character in input_seq[i]]
        target_seq[i] = [char2int[character] for character in target_seq[i]]
#         textOUTInput[i] = [char2int[character] for character in textOUTInput[i]]
#         textOUTTarget[i] = [char2int[character] for character in textOUTTarget[i]]

Before encoding our input sequence into one-hot vectors, we'll define 3 key variables:

- *dict_size*: The number of unique characters that we have in our text
    - This will determine the one-hot vector size as each character will have an assigned index in that vector
- *seq_len*: The length of the sequences that we're feeding into the model
    - As we standardised the length of all our sentences to be equal to the longest sentences, this value will be the max length - 1 as we removed the last character input as well
- *batch_size*: The number of sentences that we defined and are going to feed into the model as a batch

In [8]:
dict_size = len(char2int)
seq_len = maxlen -1 
batch_size = len(text)

def one_hot_encode(sequence, dict_size, seq_len, batch_size):
    # Creating a multi-dimensional array of zeros with the desired output shape
    features = np.zeros((batch_size, seq_len, dict_size), dtype=np.float32)
    
    # Replacing the 0 at the relevant character index with a 1 to represent that character
    for i in range(batch_size):
        for u in range(seq_len):
            features[i, u, sequence[i][u]] = 1
    return features

We also defined a helper function that creates arrays of zeros for each character and replaces the corresponding character index with a **1**.

In [9]:
print(len(input_seq))
input_seq = one_hot_encode(input_seq, dict_size, seq_len, batch_size)
# textOUTInput = one_hot_encode(textOUTInput, dict_size, seq_len, half)

# target_seq = one_hot_encode(target_seq, dict_size, seq_len, half)

print("Input shape: {} --> (Batch Size, Sequence Length, One-Hot Encoding Size)".format(input_seq.shape))

50000
Input shape: (50000, 84, 23) --> (Batch Size, Sequence Length, One-Hot Encoding Size)


Since we're done with all the data pre-processing, we can now move the data from numpy arrays to PyTorch's very own data structure - **Torch Tensors**

In [10]:
input_seq = torch.from_numpy(input_seq)
#textOUTInput = torch.from_numpy(textOUTInput)
target_seq = torch.Tensor(target_seq)
#textOUTTarget = torch.Tensor(textOUTTarget)


Now we've reached the fun part of this project! We'll be defining the model using the Torch library, and this is where you can add or remove layers, be it fully connected layers, convolutational layers, vanilla RNN layers, LSTM layers, and many more! In this post, we'll be using the basic nn.rnn to demonstrate a simple example of how RNNs can be used.

Before we start building the model, let's use a build in feature in PyTorch to check the device we're running on (CPU or GPU). This implementation will not require GPU as the training is really simple. However, as you progress on to large datasets and models with millions of trainable parameters, using the GPU will be very important to speed up your training.

In [19]:
# torch.cuda.is_available() checks and returns a Boolean True if a GPU is available, else it'll return False
is_cuda = torch.cuda.is_available()

# If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
if is_cuda:
    device = torch.device("cuda")
    deviceShared = torch.device("cuda:0")
    print("GPU is available")
else:
    device = torch.device("cpu")
    print("GPU not available, CPU used")
# device = torch.device("cpu")

GPU is available


To start building our own neural network model, we can define a class that inherits PyTorch’s base class (nn.module) for all neural network modules. After doing so, we can start defining some variables and also the layers for our model under the constructor. For this model, we’ll only be using 1 layer of RNN followed by a fully connected layer. The fully connected layer will be in-charge of converting the RNN output to our desired output shape.

We’ll also have to define the forward pass function under forward() as a class method. The order the forward function is sequentially executed, therefore we’ll have to pass the inputs and the zero-initialized hidden state through the RNN layer first, before passing the RNN outputs to the fully-connected layer. Note that we are using the layers that we defined in the constructor.

The last method that we have to define is the method that we called earlier to initialize the hidden state - init_hidden(). This basically creates a tensor of zeros in the shape of our hidden states.

In [15]:
from torch.utils.data import Dataset
class Model(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(Model, self).__init__()

        # Defining some parameters
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers

        #Defining the layers
        # RNN Layer
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True, nonlinearity='tanh')
           
        # Fully connected layer
        self.fc = nn.Linear(hidden_dim, output_size)
        self.fc1 = nn.Linear(hidden_dim, 500)
        self.fc2 = nn.Linear(500, hidden_dim)
    
    def forward(self, x):
        
        batch_size = x.size(0)

        #Initializing hidden state for first input using method defined below
        hidden = self.init_hidden(batch_size)

        # Passing in the input and hidden state into the model and obtaining outputs
        out, hidden = self.rnn(x, hidden)
        

        # Reshaping the outputs such that it can be fit into the fully connected layer
        out = out.contiguous().view(-1, self.hidden_dim)
        out = self.fc1(out)
        out = self.fc2(out)
        out = self.fc(out)
        
        # Reshaping the outputs such that it can be fit into the fully connected layer
        #out = out.contiguous().view(-1, self.hidden_dim)
        #print("out 1",out.shape)
        #out = self.fc(out)
        #print("out 2", out.shape)
        #print("hidden", hidden.shape)
        return out, hidden
    
    def init_hidden(self, batch_size):
        # This method generates the first hidden state of zeros which we'll use in the forward pass
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)
         # We'll send the tensor holding the hidden state to the device we specified earlier as well
        return hidden
class LSTMModel(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(LSTMModel, self).__init__()

        # Defining some parameters
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers

        #Defining the layers
        # RNN Layer
        self.lstm = nn.LSTM(input_size, hidden_dim, n_layers, batch_first=True) 
        # Fully connected layer        
        

        self.fc = nn.Linear(hidden_dim, output_size)
        self.softmax = nn.LogSoftmax(dim=output_size)
    
    def forward(self, x):
        
        batch_size = x.size(0)

        #Initializing hidden state for first input using method defined below
        hidden = self.init_hidden(batch_size)

        # Passing in the input and hidden state into the model and obtaining outputs
        out, hidden = self.lstm(x, hidden)
        

        # Reshaping the outputs such that it can be fit into the fully connected layer
        
        out = out.contiguous().view(-1, self.hidden_dim)
       
        return out, hidden
    
    def init_hidden(self, batch_size):
        # This method generates the first hidden state of zeros which we'll use in the forward pass
        c0 = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)  # Cell state
        h0 = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(device)  # Hidden state
    
        return c0, h0  
    
class LSTMTagger(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
        super(LSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim

        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)

        # The LSTM takes word embeddings as inputs, and outputs hidden states
        # with dimensionality hidden_dim.
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)

        # The linear layer that maps from hidden state space to tag space
        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores    
class MyDataset(Dataset):
    def __init__(self, inputs, outputs):
        self.inputs = inputs
        self.outputs = outputs

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, index):
        input_data = self.inputs[index]
        output_data = self.outputs[index]
        return input_data, output_data

After defining the model above, we'll have to instantiate the model with the relevant parameters and define our hyperparamters as well. The hyperparameters we're defining below are:

- *n_epochs*: Number of Epochs --> This refers to the number of times our model will go through the entire training dataset
- *lr*: Learning Rate --> This affects the rate at which our model updates the weights in the cells each time backpropogation is done
    - A smaller learning rate means that the model changes the values of the weight with a smaller magnitude
    - A larger learning rate means that the weights are updated to a larger extent for each time step

Similar to other neural networks, we have to define the optimizer and loss function as well. We’ll be using CrossEntropyLoss as the final output is basically a classification task.

In [22]:
# Instantiate the model with hyperparameters

model = Model(input_size=dict_size, output_size=dict_size, hidden_dim=6, n_layers=1)

# We'll also set the model to the device that we defined earlier (default is CPU)
model = model.to(deviceShared) 

modelL = LSTMModel(input_size=dict_size, output_size=dict_size, hidden_dim=40, n_layers=1)
# modelL = modelL.to(device) 

# Define hyperparameters
n_epochs = 300
lr=0.01

# Define Loss, Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

dataset = MyDataset(input_seq, target_seq)
batch_size = 10000
shuffle = True  # Set to True if you want to shuffle the data
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)




In [24]:
lossPerEpoch = []
input_seq = input_seq.to(deviceShared)
for epoch in range(1, n_epochs + 1):
    optimizer.zero_grad() # Clears existing gradients from previous epoch
    # input_seq = input_seq.to(device)
    output, hidden = model(input_seq)
    output = output.to(deviceShared)
    target_seq = target_seq.to(deviceShared)
    loss = criterion(output,target_seq.view(-1).long())
    loss.backward() # Does backpropagation and calculates gradients
    optimizer.step() # Updates the weights accordingly
    lossPerEpoch.append(loss.item())
    if epoch%30 == 0:
        print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
        print("Loss: {:.4f}".format(loss.item()))

RuntimeError: CUDA out of memory. Tried to allocate 7.82 GiB (GPU 0; 8.00 GiB total capacity; 1.50 GiB already allocated; 4.33 GiB free; 1.85 GiB reserved in total by PyTorch)

In [None]:
# lossPerEpochLSTMnb = []
# input_seq = input_seq.to(device)
# for epoch in range(1, n_epochs + 1):
#     optimizer.zero_grad() # Clears existing gradients from previous epoch
#     # input_seq = input_seq.to(device)
#     output, hidden = modelL(input_seq)
#     output = output.to(device)
#     target_seq = target_seq.to(device)
#     loss = criterion(output,target_seq.view(-1).long())
#     loss.backward() # Does backpropagation and calculates gradients
#     optimizer.step() # Updates the weights accordingly
#     lossPerEpochLSTMnb.append(loss.item())
#     if epoch%10 == 0:
#         print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
#         print("Loss: {:.4f}".format(loss.item()))

Now we can begin our training! As we only have a few sentences, this training process is very fast. However, as we progress, larger datasets and deeper models mean that the input data is much larger and the number of parameters within the model that we have to compute is much more.

In [None]:


# lossPerEpoch = []
# for epoch in range(1, n_epochs + 1):
#     for batch_inputs, batch_targets in dataloader:
#         # Move the batch tensors to the GPU if available
#         optimizer.zero_grad()
#         batch_inputs = batch_inputs.to(device)
#         batch_targets = batch_targets.to(device)

#         output, hidden = model(batch_inputs)
#         output = output.to(device)
        
#         loss = criterion(output,batch_targets.view(-1).long())
#         loss.backward() # Does backpropagation and calculates gradients
#         optimizer.step() # Updates the weights accordingly
#         lossPerEpoch.append(loss.item())
#         if epoch%40 == 0 :
#             print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
#             print("Loss: {:.4f}".format(loss.item()))
#         # Rest of the training loop
#         # ...


In [None]:

# # Training Run
# lossPerEpochLSTM = []
# for epoch in range(1, n_epochs + 1):
#     for batch_inputs, batch_targets in dataloader:
#         # Move the batch tensors to the GPU if available
#         optimizer.zero_grad()
        
#         batch_inputs = batch_inputs.to(device)
#         batch_targets = batch_targets.to(device)

#         output, hidden = modelL(batch_inputs)
#         output = output.to(device)
        
#         loss = criterion(output,batch_targets.view(-1).long())
#         loss.backward() # Does backpropagation and calculates gradients
#         optimizer.step() # Updates the weights accordingly
#         lossPerEpochLSTM.append(loss.item())
#         if epoch%20 == 0:
#             print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
#             print("Loss: {:.4f}".format(loss.item()))
            
#         # Rest of the training loop
#         # ...


In [None]:
def predict(model, character):
    # One-hot encoding our input to fit into the model
    character = np.array([[char2int[c] for c in character]])
    character = one_hot_encode(character, dict_size, character.shape[1], 1)
    character = torch.from_numpy(character)
    character = character.to(device)
    
    out, hidden = model(character)

    prob = nn.functional.softmax(out[-1], dim=0).data
    # Taking the class with the highest probability score from the output
    char_ind = torch.max(prob, dim=0)[1].item()

    return int2char[char_ind], hidden

In [None]:
def sample(model, out_len, start):
    model.eval() # eval mode
    start = start.lower()
    # First off, run through the starting characters
    chars = [ch for ch in start]
    size = out_len - len(chars)
    # Now pass in the previous characters and get a new one
    for ii in range(size):
        char, h = predict(model, chars)
        chars.append(char)

    return ''.join(chars)

In [None]:
sample(model, 150 , 'rotation 100-100-100 to 100-100-100 >>')

In [None]:
import matplotlib.pyplot as plt

  # Replace with your array of loss values
epochs = range(1, len(lossPerEpoch) + 1)
plt.plot(epochs, lossPerEpoch, 'b-o', markersize=2)
plt.title('Loss per Epoch (tanh and Adam)')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.ylim(0,10)
plt.xlim(0,500)
plt.show()


In [None]:
import numpy as np
from sklearn.metrics import confusion_matrix, roc_curve, roc_auc_score
import matplotlib.pyplot as plt



# Assuming you have predictions and true labels for your model
predictions = model.predict(X_test)
true_labels = y_test

# Compute confusion matrix
cm = confusion_matrix(true_labels, predictions)

for pred, true in zip(predictions, true_labels):
    confusion_matrix[pred, true] += 1


# Plot confusion matrix
plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.colorbar()
plt.xticks(np.arange(len(classes)), classes, rotation=45)
plt.yticks(np.arange(len(classes)), classes)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')

plt.show()




In [None]:
import torch
import torch.nn as nn
from gensim.models import Word2Vec

# define the RNN model
class RNNModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers):
        super(RNNModel, self).__init__()
        
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.rnn = nn.RNN(hidden_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size)
        x = x.long()
        embedded = self.embedding(x)
        out, hidden = self.rnn(embedded, hidden)
        out = out.contiguous().view(-1, self.hidden_size)
        out = self.fc(out)
        
        return out, hidden
    
    def init_hidden(self, batch_size):
        return torch.zeros(self.num_layers, batch_size, self.hidden_size)

# define the Word2Vec model
class W2VModel():
    def __init__(self, sentences):
        self.model = Word2Vec(sentences, min_count=1)

    def encode(self, sentence):
        vecs = []
        sentence2 = str(sentence)
        sentences3 = sentence2.split()
        for word in sentences3:
            if word in self.model.wv.key_to_index:
                vecs.append(self.model[word])
        return torch.Tensor(vecs)

# define the training loop


In [None]:
def train(model, w2v_model, optimizer, criterion, train_loader, num_epochs):
    for epoch in range(num_epochs):
        for i, (inputs, targets) in enumerate(train_loader):
            optimizer.zero_grad()
            print(inputs)
            encoded_inputs = w2v_model.encode(inputs)
            encoded_targets = w2v_model.encode(targets)
            print("in",encoded_inputs)
            print("out",encoded_targets)
            
            outputs, _ = model(encoded_inputs.unsqueeze(0))
            
            loss = criterion(outputs, encoded_targets)
            loss.backward()
            optimizer.step()
            
            if (i+1) % 10 == 0:
                print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                       .format(epoch+1, num_epochs, i+1, len(train_loader), loss.item()))

In [None]:

# sample data
sentences = ['this is a sample sentence.', 'this is another sample sentence.']
inputs = ['this is the input.', 'this is another input.']
targets = ['this is the target output.', 'this is another target output.']
allStuff = inputs + targets
print("inputs and targs", allStuff)
# create the Word2Vec model
w2v_model = W2VModel(allStuff)
print("w2v", w2v_model.model)


In [None]:
input_size = len(w2v_model.model.wv)
print(input_size)
hidden_size = 100
output_size = 100
num_layers = 1
model2 = RNNModel(input_size, hidden_size, output_size, num_layers)

# define the optimizer and loss function
learning_rate = 0.01
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

# define the data loader
train_data = [(inputs[i], targets[i]) for i in range(len(inputs))]
train_loader = torch.utils.data.DataLoader(train_data, batch_size=1, shuffle=True)
print(train_loader.dataset)
# train the model
num_epochs = 10
train(model2, w2v_model, optimizer, criterion, train_loader, num_epochs)

In [None]:
import torch
import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size, num_layers, batch_first=True)
    
    def forward(self, input):
        embedded = self.embedding(input)
        output, hidden = self.gru(embedded)
        return output, hidden
print("done")


In [None]:
class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size, num_layers):
        super(Decoder, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=2)
    
    def forward(self, input, hidden):
        embedded = self.embedding(input)
        output, hidden = self.gru(embedded, hidden)
        output = self.fc(output)
        output = self.softmax(output)
        return output, hidden


In [None]:
class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder):
        super(Seq2Seq, self).__init__()
        self.encoder = encoder
        self.decoder = decoder
    
    def forward(self, input, target, teacher_forcing_ratio=0.5):
        batch_size = input.size(0)
        target_len = target.size(1)
        target_vocab_size = self.decoder.fc.out_features
        
        outputs = torch.zeros(batch_size, target_len, target_vocab_size).to(input.device)
        
        encoder_output, hidden = self.encoder(input)
        
        decoder_input = torch.tensor([[SOS_token]] * batch_size, dtype=torch.long).to(input.device)
        
        for t in range(target_len):
            decoder_output, hidden = self.decoder(decoder_input, hidden)
            outputs[:, t, :] = decoder_output.squeeze(1)
            
            teacher_force = torch.rand(1).item() < teacher_forcing_ratio
            top1 = decoder_output.argmax(2)
            decoder_input = target[:, t] if teacher_force else top1.squeeze(1)
        
        return outputs


In [None]:
# Instantiate the encoder and decoder models with hyperparameters
encoder = Encoder(input_size=dict_size, hidden_size=hidden_size, num_layers = 20)
decoder = Decoder(hidden_size=hidden_size, output_size=dict_size, num_layers = 20)

# We'll also set the models to the device that we defined earlier (default is CPU)
encoder = encoder.to(device)
decoder = decoder.to(device)

# Define hyperparameters
n_epochs = 300
lr = 0.01

# Define Loss, Optimizer
criterion = nn.CrossEntropyLoss()
encoder_optimizer = torch.optim.Adam(encoder.parameters(), lr=lr)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)


In [None]:
# Training Run
lossPerEpoch = []
#input_seq = input_seq.to(device)
#target_seq = target_seq.to(device)

for epoch in range(1, n_epochs + 1):
    encoder_optimizer.zero_grad()  # Clears existing gradients from previous epoch
    decoder_optimizer.zero_grad()  # Clears existing gradients from previous epoch

    textOUTInput = textOUTInput.to(device)
    textOUTTarget = textOUTTarget.to(device)

   
    encoder_outputs, encoder_hidden = encoder(textOUTInput)

   
    decoder_hidden = encoder_hidden  # Use last encoder hidden state as decoder initial hidden state

    loss = 0

    # Teacher forcing: Use the true targets as inputs for the decoder
    for di in range(textOUTTarget.size(1)):
        decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden, encoder_outputs)
        loss += criterion(decoder_output, textOUTTarget[:, di])

        decoder_input = textOUTTarget[:, di]  # Next decoder input is the current target

    loss.backward()  # Does backpropagation and calculates gradients
    encoder_optimizer.step()
    decoder_optimizer.step()  # Updates the weights accordingly

    lossPerEpoch.append(loss.item())

    if epoch % 10 == 0:
        print('Epoch: {}/{}.............'.format(epoch, n_epochs), end=' ')
        print("Loss: {:.4f}".format(loss.item()))


As we can see, the model is able to come up with the sentence ‘good i am fine ‘ if we feed it with the words ‘good’, achieving what we intended for it to do!