# **LSTM -> Long Short Term Memory**

<div style="text-align: center;">
    <img src="https://media.licdn.com/dms/image/v2/D4D22AQFr-MfhWK4pHw/feedshare-shrink_2048_1536/feedshare-shrink_2048_1536/0/1680639962661?e=1730332800&v=beta&t=rQ-tHL8kvSN9CLRb4TewSYYSsieGT0PJJWNB1wc03PM" alt="RNN Architecture" style="width: 80%;">
</div>


Reference -> Statquest

In [1]:
# %%writefile modules/LSTM_Architecture.py
import torch
import torch.nn as nn
import pytorch_lightning as pl

class LSTM(pl.LightningModule):
    
    def __init__(self):
        super().__init__()
        
        mean = torch.tensor(0.0)
        std = torch.tensor(1.0)
        
        self.wlr1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.wlr2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.blr1 = nn.Parameter(torch.tensor(0.0),requires_grad=True)
        
        self.wpr1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.wpr2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.bpr1 = nn.Parameter(torch.tensor(0.0),requires_grad=True)
        
        self.wp1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.wp2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.bp1 = nn.Parameter(torch.tensor(0.0),requires_grad=True)
        
        self.wo1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.wo2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.bo1 = nn.Parameter(torch.tensor(0.0),requires_grad=True)
        
        
    def lstm_unit(self, input_value, long_memory, short_memory):
        
        long_remember_percent = torch.sigmoid((short_memory*self.wlr1)+(input_value*self.wlr2)+self.blr1)
        
        potential_memory_remember_percent = torch.sigmoid((short_memory*self.wpr1) + (input_value*self.wpr2) + self.bpr1)
        
        potential_long_term_memory = torch.tanh((short_memory*self.wp1) + (input_value*self.wp2) + self.bp1)
        
        updated_long_term_memory = ((long_memory*long_remember_percent) + (potential_long_term_memory*potential_memory_remember_percent))
        
        output_percent = torch.sigmoid((short_memory*self.wo1) + (input_value*self.wo2) + self.bo1)
        
        updated_short_term_memory = torch.tanh(updated_long_term_memory) * output_percent
        
        return ([updated_long_term_memory, updated_short_term_memory])
    
    
    
    #This LSTM Example code takes the input values of a share for 4 days and predicts the 5th day by considering all the day values. 
    #Predefined values -> First Company : [0,0.5,0.25,1] (Expected Prediction -> 0)
    #                   Second Company : [1,0.5,0.25,1] (Expected Prediction -> 1)
    def forward(self, input):
        long_memory = 0
        short_memory = 0
    
        for day in input:
            long_memory, short_memory = self.lstm_unit(day, long_memory, short_memory)
    
        return short_memory
    
    
    
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters())
    
    
    def training_step(self, batch, batch_idx):
        input_i , label_i = batch
        output_i = self.forward(input_i[0])
        
        loss_fn = torch.nn.MSELoss()
        loss = loss_fn(output_i, label_i)
        
        self.log("Training Loss", loss)
        
        if label_i==0:
            self.log("out_0", output_i)  #Very bruteforce way to check if we predicted for company first or second
        else:
            self.log("out_1", output_i)
            
        return loss

In [2]:
# %%writefile modules/training.py
import torch
import torch.nn as nn
from modules.LSTM_Architecture import *
from torch.utils.data import TensorDataset, DataLoader
import warnings
warnings.filterwarnings("ignore")


model = LSTM()

#Before Training -> 
print("Before Training ===>\n\n\n")
print("Comparing Observed and Predicted Value ==>\n\nCompany A => Observed -> 0 & Predicted ->",model(torch.tensor([0.0,0.5,0.25,1.0])).detach(),"\n\nCompany B => Observed -> 0 & Predicted ->",model(torch.tensor([1.0,0.5,0.25,1.0])).detach())


#Training -> 
inputs = torch.tensor([[0.0,0.5,0.25,1.0],[1.0,0.5,0.25,1.0]])
labels = torch.tensor([0.0,1.0])

dataset = TensorDataset(inputs,labels)
dataloader = DataLoader(dataset=dataset)

trainer = pl.Trainer(max_epochs=20001)

trainer.fit(model=model, train_dataloaders=dataloader)
print("After Training ===>\n\n\n")
print("Comparing Observed and Predicted Value ==>\n\nCompany A => Observed -> 0 & Predicted ->",model(torch.tensor([0.0,0.5,0.25,1.0])).detach(),"\n\nCompany B => Observed -> 0 & Predicted ->",model(torch.tensor([1.0,0.5,0.25,1.0])).detach())

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 3050 Ti Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Before Training ===>



Comparing Observed and Predicted Value ==>

Company A => Observed -> 0 & Predicted -> tensor(0.0888) 

Company B => Observed -> 0 & Predicted -> tensor(0.1018)



  | Name         | Type | Params | Mode
---------------------------------------------
  | other params | n/a  | 12     | n/a 
---------------------------------------------
12        Trainable params
0         Non-trainable params
12        Total params
0.000     Total estimated model params size (MB)
0         Modules in train mode
0         Modules in eval mode


Epoch 20000: 100%|██████████| 2/2 [00:00<00:00, 120.16it/s, v_num=0]

`Trainer.fit` stopped: `max_epochs=20001` reached.


Epoch 20000: 100%|██████████| 2/2 [00:00<00:00, 87.81it/s, v_num=0] 
After Training ===>



Comparing Observed and Predicted Value ==>

Company A => Observed -> 0 & Predicted -> tensor(-7.3200e-06) 

Company B => Observed -> 0 & Predicted -> tensor(0.9977)


Expected -> 0, 1  & Predicted -> -0.000007 , 0.9977 -> Very Accurate :)

In [15]:
#LSTM Implementation using PyTorch's Inbuilt Class -->

import pytorch_lightning as pl
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import warnings
warnings.filterwarnings("ignore")

class LSTMAuto(pl.LightningModule):
    
    def __init__(self):
        super().__init__()

        self.lstm = nn.LSTM(input_size = 1, hidden_size = 1, batch_first = True)
        
    def forward(self, input):
        
        input_trans = input.reshape(len(input),1)
        
        lstm_output , temp = self.lstm(input_trans)
        
        prediction = lstm_output[-1]
        
        return prediction
    
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr = 0.1)

    
    def training_step(self, batch, batch_idx):
        input_i , label_i = batch
        output_i = self.forward(input_i[0])
        
        loss_fn = torch.nn.MSELoss()
        loss = loss_fn(output_i, label_i)
        
        self.log("Training Loss", loss)
        
        if label_i==0:
            self.log("out_0", output_i)  #Very bruteforce way to check if we predicted for company first or second
        else:
            self.log("out_1", output_i)
            
        return loss
    
    
model2 = LSTMAuto()

print("Before Training ===>\n\n\n")
print("Comparing Observed and Predicted Value ==>\n\nCompany A => Observed -> 0 & Predicted ->",model2(torch.tensor([0.0,0.5,0.25,1.0])).detach(),"\n\nCompany B => Observed -> 0 & Predicted ->",model2(torch.tensor([1.0,0.5,0.25,1.0])).detach())

inputs = torch.tensor([[0.0,0.5,0.25,1.0],[1.0,0.5,0.25,1.0]])
labels = torch.tensor([0.0,1.0])

dataset = TensorDataset(inputs,labels)
dataloader = DataLoader(dataset=dataset)


trainer = pl.Trainer(max_epochs=1001, log_every_n_steps=5)

trainer.fit(model=model2, train_dataloaders=dataloader)
print("After Training ===>\n\n\n")
print("Comparing Observed and Predicted Value ==>\n\nCompany A => Observed -> 0 & Predicted ->",model2(torch.tensor([0.0,0.5,0.25,1.0])).detach(),"\n\nCompany B => Observed -> 0 & Predicted ->",model2(torch.tensor([1.0,0.5,0.25,1.0])).detach())


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name | Type | Params | Mode 
--------------------------------------
0 | lstm | LSTM | 16     | train
--------------------------------------
16        Trainable params
0         Non-trainable params
16        Total params
0.000     Total estimated model params size (MB)
1         Modules in train mode
0         Modules in eval mode


Before Training ===>



Comparing Observed and Predicted Value ==>

Company A => Observed -> 0 & Predicted -> tensor([-0.1441]) 

Company B => Observed -> 0 & Predicted -> tensor([-0.1091])
Epoch 1000: 100%|██████████| 2/2 [00:00<00:00, 148.31it/s, v_num=4]

`Trainer.fit` stopped: `max_epochs=1001` reached.


Epoch 1000: 100%|██████████| 2/2 [00:00<00:00, 102.19it/s, v_num=4]
After Training ===>



Comparing Observed and Predicted Value ==>

Company A => Observed -> 0 & Predicted -> tensor([2.8894e-06]) 

Company B => Observed -> 0 & Predicted -> tensor([0.9951])


Great accuracy in lesser epochs as the LSTM model weights in the inbuilt library are more optimized.

In [19]:
!tensorboard --logdir lightning_logs --port 6008