# Predicting the Sentiment of IMDB Movie Reviews using LSTM in PyTorch
> This is a practice notebook.

- toc: true 
- badges: true
- comments: true
- categories: [pytorch, lstm]
- keyword: [ml, dl, nn, pytorch, LSTM, IMDB, sentiment]
- image: images/copied_from_nb/images/2022-11-09-pytorch-lstm-imdb-sentiment-prediction.jpeg

![](images/2022-11-09-pytorch-lstm-imdb-sentiment-prediction.jpeg)

## Credits
This notebook takes inspiration and ideas from the following sources.
* The outstanding book "Deep Learning with PyTorch Step-by-Step" by "Daniel Voigt Godoy". You can get the book from its website: [pytorchstepbystep](https://pytorchstepbystep.com/). In addition, the GitHub repository for this book has valuable notebooks: [github.com/dvgodoy/PyTorchStepByStep](https://github.com/dvgodoy/PyTorchStepByStep). Parts of the code you see in this notebook are taken from [chapter 3](https://colab.research.google.com/github/dvgodoy/PyTorchStepByStep/blob/master/Chapter03.ipynb) and [chapter 8](https://github.com/dvgodoy/PyTorchStepByStep/blob/master/Chapter08.ipynb) notebooks of the same book.
* Very helpful Kaggle notebook from 'TARON ZAKARYAN' to predict stock prices using LSTM. [Link here](https://www.kaggle.com/code/taronzakaryan/predicting-stock-price-using-lstm-model-pytorch/notebook)

## Environment
This notebook is prepared with Google Colab.

In [1]:
#collapse
from platform import python_version
import numpy, matplotlib, pandas, torch, seaborn

print("python==" + python_version())
print("numpy==" + numpy.__version__)
print("torch==" + torch.__version__)
print("matplotlib==" + matplotlib.__version__)
print("seaborn==" + seaborn.__version__)

python==3.7.15
numpy==1.21.6
torch==1.12.1+cu113
matplotlib==3.2.2
seaborn==0.11.2


## Introduction
Recurrent Neural Network (RNN) is great for exploiting data that involves one-dimensional (1D) ordered structures. We call these 1D-ordered structures `sequences`. Two main sequence problems are `Time series` and `Natural Language Processing (NLP)`. RNN and its variants are developed to work for both types of sequence problems, but in this notebook we will only deal with time series sequences.

I have divided this notebook into two sections. In the first section, our focus will be on understanding the structure of sequences and generating training sets and batches from them. We will develop a simple (synthetic) sequence data and then create its training set. Next, we will make batches using PyTorch DataLoaders and write a training pipeline. We will end this section by training an RNN on this data. 

In the next section, our focus will be more on the internals of different neural architectures for sequence data problems. We will use stock price data and train multiple networks (RNN, GRU, LSTM, CNN) on it while understanding their features and behavior.