<p style="color:#153462; 
          font-weight: bold; 
          font-size: 30px; 
          font-family: Gill Sans, sans-serif; 
          text-align: center;">
          RNN Implementation</p>

<p style="text-align: justify; text-justify: inter-word;">
   <font size=3>
       Using Recurrent Neural Network we are going to predict stock price of Google
   </font>
</p>

### <span style="color:#3C4048; font-weight: bold; font-size: 18px; font-family: Gill Sans, sans-serif;">Importing Required Packages</span>

In [1]:
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout

### <span style="color:#3C4048; font-weight: bold; font-size: 18px; font-family: Gill Sans, sans-serif;">Data Reading</span>

In [2]:
start_date = '2017-01-01'
end_date = '2022-12-31'
df = yf.download('GOOG', start=start_date, end=end_date)

[*********************100%%**********************]  1 of 1 completed


In [3]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-01-03,38.940498,39.481499,38.790001,39.306999,39.306999,33146000
2017-01-04,39.417999,39.567001,39.158001,39.345001,39.345001,21460000
2017-01-05,39.304001,39.723999,39.250999,39.701,39.701,26704000
2017-01-06,39.763,40.395,39.610199,40.307499,40.307499,32804000
2017-01-09,40.32,40.498299,40.141499,40.3325,40.3325,25492000


In [4]:
training_set = df.iloc[:, 0:1].to_numpy() # Doing this way is important as it returns 2D array
training_set

array([[38.94049835],
       [39.41799927],
       [39.30400085],
       ...,
       [87.5       ],
       [87.02999878],
       [87.36499786]])

In [5]:
# Scaling the data. It will scale values between 0 to 1
sc = MinMaxScaler(feature_range=(0, 1))
training_set_scaled = sc.fit_transform(training_set)
training_set_scaled

array([[0.        ],
       [0.00422855],
       [0.00321903],
       ...,
       [0.43002314],
       [0.425861  ],
       [0.42882762]])

### <span style="color:#3C4048; font-weight: bold; font-size: 18px; font-family: Gill Sans, sans-serif;">Creating Dataset</span>

Creating a data structure with 60 timestamp steps and with 1 ouput

In [6]:
# Demonstrating with simple code how it looks like
X_demo = []
y_demo = []
nums = list(range(0, 8)) # 0 to 7 numbers
for i in range(2, len(nums)):
    X_demo.append(nums[i-2:i])
    y_demo.append(nums[i])

In [7]:
X_demo

[[0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]

In [8]:
y_demo

[2, 3, 4, 5, 6, 7]

In [9]:
# Apply same strategy on the scaled dataset
X_train = []
y_train = []
for i in range(60, len(training_set_scaled)):
    # Taking 60 rows and selecting 0 column so that it well get an array of 60 elements.
    X_train.append(training_set_scaled[i-60:i, 0])
    y_train.append(i)

In [10]:
X_train = np.array(X_train)
y_train = np.array(y_train)

In [11]:
print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of y_train: {y_train.shape}")

Shape of X_train: (1450, 60)
Shape of y_train: (1450,)


In [12]:
X_train.shape

(1450, 60)

In [13]:
# Reshaping the trining array
X_train=np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_train.shape

(1450, 60, 1)

### <span style="color:#3C4048; font-weight: bold; font-size: 18px; font-family: Gill Sans, sans-serif;">Building the RNN</span>

In [14]:
# Initializing the RNN
regressor = Sequential()

Below is the description about the parameter of <b>LSTM</b>:
<p style="text-align: justify; text-justify: inter-word;">
   <font size=3>
       <b>units=50:</b><br>
       This parameter specifies the number of LSTM units (or neurons) in the layer. In this case, there are 50 LSTM units in the layer. 
       These units are responsible for capturing and learning patterns in the sequential data.
       <br><br>
       <b>return_sequences=True:</b><br>
       This parameter determines whether to return the full sequence of outputs for each timestep or just the last output. When return_sequences
       is set to True, the layer will output the full sequence, and when set to False (or omitted, as it defaults to False), only the last output
       for each input sequence will be returned.
       <br><br>
       <b>input_shape=(X_train.shape[1], 1):</b><br>
       This parameter specifies the shape of the input data that the LSTM layer will receive. In this case, the input shape is set to
       (X_train.shape[1], 1), where X_train.shape[1] represents the number of timesteps in each input sequence, and 1 represents the number
       of features at each timestep. The reshaping of the input data is done using np.reshape to match this input shape requirement.
   </font>
</p>


In [15]:
regressor.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))

<p style="text-align: justify; text-justify: inter-word;">
   <font size=3>
       The <b>Dropout</b> layer is a regularization technique used in neural networks to prevent overfitting. Overfitting occurs when a model 
       learns the training data too well, including its noise and specific patterns that may not generalize well to new, unseen data.Dropout helps
       address this issue by randomly "dropping out" (i.e., setting to zero) a fraction of the input units (neurons) during training. This prevents
       the network from relying too much on specific neurons and helps it generalize better to new data.
   </font>
</p>

In [16]:
regressor.add(Dropout(0.2))

In [17]:
# Adding two more layer with return_sequence is True
regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.2))

In [18]:
# Layer without return_sequence
regressor.add(LSTM(units=50))
regressor.add(Dropout(0.2))

In [19]:
# Adding the last layer
regressor.add(Dense(units=1))