# Introduction


In this sample, we're going to try to predict the upward and downward trends that exist in the Google stock price.
We implement this application by using a Recurrent Neural Network (RNN) with Keras.


Indeed, there is a Brownian motion that states that the future evaluations of the stock price are independent
from the past. So it's actually impossible to predict exactly the future stock price. Otherwise we would all become billionaires. 

But it's actually possible to predict some trends.



We're going to train our data, given in the current directory, on five years of the Google stock price, from the beginning of 2012 to the end of 2016. And then, we will try to predict the stock price of the first month of 2017.

Again, we're only going to try to predict the trend the upward or downward trend of the global stock price.


# Look at the training data and test data

In [None]:
# Training data, given in "Google_stock_price_train.csv", and its graph

![image.png](attachment:image.png)


As shown as above, it is the trailing data of Google stock price from 2012 to 2016. By using the training data, we're going to try to predict the open Google stock price at the beginning of the financial day 2017.



In [None]:
# test data, given in "Google_stock_price_test.csv", and its the graph. 
# That's we have to pretict?!

![image.png](attachment:image.png)


As indicated as above, Google stock price test that contains the same columns as training data.  However, we're only interested in "grey color" one, the Open Google stock price for the first month of January 2017, where Saturday or Sunday are not included. That's why here we start from January 3rd. As a result, we only have 20 days in this financial month.

# Data Preprocessing

In [None]:
 # Recurrent Neural Network

# Part 1 - Data Preprocessing

# Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


In [None]:
# Importing the training set

dataset_train = pd.read_csv('Google_Stock_Price_Train.csv')
training_set = dataset_train.iloc[:, 1:2].values



Feature Scaling

In [None]:
# Two ways for feature scaling

![image.png](attachment:image.png)

In [None]:
# We will use "normalisation", so all the vaules of training data will be in the range [0, 1]

In [None]:
# Feature Scaling

from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
training_set_scaled = sc.fit_transform(training_set)

This is a supervised machine learning.

To organise input data and its output data for training, we need to create a data structure with 60 timesteps and 1 output


60 time steps present the past information from which our RNN is going to learn and understand
some correlations or some trends. Based on its understanding, it tries to predict the next output,
the stock price at time (t+1).

Why 60 timesteps?

Based on the experience:

One timestep was completely stupid, as the model was not learning anything.
Then 20 timessteps was not enough to be able to capture some trends than 30 - 40 days and eventually,
the best number of times steps I ended up with was 60.

The 60 time steps correspond of course to the 60 previous financial days, since there are 20 financiall days
in one month. So, 60 timessteps correspond to three months.

So that means that each day we're going to look at the three previous months to try to predict the stock
price of the next day.

So we're going to have 60 time steps and one output which will be the stock price at time (t+1).


Now, we need to create two separate entities.
The first entity is going to be X_train, the input of the neural network and then 
the second entity will be white train_y, the output.

It's a special data structure, of course, we need to do it for every time.
So let's initialize X_train and y_train as empty list at beginning.


In [None]:

# Creating a data structure with 60 timesteps and 1 output

X_train = []
y_train = []
for i in range(60, 1258): # for three years data, we have 1258 instance
    X_train.append(training_set_scaled[i-60:i, 0])
    y_train.append(training_set_scaled[i, 0])
    
    # convert them to numpy's array format for data pre-processing before training them
    
X_train, y_train = np.array(X_train), np.array(y_train)



In [None]:
# Any time you want to add a dimension in a number array you always need to use the reshape function.


In [None]:
# Reshaping

# The last argument is the number of indicators, 
# a number of predictors which is what the Open Google suppress.

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

# if you want to see more details about reshape, using help (reshape)

In [None]:
help(np.reshape)

# Building the RNN

In [None]:

# Building the RNN

# Importing the Keras libraries and packages

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout


In [None]:

# Initialising the RNN, as we did for building ANN and CNN samples as before.
# This time we're predicting a continued value 
# and therefore we called the neural network as "regressor"

regressor = Sequential()


In [None]:
# Adding the first LSTM layer and some Dropout regularisation

regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))

# as lectured, in order to solve "Vanishing Gradient problem", we use "Long Short-Term Memory units) (LSTM)
# you may look at LSMT() arguments from


In [None]:
help(LSTM)

In [None]:
# Another function, Dropout()

# It consists in randomly setting a fraction `rate` between 0 and 1, here is 0.2.
# So, the fraction of input units will be dropped at each update during training time.
# This helps to prevent overfitting.



In [None]:
# help(Dropout)

In [None]:

# Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

# Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))


The next is how to add the output layer to our RNN.

Here, we use Dance class.

Now, we have to input one arguments, as  only one predictore corresponds to the number of neurons we need.
Our stock price at "time t+1", which is exactly what we have to predict, as the output of the RNN.


In [None]:

# Adding the output layer
regressor.add(Dense(units = 1))


In [None]:

# Compiling the RNN, similar to ANN and CNN samples
# I thnk, you know what's the meaming of 'adam', as dicussed before

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN 

In [None]:

# Fitting the RNN to the Training set

regressor.fit(X_train, y_train, epochs = 100, batch_size = 32) # 100, 32


# why epochs =100 and bach size = 32 ?

# You may try smaller number of them, gradually increase them and see what happens.
# It takes time.
# In order to achieve the best results, you need to tune the values of the parameters.

# It will take quite while to train the dataset



So, we're getting closer and closer to visualize our predictions compared to the real Google stock price of the first financial month of 2017.

But, we have to complete three steps:

Step 1: to get the real Google stock price of 2017.

Step 2: to get the predicted Google stock price of 2017.

The last step:  we will visualize the results.

In [None]:
# real data as shown before

![image.png](attachment:image.png)

# Making the predictions and visualising the results

In [None]:

# Getting the real stock price of 2017

dataset_test = pd.read_csv('Google_Stock_Price_Test.csv') # actually its test data
real_stock_price = dataset_test.iloc[:, 1:2].values




In order to predict each day' stock price of January 2017, we need the 60 stock prices of the previous 60 days before the actual day.

We will need both the training set and the test.

Now,  way we're going to make "concatenation", which links the training set together, containing the real Google stock prices from 2012 to the end of 2016.

From this concatenation, we'll get the inputs of each prediction as well. That is the 60 preview stock prices at each time t.


Then, we will scale them.



In [None]:
# Getting the predicted stock price of 2017

dataset_total = pd.concat((dataset_train['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)

After gathering the right inputs with the right scaling for our predictions of the January 2017 stock prices,
we are ready to make this special 3D structure expected by the RNN for the training, but also for the predictions.


we will make the proper changes and then, of course, we will do this reshaping.

But first let's make this special structure where we have in each line the 60 breeding stock prices that we need to predict the next stock prices.



In [None]:
X_test = []
for i in range(60, 80):
    X_test.append(inputs[i-60:i, 0])
X_test = np.array(X_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
predicted_stock_price = regressor.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)

# Visualising the results

In [None]:

# Visualising the results

plt.plot(real_stock_price, color = 'red', label = 'Real Google Stock Price')
plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Google Stock Price')
plt.legend()
plt.show()


END