# Apple Stock Prediction

In this notebook, we are going to use a recurrent neural network in order to predict the Apple stock price. Thought this notebook, you will see the different steps for visualising the data, the preparation of them and the creation of our neural network. 

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import math
import matplotlib.pyplot as plt
import tensorflow as tf 

from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout, GRU, Bidirectional

from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler

## Visualize our data

In [None]:
df_apple = pd.read_csv("../input/apple-stock-data-updated-till-22jun2021/AAPL.csv")
df_apple.head()

In [None]:
# Set the Date format
df_apple['Date'] = pd.to_datetime(df_apple['Date'], format="%Y-%m-%d")

In [None]:
# Compute the average price from the highest and lowest
high_prices = df_apple.loc[:,'High'].to_numpy()
low_prices = df_apple.loc[:,'Low'].to_numpy()
mid_prices = (high_prices + low_prices) / 2.0

print("Size of our data : ", len(mid_prices))

In [None]:
# Show only the data
from matplotlib.dates import DateFormatter
formatter = DateFormatter('%Y-%m-%d')

plt.figure(figsize = (18, 9))
plt.plot(range(df_apple.shape[0]), mid_prices)
plt.xticks(range(0, df_apple.shape[0], 500), df_apple['Date'].loc[::500], rotation=45)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Average price', fontsize=18)
plt.title('Average price of Apple\'s shares')

# Apply time format 
plt.gcf().axes[0].xaxis.set_major_formatter(formatter)

plt.show()

It seems that the Apple share price can be represented as an exponential function. Thus, if we want to continue the chart value, intuitively we must continue the exponentiel curve.

However, we can't see in the future, so we don't know if it will continue to growth or to collapse.

## Create our data sets

Before diving into the creation of our neural network, we need to create specific sets for the training, the validation and the testing.

In [None]:
mid_prices = mid_prices.reshape(-1, 1)

# Create our final test data
train_set = mid_prices[:7000]
valid_set = mid_prices[7000:8000]
test_set = mid_prices[8000:]

### Normalize the data

Once our sets created, we need to scaled our data between 0 and 1. Indeed, in order to use our data with neural network, we need to normalize our data. Also, it will allows the network to understand the price difference. Indeed, the prices of Apple shares have a huge gap between the first valuers and today's. By normalized them, we reduce the gap and our model can be more accurate.

In [None]:
# Normalize our data
sc = MinMaxScaler(feature_range=(0, 1))
sc_valid = MinMaxScaler(feature_range=(0, 1))
sc_test = MinMaxScaler(feature_range=(0, 1))

train_set_scaled = sc.fit_transform(train_set)
valid_set_scaled = sc_valid.fit_transform(valid_set)
test_set_scaled = sc_test.fit_transform(test_set)

### Generate our data

In order to predict the next price of the share, we are going to generate our input data and the corresponding output value. In order to predict the next price, we need to have references (previous price).

In this notebook, I decided to use 300 days as reference. This number is arbitrary. You could decide to increase or reduce it as you want. Note that it's important for the network to have enought data in order for him to have a better view of the market. If, for example, you only show him 10 previous prices, it could be difficult for him to see the trend of the curent market. 

In [None]:
def generate_data(data, window_size_input = 300):
    
    X = []
    y = []

    for i in range(window_size_input, len(data)):

        # Get the data 
        X_data = data[i - window_size_input : i, 0]
        y_data = data[i, 0]

        X.append(X_data)
        y.append(y_data)
        
    return np.array(X), np.array(y)

# Each analyze will be 300 days
# You coud change this parameter
WINDOW_SIZE_SEARCH = 300

X_train, y_train = generate_data(train_set_scaled, WINDOW_SIZE_SEARCH)
X_valid, y_valid = generate_data(valid_set_scaled, WINDOW_SIZE_SEARCH)

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
X_valid = np.reshape(X_valid, (X_valid.shape[0], X_valid.shape[1], 1))

## Model creation

For this model, we are going to use stacked GRU. It will take as input the 300 previous days, and output the next day.

Note : You could also decide to create a model that try to predict the next 5 days instead of only one (or more). In this approach, we only see the short term. Also, another possibility could be to based our approach by month instead of days. With that we keep our short term approach, but with a larger view (month instead of day).

In [None]:
def create_model(input_shape = 300):

    model = Sequential()
    model.add(Bidirectional(GRU(50, return_sequences=True, input_shape=(input_shape, 1))))
    model.add(Bidirectional(GRU(50, return_sequences=True)))
    model.add(Bidirectional(GRU(50, return_sequences=True)))
    model.add(Bidirectional(GRU(50)))
    model.add(Dropout(0.2))
    
    # Note : Here we can adjust the output number we want.
    model.add(Dense(units = 1))
    
    model.compile(optimizer='rmsprop', loss='mean_squared_error', metrics=['mae', 'mse'])
    
    return model 


model = create_model()
model.fit(X_train, 
          y_train, 
          epochs=5, 
          batch_size=32, 
          validation_data=(X_valid, y_valid))

### Predict on the validation set

Once the model trained, we are going to predict the next price for the validation set.

In order to visualize our data, we will have to add a padding of 300 value at the beginning. Indeed, as we need 300 input data, we will not predict the first 300 values. 

In [None]:
price_prediction = model.predict(X_valid)

price_error = mean_squared_error(y_valid, price_prediction)

print("Price error on the validation set : ", price_error)

# Transform back our data for the real price
price_prediction_valid = sc_valid.inverse_transform(price_prediction)

# Add padding from our data
zeros = np.zeros([300])
price_prediction = np.concatenate((zeros, price_prediction_valid), axis=None)

In [None]:
plt.figure(figsize=(15,15))
plt.plot(valid_set, color='red',label='Real Apple Stock Price')
plt.plot(price_prediction, color='blue',label='Predicted Apple Stock Price')
plt.title('Apple Stock Price Prediction on the validation set')
plt.xlabel('Time')
plt.ylabel('Apple Stock Price')
plt.legend()
plt.show()

### Predict on the test set

Do the same on the test set.

In [None]:
X_test, y_test = generate_data(test_set_scaled, WINDOW_SIZE_SEARCH)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

price_prediction_test = model.predict(X_test)


price_error = mean_squared_error(y_test, price_prediction_test)
print("Price error on the test set : ", price_error)

# Transform back our data
price_prediction_test = sc_test.inverse_transform(price_prediction_test)

zeros = np.zeros([300])
price_prediction_test = np.concatenate((zeros, price_prediction_test), axis=None)

plt.figure(figsize=(15,15))
plt.plot(test_set, color='red',label='Real Apple Stock Price')
plt.plot(price_prediction_test, color='blue',label='Predicted Apple Stock Price')
plt.title('Apple Stock Price Prediction on the test set')
plt.xlabel('Time')
plt.ylabel('Apple Stock Price')
plt.legend()
plt.show()

## Try to predict the next days

One of the question that we could ask is :

Is it possible to use our model recursivelly to see the future? As our model can see tomorrow, how far can our model see ?

In order to response this question, we are going to use our trained model and give him the first 300 days. Then, for each prediction our model get, we are going to add it to our input data. And we are going to repeat those actions until the end of our test set.

In [None]:
print("Size of our test set : ", len(test_set_scaled))

# Get as input the 300 days
input_data = X_test[0]

output = input_data

for data in range(300, len(test_set_scaled)):
    
    # Get the input data
    X_input = np.reshape(output[-300:], (1, 300, 1))
    
    # Make the prediction
    pred = model.predict(X_input)    
    
    # Add the prediction to our input data
    output = np.concatenate((output, pred))


In [None]:

price_error = mean_squared_error(y_test, output[300:])
print("Price error on the test set : ", price_error)
    
# Transform our data to the real price value
output_price_prediction = sc_test.inverse_transform(output)

In [None]:
plt.figure(figsize=(15,15))
plt.plot(test_set, color='red',label='Real Apple Stock Price')
plt.plot(output_price_prediction, color='blue',label='Predicted Apple Stock Price')
plt.title('Apple Stock Price Prediction On Consecutive Day')
plt.xlabel('Time')
plt.ylabel('Apple Stock Price')
plt.legend()
plt.show()

As you can see from the graph above, the predicted value don't reflect the truth.

I created this graph to show you that our model is used to predict the price for tomorrow. Not the price for the days that follow. Our model is specific for day prediction. If we want to predict a larger range of days, we have to increase the output of our model.
Also, another approach could be to use months instead of days as prediction. 

So please, if you use that kind of model, be sure to know in advance what you want to predict and the length of your prediction.

If you have questions, don't hesitate. Also, if you have interesting documentation/notebook that explain stock prediction, feel free to leave a comment and share it with us. 

In addition, you will find in the references a link to a notebook from which I was inspired by. Don't hesitate to check it out.

Hope this notebook help some of you. 

## References

https://www.kaggle.com/subbhashit/time-series-prediction-a-complete-guide