# Stock price prediction and forcasting using LSTM
Long Short Term Memory networks – usually called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies.

## Created LSTM network for stock price prediction and forecasting

NOTE: Required data is stored in IPython's database<br>

### Steps for creating LSTM network

Step 1 : Read required data from IPython database, here we trained model using "open_value" of the stock<br>
Step 2 : As LSTM are sensitive to the scale of the data so we apply MinMaxScaler here<br>
Step 3 : Pre-process the data and create training and testing datasets, we used 60% of historical data for training our model as we got best performance using this<br>
Step 4 : Create LSTM network<br>
Step 5 : Train and test the model and plot the results<br>
Step 6 : Evaluate the performance of the network for training and test dataset<br>
Step 7 : Forecast stock price for next 30 days<br>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math

from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense,LSTM
from sklearn.metrics import mean_squared_error

In [None]:
# Read data, commenting this as it will be executed in Main
# %run PrepareData.ipynb

In [None]:
tsla_feature_df = tsla_stock_values_df["open_value"]
# tsla_feature_df.info()

In [None]:
# LSTM are sensitive to the scale of the data so we apply MinMax scaler 
minmax_scaler = MinMaxScaler(feature_range = (0,1))
tsla_feature_df = minmax_scaler.fit_transform(np.array(tsla_feature_df).reshape(-1,1))

In [None]:
# This function is used to pre-process the data, where in we are converting data into dependent and independent data based on timestamp
def create_model_dataset(dataset,time_step):
    XData, YData = [],[]    
    # print(len(dataset)-time_step-1)
    for i in range(len(dataset)-time_step-1):
        t = i + time_step
        data = dataset[i : i+time_step,0]
        XData.append(data)        
        YData.append(dataset[i+time_step,0])
    return np.array(XData),np.array(YData)   


In [None]:
train_data_size = int (len(tsla_feature_df) * .6)
test_data_size = len(tsla_feature_df) - train_data_size

train_data, test_data = tsla_feature_df[0:train_data_size,:],tsla_feature_df[train_data_size:len(tsla_feature_df),:1 ]

# reshape into X=t,t+1,t+2,t+3 and Y=t+4
time_step = 100
X_train, y_train = create_model_dataset(train_data,time_step)
X_test, y_test = create_model_dataset(test_data,time_step)


In [None]:
# for LSTM we need to reshape our X_train and X_test to be [samples, time steps and batch_size]
X_train = X_train.reshape(X_train.shape[0],X_train.shape[1],1)
X_test = X_test.reshape(X_test.shape[0],X_test.shape[1],1)

In [None]:
# Create LSTM model
model = Sequential()
model.add(LSTM(50,return_sequences=True,input_shape=(time_step,1)))
model.add(LSTM(50,return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
model.compile(loss='mean_squared_error',optimizer= 'adam')
model.summary()

In [None]:
# Train the model
model.fit(X_train,y_train,validation_data=(X_test,y_test), epochs=100,batch_size=64,verbose=1)

In [None]:
# Make Predictions using trained model
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)


In [None]:
#Transform predicted data to original form
train_predict = minmax_scaler.inverse_transform(train_predict)
test_predict= minmax_scaler.inverse_transform(test_predict)

In [None]:
# Calculate RMSE for both train data
math.sqrt(mean_squared_error(y_train,train_predict))

In [None]:
# Calculate RMSE for test data
math.sqrt(mean_squared_error(y_test,test_predict))

In [None]:
# Plot for comparing actual data with train predictions, test predictons
# as we considers time step of 100, going back 100 days to better visualistion

back_time = time_step
# shift train predictions

trainpredict_plt = np.empty_like(tsla_feature_df)
trainpredict_plt[:,:] = np.nan
trainpredict_plt[back_time: len(train_predict) + back_time ,: ] = train_predict

# shift test predictions
testpredict_plt = np.empty_like(tsla_feature_df)
testpredict_plt[:,:] = np.nan
testpredict_plt[len(train_predict)+(back_time*2)+1: len(tsla_feature_df)-1,: ] = test_predict

plt.plot(minmax_scaler.inverse_transform(tsla_feature_df),label="Historical Price")
plt.plot(trainpredict_plt, label="Train Predictions")
plt.plot(testpredict_plt,label="Test Predictions")
plt.ylabel("Open Price")
plt.xlabel("Number of days")
plt.legend()
plt.title("LSTM performance for TSLA stock historical data")
plt.savefig("Resources/Images/LSTMTrain.png")
plt.show()

In [None]:
x_input = test_data[test_data_size-time_step:].reshape(1,-1)
x_moving_input = list(x_input)
x_moving_input=x_input[0].tolist()


In [None]:
# predictions for next 30 days
num_days = 30
output_l = []
count = 0
while(count < num_days):
    if(len(x_moving_input) > time_step):
        x_input=np.array(x_moving_input[1:])
        # print("{} day input {}".format(count,x_input))
        x_input = x_input.reshape(1,-1)
        x_input = x_input.reshape(1,time_step,1)
        yhat = model.predict(x_input, verbose=0)
        # print("{} day input {}".format(count,yhat))
        x_moving_input.extend(yhat[0].tolist())
        x_moving_input = x_moving_input[1:]
        output_l.extend(yhat.tolist())
    else:
        x_input = x_input.reshape(1,time_step,1)
        yhat = model.predict(x_input, verbose=0)
        # print(yhat[0])
        x_moving_input.extend(yhat[0].tolist())        
        # print(len(x_moving_input))
        output_l.extend(yhat.tolist())
        
    count = count +1
    
# print(output_l)

In [None]:
day_new=np.arange(1,time_step+1)
day_pred=np.arange(time_step+1,time_step+num_days+1)

plt.plot(day_new,minmax_scaler.inverse_transform(tsla_feature_df[len(tsla_feature_df)-time_step:]),label="Historical Price")
plt.plot(day_pred,minmax_scaler.inverse_transform(output_l),label="Predicted Price")
plt.ylabel("Open Price")
plt.xlabel("Number of days")
plt.legend()
plt.title("TSLA stock historical & forecasted open price\nNext 30 days price forecasted using LSTM")
plt.savefig("Resources/Images/LSTMPredict.png")
plt.show()

# LSTM Analysis

Model performed really well on training and testing data. The difference between training and testing root mean squared error is just 8.71<br>
<br>
Root mean squared error for training data is 17.98<br>
Root mean squared error for testing data is 26.69<br>