# Simple Stock Price Prediction Model
Here I will be implementing a simple MLP neural network for stock price prediction.  My GitHub profile is htjames0 and all code and files can be found in the repository. 

In [2]:
import tensorflow as tf
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

Init Plugin
Init Graph Optimizer
Init Kernel


In [3]:
data = pd.read_csv('AMD.csv')
#features will be Open, High, Low, and Volume
#label will be Close

#dividing the data set into train and test data
train_end = data[data.Date=='2021-01-04'].index[0]
train_data = data.iloc[:train_end,:]
test_data = data.iloc[train_end:,:]

# Data Methods

Training Data Method: 
This method will take parameters window and consecutive as inputs and will return the training data sets, x_train and y_train.  The window parameter is an integer that is used to determine how many days to look back in order to predict the stock price for the next day.  The consecutive parameter is a boolean variable used to determine which method the training data will be generated.  The first way to generate the training data will be to use the window and then slide it along in time, incrementing by 1 day each iteration, but not changing the size of the window. Almost similar to convolution of an image. The second way to generate this training data will be to use the window and then have every iteration build on that window by one day.  Thus, the train data is gathered over consecutive days and forms a long term vision of the network.  The comments in the code will give a better description of each step and what that piece of code does for the entire method.  

In [4]:
def getTrainData(data, window=90, consecutive=False):

    #need to divide training data to account for the prediction days
    #i.e. how many days to look back in order to predict the next day
    #I plan to expirement with several windows - 15, 30, 45, 90 but 
    #starting out with 90
    days = window
    
    #manipulating data so to add a correct label for the features
    #by creating a 3D array

    #itializing feature and label arrays
    x_train = []
    y_train = []

    #iterating from the start of the window to the length of the data 
    #to get a window of days for data points and making the label that last
    #day of that specific window
    if not consecutive:
        for i in range(days, len(train_data)):
            x_train.append(train_data.iloc[i-days:i,[1,2,3,6]])
            y_train.append(train_data.iloc[i,4])
    
    #looking back x days to start with to predict the next day but after the 
    #first iteration the number of days to look back in order to predict the next
    #day grows by 1    
    else: 
        for i in range(days, len(train_data)):
            x_train.append(train_data.iloc[0:i,[1,2,3,6]])
            y_train.append(train_data.iloc[i,4])   

    #converting to numpy array     
    x_train = np.array(x_train)
    y_train = np.array(y_train)

    #3D array in python - (layer, row, column)
    #layer is each iteration of the loop, 1169
    #row is number of days used to predict next day, 90
    #column is the feature, Open, High, Low, or Volume 
    x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 4))
    
    return x_train, y_train

Test Data Method: This method will take parameters data and window and return the x_test and y_test data. This data will be used to test how accurate the model is in predicting the stock price of the next day.  The window parameter is  an integer that is used to determine how many days to look back in order to predict the stock price for the next day.  Data is the prepartitioned stock price data. Within the method, the data will be manipulated to the same format as the training data, i.e. a 3D dataframe with layers, rows, and columns sized accordingly. The layers represent the different iterations of the loop that will look back window days to predict the price of the next day.  The columns of dataframe are the features that will be fed into the model.  The number of rows will be the same length as the window.  

In [None]:
def getTestData(data, window=90):
    

In [5]:
def MLPmodel(features, step_size, hidden_neurons=5, act='relu', loss_fxn='mean_squared_error'):
    
    #method inputs
        #features - feature data used to get the number of input layer nodes
        #step-size - for optimizer
        #hidden_neurons - number of neurons in hidden layer, default of five
        #act - activation function, default ReLu function
        #loss_fxn - loss function, default as mean squared error
    
    #defining model
    model = tf.keras.models.Sequential()
   
    #layers
    model.add(tf.keras.layers.Dense(units=hidden_neurons,
                                    input_shape=len(features.columns),
                                    activation='relu')
                                   )
    model.add(tf.keras.layers.Dense(units = 1,
                                    activation=act)
                                   )
    
    #optimizer
    opt = tf.keras.optimizers.Adam(learn_rate=step_size)
    
    #compile - loss fxn MSE
    model.compile(optimizer=opt,
                  loss=loss_fxn,
                  metrics=[tf.keras.metrics.RootMeanSquaredError()])
   
    return model 

In [39]:
#train function
def train_model(model, feature, label, epochs, batch_size):

    #feeding features and labels to model, model iterates epoch number of times
    #using batch_size number of data points per iteration
    history = model.fit(x=feature,
                        y=label,
                        batch_size=batch_size,
                        epochs=epochs)

    #weights and bias
    trained_weight = model.get_weights()[0]
    trained_bias = model.get_bias()[1]

    #historical data of model for each epoch
    epochs = history.epoch
    hist = pd.DateFram(history.history)

    #rmse for each epoch
    rmse = hist["root_mean_squared_error"]

    return trained_weight, trained_bias, epochs, rmse

In [13]:
#predict function
def predict()
    