# Predicting stock prices using MLP and LSTM models

### Part 1: Introduction

The task of predicting stock market prices is challenging. Stock prediction is of interest to most investors due to its high volatility. Even now, some investors use a combination of technical and fundamental analysis to help them make better decisions about their equity market investments. 

A couple of previous studies have compared linear regression models to artificial neural networks (ANN) and showed that ANNs yielded a much higher profit. The main difference in this observation is that ANNs can identify non-linear patterns. The chaotic and non-linear nature of the stock markets makes ANNs more popular for predictions. 

For this project, we've decided to investigate various ANN models and compare them to see which has the best performance. In particular, we will look at multi-layer perceptron (MLP) and long short-term memory models (LSTM). 

### Part 2: Methodology

#### Section 1: Multilayer perceptron model (Model 1)

The first model is a multilayer perceptron model. In this model, we use the opening, highest, lowest price, and volume of Meta stocks to predict the closing price on the same day. 

In figure [NO], we've drawn out the simple structure of our model consisting of one input, hidden, and output layer. The forward propagation and backward propagation process for this model can be expressed as the following:

When building all our MLP models, we decided to use fully-connected or linear layers, along with the reLU activation function. Although our models are simple, we found that fully connected layers would serve our model better. The main reason for this is that fully connected layers would update all the weights in back-propogation since all neurons are connected. As for our activation function (AF) choice, we identified the advantages and limitations of AFs before deciding on the reLU function. A major downside of the reLU function is that it can't update weights during the back propagation if its inputs are negative. Fortunately for our case, stock prices can be low, but they will never be in the negative range. The main advantage of the reLU function is that it accelerates the convergence of stochastic gradient descent in compared to functions like sigmoid and tanh. 

#### Section 2: Multilayer perceptron model with the sliding window method (Model 2)

As an investor, predicting the closing price given today's information is not very useful. 

A good model should use the stock's past information to predict today's price. For starters, we can try to use the closing price of Meta in the last $t-1$ days to predict the closing price on the $t$-th day. We've decided to use the closing price of the last 14 days to predict the price on the day after the 14 days. Therefore, the second MLP model will have 14 inputs. This model aims to predict the closing price after 14 days. We can achieve this goal by using the sliding window method.

Sliding window method:

#### Section 3: RNN model with the sliding window method (Model 3)

#### Section 4: LSTM model with the sliding window method (Model 4)

Before we build our LSTM network, we need to explore recurrent neural networks (RNN). LSTM networks are an extension built to solve one of the limitations of RNNs. 

An RNN functions similarly to the way a human does. Humans learn new things based on their previous knowledge. They don't think from scratch in every instance. Using this example, RNNs address this memory issue by storing information about the past. This RNN feature makes it useful for stock market predictions. In technical analysis of stocks, we use past information on stocks like their opening, closing prices, volumes etc. In an RNN model, we can take information at time $t-1$ and concatenate it to our input at time $t$ to predict the stock's price at time $t+1$. In figure [NO], we've illustrated the structure of the RNN we've built. 

However, the limitation of RNNs is their vanishing gradient problem. As explained earlier, RNNs retain all information from the past. If we visualise this, every single neuron in the model has contributed to the output. When the model is updated, it would have to propagate back through all these neurons. The problem arises here when the model updates the weights. 

#### Results

#### Conclusion