# Stock Price Forecasting
Goals: 
- Use historical data to create a machine learning algorithm to predict future stock price
- Train and validate model to improve precision and accuracy

## Background
### RNN's (Recurrent Nueral Netowrks)
- designed to function like the frontal lobe of the brain
- this is responsible for short term memory, and can recall near past to make decisions for the immediate future

### LSTM Models
- We will be working with an LSTM model by KERAS a tensorflow application

## Part 1 ETL Layer
1) Extract: Query data from Yahoo Finance
- If you would like to do this we covered this in the last meeting -- however for everyones convienence I have inlucded a CSV of Microsoft's historical stock data
- This data is from the Yahoo Finance API

### Interpreting and manipulating data
- Stocks can be classified into three groups
    1. Open
    2. High
    3. Low
    4. Close
- We will ignore the `Adj Close` and `Volume` columns for this project

In [37]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
training_set = pd.read_csv('MSFT-2.csv')
# transform the data to a numpy array
training_set = training_set.iloc[:, 1:2].values
training_set[0:5]

array([[235.059998],
       [241.300003],
       [239.570007],
       [242.660004],
       [242.229996]])

### Feature Scaling
1. Normalization of Inputs: Feature scaling in LSTMs helps to normalize the input data such that all the features are on a similar scale. This is important in LSTMs because the model uses activation functions like sigmoid or tanh that operate best on inputs in the range of -1 to 1.
<p align=center>
*tanh function*
</p>


$$
\begin{align}

    f(x) = tanh(x) = \frac{2}{1+e^{-2x}}-1 \\

\end{align}
$$

<p align=center>
*sigmoid function*
</p>


$$
\begin{align}
    \phi(z) = \frac{1}{1+e^{-z}} \\
\end{align}
$$
2. Improved Training Speed: Normalizing inputs through feature scaling can speed up the training process of an LSTM. It helps to avoid slow convergence and the vanishing gradient problem, making it easier for the model to learn from the data.
3. Better Generalization: Feature scaling can help improve the generalization ability of an LSTM by making the model less sensitive to the scale of the input features. This can lead to better performance on unseen data.
4. Consistent Interpretation: Normalizing inputs through feature scaling can help maintain consistent interpretations of feature importance across different features, allowing for better interpretation of the model's predictions.

In [39]:
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
training_set_scaled = sc.fit_transform(training_set)
training_set_scaled[0:5]

array([[0.0705802 ],
       [0.12351543],
       [0.10883952],
       [0.1350526 ],
       [0.13140475]])

### Splitting into X and Y training
- cannot have x go to the end and cannot have y start at the first index
- cannot include the 0th day prediction 

In [44]:
# find the number of rows to train the model on
training_set_scaled.shape
X_train = training_set_scaled[0:252]
Y_train = training_set_scaled[1:253]
print(f'X train is \n {X_train[0:5]}')
print(f'Y train is \n {Y_train[0:5]}')


X train is 
 [[0.0705802 ]
 [0.12351543]
 [0.10883952]
 [0.1350526 ]
 [0.13140475]]
Y train is 
 [[0.12351543]
 [0.10883952]
 [0.1350526 ]
 [0.13140475]
 [0.13920928]]


In [46]:
# Reshaping by adding a time dimension to the data
X_train = np.reshape(X_train, (252, 1, 1))


In [48]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
regressor = Sequential()
regressor.add(LSTM(units = 4, activation = 'sigmoid', input_shape = (None, 1)))
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, Y_train, batch_size = 32, epochs = 200)

2023-01-31 18:24:37.055160: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-31 18:24:45.011116: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.callbacks.History at 0x142445960>

### Your Task:
1. Collect data on Microsofts Historical Stock data (need a testing dataset)
2. Transform the data in the same way as the training data

In [None]:
# Getting the real stock price of MSFT
test_set = pd.read_csv(None) #TODO: replace None with the path to the test set
real_stock_price = test_set.iloc[:, 1:2].values #transforming the data to a numpy array
inputs = sc.transform(real_stock_price) #scaling the data
inputs = np.reshape(inputs, (20, 1, 1)) #reshaping the data

In [None]:
predicted_stock_price = regressor.predict(inputs) #predicting the stock price
predicted_stock_price = sc.inverse_transform(predicted_stock_price) #inverse transforming the data

### Plotting Your Findings
1. Use matplotlib (or your plotting library of choice)
    - [Matplotlib](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)
    - [Plotly](https://plotly.com/python/line-charts/)
2. Compare the real stock prices and the predicted price

In [None]:
# Visualising the results 

### Question:
How can you improve this process to make this into a better project that uses machine learning to predict stock prices, or the price of anything else?