# Introduction
We hope you're reading this on <font color = "orange">Google Colab</font>. If not, go back to Part III and follow the guide! 

In this section, we will:
1. Mount the Google Drive and read the CSV
2. Train a vanilla LSTM model using Open
3. Train another vanilla LSTM model using FilteredOpen
4. Plot and compare our model results

As mentioned at the end of Part I, we will be answering two questions:
1. Can LSTM help with stock predictions?
2. Can using signal processing technique help with improving stock predictions?

### Step 1: Import pandas
Let's start with importing pandas.

In [None]:
# Step 1: Import pandas

### Step 2: Mount your drive
Before we can read the CSV we'll need to mount the drive.

![MountDriveInstructions](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/MountDriveInstructions.png)

Steps you'll need:
1. Connect the runtime
2. Mount your Google Drive
3. Navigate through your directory until you reach the folder containing "Project Finance x LSTM (Part IV).ipynb" and the CSV from Part III
4. [Not shown] Right click, and click 'Copy path'
5. Use that to read your CSV using pandas later

In [None]:
# Step 2: Click on 'Mount Drive' button (2) and mount drive

### Step 3: Read the CSV from Part III
Now that you've mounted the Drive, you can now read the CSV that you've uploaded into the Google Drive.

Make sure you set the first column as your index, and parse dates so that the dates are parsed as DateTimeIndex object.

You should have:
- 4,904 rows
- 7 columns

In [None]:
# Step 3: Read the CSV from Part III

### Step 4: Import libraries
Now, we import the rest of the libraries needed for LSTM model training. Here are the libraries you'll need:
- matplotlib.pyplot as plt
- numpy as np
- StandardScaler from sklearn.preprocessing
- mean_squared_error from sklearn.metrics
- Sequential from keras.models
- Dense from keras.layers
- LSTM from keras.layers
- Dropout from keras.layers

In [None]:
# Step 4: Import libraries

### Step 5: Split your data into train and test
Similarly, split your DataFrame into train and test DataFrames according to the following dates:

- Train data: January 3 2000 to May 16 2019
- Test data: May 15 2019 to July 1 2019

You might be wondering why there's an overlap between the last 2 days and first 2 days of Train and Test respectively. 

We'll explain a bit more later.

In [None]:
# Step 5: Split DataFrame into train and test

# Data Preparation
We'll have to do a bit more data preparation before we start training our model.

This step was not mentioned in the publication, but it's good practice to scale your values. 

We'll prepare three separate sets of scaled training data:
1. Open
2. FilteredOpen
3. ZeroMeanFilteredOpen (we'll get to that soon)

### Step 6: Add a new column called ZeroMeanFilteredOpen
![ResearchPaperNormalization](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/ResearchPaperNormalization.png)

In the paper, the authors performed zero-mean normalization. What is zero-mean normalization?

![ZeroMeanNormalization](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/ZeroMeanNormalization.png)

Zero-mean normalization happens when you subtract all values in a column with the overall mean. 

We will do the same with our denoised Open data and name the new column 'ZeroMeanFilteredOpen'.

Here's what we'll do:
1. Create a new column 'ZeroMeanFilteredOpen' in the train DataFrame, using the mean of 'FilteredOpen' in train
2. Create a new column 'ZeroMeanFilteredOpen' in the test DataFrame, using the mean of 'FilteredOpen' in train

It's not the usual normalization, but we should still normalize after splitting, and using data from train set only.

In [None]:
# Step 6a: Get the mean of 'FilteredOpen' from train

# Step 6b: Create 'ZeroMeanFilteredOpen' in train

# Step 6c: Create 'ZeroMeanFilteredOpen' in test

### Step 7: Transform Open, FilteredOpen, and ZeroMeanFilteredOpen with StandardScaler
After creating ZeroMeanFilteredOpen, we'll proceed with feature scaling. 

This step was not mentioned in the paper, but it's good practice to do so for training. 

First, let's declare three variables containing a StandardScaler, without any additional parameters. 

We can't use the same scaler for the three Opens because we'll be using the scaler to un-scale predictions as well so the scaler properties must be specific to each Open data.

Then, we'll scale our column values and use that for training.

Note: If you have an error that goes <strong>"Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."</strong>, you used a Series. 

Either use a DataFrame containing only the Open/FilteredOpen/ZeroMeanFilteredOpen column, or a (-1, 1) reshape to reshape your the np.array of your Series.

In [None]:
# Step 7a: Declare a StandardScaler for Open

# Step 7b: Call .fit_transform on the 'Open' column values from your train dataset


# Step 7c: Declare a StandardScaler for FilteredOpen

# Step 7d: Call .fit_transform on the 'FilteredOpen' column values from your train dataset


# Step 7e: Declare a StandardScaler for ZeroMeanFilteredOpen

# Step 7f: Call .fit_transform on the 'ZeroMeanFilteredOpen' column values from your train dataset


### Step 8: Prepare create arrays for training
A bit of context on why we're creating more arrays. For LSTM, we're taking a sequence of data and predicting an output in the end. 

In our case, we're taking a window of two prices in sequence, and predicting the next one.

![LSTMTrainingWindow](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/LSTMTrainingWindow.png)

This is why we had the small overlap - so that we have enough data to predict for the "first" day of our test data, which the authors said was 17th May 2019.

<strong>You will create a list of length-2 NumPy arrays for your train data, and a NumPy array of prices for your test data.</strong>

Something like this for the scaled 'Open':

![OpenXyTrainList](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/OpenXyTrainList.png)

Expect six variables after running this step. Three sets of X train lists and y train lists for "Open", "FilteredOpen", and "ZeroMeanFilteredOpen".

<details>
    <summary>Click for instructions if you're stuck</summary>
    <div>
        <ol>
            <li>Declare list for X train</li>
            <li>Declare list for y train</li>
            <li>Loop through your scaled "Open" values using a for loop and range function</li>
              <ul>
                  <li>Start at index 2 and end at the last index, so you'll need to configure your range</li>
                  <li>Append your X train list with an array containing items from 1 and 2 indices before, i.e. if I am at index 2, I should be appending a NumPy array containing items from index 0 and 1</li>
                  <li>Append your y train list with the item from the current index</li>
              </ul>
            <li>Don't forget that the scaled values from Step 7 are in a NumPy array as well so you need to reshape your sliced array</li>
        </ol>
    </div>
    <div>
        <p>Tweak the values around like [i-2:i, y] as your slicing (figure what y is and you'll be fine)</p>
    </div>
</details>



In [None]:
# Step 8a: Prepare X train and y train using scaled "Open"

# Step 8b: Prepare X train and y train using scaled "FilteredOpen"

# Step 8c: Prepare X train and y train using scaled "ZeroMeanFilteredOpen"

### Step 9: Turn the X train and y train lists into NumPy arrays
Now that you have a list of NumPy arrays, time to turn them into a NumPy array of NumPy arrays.

Sounds confusing, we know. That's why we had this as a separate instruction.

After you turn the list into a NumPy array, you can look at its .shape attribute and get a (4871,2) for train, and (4871,) for test.

<details>
    <summary>Click once if you need a hint</summary>
    <div>
        <strong>Google "convert python list into numpy array"</strong>
    </div>
</details>

In [None]:
# Step 9a: Turn open X train list into NumPy array

# Step 9b: Turn open y train list into NumPy array


# Step 9a: Turn filtered X train list into NumPy array

# Step 9b: Turn filtered y train list into NumPy array


# Step 9a: Turn open X train list into NumPy array

# Step 9b: Turn open y train list into NumPy array)

In [None]:
# Optional: Get the shape of your converted arrays

### Step 10: Reshape your X train NumPy arrays
We will now reshape our X train arrays from 2D to 3D.

Earlier on, you migth have found that our X train arrays have a shape of (4871, 2). Reshape it such that it becomes (4871, 2, 1), a 3D array.

In [None]:
# Step 10a: Reshape your "Open" X train array

# Step 10b: Reshape your "FilteredOpen" X train array

# Step 10c: Reshape your "ZeroMeanFilteredOpen" X train array

# Model building and prediction
Now that we've prepared our data, now is the time to build and train the model. Since the authors did not detail their architecture, we'll be using a simple LSTM model architecture in this exercise. 

Don't worry, it does its job well.

### Step 11: Set up the model architecture
We'll do the following steps to set up a model.

1. Declare a variable, and store a Sequential object
2. [First layer] Add a LSTM layer
  *   50 units
  *   return sequences
  *   input shape as a tuple with (2, 1)
3. [Second layer] Add a Dropout layer, with a rate of 0.3
4. [Third layer] Add an LSTM layer
  *   50 units
5. [Fourth layer] Add a Dropout layer, with a rate of 0.3
6. [Fifth layer] Add a Dense layer
  *   1 units

That's it, you're done. When you call the model's .summary method, you'll see the following:

![LSTMModelArchitecture](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/LSTMModelArchitecture.png)

In [None]:
# Step 11: Set up your model architecture

### Step 12: Compile and fit your model with "Open" data
Now that you're done setting up, let's start with the "Open" data. Do the following next:
1. Call the compile method
  *    Use the 'adam' optimizer
  *    Use mean_squared_error as the loss function
2. Call the fit method
  *    Use the "Open" X train and y train data
  *    Have 15 epochs
  *    Use a batch size of 32

In [None]:
# Step 12: Compile and fit your data

### Step 13: Prepare the "Open" test data
Repeat what you did in Steps 8-10 for the test set.
 
Use the respective scalers with the corresponding test data. For example, scale the "Open" from test using the Scaler from Step 7a.

Take note that for the reshape step, the dimensions will be different.

For "Open" test X train, this is what we anticipate to see in the end after repeating the reshape step:

![OpenXTestArray](https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/OpenXTestArray.png)

In [None]:
# Step 13: Transform, loop, transform, and reshape "Open" test X data

### Step 14: Make your predictions
Call the predict method of your model, using the X test data you have prepared in Step 13.

The predictions must also be transformed using the .inverse_transform method of your scaler from Step 7. 

Just ignore any warnings that appear. Don't worry.

In [None]:
# Step 14a: Make the predict method call

# Step 14b: Make the inverse_transform call

### Step 15: Create a DataFrame for your "Open" prediction
Now that we're done with prediction, let's create a DataFrame because we need the date index for plotting and comparison.

Our DataFrame is 31 rows long, starts on the 17th May 2019 and ends on 1st July 2019.

You can borrow the index of your original test DataFrame, but don't forget that DataFrame is 33 rows long. 

In [None]:
# Step 15: Create a DataFrame for your "Open" prediction

### Step 16: Plot the "Open" prediction with the original test "Open"
Moment of truth. 

Let's plot the "Open" data from the original test DataFrame, from 17th of May 2019 to 1st of July 2019.

<details>
    <summary><font color = 'green'>SPOILERS! Click once for a look to compare our plot and yours</font></summary>
    <div>
        <img src = 'https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/OpenPredictionPlot.png'>
    </div>
</details>

In [None]:
# Step 16: Plot "Open" prediction with the original test "Open"

### Step 17: Repeat Steps 11-15 for "FilteredOpen" and "ZeroMeanFilteredOpen" 
Now that we've successfully done predictions using data from "Open", let's work on "FilteredOpen" and "ZeroMeanFilteredOpen" next.

Don't forget that you have to add the FilteredOpen mean to the predictions for ZeroMeanFilteredOpen.

In [None]:
# Step 17a: Set up your model architecture for FilteredOpen, compile, and fit "FilteredOpen" data (Steps 11-12)

In [None]:
# Step 17b: Transform, loop, transform, and reshape "FilteredOpen" test X data (Step 13)

# Step 17c: Make the predict method and inverse_transform call (Step 14)

# Step 17d: Create a DataFrame for your "FilteredOpen" prediction (Step 15)

In [None]:
# Step 17e: Set up your model architecture for ZeroMeanFilteredOpen, compile, and fit "ZeroMeanFilteredOpen" data (Steps 11-12)

In [None]:
# Step 17f: Transform, loop, transform, and reshape "ZeroMeanFilteredOpen" test X data (Step 13)

# Step 17g: Make the predict method and inverse_transform call (Step 14)

# Step 17h: Create a DataFrame for your "ZeroMeanFilteredOpen" prediction (Step 15)

# Step 17i: Add the FilteredOpen mean (Step 6a) from train DataFrame into all values (reverse the normalization)

### Step 18: Plot all three predictions with the original test "Open"
Which predictions did the best? Let's find out by plotting all three sets of predictions on the same plot. 

If your ZeroMeanFilteredOpen plot is way lower than others, make sure you added the mean to undo the zero-mean normalization.

<details>
    <summary><font color = 'green'>SPOILERS! Click once for a look to compare our plot and yours</font></summary>
    <div>
        <img src = 'https://uplevelsg.s3-ap-southeast-1.amazonaws.com/ProjectFinancexLSTM/FinalPredictionPlots.png'>
    </div>
</details>

In [None]:
# Step 18: Plot all predictions with the original test "Open"

### Step 19: Calculate the RMSE of the three predictions
Visually, we know which predictions performed best.

However, it's good to put a number on it as well. Let's calculate the RMSE of the predictions.

In [None]:
# Step 19a: Print the RMSE of test 'Open' and 'Open' predictions

# Step 19b: Print the RMSE of test 'Open' and 'FilteredOpen' predictions

# Step 19c: Print the RMSE of test 'Open' and 'ZeroMeanFilteredOpen' predictions

# The end
And that's the end! What a journey; you successfully performed classical and deep learning for S&P 500 stock prices.

To recap, you've:
1. Read research on stock pricing and retrieved the data
2. Investigated the ARIMA terms and performed ARIMA modelling
3. Used signal processing techniques to denoise stock data
4. Trained an LSTM model to predict stock pricing data

You have also answered the two questions that we wanted to ask at the start of this project.

Go on, give yourself a pat on the back. We hope this project series has give you more confidence in coding and deep learning. 

Whatever you learn here is but a tip of the iceberg, and launchpad for bigger and better things to come. 

If you're keen, here are some more things you can try:
- More datasets, e.g., HSI and DJI
- More modelling, e.g., more complex LSTM infrastructures

Come join us in our Telegram community over at https://bit.ly/UpLevelSG and our Facebook page at https://fb.com/UpLevelSG

<strong>Most importantly, UpLevel won't be what it is today without learners like yourself so help us grow by spreading the word and get more subscribers <font color = 'red'><3</font></strong>
