# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

## Create Input and Output Parameters
In the previous notebook, you learned how to read data from a CSV file and drop missing values.

In this notebook, you will learn how to create custom input and output parameters that you will use in your regression model. The key steps are:

1. [Import the Libraries](#import)
2. [Read the GLD Data](#read)
3. [Create Output Parameters](#Output)
4. [Create Input Parameters](#input)


<a id='import'></a>
## Import the Libraries

First, we will import `pandas` and `numpy` libraries for data manipulation and analysis. 

In [1]:
# Data manipulation
import pandas as pd
import numpy as np

<a id='read'></a>
## Read the GLD Data

We have saved the Gold ETF (GLD) data in OHLC format in a CSV file named `gold_prices.csv`. You can read the file using the `pandas.read_csv()` method. 

In [2]:
# Read the data
gold_prices = pd.read_csv('../data_modules/gold_prices.csv',
                          parse_dates=['Date'], index_col='Date')

<a id='output'></a>
## Create Output Parameters

First, we will create `Std_U` and `Std_D` which are the upward and downward deviations from the `Open` price, respectively. These parameters will be our output parameters. This means that our machine learning model will predict the upward and downward deviations with the help of the Open price and other custom input parameters. 

The formulas for them are as follows: 

Upward deviation  = High - Open

Downward deviation = Open - Low

In [3]:
# Calculate the upward and downward deviations from the Open
gold_prices['Std_U'] = gold_prices['High']-gold_prices['Open']
gold_prices['Std_D'] = gold_prices['Open']-gold_prices['Low']

<a id='input'></a>
## Create Input Parameters

We will create custom input parameters by using the raw data from the CSV file. In the machine learning model, these indicators will be used as inputs for prediction.

We will create `S_3`, `S_15` and `S_60` which are 3-day, 15-day and 60-day moving averages for GLD `Close` prices, respectively.

To calculate these moving averages, we will make use of `shift()`, `rolling()` and `mean()` operators. 

The `shift()` operator shifts the index by the desired number of periods. We use  `shift(1)` so that we can exclude the current value from our moving average. The `rolling()` function allows us to consider a moving window for a fixed number of observations. We will change this number according to the moving average we want to find. Finally, the `mean()` will give us the average of the respective values. 

In [4]:
# Calculate 3-day moving average of close prices
gold_prices['S_3'] = gold_prices['Close'].shift(1).rolling(window=3).mean()

# Calculate 15-day moving average of close prices
gold_prices['S_15'] = gold_prices['Close'].shift(1).rolling(window=15).mean()

# Calculate 60-day moving average of close prices
gold_prices['S_60'] = gold_prices['Close'].shift(1).rolling(window=60).mean()

We will calculate the correlation between the previous close values and corresponding 3-day moving average values. We will use a 10-day window so that we get the recent correlation which only considers the last 10 days.  

In [5]:
# Calculate correlation between previous close and 3-day moving average for past 10 days
gold_prices['Corr'] = gold_prices['Close'].shift(
    1).rolling(window=10).corr(gold_prices['S_3'].shift(1))

Next, we will add `OD` which will tell us how much the market has changed compared to the previous day's Open. The formula is as follows: 

OD = Today's open - Previous open

We will also add an indicator that will show us the overnight changes in the stock. We can calculate this with the following formula: 

OL = Previous close - Today's open

In [6]:
# Calculate OD, which shows changes since previous open
gold_prices['OD'] = gold_prices['Open']-gold_prices['Open'].shift(1)

# Calculate OL, which shows overnight changes
gold_prices['OL'] = gold_prices['Close'].shift(1)-gold_prices['Open']

gold_prices.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Std_U,Std_D,S_3,S_15,S_60,Corr,OD,OL
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2019-05-08,121.540001,121.540001,120.769997,120.910004,0.0,0.770004,120.89,120.606668,122.611834,-0.221595,0.520004,-0.330002
2019-05-09,120.959999,121.620003,120.860001,121.199997,0.660004,0.099998,120.976667,120.633335,122.567001,-0.290695,-0.580002,-0.049995
2019-05-10,121.410004,121.730003,121.300003,121.43,0.319999,0.110001,121.106667,120.694668,122.522667,-0.280418,0.450005,-0.210007
2019-05-13,122.629997,122.849998,122.330002,122.669998,0.220001,0.299995,121.18,120.765334,122.490334,0.078028,1.219993,-1.199997
2019-05-14,122.599998,122.660004,122.120003,122.459999,0.060006,0.479995,121.766665,120.918667,122.467167,0.365089,-0.029999,0.07


## Conclusion
In this notebook, we have learned how to create input and output parameters from a raw dataset. We will use these parameters later in our regression model.
<br><br>