<a href="https://colab.research.google.com/github/gingerchien/QuantHub/blob/main/LinearRegressionPricePrediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Goal: Predicting Days High and Low Given Its Open

Objectives:

* Learn the types of Regression
* Understand Variance and Bias Trade-off
* Making Predictions

### ScikitLearn

* Pre-installed in colab
* Important for Regression Problems
* Has data pre-processing packages to help standardize the data to normally distributed data such as the standarScaler function, MinMaxScaler, and MaxAbsScaler
* To tackle NaN values, especially as it will lead to losing out on the information provided by the non-NaN values in other parameterics. Use the Imputer function.

In [1]:
import sklearn

In [2]:
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

### Hyperparameter Optimization

* In the context of machine learning, hyper parameter optimization or model selection is the problem of
choosing a set of hyper parameters for a learning algorithm, usually with the goal of optimizing a measure of
the algorithm's performance on an independent data set.

In [3]:
from sklearn.model_selection import GridSearchCV

### Linear Regression Model

In [4]:
from sklearn.linear_model import LinearRegression

### Pipeline

 Pipeline is a feature which allows us to send in the functions and the steps that we would
want the algorithm to follow during the process. The purpose of the pipeline is to assemble several steps that
can be cross validated together while setting different parameters.

In [5]:
from sklearn.pipeline import Pipeline

# Importing Required Libraries

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Drop NaN Values

In [11]:
df = pd.read_csv('gold_prices.csv')
df = Df.dropna()

In [12]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close
0,2013-04-15,136.0,136.75,130.509995,131.309998
1,2013-04-16,134.899994,135.110001,131.759995,132.800003
2,2013-04-17,133.809998,134.949997,132.320007,132.869995
3,2013-04-18,134.119995,135.309998,133.619995,134.300003
4,2013-04-19,136.0,136.020004,134.600006,135.470001


# Check NaN Values are Dropped
0s in all columns confirms that all NaN values are dropped.

In [13]:
df.isna().sum()

Date     0
Open     0
High     0
Low      0
Close    0
dtype: int64

# Create Feature Columns

* Std_U = High - Open
* Std_D = Open - Low
* 3 periods moving average S_3 = Close.shift(1).rolling(window=3).mean()
* 15 periods moving average S_15 = Close.shift(1).rolling(window=15).mean()
* 60 periods moving average S_60 = close.shift(1).rolling(windows=60).mean()
* Todays open minus Yesterday's Open OD = Open - Open.shift(1)
* Correlation Indicator Corr = Close.shift(1).rolling(window=10).corr(S_3).shift(1) #find the correlation between the moving average and the previous close values
* Calculate Overnight Changes = Today's open - Yesterday's Close

In [14]:
#Calculate Upward and Downward Deviations from the Open
df['Std_U'] = df['High'] - df['Open']
df['Std_D'] = df['Open'] - df['Low']

In [15]:
#calculate the moving averages as inputs for prediction
df['S_3'] = df['Close'].shift(1).rolling(window=3).mean()
df['S_15'] = df['Close'].shift(1).rolling(window=15).mean()
df['S_60'] = df['Close'].shift(1).rolling(window=60).mean()

In [18]:
#calculate correlation between the previous close and the corresponding 3 day moving average values by using a 10 day window to get the recent correlation
df['Corr'] = df['Close'].shift(1).rolling(window=10).corr(df['S_3'].shift(1))

In [19]:
#Calculate how much the market has changed compared to the previous day's open
df['OD'] = df['Open'] - df['Open'].shift(1)

#Calculate how much the market has changed compared to previous day's close by subtracting today's open from previous days close
df['OL'] = df['Close'].shift(1) - df['Open']
df.tail()

Unnamed: 0,Date,Open,High,Low,Close,Std_U,Std_D,S_3,S_15,S_60,Corr,OD,OL
1527,2019-05-08,121.540001,121.540001,120.769997,120.910004,0.0,0.770004,120.89,120.606668,122.611834,-0.221595,0.520004,-0.330002
1528,2019-05-09,120.959999,121.620003,120.860001,121.199997,0.660004,0.099998,120.976667,120.633335,122.567001,-0.290695,-0.580002,-0.049995
1529,2019-05-10,121.410004,121.730003,121.300003,121.43,0.319999,0.110001,121.106667,120.694668,122.522667,-0.280418,0.450005,-0.210007
1530,2019-05-13,122.629997,122.849998,122.330002,122.669998,0.220001,0.299995,121.18,120.765334,122.490334,0.078028,1.219993,-1.199997
1531,2019-05-14,122.599998,122.660004,122.120003,122.459999,0.060006,0.479995,121.766665,120.918667,122.467167,0.365089,-0.029999,0.07
