# Multiple Linear Regression Part 2

Multiple Linear Regression (MLR) is used for more than 1 variables or features to find the relationship by fitting a linear equation.   

Y: 1 continuous target variable     
X: 2 or more predictor variables        
b0: intercept (X=0)     
b1: the coefficient or parameter of x1   
b2: the coefficient of parameter x2 and so on...        


To find the the parameter or coefficients for multiple linear regression with very large dataset and high dimensionality., you should use optimization approach .

Data Preprocessing

In [1]:
# Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

# fix_yahoo_finance is used to fetch data 
import fix_yahoo_finance as yf
yf.pdr_override()

In [2]:
# input
symbol = 'AMD'
start = '2014-01-01'
end = '2018-08-27'

# Read data 
dataset = yf.download(symbol,start,end)

# View columns 
dataset.head()

[*********************100%***********************]  1 of 1 downloaded


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-01-02,3.85,3.98,3.84,3.95,3.95,20548400
2014-01-03,3.98,4.0,3.88,4.0,4.0,22887200
2014-01-06,4.01,4.18,3.99,4.13,4.13,42398300
2014-01-07,4.19,4.25,4.11,4.18,4.18,42932100
2014-01-08,4.23,4.26,4.14,4.18,4.18,30678700


In [3]:
dataset.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,1172.0,1172.0,1172.0,1172.0,1172.0,1172.0
mean,7.013959,7.161143,6.868012,7.015776,7.015776,37857470.0
std,4.880564,4.98504,4.779099,4.887464,4.887464,34562190.0
min,1.62,1.69,1.61,1.62,1.62,0.0
25%,2.71,2.78,2.66,2.7075,2.7075,13082020.0
50%,4.25,4.35,4.175,4.275,4.275,29003800.0
75%,11.585,11.785,11.31,11.555,11.555,50642720.0
max,24.940001,27.299999,24.629999,25.26,25.26,325058400.0


Explore Features

In [4]:
features = dataset[['Open','High','Low','Close','Adj Close','Volume']]
features.head(9)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-01-02,3.85,3.98,3.84,3.95,3.95,20548400
2014-01-03,3.98,4.0,3.88,4.0,4.0,22887200
2014-01-06,4.01,4.18,3.99,4.13,4.13,42398300
2014-01-07,4.19,4.25,4.11,4.18,4.18,42932100
2014-01-08,4.23,4.26,4.14,4.18,4.18,30678700
2014-01-09,4.2,4.23,4.05,4.09,4.09,30667600
2014-01-10,4.09,4.2,4.07,4.17,4.17,20840800
2014-01-13,4.19,4.2,4.09,4.13,4.13,22856100
2014-01-14,4.14,4.3,4.13,4.3,4.3,42434800


In [5]:
target = dataset['Adj Close']
target.head(9)

Date
2014-01-02    3.95
2014-01-03    4.00
2014-01-06    4.13
2014-01-07    4.18
2014-01-08    4.18
2014-01-09    4.09
2014-01-10    4.17
2014-01-13    4.13
2014-01-14    4.30
Name: Adj Close, dtype: float64

In [6]:
msk = np.random.rand(len(dataset)) < 0.8
train = features[msk]
test = features[~msk]

Multiple Regression Model

In [7]:
from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['Open','High','Low']])
y = np.asanyarray(train[['Adj Close']])
regr.fit (x, y)
# The coefficients
print ('Coefficients: ', regr.coef_)

Coefficients:  [[-0.51741512  0.71866713  0.80080015]]


In [8]:
y_pred = regr.predict(test[['Open','High','Low']])
x = np.asanyarray(test[['Open','High','Low']])
y = np.asanyarray(test[['Adj Close']])
print("Residual sum of squares: %.2f"
      % np.mean((y_pred - y) ** 2))

# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(x, y))

Residual sum of squares: 0.01
Variance score: 1.00


In [9]:
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['Open','High','Low','Volume']])
y = np.asanyarray(train[['Adj Close']])
regr.fit (x, y)
print ('Coefficients: ', regr.coef_)
y_= regr.predict(test[['Open','High','Low','Volume']])
x = np.asanyarray(test[['Open','High','Low','Volume']])
y = np.asanyarray(test[['Adj Close']])
print("Residual sum of squares: %.2f"% np.mean((y_ - y) ** 2))
print('Variance score: %.2f' % regr.score(x, y))

Coefficients:  [[-5.28488950e-01  8.11577802e-01  7.19334192e-01 -8.48431352e-10]]
Residual sum of squares: 0.01
Variance score: 1.00
