### Multi Linear Regression
These are of the form

\begin{align}
y=\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}+...+\theta_{n}x_{n}
\end{align}


In this example we have

\begin{align}
y=\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}
\end{align}

Need to find $\theta^{'}{s}$ to fit the model

Once $\theta^{'}{s}$ are found out, $\hat{y}$ can be calculated, and this process needs gradient descent or least square error method.

When optimum $\theta^{'}{s}$ are found, Test data is invoked to predict

In [1]:
import numpy as np
import pandas as pd
import matplotlib as plt
from sklearn import linear_model

In [4]:
#read csv file
df = pd.read_csv("Downloads/py-master/ML/2_linear_reg_multivariate/homeprices.csv")

In [5]:
#display first five rows
df.head()

Unnamed: 0,area,bedrooms,age,price
0,2600,3.0,20,550000
1,3000,4.0,15,565000
2,3200,,18,610000
3,3600,3.0,30,595000
4,4000,5.0,8,760000


In [7]:
#need to fill NaN in bedrooms column
#hence find mean of all values of no. of bedrooms
#round off median value (middle value)

import math
bedroom_median = math.floor(df.bedrooms.median())
bedroom_median

4

In [9]:
#mean value
df.bedrooms.mean()

4.2

In [10]:
#fillna is function in pandas to fill NaN rows,
df.bedrooms.fillna(bedroom_median) #specify data to be inserted in place of NaN, in this case its median

0    3.0
1    4.0
2    4.0
3    3.0
4    5.0
5    6.0
Name: bedrooms, dtype: float64

In [11]:
#assign altered column back to original data
df.bedrooms = df.bedrooms.fillna(bedroom_median)

In [12]:
#invoke sklearn model
reg = linear_model.LinearRegression()

In [13]:
#pass multi dimensional data as X variable and price as y variable
reg.fit(df[['area','bedrooms','age']], df.price)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

co-efficient $\theta_{1},\theta_{2}.....\theta_{n}$

intercept $\theta_{0}$

In [14]:
reg.coef_

array([  112.06244194, 23388.88007794, -3231.71790863])

In [15]:
reg.intercept_

221323.00186540425

In [16]:
reg.predict([[3000, 3, 30]])

array([530725.43066664])

In [17]:
reg.predict([[3000, 3, 10]])

array([595359.7888393])