# Multiple Linear Regression

### Task

We would like to build a predictive model to help us determine the median value of homes in a metropolitan city.

### Input features
1) Crime_rate: per capita crime rate by town
2) Mountain_view: 1 if property has mountain views; 0 otherwise
3) Avg_room_num: average number of rooms per dwelling
4) Low_stat: % of lower income individuals in area

### Target variable
1) Value: Median value of homes in $1000's

In [3]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [1]:
# Mounting notebook (for Google Colab users)
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Housing_dataset.csv')
df.head()

Unnamed: 0,Crime_rate,Mountain_view,Avg_room_num,Low_stat,Value
0,0.00632,No,6.575,4.98,24.0
1,0.02731,No,6.421,9.14,21.6
2,0.02729,No,7.185,4.03,34.7
3,0.03237,No,6.998,2.94,33.4
4,0.06905,No,7.147,5.33,36.2


In [5]:
# Independent and dependent variables
X = df[['Crime_rate','Avg_room_num', 'Low_stat']]
y = df['Value']

In [6]:
X.shape, y.shape

((506, 3), (506,))

In [7]:
# Training and testing split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [8]:
# Initialise scalers
min_max_scaler = MinMaxScaler()
standard_scaler = StandardScaler()

# Fit scalers on training data
min_max_scaler.fit(Xtrain)
standard_scaler.fit(Xtrain)

# Apply feature scaling
Xtrain_minmax_scaled = min_max_scaler.transform(Xtrain)
Xtest_minmax_scaled = min_max_scaler.transform(Xtest)

Xtrain_standard_scaled = standard_scaler.transform(Xtrain)
Xtest_standard_scaled = standard_scaler.transform(Xtest)

In [9]:
# create an empty linear regression model
housing_model1 = LinearRegression()

# fit on training data
housing_model1.fit(Xtrain_minmax_scaled, ytrain)

In [10]:
# create an empty linear regression model
housing_model2 = LinearRegression()

# fit on training data
housing_model2.fit(Xtrain_standard_scaled, ytrain)

In [11]:
# Intercept and coefficients values for housing_model1
print('Intercept: \n', housing_model1.intercept_)
print('Coefficients: \n', housing_model1.coef_)

Intercept: 
 14.646283131921713
Coefficients: 
 [-12.4925288   27.93965696 -20.01574118]


In [12]:
# Intercept and coefficients values for housing_model2
print('Intercept: \n', housing_model2.intercept_)
print('Coefficients: \n', housing_model2.coef_)

Intercept: 
 22.6118811881188
Coefficients: 
 [-1.14125798  3.71439284 -4.13942085]


Value = -1.14125798 Crime_rate + 3.71439284 Avg_room_num - 4.13942085 Low_stat + 22.6118811881188

In [13]:
# Using both housing models to obtain predictions on test set
y_pred1 = housing_model1.predict(Xtest_minmax_scaled)
y_pred2 = housing_model2.predict(Xtest_standard_scaled)

In [17]:
R_sq = r2_score(ytest, y_pred1)
print("R-squared is for housing_model1 ", round(R_sq*100), "%")

R-squared is  52 %


In [19]:
R_sq = r2_score(ytest, y_pred2)
print("R-squared is for housing_model2 ", round(R_sq*100), "%")

R-squared is for housing_model2  52 %


Explain that 0.52 is not good.