# Multiple Linear Regression with Dummies - Exercise

You are given a real estate dataset. 

Real estate is one of those examples that every regression course goes through as it is extremely easy to understand and there is a (almost always) certain causal relationship to be found.

The data is located in the file: 'real_estate_price_size_year_view.csv'. 

You are expected to create a multiple linear regression (similar to the one in the lecture), using the new data. 

In this exercise, the dependent variable is 'price', while the independent variables are 'size', 'year', and 'view'.

#### Regarding the 'view' variable:
There are two options: 'Sea view' and 'No sea view'. You are expected to create a dummy variable for view and include it in the regression

Good luck!

## Import the relevant libraries

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

## Load the data

In [2]:
df = pd.read_csv('real_estate_price_size_year_view.csv')
df.head()

Unnamed: 0,price,size,year,view
0,234314.144,643.09,2015,No sea view
1,228581.528,656.22,2009,No sea view
2,281626.336,487.29,2018,Sea view
3,401255.608,1504.75,2015,No sea view
4,458674.256,1275.46,2009,Sea view


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
price    100 non-null float64
size     100 non-null float64
year     100 non-null int64
view     100 non-null object
dtypes: float64(2), int64(1), object(1)
memory usage: 3.2+ KB


In [4]:
df.describe()

Unnamed: 0,price,size,year
count,100.0,100.0,100.0
mean,292289.47016,853.0242,2012.6
std,77051.727525,297.941951,4.729021
min,154282.128,479.75,2006.0
25%,234280.148,643.33,2009.0
50%,280590.716,696.405,2015.0
75%,335723.696,1029.3225,2018.0
max,500681.128,1842.51,2018.0


In [5]:
df.isnull().sum()

price    0
size     0
year     0
view     0
dtype: int64

## Create a dummy variable for 'view'

In [6]:
df['view'].unique()

array(['No sea view', 'Sea view'], dtype=object)

In [7]:
data = df.copy()
data['view'] = data['view'].map({'Sea view' : 1, 'No sea view' : 0})
data.head()

Unnamed: 0,price,size,year,view
0,234314.144,643.09,2015,0
1,228581.528,656.22,2009,0
2,281626.336,487.29,2018,1
3,401255.608,1504.75,2015,0
4,458674.256,1275.46,2009,1


## Create the regression

### Declare the dependent and the independent variables

In [8]:
X = data[['size', 'year', 'view']]
y= data['price']
X.shape,y.shape

((100, 3), (100,))

### Regression

In [9]:
x = sm.add_constant(X)
results = sm.OLS(y,X).fit()
results.summary()

  return ptp(axis=axis, out=out, **kwargs)


0,1,2,3
Dep. Variable:,price,R-squared (uncentered):,0.993
Model:,OLS,Adj. R-squared (uncentered):,0.992
Method:,Least Squares,F-statistic:,4377.0
Date:,"Fri, 12 Nov 2021",Prob (F-statistic):,2.3e-103
Time:,23:36:57,Log-Likelihood:,-1158.0
No. Observations:,100,AIC:,2322.0
Df Residuals:,97,BIC:,2330.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
size,218.5337,8.866,24.650,0.000,200.938,236.129
year,38.6216,4.126,9.361,0.000,30.433,46.810
view,5.75e+04,5261.330,10.928,0.000,4.71e+04,6.79e+04

0,1,2,3
Omnibus:,24.678,Durbin-Watson:,1.962
Prob(Omnibus):,0.0,Jarque-Bera (JB):,54.705
Skew:,0.906,Prob(JB):,1.32e-12
Kurtosis:,6.138,Cond. No.,4380.0


## Predict on new data

In [11]:
data_predict = pd.DataFrame({'size': [705.29, 549.80, 500.72], 
                             'year': [2019, 2020, 2021], 
                             'view': ['Sea view', 'Sea view', 'No sea view']})
data_predict = data_predict[['size','year','view']]
data_predict

Unnamed: 0,size,year,view
0,705.29,2019,Sea view
1,549.8,2020,Sea view
2,500.72,2021,No sea view


In [15]:
data = data_predict.copy()
data['view'] = data['view'].map({'Sea view' : 1, 'No sea view' : 0})
data.rename(index={0 : 'BhuvanaRes',1 : 'juliet', 2 : 'LembongHouse'}, inplace = True)
data

Unnamed: 0,size,year,view
BhuvanaRes,705.29,2019,1
juliet,549.8,2020,1
LembongHouse,500.72,2021,0


In [16]:
predict = results.predict(data)
predict

BhuvanaRes      289605.079132
juliet          255663.894416
LembongHouse    187478.494243
dtype: float64