# Linear regression
---
Linear regression is a part of **Supervised Learning** which means that the algorithm is trained using labeled data points.

Supervised learning is further split into **Classification** and **Regression**
## How does it work?
Linear regression models generally are fitted using LS(*Least-squares*) approach.
The goal is to find a line that has as many observations as close to the line as possible.

### Importing the required libraries

In [1]:
import pandas as pd #for data manipulation
import numpy as np #for data manipulation
import plotly.graph_objects as go #for visualization
import plotly.express as px #for visualization
from sklearn.linear_model import LinearRegression # for creating a model

In [3]:
df = pd.read_csv('../data/Real_estate.csv')

In [4]:
df

Unnamed: 0,No,X1 transaction date,X2 house age,X3 distance to the nearest MRT station,X4 number of convenience stores,X5 latitude,X6 longitude,Y house price of unit area
0,1,2012.917,32.0,84.87882,10,24.98298,121.54024,37.9
1,2,2012.917,19.5,306.59470,9,24.98034,121.53951,42.2
2,3,2013.583,13.3,561.98450,5,24.98746,121.54391,47.3
3,4,2013.500,13.3,561.98450,5,24.98746,121.54391,54.8
4,5,2012.833,5.0,390.56840,5,24.97937,121.54245,43.1
...,...,...,...,...,...,...,...,...
409,410,2013.000,13.7,4082.01500,0,24.94155,121.50381,15.4
410,411,2012.667,5.6,90.45606,9,24.97433,121.54310,50.0
411,412,2013.250,18.8,390.96960,7,24.97923,121.53986,40.6
412,413,2013.000,8.1,104.81010,5,24.96674,121.54067,52.5


In [5]:
fig = px.scatter(df, x = df['X3 distance to the nearest MRT station'],
                    y=df['Y house price of unit area'], opacity=0.8, color_discrete_sequence=['black'])
fig.update_layout(dict(plot_bgcolor = 'white'))
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
                 zeroline=True, zerolinewidth=1,zerolinecolor='lightgrey',
                 showline=True,linewidth=1, linecolor='black')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
                 zeroline=True, zerolinewidth=1,zerolinecolor='lightgrey',
                 showline=True,linewidth=1, linecolor='black')
fig.update_layout(title_text="Scatter plot")
fig.update_traces(marker=dict(size=3))
fig.show()

In [6]:
x = df['X3 distance to the nearest MRT station'].values.reshape(-1,1)
y = df['Y house price of unit area'].values

model = LinearRegression()
reg = model.fit(x,y)

print(reg.coef_)
print(reg.intercept_)

[-0.00726205]
45.85142705777498


In [7]:
x_range = np.linspace(x.min(),x.max(),20)
y_range = model.predict(x_range.reshape(-1,1))
fig = px.scatter(df, x = df['X3 distance to the nearest MRT station'],
                 y = df['Y house price of unit area'], opacity=0.8, color_discrete_sequence=['black'])

fig.add_traces(go.Scatter(x = x_range, y = y_range, name = 'Regression Fit'))

fig.update_layout(dict(plot_bgcolor = 'white'))

fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
                 zeroline=True, zerolinewidth=1,zerolinecolor='lightgrey',
                 showline=True,linewidth=1, linecolor='black')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
                 zeroline=True, zerolinewidth=1,zerolinecolor='lightgrey',
                 showline=True,linewidth=1, linecolor='black')
fig.update_layout(title_text="Scatter plot with linear regression line")
fig.update_traces(marker=dict(size=3))

fig.show()

# Multiple Linear Regression
---
In this part we use one more independent variable and use 2 independent variables to predict a value.

In [8]:
fig = px.scatter_3d(df, x = df['X3 distance to the nearest MRT station'],
                     y = df['X2 house age'], z = df['Y house price of unit area'],
                     opacity=0.8, color_discrete_sequence=['black'])
fig.update_layout(title_text="Scatter 3D Plot",
                  scene = dict(xaxis=dict(backgroundcolor='white',
                                          color='black',
                                          gridcolor='lightgrey'),
                               yaxis=dict(backgroundcolor='white',
                                          color='black',
                                          gridcolor='lightgrey'
                                          ),
                               zaxis=dict(backgroundcolor='white',
                                          color='black', 
                                          gridcolor='lightgrey')))
fig.update_traces(marker = dict(size=2))
fig.show()                   

In [9]:
x = df[['X3 distance to the nearest MRT station','X2 house age']]
y = df['Y house price of unit area'].values
model = LinearRegression()
reg = model.fit(x,y)

print(reg.coef_)
print(reg.intercept_)

[-0.00720862 -0.23102658]
49.885585756906636


In [10]:
mesh_size = 1
x_min, x_max = x['X3 distance to the nearest MRT station'].min(), x['X3 distance to the nearest MRT station'].max()
y_min, y_max = x['X2 house age'].min(), x['X2 house age'].max()

xrange = np.arange(x_min, x_max, mesh_size)
yrange = np.arange(y_min, y_max, mesh_size)

xx, yy = np.meshgrid(xrange, yrange)

In [11]:
pred = model.predict(np.c_[xx.ravel(), yy.ravel()])
pred = pred.reshape(xx.shape)

In [12]:
fig = px.scatter_3d(df, x = df['X3 distance to the nearest MRT station'], 
                     y = df['X2 house age'], z = df['Y house price of unit area'],
                     opacity=0.8, color_discrete_sequence=['black'])
fig.update_layout(title_text="Scatter 3D Plot with Prediction Surface",
                  scene = dict(xaxis=dict(backgroundcolor='white',
                                          color='black',
                                          gridcolor='lightgrey'),
                               yaxis=dict(backgroundcolor='white',
                                          color='black',
                                          gridcolor='lightgrey'
                                          ),
                               zaxis=dict(backgroundcolor='white',
                                          color='black', 
                                          gridcolor='lightgrey')))
fig.update_traces(marker=dict(size=3))
fig.add_traces(go.Surface(x = xrange,y = yrange,z = pred, name = 'pred_surface'))
fig.show()