# Why do you need Machine Learning models?

Let's say you have a house to sell, which Price should you put to it?

In [2]:
form_input = {
    'Bedrooms': 3,
    'Bathrooms': 2,
    'Garage': 2,
    'Build Year': 2000,
    'Floor Area': 200
}

In [3]:
import pandas as pd
df_input = pd.DataFrame(form_input, index=[0])
df_input

Unnamed: 0,Bedrooms,Bathrooms,Garage,Build Year,Floor Area
0,3,2,2,2000,200


## Data

Having a dataset with many houses and their sold Prices, you can use Machine Learning to predict the optimal Price for your house.

In [8]:
import pandas as pd

path = '../../../data/house_perth/output/ml_simple.csv'
df_base = pd.read_csv(path, index_col=0)
df_base

Unnamed: 0_level_0,Price,Bedrooms,Bathrooms,Garage,Build Year,Floor Area
ADDRESS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1 Datchet Turn,270000,3,2,2.0,2011.0,109
1 McKenzie Corner,470000,4,2,2.0,2005.0,279
...,...,...,...,...,...,...
93 Centennial Avenue,350000,4,2,2.0,2005.0,177
98 Centennial Avenue,441000,4,2,2.0,2004.0,195


## ML Model

**Machine learning, what does it mean?**

Visual explanation: https://twitter.com/jsulopzs/status/1449735653328031745

### Import algorithm from sklearn

In [17]:
from sklearn.linear_model import LinearRegression

### Assign the algorithm to a variable in the computer's memory

In [18]:
model = LinearRegression()

### Fit model with historical data

In [19]:
y = df_base['Price']
X = df_base.drop(columns='Price')

model.fit(X, y)

### Predict your house Price

In [20]:
df_input

Unnamed: 0,Bedrooms,Bathrooms,Garage,Build Year,Floor Area
0,3,2,2,2000,200


In [21]:
model.predict(df_input)

array([415238.81026797])

### Interpretation

In [22]:
model.coef_

array([20407.52444216, -5446.13921643,  1952.49704666, -2468.27502394,
        1080.61667876])

In [23]:
model.intercept_

5081430.233404346

In [24]:
(df_input * model.coef_).sum(axis=1) + model.intercept_

0    415238.810268
dtype: float64

### How does the machine learns?

- Visual explanation: https://youtu.be/Ht3rYS-JilE
- Visit source code `model.fit` to understand see the math behind the model.

<img src="src/sourcecode.png" width=800>

### Model evaluation

In [28]:
y_pred = model.predict(X)
y_pred

array([289751.66723767, 508673.67721233, 425773.23461438, 367112.89229502,
       281008.42845204, 445690.86817794, 360015.10888963, 402466.20102762,
       377452.52573668, 419903.61787467, 293865.39764184, 423304.95959044,
       362483.38391356, 416612.52320704, 377304.97574976, 292895.21191837,
       395208.92594273, 408335.81776732, 320782.50925523, 389032.2675366 ,
       371582.90899698, 344431.88373366, 426276.88699193, 411111.13445768,
       286411.51184583, 434884.70139036, 459223.10702452, 446464.44319028,
       404934.47605155, 295768.8339643 , 412498.79280286, 384255.20916822,
       412498.79280286, 373142.00071422, 399224.35099134, 422838.42624453,
       390726.9675482 , 367272.38397451, 462207.16002573, 380546.82578603,
       383015.10080997, 398610.2676585 , 355226.10882868, 385483.37583391,
       347968.83374379, 481658.26024338, 447692.60985596, 419449.02622132,
       419903.61787467, 280947.24212811, 325719.05930311, 473160.87680024,
       402172.92086015, 4

In [29]:
data = {
    'y_true': y,
    'y_pred': y_pred
}

df_pred = pd.DataFrame(data)
df_pred

Unnamed: 0_level_0,y_true,y_pred
ADDRESS,Unnamed: 1_level_1,Unnamed: 2_level_1
1 Datchet Turn,270000,289751.667238
1 McKenzie Corner,470000,508673.677212
...,...,...
93 Centennial Avenue,350000,398450.775979
98 Centennial Avenue,441000,420370.151221


In [30]:
model.score(X, y)

0.5881533749382085

Watch out! The model's prediction is 58% approximate to the real Price... is there a better model?

And that's why you need Machine Learning!

https://scikit-learn.org/stable/supervised_learning.html