# Predicting house prices 🏠💵

🔎 For this experiment, we are going to test two different regression models:

- Linear Regression
- Decision Tree

MLFlow will help us to compare their training metrics with some great charts using the Monitor tab. 😎

## Imports 

These are the frameworks we are going to use

In [83]:
import math
import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import numpy as np

## Dataset

In [96]:
df = pd.read_csv('/home/jovyan/datafabric/Full_data_sources/MLFlow/houses.csv') # Place your local path here :)
# Translating the column names to english
df.rename(columns={'tamanho': 'size', 'ano': 'year', 'garagem':'n_garage', 'preco':'price'}, inplace=True)

display(df.head())
display(df.shape)

Unnamed: 0,size,year,n_garage,price
0,159.0,2003,2,208500
1,117.0,1976,2,181500
2,166.0,2001,2,223500
3,160.0,1915,3,140000
4,204.0,2000,3,250000


(1460, 4)

## Exploratory Data Analyses

Let's do some quick and cool interactive data visuzalitions over our data using plotly lib, which comes with the data science workspace :) 

In [85]:
fig = go.Figure()

fig.add_trace(
    go.Heatmap(
        x = df.corr().columns,
        y = df.corr().index,
        z = np.array(df.corr()),
        text=df.corr().values,
        texttemplate='%{text:.2f}'
    )
)
fig.update_layout(
    title='Correlation Matrix',
    coloraxis_colorbar=dict(title='Correlation level')
)

fig.show()

In [86]:
fig = px.histogram(
    df['price']
)

fig.update_layout(
    title='Price distribution',
    xaxis_title='Price', 
    yaxis_title='Count',
    
)

fig.show()

In [87]:
fig = px.scatter(
    df['price'],
    df['year'],
    color=df['n_garage']
)

fig.update_layout(
    title='Price x Number of garages over the years',
    xaxis_title='Year', 
    yaxis_title='Price'
    
)

fig.show()

In [88]:
fig = px.line(
df.groupby('year').mean().iloc[:, 0]
)

fig.update_layout(
    title='Price x Number of garages over the years',
    xaxis_title='Year', 
    yaxis_title='Price'
    
)

fig.show()

# Machine Learning 

After the EDA, it's time to instantiate and train the machine learning algorithms we mentioned before.
They were imported from scikit-learn, a very popular DS library :) 

### Features and target

In [89]:
X = df.iloc[:, 0:3].values # Garages, size of the house and year
y = df.iloc[:, 3].values # Price

### Train and test split

Let's do the usual train and test split over our data. 

We're chosing 70% of the data for training and 30% for validation.

In [99]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [91]:
X_train.shape, X_test.shape # 1022 data points for training and 438 for testing

((1022, 3), (438, 3))

#### Start of MLflow

Let's give a name for our current experiment so we can select it later on the Monitor tab :)

In [98]:
mlflow.set_experiment('house-prices-models')
mlflow.sklearn.autolog() # Autolog will automatically log the metrics from the scikit models



### Linear Regression


Linear regression is a statistical technique for predicting the value of a variable based on the values of other variables, assuming a linear relationship between them.

In [93]:
lr = LinearRegression(fit_intercept=True)
with mlflow.start_run(run_name='LinearRegression') as run:
    lr.fit(X_train, y_train)

y_pred_lr = lr.predict(X_test)

### Decision Tree Regressor


A Decision Tree Regressor is a machine learning model that predicts a continuous value by splitting data into branches based on input variables





In [94]:
regressor = DecisionTreeRegressor(random_state=42)
with mlflow.start_run(run_name='DecisionTreeRegressor') as run:
    regressor.fit(X_train, y_train)

y_pred_regressor = regressor.predict(X_test)

### End of MLFlow

In [95]:
mlflow.end_run()

Now you can go to the Monitor tab and check your metric results from this experiment compare the results using the Chart option! :)