# Projeto do Aluno - Fundamentos de Machine Learning

Ensaio de Machine Learning com dados da empresa Data Money.

**Algoritmos de Regressão:** 

- Linear Regression
    - Linear Regression Lasso
    - Linear Regression Ridge 
    - Linear Regression Elastic Net
- Polinomial Regression
    - Polinomial Regression Lasso
    - Polinomial Regression Ridge 
    - Polinomial Regression Elastic Net
- Decision Tree Regressor
- Random Forest Regressor

**Métricas de performance:** R2, MSE, RMSE, MAE e MAPE

## 0.0 Imports:

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics as mt
from sklearn import datasets as dt
from sklearn import linear_model as lm
from sklearn import preprocessing as pp
from sklearn import tree as tr
from sklearn import model_selection as ms
from sklearn.ensemble import RandomForestRegressor

## 1.0 Load datasets:

### 1.1 Regressão:

In [2]:
#Dados de treino:
X_train = pd.read_csv("regressao/X_training.csv")
y_train = pd.read_csv("regressao/y_training.csv")

#Dados de teste:
X_test = pd.read_csv("regressao/X_test.csv")
y_test = pd.read_csv("regressao/y_test.csv")

#Dados de validação:
X_val = pd.read_csv("regressao/X_validation.csv")
y_val = pd.read_csv("regressao/y_val.csv")

## 2.0 Análise dos dados:

### 2.1 Regressão:

In [5]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10547 entries, 0 to 10546
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   song_duration_ms  10547 non-null  float64
 1   acousticness      10547 non-null  float64
 2   danceability      10547 non-null  float64
 3   energy            10547 non-null  float64
 4   instrumentalness  10547 non-null  float64
 5   key               10547 non-null  float64
 6   liveness          10547 non-null  float64
 7   loudness          10547 non-null  float64
 8   audio_mode        10547 non-null  int64  
 9   speechiness       10547 non-null  float64
 10  tempo             10547 non-null  float64
 11  time_signature    10547 non-null  float64
 12  audio_valence     10547 non-null  float64
dtypes: float64(12), int64(1)
memory usage: 1.0 MB


In [6]:
X_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3767 entries, 0 to 3766
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   song_duration_ms  3767 non-null   float64
 1   acousticness      3767 non-null   float64
 2   danceability      3767 non-null   float64
 3   energy            3767 non-null   float64
 4   instrumentalness  3767 non-null   float64
 5   key               3767 non-null   float64
 6   liveness          3767 non-null   float64
 7   loudness          3767 non-null   float64
 8   audio_mode        3767 non-null   int64  
 9   speechiness       3767 non-null   float64
 10  tempo             3767 non-null   float64
 11  time_signature    3767 non-null   float64
 12  audio_valence     3767 non-null   float64
dtypes: float64(12), int64(1)
memory usage: 382.7 KB


In [7]:
X_val.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4521 entries, 0 to 4520
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   song_duration_ms  4521 non-null   float64
 1   acousticness      4521 non-null   float64
 2   danceability      4521 non-null   float64
 3   energy            4521 non-null   float64
 4   instrumentalness  4521 non-null   float64
 5   key               4521 non-null   float64
 6   liveness          4521 non-null   float64
 7   loudness          4521 non-null   float64
 8   audio_mode        4521 non-null   int64  
 9   speechiness       4521 non-null   float64
 10  tempo             4521 non-null   float64
 11  time_signature    4521 non-null   float64
 12  audio_valence     4521 non-null   float64
dtypes: float64(12), int64(1)
memory usage: 459.3 KB


Conclusão: 
- Não há dados nulos;
- Todas as colunas são numéricas;
- Não há coluna de "id";
- Não foi feito nenhum tratamento de dados.

## 3.0 Modelos de Regressão:

### 3.1.0 Linear Regression:

#### Função:

In [12]:
def lr_performance(X_train, y_train, X, y, tipo_dados):

    #define and train:
    model_lr = lm.LinearRegression()
    model_lr.fit(X_train, y_train)

    #predict:
    y_pred = model_lr.predict(X)

    #R2 
    r2 = np.round(mt.r2_score(y, y_pred), 3)

    #MSE
    mse = np.round(mt.mean_squared_error(y, y_pred), 3)

    #RMSE
    rmse = np.round(np.sqrt(mse), 2)

    #MAE
    mae = np.round(mt.mean_absolute_error(y, y_pred), 2)

    #MAPE
    mape= np.round(mt.mean_absolute_percentage_error(y, y_pred), 3)

    performance = [{'algoritmo': 'Linear Regression' , 'dados': tipo_dados, 'R2': r2, 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(performance)

In [13]:
tabela_lr_treino = lr_performance(X_train, y_train, X_train, y_train, "treino")
tabela_lr_val = lr_performance(X_train, y_train, X_val, y_val, "validação")
tabela_lr_test = lr_performance(X_train, y_train, X_test, y_test, "teste")

#### Performance da Regressão Linear:

In [14]:
perf_lr = pd.concat([tabela_lr_treino, tabela_lr_val, tabela_lr_test])
perf_lr

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Linear Regression,treino,0.046,455.996,21.35,17.0,8.653
0,Linear Regression,validação,0.04,458.447,21.41,17.04,8.683
0,Linear Regression,teste,0.052,461.428,21.48,17.13,8.522


#### 3.1.1 Regressão Linear - Lasso:

##### Função:

In [15]:
def lr_lasso(X_train, y_train, X, y, tipo_dados):

    #define and train:
    model_lasso = lm.Lasso(alpha = 1)
    model_lasso.fit(X_train, y_train)

    #predict:
    y_pred = model_lasso.predict(X)

    #R2 
    r2 = np.round(mt.r2_score(y, y_pred), 3)

    #MSE
    mse = np.round(mt.mean_squared_error(y, y_pred), 3)

    #RMSE
    rmse = np.round(np.sqrt(mse), 2)

    #MAE
    mae = np.round(mt.mean_absolute_error(y, y_pred), 2)

    #MAPE
    mape= np.round(mt.mean_absolute_percentage_error(y, y_pred), 3)

    performance = [{'algoritmo': 'Linear Regression -  Lasso' , 'dados': tipo_dados, 'R2': r2, 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(performance)

In [16]:
tabela_lasso_treino = lr_lasso(X_train, y_train, X_train, y_train, "treino")
tabela_lasso_val = lr_lasso(X_train, y_train, X_val, y_val, "validação")
tabela_lasso_test = lr_lasso(X_train, y_train, X_test, y_test, "teste")

##### Performance da Regressão Linear - Lasso:

In [17]:
perf_lasso = pd.concat([tabela_lasso_treino, tabela_lasso_val, tabela_lasso_test])
perf_lasso

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Linear Regression - Lasso,treino,0.007,474.475,21.78,17.31,8.737
0,Linear Regression - Lasso,validação,0.008,473.747,21.77,17.26,8.696
0,Linear Regression - Lasso,teste,0.008,483.178,21.98,17.47,8.753


#### 3.1.2 Regressão Linear - Ridge:

##### Função:

In [18]:
def lr_ridge(X_train, y_train, X, y, tipo_dados):

    #define and train:
    model_ridge = lm.Ridge(alpha = 1)
    model_ridge.fit(X_train, y_train)

    #predict:
    y_pred = model_ridge.predict(X)

    #R2 
    r2 = np.round(mt.r2_score(y, y_pred), 3)

    #MSE
    mse = np.round(mt.mean_squared_error(y, y_pred), 3)

    #RMSE
    rmse = np.round(np.sqrt(mse), 2)

    #MAE
    mae = np.round(mt.mean_absolute_error(y, y_pred), 2)

    #MAPE
    mape= np.round(mt.mean_absolute_percentage_error(y, y_pred), 3)

    performance = [{'algoritmo': 'Linear Regression -  Ridge' , 'dados': tipo_dados, 'R2': r2, 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(performance)

In [19]:
tabela_ridge_treino = lr_ridge(X_train, y_train, X_train, y_train, "treino")
tabela_ridge_val = lr_ridge(X_train, y_train, X_val, y_val, "validação")
tabela_ridge_test = lr_ridge(X_train, y_train, X_test, y_test, "teste")

##### Performance da Ragressão Linear - Ridge:

In [20]:
perf_ridge = pd.concat([tabela_ridge_treino, tabela_ridge_val, tabela_ridge_test])
perf_ridge

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Linear Regression - Ridge,treino,0.046,455.996,21.35,17.0,8.653
0,Linear Regression - Ridge,validação,0.04,458.445,21.41,17.04,8.682
0,Linear Regression - Ridge,teste,0.052,461.431,21.48,17.13,8.523


#### 3.1.3 Regressão Linear - Elastic Net:

##### Função:

In [21]:
def lr_elasticnet(X_train, y_train, X, y, tipo_dados):

    #define and train:
    model_elasticnet = lm.ElasticNet(alpha = 0.1, l1_ratio=1, max_iter = 50)
    model_elasticnet.fit(X_train, y_train)

    #predict:
    y_pred = model_elasticnet.predict(X)

    #R2 
    r2 = np.round(mt.r2_score(y, y_pred), 3)

    #MSE
    mse = np.round(mt.mean_squared_error(y, y_pred), 3)

    #RMSE
    rmse = np.round(np.sqrt(mse), 2)

    #MAE
    mae = np.round(mt.mean_absolute_error(y, y_pred), 2)

    #MAPE
    mape= np.round(mt.mean_absolute_percentage_error(y, y_pred), 3)

    performance = [{'algoritmo': 'Linear Regression -  Elastic Net' , 'dados': tipo_dados, 'R2': r2, 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(performance)

In [22]:
tabela_en_treino = lr_elasticnet(X_train, y_train, X_train, y_train, "treino")
tabela_en_val = lr_elasticnet(X_train, y_train, X_val, y_val, "validação")
tabela_en_test = lr_elasticnet(X_train, y_train, X_test, y_test, "teste")

##### Performance da Regressão Linear - Elastic Net:

In [23]:
perf_en = pd.concat([tabela_en_treino, tabela_en_val, tabela_en_test])
perf_en

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Linear Regression - Elastic Net,treino,0.041,458.309,21.41,17.05,8.668
0,Linear Regression - Elastic Net,validação,0.037,459.75,21.44,17.05,8.687
0,Linear Regression - Elastic Net,teste,0.045,465.123,21.57,17.18,8.593


#### 3.1.4 Resumo da Performance da Regressão Linear:

In [24]:
performance_lr = pd.concat([perf_lr, perf_lasso, perf_ridge, perf_en])
performance_lr

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Linear Regression,treino,0.046,455.996,21.35,17.0,8.653
0,Linear Regression,validação,0.04,458.447,21.41,17.04,8.683
0,Linear Regression,teste,0.052,461.428,21.48,17.13,8.522
0,Linear Regression - Lasso,treino,0.007,474.475,21.78,17.31,8.737
0,Linear Regression - Lasso,validação,0.008,473.747,21.77,17.26,8.696
0,Linear Regression - Lasso,teste,0.008,483.178,21.98,17.47,8.753
0,Linear Regression - Ridge,treino,0.046,455.996,21.35,17.0,8.653
0,Linear Regression - Ridge,validação,0.04,458.445,21.41,17.04,8.682
0,Linear Regression - Ridge,teste,0.052,461.431,21.48,17.13,8.523
0,Linear Regression - Elastic Net,treino,0.041,458.309,21.41,17.05,8.668


### 3.2.0 Polinomial Regression:

#### Encontrando o melhor valor do parâmetro degree:

In [25]:
degree = np.arange(1, 4)

r2_list = []
mse_list = []
rmse_list = []
mae_list = []
mape_list = []

In [26]:
for i in degree:
    # Define polynomial model
    poly = pp.PolynomialFeatures(degree=i)
    X_poly_train = poly.fit_transform(X_train)

    # Train and Fit Model
    poly_reg = lm.LinearRegression()
    poly_reg.fit(X_poly_train, y_train)

    # Predict
    y_pred = poly_reg.predict(X_poly_train)

    # Performance Metrics
    r2 = mt.r2_score(y_train, y_pred)
    mse = mt.mean_squared_error(y_train, y_pred)
    rmse = np.sqrt(mse)
    mae = mt.mean_absolute_error(y_train, y_pred)
    mape = mt.mean_absolute_percentage_error(y_train, y_pred)

    r2_list.append(r2)
    mse_list.append(mse)
    rmse_list.append(rmse)
    mae_list.append(mae)
    mape_list.append(mape)

    print(f"Degree: {i}, R2: {r2}, MSE: {mse}, RMSE: {rmse}, MAE: {mae}, MAPE: {mape}" )

Degree: 1, R2: 0.04605830473391903, MSE: 455.99611182562677, RMSE: 21.35406546364478, MAE: 16.9982490660112, MAPE: 8.65318594380437
Degree: 2, R2: 0.09419491057528084, MSE: 432.9862096386579, RMSE: 20.80832068280999, MAE: 16.458031755824443, MAPE: 8.35053982092811
Degree: 3, R2: 0.154417721408185, MSE: 404.1989496632411, RMSE: 20.10469969094891, MAE: 15.883591754209826, MAPE: 7.800181445552562


Conclusão: Escolhido o Degree 2.

#### Função:

In [27]:
def poly_performance(X_train, y_train, X, y, n, tipo_dados):

    #define
    poly = pp.PolynomialFeatures(degree=n)
    X_poly_train = poly.fit_transform(X_train)
    X_poly = poly.fit_transform(X)

    # Train and Fit Model
    poly_reg = lm.LinearRegression()
    poly_reg.fit(X_poly_train, y_train)

    # Predict
    y_pred = poly_reg.predict(X_poly)

    # Performance Metrics
    r2 = mt.r2_score(y, y_pred)
    mse = mt.mean_squared_error(y, y_pred)
    rmse = np.sqrt(mse)
    mae = mt.mean_absolute_error(y, y_pred)
    mape = mt.mean_absolute_percentage_error(y, y_pred)

    perf_poly = [{'algoritmo': 'Polinomial Regression' , 'dados': tipo_dados, 'R2': r2 , 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(perf_poly)

- Dados de treino:

In [28]:
tabela_poly_treino = poly_performance(X_train, y_train, X_train, y_train, 2, 'treino')
tabela_poly_treino

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Polinomial Regression,treino,0.094195,432.98621,20.808321,16.458032,8.35054


- Dados de validação:

In [29]:
tabela_poly_val = poly_performance(X_train, y_train, X_val, y_val, 2, 'validação')
tabela_poly_val

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Polinomial Regression,validação,0.066477,445.768223,21.113224,16.749939,8.547931


- Dados de teste:

In [30]:
tabela_poly_test = poly_performance(X_train, y_train, X_test, y_test, 2, 'teste')
tabela_poly_test

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Polinomial Regression,teste,0.090079,443.041256,21.048545,16.720535,8.242464


#### Performance da Regressão Polinomial de Grau 2:

In [31]:
perf_pr = pd.concat([tabela_poly_treino, tabela_poly_val, tabela_poly_test])
perf_pr

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Polinomial Regression,treino,0.094195,432.98621,20.808321,16.458032,8.35054
0,Polinomial Regression,validação,0.066477,445.768223,21.113224,16.749939,8.547931
0,Polinomial Regression,teste,0.090079,443.041256,21.048545,16.720535,8.242464


#### 3.2.1 Regressão Polinomial - Lasso:

In [32]:
#Instanciando a função do modelo:

def poly_lasso_performance(X_train, y_train, X, y, tipo_dados):

    #define
    poly = pp.PolynomialFeatures(degree=2)
    X_poly_train = poly.fit_transform(X_train)
    X_poly = poly.fit_transform(X)
    
    # Train and Fit Model
    poly_lasso = lm.Lasso(alpha = 19, max_iter=1000)
    poly_lasso.fit(X_poly_train, y_train)

    # Predict
    y_pred = poly_lasso.predict(X_poly)

    # Performance Metrics
    r2 = np.round(mt.r2_score(y, y_pred),3)
    mse = np.round(mt.mean_squared_error(y, y_pred),3)
    rmse = np.round(np.sqrt(mse),3)
    mae = np.round(mt.mean_absolute_error(y, y_pred),3)
    mape = np.round(mt.mean_absolute_percentage_error(y, y_pred),3)

    perf_poly = [{'algoritmo': 'Polinomial Regression Lasso' , 'dados': tipo_dados, 'R2': r2 , 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(perf_poly)

In [33]:
#Realizando a predição com os dados de treino, teste e validação:

tabela_plasso_treino = poly_lasso_performance(X_train, y_train, X_train, y_train, "treino")
tabela_plasso_val = poly_lasso_performance(X_train, y_train, X_val, y_val, "validação")
tabela_plasso_test = poly_lasso_performance(X_train, y_train, X_test, y_test, "teste")

In [34]:
#Performance do modelo Regressão Polinomial - Lasso:

perf_plasso = pd.concat([tabela_plasso_treino, tabela_plasso_val, tabela_plasso_test])
perf_plasso

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Polinomial Regression Lasso,treino,0.0,478.013,21.864,17.365,8.742
0,Polinomial Regression Lasso,validação,-0.0,477.512,21.852,17.353,8.679
0,Polinomial Regression Lasso,teste,-0.0,486.961,22.067,17.551,8.715


#### 3.2.2 Regressão Polinomial- Ridge:

In [35]:
#Instanciando a função do modelo:

def poly_ridge_performance(X_train, y_train, X, y, tipo_dados):

    #define
    poly = pp.PolynomialFeatures(degree=2)
    X_poly_train = poly.fit_transform(X_train)
    X_poly = poly.fit_transform(X)
    
    # Train and Fit Model
    poly_ridge = lm.Ridge()
    poly_ridge.fit(X_poly_train, y_train)

    # Predict
    y_pred = poly_ridge.predict(X_poly)

    # Performance Metrics
    r2 = np.round(mt.r2_score(y, y_pred),3)
    mse = np.round(mt.mean_squared_error(y, y_pred),3)
    rmse = np.round(np.sqrt(mse),3)
    mae = np.round(mt.mean_absolute_error(y, y_pred),3)
    mape = np.round(mt.mean_absolute_percentage_error(y, y_pred),3)

    perf_poly = [{'algoritmo': 'Polinomial Regression Ridge' , 'dados': tipo_dados, 'R2': r2 , 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(perf_poly)

In [36]:
#Realizando a predição com os dados de treino, teste e validação:

tabela_pridge_treino = poly_ridge_performance(X_train, y_train, X_train, y_train, "treino")
tabela_pridge_val = poly_ridge_performance(X_train, y_train, X_val, y_val, "validação")
tabela_pridge_test = poly_ridge_performance(X_train, y_train, X_test, y_test, "teste")

In [37]:
#Performance do modelo Regressão Polinomial - Ridge:

perf_pridge = pd.concat([tabela_pridge_treino, tabela_pridge_val, tabela_pridge_test])
perf_pridge

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Polinomial Regression Ridge,treino,0.093,433.475,20.82,16.472,8.373
0,Polinomial Regression Ridge,validação,0.068,445.184,21.099,16.739,8.569
0,Polinomial Regression Ridge,teste,0.089,443.485,21.059,16.729,8.289


#### 3.2.3 Regressão Polinomial - Elastic Net:

In [38]:
#Instanciando a função do modelo:

def poly_en_performance(X_train, y_train, X, y, tipo_dados):

    #define
    poly = pp.PolynomialFeatures(degree=2)
    X_poly_train = poly.fit_transform(X_train)
    X_poly = poly.fit_transform(X)
    
    # Train and Fit Model
    poly_en = lm.ElasticNet()
    poly_en.fit(X_poly_train, y_train)

    # Predict
    y_pred = poly_en.predict(X_poly)

    # Performance Metrics
    r2 = np.round(mt.r2_score(y, y_pred),3)
    mse = np.round(mt.mean_squared_error(y, y_pred),3)
    rmse = np.round(np.sqrt(mse),3)
    mae = np.round(mt.mean_absolute_error(y, y_pred),3)
    mape = np.round(mt.mean_absolute_percentage_error(y, y_pred),3)

    perf_poly = [{'algoritmo': 'Polinomial Regression Elastic Net' , 'dados': tipo_dados, 'R2': r2 , 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(perf_poly)

In [39]:
#Realizando a predição com os dados de treino, teste e validação:

tabela_pen_treino = poly_en_performance(X_train, y_train, X_train, y_train, "treino")
tabela_pen_val = poly_en_performance(X_train, y_train, X_val, y_val, "validação")
tabela_pen_test = poly_en_performance(X_train, y_train, X_test, y_test, "teste")

In [40]:
#Performance do modelo Regressão Polinomial - Elastic Net:

perf_pen = pd.concat([tabela_pen_treino, tabela_pen_val, tabela_pen_test])
perf_pen

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Polinomial Regression Elastic Net,treino,0.013,471.878,21.723,17.244,8.679
0,Polinomial Regression Elastic Net,validação,0.013,471.408,21.712,17.2,8.675
0,Polinomial Regression Elastic Net,teste,0.011,481.695,21.948,17.426,8.751


#### 3.2.4 Resumo da Performance da Regressão Polinomial:

In [41]:
performance_reg_poly = pd.concat([perf_pr, perf_plasso, perf_pridge, perf_pen])
performance_reg_poly

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Polinomial Regression,treino,0.094195,432.98621,20.808321,16.458032,8.35054
0,Polinomial Regression,validação,0.066477,445.768223,21.113224,16.749939,8.547931
0,Polinomial Regression,teste,0.090079,443.041256,21.048545,16.720535,8.242464
0,Polinomial Regression Lasso,treino,0.0,478.013,21.864,17.365,8.742
0,Polinomial Regression Lasso,validação,-0.0,477.512,21.852,17.353,8.679
0,Polinomial Regression Lasso,teste,-0.0,486.961,22.067,17.551,8.715
0,Polinomial Regression Ridge,treino,0.093,433.475,20.82,16.472,8.373
0,Polinomial Regression Ridge,validação,0.068,445.184,21.099,16.739,8.569
0,Polinomial Regression Ridge,teste,0.089,443.485,21.059,16.729,8.289
0,Polinomial Regression Elastic Net,treino,0.013,471.878,21.723,17.244,8.679


### 3.3.0 Decision Tree Regressor:

In [42]:
#Instanciando a função do modelo:

def tree_perf(X_train, y_train, X, y, tipo_dados):

    #define and train:
    model = tr.DecisionTreeRegressor(random_state=0)
    model.fit(X_train, y_train)

    #predict:
    y_pred = model.predict(X)

    #R2 
    r2 = np.round(mt.r2_score(y, y_pred), 3)

    #MSE
    mse = np.round(mt.mean_squared_error(y, y_pred), 3)

    #RMSE
    rmse = np.round(np.sqrt(mse), 2)

    #MAE
    mae = np.round(mt.mean_absolute_error(y, y_pred), 2)

    #MAPE
    mape= np.round(mt.mean_absolute_percentage_error(y, y_pred), 3)

    performance = [{'algoritmo': 'Decision Tree Regressor' , 'dados': tipo_dados, 'R2': r2, 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(performance)

In [43]:
#Realizando a predição com os dados de treino, teste e validação:

tabela_tree_treino = tree_perf(X_train, y_train, X_train, y_train, "treino")
tabela_tree_val = tree_perf(X_train, y_train, X_val, y_val, "validação")
tabela_tree_test = tree_perf(X_train, y_train, X_test, y_test, "teste")

In [44]:
#Performance do modelo Decision Tree Regressor:

perf_tree = pd.concat([tabela_tree_treino, tabela_tree_val, tabela_tree_test])
perf_tree 

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Decision Tree Regressor,treino,0.992,3.94,1.98,0.21,0.083
0,Decision Tree Regressor,validação,-0.292,617.164,24.84,17.06,7.304
0,Decision Tree Regressor,teste,-0.244,605.65,24.61,16.96,6.31


### 3.4.0 Random Forest Regressor:

In [45]:
#Instanciando a função do modelo:

def randomforest_perf(X_train, y_train, X, y, tipo_dados):

    #define and train:
    model = RandomForestRegressor()
    model.fit(X_train, y_train)

    #predict:
    y_pred = model.predict(X)

    #R2 
    r2 = np.round(mt.r2_score(y, y_pred), 3)

    #MSE
    mse = np.round(mt.mean_squared_error(y, y_pred), 3)

    #RMSE
    rmse = np.round(np.sqrt(mse), 2)

    #MAE
    mae = np.round(mt.mean_absolute_error(y, y_pred), 2)

    #MAPE
    mape= np.round(mt.mean_absolute_percentage_error(y, y_pred), 3)

    performance = [{'algoritmo': 'Random Forest Regressor' , 'dados': tipo_dados, 'R2': r2, 'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape}]
    
    return pd.DataFrame(performance)

In [47]:
#Realizando a predição com os dados de treino, teste e validação:

#y_test = y_test.values.ravel()
#y_train = y_train.values.ravel()
#y_val = y_val.values.ravel()

tabela_rf_treino = randomforest_perf(X_train, y_train, X_train, y_train, "treino")
tabela_rf_val = randomforest_perf(X_train, y_train, X_val, y_val, "validação")
tabela_rf_test = randomforest_perf(X_train, y_train, X_test, y_test, "teste")

In [48]:
#Performance do modelo Decision Tree Regressor:

perf_rf = pd.concat([tabela_rf_treino, tabela_rf_val, tabela_rf_test])
perf_rf

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Random Forest Regressor,treino,0.903,46.169,6.79,4.87,2.576
0,Random Forest Regressor,validação,0.334,317.937,17.83,12.99,7.08
0,Random Forest Regressor,teste,0.351,316.12,17.78,13.03,6.576


## 4.0 Resumo:

In [49]:
resumo = pd.concat([perf_lr, perf_lasso, perf_ridge, perf_en, 
                    perf_pr, perf_plasso, perf_pridge, perf_pen,
                    perf_tree, perf_rf])
resumo

Unnamed: 0,algoritmo,dados,R2,MSE,RMSE,MAE,MAPE
0,Linear Regression,treino,0.046,455.996,21.35,17.0,8.653
0,Linear Regression,validação,0.04,458.447,21.41,17.04,8.683
0,Linear Regression,teste,0.052,461.428,21.48,17.13,8.522
0,Linear Regression - Lasso,treino,0.007,474.475,21.78,17.31,8.737
0,Linear Regression - Lasso,validação,0.008,473.747,21.77,17.26,8.696
0,Linear Regression - Lasso,teste,0.008,483.178,21.98,17.47,8.753
0,Linear Regression - Ridge,treino,0.046,455.996,21.35,17.0,8.653
0,Linear Regression - Ridge,validação,0.04,458.445,21.41,17.04,8.682
0,Linear Regression - Ridge,teste,0.052,461.431,21.48,17.13,8.523
0,Linear Regression - Elastic Net,treino,0.041,458.309,21.41,17.05,8.668
