<a href="https://colab.research.google.com/github/ladsong/if697-2020.2-data-science/blob/projeto2/Projeto_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Preparando o ambiente

In [1]:
import pandas as pd
import numpy as np
import re
import gc
import warnings
import sys

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [3]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/My Drive/Kaggle"
# /content/gdrive/My Drive/Kaggle é o caminho onde o arquivo kaggle.json está presente do Google Drive
#Mudar o diretorio
%cd /content/gdrive/My Drive/Kaggle

/content/gdrive/My Drive/Kaggle


##Carregando o dataset e processando os dados
No projeto 1, foi adicionado o dump dos dados para que possamos utiliza-los aqui. O objetivo desse projeto é predizer a popularidade de uma música através de suas caracteristicas informadas no subset.

In [49]:
df = pd.read_csv('project1_output.csv')

Observamos que o dataset é grande e assim decidimos utilizar um subset  com o objetivo de diminuir o tempo de experimentação

In [50]:
df = df[:5000]
df = df.rename(columns = lambda x:re.sub('[^A-Za-z0-9_]+', '', x))
df.dtypes

year                   int64
acousticness         float64
artists_data          object
danceability         float64
duration_ms            int64
energy               float64
explicit               int64
id                    object
instrumentalness     float64
loudness             float64
name                  object
popularity           float64
release_date          object
speechiness          float64
tempo                float64
first_artist          object
artists_by_artist     object
first_genre           object
dtype: object

Tirando colunas desnecessárias, como `artist_data, id, name, release_date, first_artist, artists_by_artist` e `first_genre,` pois só queremos trabalhar com valores numéricos




In [51]:
df = df.select_dtypes(exclude=['object'])

In [52]:
df.columns

Index(['year', 'acousticness', 'danceability', 'duration_ms', 'energy',
       'explicit', 'instrumentalness', 'loudness', 'popularity', 'speechiness',
       'tempo'],
      dtype='object')



###Escolher coluna para predição

In [53]:
target_col = df['popularity']

In [54]:
target_col

0       0.04
1       0.02
2       0.04
3       0.00
4       0.01
        ... 
4995    0.00
4996    0.00
4997    0.00
4998    0.00
4999    0.00
Name: popularity, Length: 5000, dtype: float64

In [55]:
df = df.drop(columns=['popularity'])

##Separando os dados em teste e predição
Para realizacão do treinamento, teste e vaidacão, iremos separa o subset em: 

*   3/5 dos dados para treinamento
*   1/5 dos dados para teste 
*   1/5 dos dados para validacão

Seguindo assim, uma base 60/20/20



In [56]:
def get_x_data():
    # input 
    train, val, test = np.split(df.sample(frac=1), [int(.6*len(df)), int(.8*len(df))])
    
    return train, val, test

In [57]:
def get_y_data():
    # output
    train_labels, val_labels, test_labels = (
        np.split(
            target_col, 
            [int(.6*len(target_col)), int(.8*len(target_col))])
    )
    
    return train_labels, val_labels, test_labels

##Escolher os 4 algoritmos para predição

Os algoritmos de predicão que iremos usar são:

*   Regressão Linear
*   Multilayer perceptron
*   Random forests
*   Gradient boost com lightgbm


In [38]:
!pip install mlflow --quiet

In [39]:
!pip install optuna



In [58]:
import mlflow
import mlflow.sklearn

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import mixed_precision
from tensorflow.keras.layers.experimental import preprocessing

import optuna

from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score,
    confusion_matrix,
    classification_report,
    accuracy_score
)
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor

import lightgbm
from lightgbm import LGBMRegressor

#Função de avaliação das metricas


In [59]:
def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    return rmse, mae

In [60]:
mlflow.sklearn.autolog()
mlflow.tensorflow.autolog()
mlflow.lightgbm.autolog()



###Regressão Linear

In [62]:
def linear_regression(trial):
    train, test, val = get_x_data()
    train_labels, val_labels, test_labels = get_y_data()
    
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    with mlflow.start_run(run_name="Linear Regression"):
        reg = LinearRegression()
        reg.fit(train, train_labels)

        predictions = reg.predict(val)

        (rmse, mae) = eval_metrics(val_labels, predictions)

        print("Modelo de regressão linear")
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)

        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("mae", mae)
        
        
        gc.collect()
        
        return rmse

In [63]:
study = optuna.create_study()
study.optimize(linear_regression, n_trials=1)

[32m[I 2021-08-17 01:17:36,432][0m A new study created in memory with name: no-name-50d9be8c-8a38-4b20-b1eb-ae94e3bf35eb[0m


Modelo de regressão linear
  RMSE: 0.13007774724559681
  MAE: 0.08351024898801312


[32m[I 2021-08-17 01:17:36,954][0m Trial 0 finished with value: 0.13007774724559681 and parameters: {}. Best is trial 0 with value: 0.13007774724559681.[0m


###Multilayer Perceptron

In [64]:
def mlp(trial):
    train, test, val = get_x_data()
    train_labels, val_labels, test_labels = get_y_data()
    
    params = {
        "hidden_units": trial.suggest_int("hidden_units", 3, 15),
        "lr": trial.suggest_float("lr", 1e-5, 1e-3, log=True),
        "epochs": trial.suggest_int("epochs", 10, 50)
    }
    
    warnings.filterwarnings("ignore")
    np.random.seed(40)
    
    with mlflow.start_run(run_name="MLP"):
        normalizer = preprocessing.Normalization(axis=-1)
        normalizer.adapt(np.array(train))
        
        mlp_model = tf.keras.Sequential([
            normalizer,
            layers.Dense(units=params["hidden_units"]),
            layers.Dense(units=params["hidden_units"]),
            layers.Dense(units=params["hidden_units"]),
            layers.Dense(units=1),
        ])

        mlp_model.summary()
        
        mlp_model.compile(
            optimizer=tf.optimizers.Adam(learning_rate=params["lr"]),
            loss='mean_squared_error'
        )

        history = mlp_model.fit(
            train, train_labels,
            validation_data=(test, test_labels),
            epochs=params["epochs"]
        )
        
        predictions = mlp_model.predict(val)

        (rmse, mae) = eval_metrics(val_labels, predictions)

        print("MLP model")
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)

        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("mae", mae)
        mlflow.log_params(trial.params)
        mlflow.set_tags(
            {
                "estimator_name":"MultiLayerPerceptron",
                "estimator_class":"Keras"
            }
        )
        
        tf.keras.backend.clear_session()

        gc.collect()
        
        return rmse

In [65]:

study = optuna.create_study()
study.optimize(mlp, n_trials=10)

[32m[I 2021-08-17 01:17:45,126][0m A new study created in memory with name: no-name-cbdeedb0-58d5-4d93-be34-d9399d758b4a[0m


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization_1 (Normalizati (None, 10)                21        
_________________________________________________________________
dense_4 (Dense)              (None, 14)                154       
_________________________________________________________________
dense_5 (Dense)              (None, 14)                210       
_________________________________________________________________
dense_6 (Dense)              (None, 14)                210       
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 15        
Total params: 610
Trainable params: 589
Non-trainable params: 21
_________________________________________________________________
Epoch 1/17
Epoch 2/17
Epoch 3/17
Epoch 4/17
Epoch 5/17
Epoch 6/17
Epoch 7/17
Epoch 8/17
Epoch 9/17
Epoch 10/17
Epoch 11/1

[32m[I 2021-08-17 01:17:57,155][0m Trial 0 finished with value: 0.3674490223143016 and parameters: {'hidden_units': 14, 'lr': 3.566083414277468e-05, 'epochs': 17}. Best is trial 0 with value: 0.3674490223143016.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 3)                 33        
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 12        
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 12        
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 4         
Total params: 82
Trainable params: 61
Non-trainable params: 21
_________________________________________________________________
Epoch 1/22
Epoch 2/22
Epoch 3/22
Epoch 4/22
Epoch 5/22
Epoch 6/22
Epoch 7/22
Epoch 8/22
Epoch 9/22
Epoch 10/22
Epoch 11/22
Ep

[32m[I 2021-08-17 01:18:04,588][0m Trial 1 finished with value: 1.597210751913307 and parameters: {'hidden_units': 3, 'lr': 4.2617695824325495e-05, 'epochs': 22}. Best is trial 0 with value: 0.3674490223143016.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 7)                 77        
_________________________________________________________________
dense_1 (Dense)              (None, 7)                 56        
_________________________________________________________________
dense_2 (Dense)              (None, 7)                 56        
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 8         
Total params: 218
Trainable params: 197
Non-trainable params: 21
_________________________________________________________________
Epoch 1/44
Epoch 2/44
Epoch 3/44
Epoch 4/44
Epoch 5/44
Epoch 6/44
Epoch 7/44
Epoch 8/44
Epoch 9/44
Epoch 10/44
Epoch 11/44


[32m[I 2021-08-17 01:18:16,605][0m Trial 2 finished with value: 0.13499030385042662 and parameters: {'hidden_units': 7, 'lr': 6.873920055800635e-05, 'epochs': 44}. Best is trial 2 with value: 0.13499030385042662.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 13)                143       
_________________________________________________________________
dense_1 (Dense)              (None, 13)                182       
_________________________________________________________________
dense_2 (Dense)              (None, 13)                182       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 14        
Total params: 542
Trainable params: 521
Non-trainable params: 21
_________________________________________________________________
Epoch 1/37
Epoch 2/37
Epoch 3/37
Epoch 4/37
Epoch 5/37
Epoch 6/37
Epoch 7/37
Epoch 8/37
Epoch 9/37
Epoch 10/37
Epoch 11/37


[32m[I 2021-08-17 01:18:28,665][0m Trial 3 finished with value: 0.12948633635721293 and parameters: {'hidden_units': 13, 'lr': 0.0001467568330615016, 'epochs': 37}. Best is trial 3 with value: 0.12948633635721293.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 13)                143       
_________________________________________________________________
dense_1 (Dense)              (None, 13)                182       
_________________________________________________________________
dense_2 (Dense)              (None, 13)                182       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 14        
Total params: 542
Trainable params: 521
Non-trainable params: 21
_________________________________________________________________
Epoch 1/43
Epoch 2/43
Epoch 3/43
Epoch 4/43
Epoch 5/43
Epoch 6/43
Epoch 7/43
Epoch 8/43
Epoch 9/43
Epoch 10/43
Epoch 11/43


[32m[I 2021-08-17 01:18:41,295][0m Trial 4 finished with value: 0.12994271856002898 and parameters: {'hidden_units': 13, 'lr': 9.418718530131483e-05, 'epochs': 43}. Best is trial 3 with value: 0.12948633635721293.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 14)                154       
_________________________________________________________________
dense_1 (Dense)              (None, 14)                210       
_________________________________________________________________
dense_2 (Dense)              (None, 14)                210       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 15        
Total params: 610
Trainable params: 589
Non-trainable params: 21
_________________________________________________________________
Epoch 1/41
Epoch 2/41
Epoch 3/41
Epoch 4/41
Epoch 5/41
Epoch 6/41
Epoch 7/41
Epoch 8/41
Epoch 9/41
Epoch 10/41
Epoch 11/41


[32m[I 2021-08-17 01:18:53,538][0m Trial 5 finished with value: 0.13176706403213764 and parameters: {'hidden_units': 14, 'lr': 0.0007220704618852015, 'epochs': 41}. Best is trial 3 with value: 0.12948633635721293.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 9)                 99        
_________________________________________________________________
dense_1 (Dense)              (None, 9)                 90        
_________________________________________________________________
dense_2 (Dense)              (None, 9)                 90        
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 10        
Total params: 310
Trainable params: 289
Non-trainable params: 21
_________________________________________________________________
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30


[32m[I 2021-08-17 01:19:03,484][0m Trial 6 finished with value: 0.3023314311213293 and parameters: {'hidden_units': 9, 'lr': 6.957766341766563e-05, 'epochs': 30}. Best is trial 3 with value: 0.12948633635721293.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 10)                110       
_________________________________________________________________
dense_1 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_2 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 11        
Total params: 362
Trainable params: 341
Non-trainable params: 21
_________________________________________________________________
Epoch 1/37
Epoch 2/37
Epoch 3/37
Epoch 4/37
Epoch 5/37
Epoch 6/37
Epoch 7/37
Epoch 8/37
Epoch 9/37
Epoch 10/37
Epoch 11/37


[32m[I 2021-08-17 01:19:15,642][0m Trial 7 finished with value: 0.13129157540998543 and parameters: {'hidden_units': 10, 'lr': 0.0003237035794040871, 'epochs': 37}. Best is trial 3 with value: 0.12948633635721293.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 6)                 66        
_________________________________________________________________
dense_1 (Dense)              (None, 6)                 42        
_________________________________________________________________
dense_2 (Dense)              (None, 6)                 42        
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 7         
Total params: 178
Trainable params: 157
Non-trainable params: 21
_________________________________________________________________
Epoch 1/49
Epoch 2/49
Epoch 3/49
Epoch 4/49
Epoch 5/49
Epoch 6/49
Epoch 7/49
Epoch 8/49
Epoch 9/49
Epoch 10/49
Epoch 11/49


[32m[I 2021-08-17 01:19:38,045][0m Trial 8 finished with value: 0.13025249308712858 and parameters: {'hidden_units': 6, 'lr': 0.00017408766108121688, 'epochs': 49}. Best is trial 3 with value: 0.12948633635721293.[0m


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
normalization (Normalization (None, 10)                21        
_________________________________________________________________
dense (Dense)                (None, 11)                121       
_________________________________________________________________
dense_1 (Dense)              (None, 11)                132       
_________________________________________________________________
dense_2 (Dense)              (None, 11)                132       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 12        
Total params: 418
Trainable params: 397
Non-trainable params: 21
_________________________________________________________________
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40


[32m[I 2021-08-17 01:19:49,931][0m Trial 9 finished with value: 0.12808454436221042 and parameters: {'hidden_units': 11, 'lr': 0.0006410770284508693, 'epochs': 40}. Best is trial 9 with value: 0.12808454436221042.[0m


###Random Forest

In [66]:
def random_forest(trial):
    train, test, val = get_x_data()
    train_labels, val_labels, test_labels = get_y_data()
    
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 150),
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "min_samples_split": trial.suggest_int("min_samples_split", 2, 5),
    }
    
    warnings.filterwarnings("ignore")
    np.random.seed(40)
    
    with mlflow.start_run(run_name="Random Forest"):
        rf = RandomForestRegressor(
            max_depth=params["max_depth"],
            n_estimators=params["n_estimators"],
            min_samples_split=params["min_samples_split"],
            random_state=0
        )
        rf.fit(train, train_labels)
        
        predictions = rf.predict(val)
        
        (rmse, mae) = eval_metrics(val_labels, predictions)
        
        print("Random Forest model")
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("mae", mae)
        mlflow.log_params(trial.params)
        
        gc.collect()
        
        return rmse

In [67]:
study = optuna.create_study()
study.optimize(random_forest, n_trials=10)

[32m[I 2021-08-17 01:20:18,889][0m A new study created in memory with name: no-name-b479c938-bc2a-4aa3-984d-9537ac272c97[0m


Random Forest model
  RMSE: 0.1314256592118017
  MAE: 0.08515884839388493


[32m[I 2021-08-17 01:20:21,077][0m Trial 0 finished with value: 0.1314256592118017 and parameters: {'n_estimators': 92, 'max_depth': 9, 'min_samples_split': 3}. Best is trial 0 with value: 0.1314256592118017.[0m


Random Forest model
  RMSE: 0.1308495018336663
  MAE: 0.08465131118158103


[32m[I 2021-08-17 01:20:23,357][0m Trial 1 finished with value: 0.1308495018336663 and parameters: {'n_estimators': 122, 'max_depth': 7, 'min_samples_split': 2}. Best is trial 1 with value: 0.1308495018336663.[0m


Random Forest model
  RMSE: 0.13105243094581254
  MAE: 0.08485392485558302


[32m[I 2021-08-17 01:20:24,943][0m Trial 2 finished with value: 0.13105243094581254 and parameters: {'n_estimators': 62, 'max_depth': 8, 'min_samples_split': 3}. Best is trial 1 with value: 0.1308495018336663.[0m


Random Forest model
  RMSE: 0.13146618299193122
  MAE: 0.08512712138234363


[32m[I 2021-08-17 01:20:27,167][0m Trial 3 finished with value: 0.13146618299193122 and parameters: {'n_estimators': 99, 'max_depth': 9, 'min_samples_split': 2}. Best is trial 1 with value: 0.1308495018336663.[0m


Random Forest model
  RMSE: 0.13104756281281413
  MAE: 0.08477837352381126


[32m[I 2021-08-17 01:20:29,603][0m Trial 4 finished with value: 0.13104756281281413 and parameters: {'n_estimators': 120, 'max_depth': 8, 'min_samples_split': 4}. Best is trial 1 with value: 0.1308495018336663.[0m


Random Forest model
  RMSE: 0.13117927744415908
  MAE: 0.08495719411581604


[32m[I 2021-08-17 01:20:32,256][0m Trial 5 finished with value: 0.13117927744415908 and parameters: {'n_estimators': 120, 'max_depth': 9, 'min_samples_split': 5}. Best is trial 1 with value: 0.1308495018336663.[0m


Random Forest model
  RMSE: 0.13089874770936405
  MAE: 0.08474632363199583


[32m[I 2021-08-17 01:20:33,967][0m Trial 6 finished with value: 0.13089874770936405 and parameters: {'n_estimators': 77, 'max_depth': 7, 'min_samples_split': 4}. Best is trial 1 with value: 0.1308495018336663.[0m


Random Forest model
  RMSE: 0.13113398739037418
  MAE: 0.08472899644465384


[32m[I 2021-08-17 01:20:36,730][0m Trial 7 finished with value: 0.13113398739037418 and parameters: {'n_estimators': 148, 'max_depth': 8, 'min_samples_split': 2}. Best is trial 1 with value: 0.1308495018336663.[0m


Random Forest model
  RMSE: 0.13088249166934718
  MAE: 0.08483020271396965


[32m[I 2021-08-17 01:20:38,931][0m Trial 8 finished with value: 0.13088249166934718 and parameters: {'n_estimators': 105, 'max_depth': 7, 'min_samples_split': 5}. Best is trial 1 with value: 0.1308495018336663.[0m


Random Forest model
  RMSE: 0.13104747490338675
  MAE: 0.08456780324609338


[32m[I 2021-08-17 01:20:40,261][0m Trial 9 finished with value: 0.13104747490338675 and parameters: {'n_estimators': 88, 'max_depth': 3, 'min_samples_split': 5}. Best is trial 1 with value: 0.1308495018336663.[0m


###Gradient boost com lightgbm

In [68]:
def gradient_boosting(trial):
    train, test, val = get_x_data()
    train_labels, val_labels, test_labels = get_y_data()
    
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 150),
        "num_leaves": trial.suggest_int("num_leaves", 25, 35),
        "max_depth": trial.suggest_int("max_depth", 2, 10)
    }
    
    warnings.filterwarnings("ignore")
    np.random.seed(40)
    
    with mlflow.start_run(run_name="Gradient Boosting"):
        model = XGBRegressor(
            max_depth=params["max_depth"],
            n_estimators=params["n_estimators"],
        )
        model.fit(train, train_labels)
        
        predictions = model.predict(test)
        print('Prediction: %.3f' % predictions[0])
        
        (rmse, mae) = eval_metrics(val_labels, predictions)

        print("LGBM model")
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)

        # Log mlflow attributes for mlflow UI
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("mae", mae)
        mlflow.log_params(trial.params)
        mlflow.set_tags(
            {
                "estimator_class":"LightGBM",
                "estimator_name":"Gradient Boosting"
            }
        )
        mlflow.sklearn.log_model(model, "model")
        
        gc.collect()
        
        return rmse

In [69]:
study = optuna.create_study()
study.optimize(gradient_boosting, n_trials=10)

[32m[I 2021-08-17 01:20:50,679][0m A new study created in memory with name: no-name-33bff1bf-22a2-4dae-b789-be0ba5bdf38b[0m


Prediction: 0.047
LGBM model
  RMSE: 0.13193137386516426
  MAE: 0.0849083552324772


[32m[I 2021-08-17 01:20:51,500][0m Trial 0 finished with value: 0.13193137386516426 and parameters: {'n_estimators': 80, 'num_leaves': 35, 'max_depth': 4}. Best is trial 0 with value: 0.13193137386516426.[0m


Prediction: 0.053
LGBM model
  RMSE: 0.1309694932155369
  MAE: 0.08409614305257795


[32m[I 2021-08-17 01:20:52,193][0m Trial 1 finished with value: 0.1309694932155369 and parameters: {'n_estimators': 94, 'num_leaves': 25, 'max_depth': 2}. Best is trial 1 with value: 0.1309694932155369.[0m


Prediction: 0.027
LGBM model
  RMSE: 0.13748194357603463
  MAE: 0.08948320661306382


[32m[I 2021-08-17 01:20:53,574][0m Trial 2 finished with value: 0.13748194357603463 and parameters: {'n_estimators': 140, 'num_leaves': 30, 'max_depth': 7}. Best is trial 1 with value: 0.1309694932155369.[0m


Prediction: 0.049
LGBM model
  RMSE: 0.1311140683268585
  MAE: 0.08431868470549583


[32m[I 2021-08-17 01:20:54,282][0m Trial 3 finished with value: 0.1311140683268585 and parameters: {'n_estimators': 57, 'num_leaves': 29, 'max_depth': 4}. Best is trial 1 with value: 0.1309694932155369.[0m


Prediction: 0.026
LGBM model
  RMSE: 0.13571364493282384
  MAE: 0.08876396795272828


[32m[I 2021-08-17 01:20:55,277][0m Trial 4 finished with value: 0.13571364493282384 and parameters: {'n_estimators': 78, 'num_leaves': 31, 'max_depth': 8}. Best is trial 1 with value: 0.1309694932155369.[0m


Prediction: 0.041
LGBM model
  RMSE: 0.13310145330378742
  MAE: 0.0854999857211113


[32m[I 2021-08-17 01:20:56,146][0m Trial 5 finished with value: 0.13310145330378742 and parameters: {'n_estimators': 124, 'num_leaves': 31, 'max_depth': 4}. Best is trial 1 with value: 0.1309694932155369.[0m


Prediction: 0.020
LGBM model
  RMSE: 0.13553866299464548
  MAE: 0.0875528560936451


[32m[I 2021-08-17 01:20:57,258][0m Trial 6 finished with value: 0.13553866299464548 and parameters: {'n_estimators': 87, 'num_leaves': 30, 'max_depth': 9}. Best is trial 1 with value: 0.1309694932155369.[0m


Prediction: 0.029
LGBM model
  RMSE: 0.13660785695535804
  MAE: 0.08945129908442498


[32m[I 2021-08-17 01:20:58,380][0m Trial 7 finished with value: 0.13660785695535804 and parameters: {'n_estimators': 98, 'num_leaves': 33, 'max_depth': 8}. Best is trial 1 with value: 0.1309694932155369.[0m


Prediction: 0.040
LGBM model
  RMSE: 0.13395784018887136
  MAE: 0.08791508105158806


[32m[I 2021-08-17 01:20:59,257][0m Trial 8 finished with value: 0.13395784018887136 and parameters: {'n_estimators': 59, 'num_leaves': 28, 'max_depth': 8}. Best is trial 1 with value: 0.1309694932155369.[0m


Prediction: 0.030
LGBM model
  RMSE: 0.1349349509209797
  MAE: 0.08813694376349449


[32m[I 2021-08-17 01:21:00,203][0m Trial 9 finished with value: 0.1349349509209797 and parameters: {'n_estimators': 72, 'num_leaves': 32, 'max_depth': 8}. Best is trial 1 with value: 0.1309694932155369.[0m


# Análise de melhor modelo de predicão

In [71]:
!pip install pyngrok --quiet

import mlflow

with mlflow.start_run(run_name="MLflow on Colab"):
  mlflow.log_metric("m1", 2.0)
  mlflow.log_param("p1", "mlflow-colab")

# run tracking UI in the background
get_ipython().system_raw("mlflow ui --port 5000 &") # run tracking UI in the background


# create remote tunnel using ngrok.com to allow local port access
# borrowed from https://colab.research.google.com/github/alfozan/MLflow-GBRT-demo/blob/master/MLflow-GBRT-demo.ipynb#scrollTo=4h3bKHMYUIG6

from pyngrok import ngrok

# Terminate open tunnels if exist
ngrok.kill()

# Setting the authtoken (optional)
# Get your authtoken from https://dashboard.ngrok.com/auth
NGROK_AUTH_TOKEN = ""
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Open an HTTPs tunnel on port 5000 for http://localhost:5000
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)

[?25l[K     |▍                               | 10 kB 22.6 MB/s eta 0:00:01[K     |▉                               | 20 kB 26.7 MB/s eta 0:00:01[K     |█▎                              | 30 kB 29.4 MB/s eta 0:00:01[K     |█▊                              | 40 kB 29.7 MB/s eta 0:00:01[K     |██▏                             | 51 kB 31.4 MB/s eta 0:00:01[K     |██▋                             | 61 kB 34.1 MB/s eta 0:00:01[K     |███                             | 71 kB 34.1 MB/s eta 0:00:01[K     |███▌                            | 81 kB 34.6 MB/s eta 0:00:01[K     |████                            | 92 kB 36.6 MB/s eta 0:00:01[K     |████▍                           | 102 kB 31.1 MB/s eta 0:00:01[K     |████▉                           | 112 kB 31.1 MB/s eta 0:00:01[K     |█████▎                          | 122 kB 31.1 MB/s eta 0:00:01[K     |█████▊                          | 133 kB 31.1 MB/s eta 0:00:01[K     |██████▏                         | 143 kB 31.1 MB/s eta 0: