# Model Experimentation Tracking (MLFow) - Hyperparamter Optimization

Record and query experiments: Code, data, config, results, parameters, metrics

![Data](images/MLflow_Model_experimentation.png)

## Import Packages

In [1]:
# Data analysis library
import numpy as np
import pandas as pd
import joblib

# Machine Learning library
import sklearn
from sklearn.metrics import roc_curve, auc, accuracy_score, plot_confusion_matrix, plot_roc_curve
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import lightgbm as lgb
from lightgbm import LGBMClassifier
from lightgbm import plot_importance, plot_metric

# Model experimentation library
import mlflow
import mlflow.lightgbm
from mlflow.tracking import MlflowClient

# Hyperparameter tunning library
import optuna

# Plotting library
import matplotlib.pyplot as plt
# Prevent figures from displaying by turning interactive mode off using the function
plt.ioff()
import warnings
warnings.filterwarnings("ignore")

In [2]:
print(f'Numpy version is {np.__version__}')
print(f'Pandas version is {pd.__version__}')
print(f'sklearn version is {sklearn.__version__}')
print(f'mlflow version is {mlflow.__version__}')
print(f'joblib version is {joblib.__version__}')
print(f'optuna version is {optuna.__version__}')

Numpy version is 1.19.4
Pandas version is 1.1.5
sklearn version is 0.23.2
mlflow version is 1.12.1
joblib version is 1.0.0
optuna version is 2.3.0


## Download data 

### Campus Recruitment Dataset
#### Academic and Employability Factors influencing placement

https://www.kaggle.com/benroshan/factors-affecting-campus-placement

## Load data

In [3]:
## Files
data_file = '../data/placement_data/Placement_Data_Full_Class.csv'

# Load train loan dataset 
try:
    data = pd.read_csv(data_file)
    print("The dataset has {} samples with {} features.".format(*data.shape))
except:
    print("The dataset could not be loaded. Is the dataset missing?")

The dataset has 215 samples with 15 features.


## Introduction To The Data

In [4]:
data.head()

Unnamed: 0,sl_no,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary
0,1,M,67.0,Others,91.0,Others,Commerce,58.0,Sci&Tech,No,55.0,Mkt&HR,58.8,Placed,270000.0
1,2,M,79.33,Central,78.33,Others,Science,77.48,Sci&Tech,Yes,86.5,Mkt&Fin,66.28,Placed,200000.0
2,3,M,65.0,Central,68.0,Central,Arts,64.0,Comm&Mgmt,No,75.0,Mkt&Fin,57.8,Placed,250000.0
3,4,M,56.0,Central,52.0,Central,Science,52.0,Sci&Tech,No,66.0,Mkt&HR,59.43,Not Placed,
4,5,M,85.8,Central,73.6,Central,Commerce,73.3,Comm&Mgmt,No,96.8,Mkt&Fin,55.5,Placed,425000.0


In [5]:
data['status'].value_counts()

Placed        148
Not Placed     67
Name: status, dtype: int64

## Start MLflow UI

Start **mlflow ui** comman from the command prompt

In [6]:
!mlflow ui

^C


## Initialize MLflow

**Experiments** : You can organize runs into experiments, which group together runs for a specific task. 

**Tracking URI**: MLflow runs can be recorded to local files, to a database, or remotely to a tracking server. By default, the MLflow Python API logs runs locally to files in an mlruns directory wherever you ran your program

#### MLflow Tracking Servers 
MLflow tracking server has two components for storage: a **backend store** and an **artifact store**

The **backend store** is where MLflow Tracking Server stores experiment and run metadata as well as params, metrics, and tags for runs. MLflow supports two types of backend stores: **file store and database-backed store**.

The **artifact store** is a location suitable for large data (such as an S3 bucket or shared NFS file system) and is where clients log their artifact output (for example, models).

    Amazon S3 and S3-compatible storage
    Azure Blob Storage
    Google Cloud Storage
    FTP server
    SFTP Server
    NFS
    HDFS

In [6]:
experiment_name = "campus_recruitment_experiments_v2"
artifact_repository = './mlflow-run'

# Provide uri and connect to your tracking server
mlflow.set_tracking_uri('http://127.0.0.1:5000/')

# Initialize client
client = MlflowClient()

# If experiment doesn't exist then it will create new
# else it will take the experiment id and will use to to run the experiments
try:
    # Create experiment 
    experiment_id = client.create_experiment(experiment_name, artifact_location=artifact_repository)
except:
    # Get the experiment id if it already exists
    experiment_id = client.get_experiment_by_name(experiment_name).experiment_id

## Prepare data for model training

In [7]:
exclude_feature = ['sl_no', 'salary', 'status']
# Define Target columns
target = data['status'].map({"Placed": 0 , "Not Placed": 1})

# Define numeric and categorical features
numeric_columns = data.select_dtypes(include=['int64', 'float64']).columns.tolist()
categorical_columns = data.select_dtypes(include=['object']).columns.tolist()
numeric_features = [col for col in numeric_columns if col not in exclude_feature]
categorical_features = [col for col in categorical_columns if col not in exclude_feature]

# Define final feature list for training and validation
features = numeric_features + categorical_features
# Final data for training and validation
data = data[features]
data = data.fillna(0)

# Split data in train and vlaidation
X_train, X_valid, y_train, y_valid = train_test_split(data, target, test_size=0.15, random_state=10)

# Perform label encoding for categorical variable
for feature in categorical_features:
    le = LabelEncoder()
    le.fit(X_train.loc[:, feature])
    X_train.loc[:, feature] = le.transform(X_train.loc[:, feature])
    X_valid.loc[:, feature] = le.transform(X_valid.loc[:, feature])

## Lightgbm Hyperparameter tunning + MLFlow for model tracking

### Define model training function to train and track model results

In [8]:
def model_training_tracking(params):
    # Launching Multiple Runs in One Program.This is easy to do because the ActiveRun object returned by mlflow.start_run() is a 
    # Python context manager. You can “scope” each run to just one block of code as follows:
    with mlflow.start_run(experiment_id=experiment_id, run_name='Lightgbm_model') as run:
        # Get run id 
        run_id = run.info.run_uuid
        
        # Set the notes for the run
        MlflowClient().set_tag(run_id,
                               "mlflow.note.content",
                               "This is experiment for hyperparameter optimzation for lightgbm models for the Campus Recruitment Dataset")
        
        # Define customer tag
        tags = {"Application": "Payment Monitoring Platform",
                "release.candidate": "PMP",
                "release.version": "2.2.0"}

        # Set Tag
        mlflow.set_tags(tags)
                        
        # Log python environment details
        mlflow.log_artifact('requirements.txt')
        
        # logging params
        mlflow.log_params(params)

        # Perform model training
        lgb_clf = LGBMClassifier(**params)
        lgb_clf.fit(X_train, y_train, 
                    eval_set = [(X_train, y_train), (X_valid, y_valid)], 
                    early_stopping_rounds=50,
                    verbose=20)

        # Log model artifacts
        mlflow.sklearn.log_model(lgb_clf, "model")

        # Perform model evaluation 
        lgb_valid_prediction = lgb_clf.predict_proba(X_valid)[:, 1]
        fpr, tpr, thresholds = roc_curve(y_valid, lgb_valid_prediction)
        roc_auc = auc(fpr, tpr) # compute area under the curve
        print("=====================================")
        print("Validation AUC:{}".format(roc_auc))
        print("=====================================")   

        # log metrics
        mlflow.log_metrics({"Validation_AUC": roc_auc})

        # Plot and save feature importance details
        ax = plot_importance(lgb_clf, height=0.4)
        filename = './images/lgb_validation_feature_importance.png'
        plt.savefig(filename)
        # log model artifacts
        mlflow.log_artifact(filename)

        ax = plot_metric(lgb_clf.evals_result_)
        filename = './images/lgb_validation_metrics_comparision.png'
        plt.savefig(filename)
        # log model artifacts
        mlflow.log_artifact(filename)

        # Plot and save metrics details    
        plot_confusion_matrix(lgb_clf, X_valid, y_valid, 
                              display_labels=['Placed', 'Not Placed'],
                              cmap='magma')
        plt.title('Confusion Matrix')
        filename = './images/lgb_validation_confusion_matrix.png'
        plt.savefig(filename)
        # log model artifacts
        mlflow.log_artifact(filename)

        # Plot and save AUC details  
        plot_roc_curve(lgb_clf, X_valid, y_valid, name='Validation')
        plt.title('ROC AUC Curve')
        filename = './images/lgb_validation_roc_curve.png'
        plt.savefig(filename)
        # log model artifacts
        mlflow.log_artifact(filename)
        
        return roc_auc

### Define an objective function to be maximized

In [9]:
def objective(trial):

    param = {
        "objective": "binary",
        "metric": "auc",
        "learning_rate": trial.suggest_float("learning_rate", 1e-2, 1e-1, log=True),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.4, 1.0),
        "subsample": trial.suggest_float("subsample", 0.4, 1.0),
        "random_state": 42,
    }
    
    auc = model_training_tracking(param)
    return auc

### Create a study object and optimize the objective function

In [10]:
# Create a study object and optimize the objective function.
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=40)
trial = study.best_trial
print('AUC: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))

[32m[I 2020-12-20 16:47:54,563][0m A new study created in memory with name: no-name-8668af4e-34dd-4a71-aad4-990b78ad4b0b[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.974331	valid_1's auc: 0.87218
[40]	training's auc: 0.979743	valid_1's auc: 0.902256
[60]	training's auc: 0.986983	valid_1's auc: 0.894737
[80]	training's auc: 0.990493	valid_1's auc: 0.902256
Early stopping, best iteration is:
[38]	training's auc: 0.981205	valid_1's auc: 0.909774
Validation AUC:0.9097744360902256


[32m[I 2020-12-20 16:47:56,361][0m Trial 0 finished with value: 0.9097744360902256 and parameters: {'learning_rate': 0.02701990132762442, 'colsample_bytree': 0.6804278072196543, 'subsample': 0.8445415704095669}. Best is trial 0 with value: 0.9097744360902256.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.978646	valid_1's auc: 0.868421
[40]	training's auc: 0.991663	valid_1's auc: 0.883459
Early stopping, best iteration is:
[2]	training's auc: 0.941568	valid_1's auc: 0.911654
Validation AUC:0.9116541353383458


[32m[I 2020-12-20 16:47:57,384][0m Trial 1 finished with value: 0.9116541353383458 and parameters: {'learning_rate': 0.05806050035338419, 'colsample_bytree': 0.45345881652972353, 'subsample': 0.68724782238422}. Best is trial 1 with value: 0.9116541353383458.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.96614	valid_1's auc: 0.853383
[40]	training's auc: 0.974916	valid_1's auc: 0.860902
[60]	training's auc: 0.977695	valid_1's auc: 0.883459
[80]	training's auc: 0.980474	valid_1's auc: 0.894737
[100]	training's auc: 0.983326	valid_1's auc: 0.887218
Did not meet early stopping. Best iteration is:
[98]	training's auc: 0.983472	valid_1's auc: 0.887218
Validation AUC:0.8872180451127819


[32m[I 2020-12-20 16:47:58,663][0m Trial 2 finished with value: 0.8872180451127819 and parameters: {'learning_rate': 0.013805860617504004, 'colsample_bytree': 0.7471234605162884, 'subsample': 0.9566608274427805}. Best is trial 1 with value: 0.9116541353383458.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.976232	valid_1's auc: 0.883459
[40]	training's auc: 0.985374	valid_1's auc: 0.898496
[60]	training's auc: 0.990639	valid_1's auc: 0.909774
[80]	training's auc: 0.995173	valid_1's auc: 0.909774
[100]	training's auc: 0.997952	valid_1's auc: 0.906015
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.997952	valid_1's auc: 0.906015
Validation AUC:0.906015037593985


[32m[I 2020-12-20 16:47:59,932][0m Trial 3 finished with value: 0.906015037593985 and parameters: {'learning_rate': 0.041189695845684694, 'colsample_bytree': 0.8363235488071805, 'subsample': 0.4780505367371751}. Best is trial 1 with value: 0.9116541353383458.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.972941	valid_1's auc: 0.890977
[40]	training's auc: 0.983619	valid_1's auc: 0.894737
[60]	training's auc: 0.988591	valid_1's auc: 0.894737
[80]	training's auc: 0.993272	valid_1's auc: 0.906015
[100]	training's auc: 0.99649	valid_1's auc: 0.909774
Did not meet early stopping. Best iteration is:
[99]	training's auc: 0.99649	valid_1's auc: 0.906015
Validation AUC:0.906015037593985


[32m[I 2020-12-20 16:48:01,097][0m Trial 4 finished with value: 0.906015037593985 and parameters: {'learning_rate': 0.033733197171224305, 'colsample_bytree': 0.7913667257134975, 'subsample': 0.4283813092596981}. Best is trial 1 with value: 0.9116541353383458.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.980254	valid_1's auc: 0.883459
[40]	training's auc: 0.989615	valid_1's auc: 0.898496
[60]	training's auc: 0.994735	valid_1's auc: 0.898496
[80]	training's auc: 0.99766	valid_1's auc: 0.917293
[100]	training's auc: 0.999854	valid_1's auc: 0.921053
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.999854	valid_1's auc: 0.921053
Validation AUC:0.9210526315789475


[32m[I 2020-12-20 16:48:02,293][0m Trial 5 finished with value: 0.9210526315789475 and parameters: {'learning_rate': 0.04980680386446043, 'colsample_bytree': 0.6734515470523221, 'subsample': 0.4965623082097034}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.958973	valid_1's auc: 0.845865
[40]	training's auc: 0.971332	valid_1's auc: 0.87594
[60]	training's auc: 0.9736	valid_1's auc: 0.913534
[80]	training's auc: 0.97711	valid_1's auc: 0.898496
[100]	training's auc: 0.981717	valid_1's auc: 0.917293
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.981717	valid_1's auc: 0.917293
Validation AUC:0.9172932330827068


[32m[I 2020-12-20 16:48:03,486][0m Trial 6 finished with value: 0.9172932330827068 and parameters: {'learning_rate': 0.01082051702665304, 'colsample_bytree': 0.8853883680967638, 'subsample': 0.5907139539623958}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.969797	valid_1's auc: 0.857143
[40]	training's auc: 0.974038	valid_1's auc: 0.864662
[60]	training's auc: 0.978134	valid_1's auc: 0.890977
[80]	training's auc: 0.980547	valid_1's auc: 0.898496
[100]	training's auc: 0.984204	valid_1's auc: 0.902256
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.984204	valid_1's auc: 0.902256
Validation AUC:0.9022556390977443


[32m[I 2020-12-20 16:48:04,744][0m Trial 7 finished with value: 0.9022556390977443 and parameters: {'learning_rate': 0.01459830413595709, 'colsample_bytree': 0.6718121466054068, 'subsample': 0.5826408347246305}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.987275	valid_1's auc: 0.898496
[40]	training's auc: 0.993711	valid_1's auc: 0.909774
[60]	training's auc: 0.998537	valid_1's auc: 0.909774
[80]	training's auc: 1	valid_1's auc: 0.913534
[100]	training's auc: 1	valid_1's auc: 0.921053
Did not meet early stopping. Best iteration is:
[80]	training's auc: 1	valid_1's auc: 0.913534
Validation AUC:0.9135338345864662


[32m[I 2020-12-20 16:48:05,967][0m Trial 8 finished with value: 0.9135338345864662 and parameters: {'learning_rate': 0.06762840920170832, 'colsample_bytree': 0.6753995071271871, 'subsample': 0.7461951900070407}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.968992	valid_1's auc: 0.857143
[40]	training's auc: 0.977183	valid_1's auc: 0.887218
Early stopping, best iteration is:
[2]	training's auc: 0.941568	valid_1's auc: 0.911654
Validation AUC:0.9116541353383458


[32m[I 2020-12-20 16:48:07,062][0m Trial 9 finished with value: 0.9116541353383458 and parameters: {'learning_rate': 0.014075661053172043, 'colsample_bytree': 0.4028642807338941, 'subsample': 0.7691546434266776}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.985593	valid_1's auc: 0.906015
[40]	training's auc: 0.997221	valid_1's auc: 0.894737
[60]	training's auc: 1	valid_1's auc: 0.909774
[80]	training's auc: 1	valid_1's auc: 0.909774
Early stopping, best iteration is:
[46]	training's auc: 0.998245	valid_1's auc: 0.917293
Validation AUC:0.9172932330827067


[32m[I 2020-12-20 16:48:08,325][0m Trial 10 finished with value: 0.9172932330827067 and parameters: {'learning_rate': 0.09437884401470567, 'colsample_bytree': 0.5564401195600701, 'subsample': 0.4026094060316379}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.963507	valid_1's auc: 0.87406
[40]	training's auc: 0.975574	valid_1's auc: 0.887218
[60]	training's auc: 0.982302	valid_1's auc: 0.902256
[80]	training's auc: 0.985666	valid_1's auc: 0.894737
[100]	training's auc: 0.988006	valid_1's auc: 0.902256
Did not meet early stopping. Best iteration is:
[96]	training's auc: 0.988006	valid_1's auc: 0.902256
Validation AUC:0.9022556390977443


[32m[I 2020-12-20 16:48:09,433][0m Trial 11 finished with value: 0.9022556390977443 and parameters: {'learning_rate': 0.020599142420924314, 'colsample_bytree': 0.9955991086188408, 'subsample': 0.5701850898409796}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.981205	valid_1's auc: 0.902256
[40]	training's auc: 0.989177	valid_1's auc: 0.906015
[60]	training's auc: 0.995027	valid_1's auc: 0.902256
[80]	training's auc: 0.998684	valid_1's auc: 0.917293
[100]	training's auc: 1	valid_1's auc: 0.913534
Did not meet early stopping. Best iteration is:
[95]	training's auc: 1	valid_1's auc: 0.913534
Validation AUC:0.9135338345864663


[32m[I 2020-12-20 16:48:10,561][0m Trial 12 finished with value: 0.9135338345864663 and parameters: {'learning_rate': 0.052893540852799494, 'colsample_bytree': 0.9353985181014517, 'subsample': 0.547687969417654}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.986178	valid_1's auc: 0.902256
[40]	training's auc: 0.997952	valid_1's auc: 0.902256
[60]	training's auc: 1	valid_1's auc: 0.909774
[80]	training's auc: 1	valid_1's auc: 0.894737
Early stopping, best iteration is:
[34]	training's auc: 0.995466	valid_1's auc: 0.913534
Validation AUC:0.9135338345864663


[32m[I 2020-12-20 16:48:11,674][0m Trial 13 finished with value: 0.9135338345864663 and parameters: {'learning_rate': 0.09941021276002143, 'colsample_bytree': 0.8815708064619441, 'subsample': 0.6322571625693427}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.95305	valid_1's auc: 0.887218
[40]	training's auc: 0.961094	valid_1's auc: 0.87594
Early stopping, best iteration is:
[3]	training's auc: 0.94581	valid_1's auc: 0.898496
Validation AUC:0.8984962406015037


[32m[I 2020-12-20 16:48:12,665][0m Trial 14 finished with value: 0.8984962406015037 and parameters: {'learning_rate': 0.010346477598912857, 'colsample_bytree': 0.5282033702150506, 'subsample': 0.4893561612196504}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.968115	valid_1's auc: 0.868421
[40]	training's auc: 0.977329	valid_1's auc: 0.887218
[60]	training's auc: 0.983619	valid_1's auc: 0.87594
[80]	training's auc: 0.988445	valid_1's auc: 0.890977
[100]	training's auc: 0.990932	valid_1's auc: 0.902256
Did not meet early stopping. Best iteration is:
[99]	training's auc: 0.990932	valid_1's auc: 0.902256
Validation AUC:0.9022556390977443


[32m[I 2020-12-20 16:48:13,776][0m Trial 15 finished with value: 0.9022556390977443 and parameters: {'learning_rate': 0.022866495658727747, 'colsample_bytree': 0.5944223778393047, 'subsample': 0.6325888384476621}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.973673	valid_1's auc: 0.87594
[40]	training's auc: 0.984935	valid_1's auc: 0.894737
[60]	training's auc: 0.989469	valid_1's auc: 0.898496
[80]	training's auc: 0.994881	valid_1's auc: 0.902256
[100]	training's auc: 0.997367	valid_1's auc: 0.902256
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.997367	valid_1's auc: 0.902256
Validation AUC:0.9022556390977443


[32m[I 2020-12-20 16:48:14,901][0m Trial 16 finished with value: 0.9022556390977443 and parameters: {'learning_rate': 0.03938398181744731, 'colsample_bytree': 0.9891010819583818, 'subsample': 0.5120946802440023}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.984642	valid_1's auc: 0.883459
[40]	training's auc: 0.995612	valid_1's auc: 0.902256
[60]	training's auc: 1	valid_1's auc: 0.913534
[80]	training's auc: 1	valid_1's auc: 0.898496
[100]	training's auc: 1	valid_1's auc: 0.902256
Did not meet early stopping. Best iteration is:
[59]	training's auc: 1	valid_1's auc: 0.909774
Validation AUC:0.9097744360902257


[32m[I 2020-12-20 16:48:16,228][0m Trial 17 finished with value: 0.9097744360902257 and parameters: {'learning_rate': 0.08107687539002113, 'colsample_bytree': 0.7363074369863956, 'subsample': 0.4109611659715784}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.976013	valid_1's auc: 0.87218
[40]	training's auc: 0.988591	valid_1's auc: 0.906015
[60]	training's auc: 0.993272	valid_1's auc: 0.894737
[80]	training's auc: 0.997075	valid_1's auc: 0.894737
[100]	training's auc: 0.99883	valid_1's auc: 0.902256
Did not meet early stopping. Best iteration is:
[96]	training's auc: 0.99883	valid_1's auc: 0.898496
Validation AUC:0.8984962406015038


[32m[I 2020-12-20 16:48:17,362][0m Trial 18 finished with value: 0.8984962406015038 and parameters: {'learning_rate': 0.04861880115106162, 'colsample_bytree': 0.6082830100227862, 'subsample': 0.6400828010991286}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.961752	valid_1's auc: 0.853383
[40]	training's auc: 0.971771	valid_1's auc: 0.879699
[60]	training's auc: 0.973892	valid_1's auc: 0.906015
[80]	training's auc: 0.976817	valid_1's auc: 0.898496
[100]	training's auc: 0.982229	valid_1's auc: 0.917293
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.982229	valid_1's auc: 0.917293
Validation AUC:0.9172932330827068


[32m[I 2020-12-20 16:48:18,444][0m Trial 19 finished with value: 0.9172932330827068 and parameters: {'learning_rate': 0.011010267161052172, 'colsample_bytree': 0.8840917371917998, 'subsample': 0.46694342653455423}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.983472	valid_1's auc: 0.890977
[40]	training's auc: 0.994442	valid_1's auc: 0.909774
[60]	training's auc: 0.99883	valid_1's auc: 0.906015
[80]	training's auc: 1	valid_1's auc: 0.902256
Early stopping, best iteration is:
[36]	training's auc: 0.992248	valid_1's auc: 0.913534
Validation AUC:0.9135338345864662


[32m[I 2020-12-20 16:48:19,538][0m Trial 20 finished with value: 0.9135338345864662 and parameters: {'learning_rate': 0.06980001239018617, 'colsample_bytree': 0.8006395276867925, 'subsample': 0.45339747814234266}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.962484	valid_1's auc: 0.857143
[40]	training's auc: 0.970601	valid_1's auc: 0.87218
[60]	training's auc: 0.97477	valid_1's auc: 0.894737
[80]	training's auc: 0.978792	valid_1's auc: 0.898496
[100]	training's auc: 0.982887	valid_1's auc: 0.913534
Did not meet early stopping. Best iteration is:
[99]	training's auc: 0.982887	valid_1's auc: 0.913534
Validation AUC:0.9135338345864662


[32m[I 2020-12-20 16:48:20,664][0m Trial 21 finished with value: 0.9135338345864662 and parameters: {'learning_rate': 0.012248756415700507, 'colsample_bytree': 0.8891546071289493, 'subsample': 0.5356268898462679}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.966798	valid_1's auc: 0.87218
[40]	training's auc: 0.972868	valid_1's auc: 0.879699
[60]	training's auc: 0.979743	valid_1's auc: 0.902256
[80]	training's auc: 0.983399	valid_1's auc: 0.909774
[100]	training's auc: 0.986471	valid_1's auc: 0.898496
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.986471	valid_1's auc: 0.898496
Validation AUC:0.8984962406015038


[32m[I 2020-12-20 16:48:21,867][0m Trial 22 finished with value: 0.8984962406015038 and parameters: {'learning_rate': 0.01809118466587407, 'colsample_bytree': 0.9377369477850716, 'subsample': 0.595190040548166}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.960875	valid_1's auc: 0.855263
[40]	training's auc: 0.970821	valid_1's auc: 0.87218
Early stopping, best iteration is:
[2]	training's auc: 0.936741	valid_1's auc: 0.906015
Validation AUC:0.9060150375939849


[32m[I 2020-12-20 16:48:22,915][0m Trial 23 finished with value: 0.9060150375939849 and parameters: {'learning_rate': 0.010028075572274678, 'colsample_bytree': 0.8400138073058714, 'subsample': 0.5071169831717103}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.959119	valid_1's auc: 0.836466
[40]	training's auc: 0.969065	valid_1's auc: 0.853383
[60]	training's auc: 0.975355	valid_1's auc: 0.890977
[80]	training's auc: 0.977549	valid_1's auc: 0.898496
[100]	training's auc: 0.97945	valid_1's auc: 0.890977
Did not meet early stopping. Best iteration is:
[98]	training's auc: 0.97945	valid_1's auc: 0.890977
Validation AUC:0.8909774436090225


[32m[I 2020-12-20 16:48:24,009][0m Trial 24 finished with value: 0.8909774436090225 and parameters: {'learning_rate': 0.010043392472654534, 'colsample_bytree': 0.748698242833366, 'subsample': 0.45356264109512295}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.966067	valid_1's auc: 0.87594
[40]	training's auc: 0.973307	valid_1's auc: 0.890977
[60]	training's auc: 0.980474	valid_1's auc: 0.894737
[80]	training's auc: 0.983399	valid_1's auc: 0.906015
[100]	training's auc: 0.986105	valid_1's auc: 0.902256
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.986105	valid_1's auc: 0.902256
Validation AUC:0.9022556390977443


[32m[I 2020-12-20 16:48:25,261][0m Trial 25 finished with value: 0.9022556390977443 and parameters: {'learning_rate': 0.017344657238993057, 'colsample_bytree': 0.9341218870299812, 'subsample': 0.6952496255306627}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.970821	valid_1's auc: 0.890977
[40]	training's auc: 0.980474	valid_1's auc: 0.894737
[60]	training's auc: 0.984935	valid_1's auc: 0.902256
[80]	training's auc: 0.988884	valid_1's auc: 0.898496
[100]	training's auc: 0.992833	valid_1's auc: 0.898496
Did not meet early stopping. Best iteration is:
[99]	training's auc: 0.992833	valid_1's auc: 0.898496
Validation AUC:0.8984962406015038


[32m[I 2020-12-20 16:48:26,322][0m Trial 26 finished with value: 0.8984962406015038 and parameters: {'learning_rate': 0.026704708686043722, 'colsample_bytree': 0.87599676636132, 'subsample': 0.5411716717803219}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.957949	valid_1's auc: 0.853383
[40]	training's auc: 0.970309	valid_1's auc: 0.868421
[60]	training's auc: 0.976452	valid_1's auc: 0.879699
[80]	training's auc: 0.978938	valid_1's auc: 0.890977
[100]	training's auc: 0.98201	valid_1's auc: 0.887218
Did not meet early stopping. Best iteration is:
[98]	training's auc: 0.98201	valid_1's auc: 0.890977
Validation AUC:0.8909774436090224


[32m[I 2020-12-20 16:48:27,464][0m Trial 27 finished with value: 0.8909774436090224 and parameters: {'learning_rate': 0.011461131366753852, 'colsample_bytree': 0.623617850350357, 'subsample': 0.4576168302915386}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.977987	valid_1's auc: 0.902256
[40]	training's auc: 0.986544	valid_1's auc: 0.894737
[60]	training's auc: 0.991517	valid_1's auc: 0.906015
[80]	training's auc: 0.995758	valid_1's auc: 0.921053
[100]	training's auc: 0.997806	valid_1's auc: 0.913534
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.997806	valid_1's auc: 0.913534
Validation AUC:0.9135338345864662


[32m[I 2020-12-20 16:48:28,598][0m Trial 28 finished with value: 0.9135338345864662 and parameters: {'learning_rate': 0.038900761866477576, 'colsample_bytree': 0.7880806829157138, 'subsample': 0.6160156356634937}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.974331	valid_1's auc: 0.87218
[40]	training's auc: 0.980035	valid_1's auc: 0.906015
[60]	training's auc: 0.986836	valid_1's auc: 0.898496
[80]	training's auc: 0.989908	valid_1's auc: 0.906015
Early stopping, best iteration is:
[38]	training's auc: 0.981205	valid_1's auc: 0.909774
Validation AUC:0.9097744360902256


[32m[I 2020-12-20 16:48:29,796][0m Trial 29 finished with value: 0.9097744360902256 and parameters: {'learning_rate': 0.02723908101053236, 'colsample_bytree': 0.6555897917305228, 'subsample': 0.8337125843659479}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.975647	valid_1's auc: 0.879699
[40]	training's auc: 0.988591	valid_1's auc: 0.894737
[60]	training's auc: 0.994003	valid_1's auc: 0.913534
[80]	training's auc: 0.99766	valid_1's auc: 0.921053
[100]	training's auc: 0.999415	valid_1's auc: 0.906015
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.999415	valid_1's auc: 0.906015
Validation AUC:0.906015037593985


[32m[I 2020-12-20 16:48:30,952][0m Trial 30 finished with value: 0.906015037593985 and parameters: {'learning_rate': 0.04773268434452678, 'colsample_bytree': 0.7163228839828573, 'subsample': 0.6672533547767582}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.964897	valid_1's auc: 0.883459
[40]	training's auc: 0.979523	valid_1's auc: 0.883459
Early stopping, best iteration is:
[3]	training's auc: 0.945663	valid_1's auc: 0.898496
Validation AUC:0.8984962406015037


[32m[I 2020-12-20 16:48:32,012][0m Trial 31 finished with value: 0.8984962406015037 and parameters: {'learning_rate': 0.03138850978861116, 'colsample_bytree': 0.5160079058647584, 'subsample': 0.4061395850406978}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.985666	valid_1's auc: 0.894737
[40]	training's auc: 0.994149	valid_1's auc: 0.902256
[60]	training's auc: 0.999707	valid_1's auc: 0.909774
[80]	training's auc: 1	valid_1's auc: 0.902256
[100]	training's auc: 1	valid_1's auc: 0.894737
Did not meet early stopping. Best iteration is:
[69]	training's auc: 1	valid_1's auc: 0.906015
Validation AUC:0.906015037593985


[32m[I 2020-12-20 16:48:33,167][0m Trial 32 finished with value: 0.906015037593985 and parameters: {'learning_rate': 0.08009158229415009, 'colsample_bytree': 0.5509460335167732, 'subsample': 0.4056580856612075}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.979084	valid_1's auc: 0.890977
[40]	training's auc: 0.99532	valid_1's auc: 0.887218
Early stopping, best iteration is:
[3]	training's auc: 0.945225	valid_1's auc: 0.906015
Validation AUC:0.9060150375939849


[32m[I 2020-12-20 16:48:34,194][0m Trial 33 finished with value: 0.9060150375939849 and parameters: {'learning_rate': 0.09446865238433827, 'colsample_bytree': 0.4797597702031502, 'subsample': 0.5054281992888205}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.976452	valid_1's auc: 0.87218
[40]	training's auc: 0.990493	valid_1's auc: 0.890977
[60]	training's auc: 0.996636	valid_1's auc: 0.883459
[80]	training's auc: 0.999415	valid_1's auc: 0.887218
Early stopping, best iteration is:
[30]	training's auc: 0.98786	valid_1's auc: 0.902256
Validation AUC:0.9022556390977444


[32m[I 2020-12-20 16:48:35,514][0m Trial 34 finished with value: 0.9022556390977444 and parameters: {'learning_rate': 0.0626828153207858, 'colsample_bytree': 0.5941152476436451, 'subsample': 0.9726327449362198}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.969431	valid_1's auc: 0.857143
[40]	training's auc: 0.979377	valid_1's auc: 0.887218
Early stopping, best iteration is:
[2]	training's auc: 0.941568	valid_1's auc: 0.911654
Validation AUC:0.9116541353383458


[32m[I 2020-12-20 16:48:36,787][0m Trial 35 finished with value: 0.9116541353383458 and parameters: {'learning_rate': 0.0161729272396665, 'colsample_bytree': 0.45270494148222873, 'subsample': 0.43860601615042444}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.9638	valid_1's auc: 0.857143
[40]	training's auc: 0.972283	valid_1's auc: 0.864662
[60]	training's auc: 0.977549	valid_1's auc: 0.898496
[80]	training's auc: 0.979743	valid_1's auc: 0.906015
[100]	training's auc: 0.982595	valid_1's auc: 0.902256
Did not meet early stopping. Best iteration is:
[98]	training's auc: 0.982741	valid_1's auc: 0.898496
Validation AUC:0.8984962406015037


[32m[I 2020-12-20 16:48:38,346][0m Trial 36 finished with value: 0.8984962406015037 and parameters: {'learning_rate': 0.01173219649284997, 'colsample_bytree': 0.64613351220743, 'subsample': 0.47903076008359685}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.985081	valid_1's auc: 0.87218
[40]	training's auc: 0.996197	valid_1's auc: 0.909774
Early stopping, best iteration is:
[2]	training's auc: 0.936888	valid_1's auc: 0.913534
Validation AUC:0.9135338345864662


[32m[I 2020-12-20 16:48:39,616][0m Trial 37 finished with value: 0.9135338345864662 and parameters: {'learning_rate': 0.08586337681168602, 'colsample_bytree': 0.8338092003510462, 'subsample': 0.558474310039239}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.964385	valid_1's auc: 0.860902
[40]	training's auc: 0.973307	valid_1's auc: 0.868421
[60]	training's auc: 0.976964	valid_1's auc: 0.887218
[80]	training's auc: 0.979011	valid_1's auc: 0.894737
[100]	training's auc: 0.983619	valid_1's auc: 0.906015
Did not meet early stopping. Best iteration is:
[98]	training's auc: 0.983765	valid_1's auc: 0.902256
Validation AUC:0.9022556390977443


[32m[I 2020-12-20 16:48:40,882][0m Trial 38 finished with value: 0.9022556390977443 and parameters: {'learning_rate': 0.01285573635448991, 'colsample_bytree': 0.6981101641465832, 'subsample': 0.598703668098904}. Best is trial 5 with value: 0.9210526315789475.[0m


Training until validation scores don't improve for 50 rounds
[20]	training's auc: 0.977695	valid_1's auc: 0.894737
[40]	training's auc: 0.988738	valid_1's auc: 0.890977
[60]	training's auc: 0.994149	valid_1's auc: 0.902256
[80]	training's auc: 0.997367	valid_1's auc: 0.921053
[100]	training's auc: 0.999269	valid_1's auc: 0.913534
Did not meet early stopping. Best iteration is:
[100]	training's auc: 0.999269	valid_1's auc: 0.913534
Validation AUC:0.9135338345864661


[32m[I 2020-12-20 16:48:42,135][0m Trial 39 finished with value: 0.9135338345864661 and parameters: {'learning_rate': 0.044358476903108804, 'colsample_bytree': 0.7617131228746111, 'subsample': 0.7467523870939182}. Best is trial 5 with value: 0.9210526315789475.[0m


AUC: 0.9210526315789475
Best hyperparameters: {'learning_rate': 0.04980680386446043, 'colsample_bytree': 0.6734515470523221, 'subsample': 0.4965623082097034}


## Load best lightgbm model

Check Mlflow UI and pick the best model for model deployment

In [11]:
# Load best model
lgb_best_model = mlflow.sklearn.load_model("./mlflow-run/6ffb67cfcd87414a961bcdd0b69c04b0/artifacts/model")

# Make prediction aganist Validation data
lgb_best_val_prediction = lgb_best_model.predict(X_valid)
lgb_best_val_prediction

array([0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1], dtype=int64)

## Reference

### Model experimentation
https://www.mlflow.org/docs/latest/tracking.html#

### Hyperparameter Optimization
https://github.com/optuna/optuna