# **Lab: Model Interpretation**


## Exercise 2: LighGBM

We will train a Lightgbm model on the same dataset as previously.

The steps are:
1.   Create new Git branch
2.   Load the dataset
3.   Train Lightgbms model with default hyperparameter
4.   Hyperparameter tuning
5.   Push changes


### 1.   Create new Git branch


**[1.1]** Create a new git branch called `adv_mla_6_lightgbm`


In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git checkout -b adv_mla_6_lightgbm

**[1.2]** Launch Jupyter Lab from your virtual environment

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
poetry run jupyter lab

**[1.3]** Navigate the folder `notebooks` and create a new jupyter notebook called `2_xgboost_hyperopt.ipynb`

### 2. Load the dataset

**[2.1]** Launch magic commands to automatically reload modules

In [None]:
# Placeholder for student's code (Python code)

In [1]:
# Solution
%load_ext autoreload
%autoreload 2

**[2.2]** Import the pandas, numpy packages and dump from joblib

In [None]:
# Placeholder for student's code (Python code)

In [2]:
# Solution
import pandas as pd
import numpy as np
from joblib import dump

**[2.3]** Import the `load_sets()` function from your custom package

In [None]:
# Placeholder for student's code (Python code)

In [3]:
# Solution
from my_krml_studentid.data.sets import load_sets

**[2.4]** Load the saved sets from `data/processed`

In [None]:
# Placeholder for student's code (Python code)

In [4]:
#Solution:
X_train, y_train, X_val, y_val, X_test, y_test = load_sets(path='../data/processed/')

# 3. Train LightGBM model

**[3.1]** Import the xgboost package as xgb


In [None]:
# Placeholder for student's code (Python code)

In [6]:
# Solution:
import lightgbm as lgb

**[3.2]** Instantiate the LGBMClassifier class into a variable called `clf` with random_state=8

In [None]:
# Placeholder for student's code (Python code)

In [7]:
# Solution
clf = lgb.LGBMClassifier()

**[3.3]** Import the function `fit_assess_classifier` from your custom package

In [None]:
# Placeholder for student's code (Python code)

In [8]:
# Solution
from my_krml_studentid.models.performance import fit_assess_classifier

**[3.4]** Fit the model and display its performance on the training and validation sets

In [None]:
# Placeholder for student's code (Python code)

In [12]:
# Solution
lightgbm = fit_assess_classifier(clf, X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004039 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9127070275569873
F1 Training: 0.9122539839313584
Accuracy Validation: 0.9065316498245761
F1 Validation: 0.9059793760877006


**[3.5]** Import `dump` from `joblib` and save the fitted model into the folder `models` as a file called `xgboost_default`

In [None]:
# Placeholder for student's code (Python code)

In [13]:
# Solution:
from joblib import dump

dump(lightgbm,  '../models/lightgbm_default.joblib')

['../models/lightgbm_default.joblib']

### 4. Hyperparameter tuning with `n_estimators` (`num_boost_round`)

**[4.1]** Assess the performance of lightGBM with `n_estimators` = 200

In [None]:
# Placeholder for student's code (Python code)

In [14]:
# Solution:
lightgbm1 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=200), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003859 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9218249938723345
F1 Training: 0.9215295320126403
Accuracy Validation: 0.9063005525326162
F1 Validation: 0.9058181108995103


**[4.2]** Assess the performance of lightGBM with `n_estimators` = 50

In [None]:
# Placeholder for student's code (Python code)

In [15]:
# Solution:
lightgbm2 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=50), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003971 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9077418677124549
F1 Training: 0.9072158210448226
Accuracy Validation: 0.9051870837622639
F1 Validation: 0.9045544548424002


**[4.3]** Assess the performance of lightGBM with `n_estimators` = 75

In [None]:
# Placeholder for student's code (Python code)

In [16]:
# Solution:
lightgbm3 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=75), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003708 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9103750131307119
F1 Training: 0.9098951454717765
Accuracy Validation: 0.9061114729301035
F1 Validation: 0.9055549228677523


### 5. Hyperparameter tuning with `num_leaves`

**[5.1]** Assess the performance of lightGBM with `n_estimators` = 50 and `num_leaves` = 50

In [None]:
# Placeholder for student's code (Python code)

In [17]:
# Solution:
lightgbm4 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=50, num_leaves=50), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003692 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9113904548478589
F1 Training: 0.910997516400859
Accuracy Validation: 0.906594676358747
F1 Validation: 0.9060976538426337


**[5.2]** Assess the performance of lightGBM with `n_estimators` = 100 and `num_leaves` = 25

In [None]:
# Placeholder for student's code (Python code)

In [18]:
# Solution:
lightgbm5 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=50, num_leaves=25), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003799 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9064673132812774
F1 Training: 0.9057908429385462
Accuracy Validation: 0.9043677388180424
F1 Validation: 0.9035340777009959


**[5.3]** Assess the performance of lightGBM with `n_estimators` = 100 and `num_leaves` = 15

In [None]:
# Placeholder for student's code (Python code)

In [19]:
# Solution:
lightgbm6 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=50, num_leaves=15), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003777 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9026646591267201
F1 Training: 0.901728147653567
Accuracy Validation: 0.9006911909914074
F1 Validation: 0.8996304615720507


### 6. Hyperparameter tuning with `max_depth`

**[6.1]** Assess the performance of lightGBM with `n_estimators` = 100, `num_leaves` = 15 and `max_depth` = 10

In [None]:
# Placeholder for student's code (Python code)

In [20]:
# Solution:
lightgbm7 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=50, num_leaves=15, max_depth=10), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003709 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9022234672082355
F1 Training: 0.9012297315829294
Accuracy Validation: 0.900544129078342
F1 Validation: 0.8994275530192228


**[6.2]** Assess the performance of lightGBM with `n_estimators` = 100, `num_leaves` = 15 and `max_depth` = 20

In [None]:
# Placeholder for student's code (Python code)

In [21]:
# Solution:
lightgbm8 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=50, num_leaves=15, max_depth=20), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004652 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9026646591267201
F1 Training: 0.901728147653567
Accuracy Validation: 0.9006911909914074
F1 Validation: 0.8996304615720507


**[6.3]** Assess the performance of lightGBM with `n_estimators` = 100, `num_leaves` = 15 and `max_depth` = 5

In [None]:
# Placeholder for student's code (Python code)

In [22]:
# Solution:
lightgbm9 = fit_assess_classifier(lgb.LGBMClassifier(n_estimators=50, num_leaves=15, max_depth=5), X_train, y_train, X_val, y_val)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.003589 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1426
[LightGBM] [Info] Number of data points in the train set: 142795, number of used features: 24
[LightGBM] [Info] Start training from score -1.205363
[LightGBM] [Info] Start training from score -1.063080
[LightGBM] [Info] Start training from score -1.130706
[LightGBM] [Info] Start training from score -3.435136
Accuracy Training: 0.9001995868202668
F1 Training: 0.8990694008335641
Accuracy Validation: 0.8985482888295973
F1 Validation: 0.8972936483310199


In [23]:
# Solution:
from joblib import dump

dump(lightgbm9,  '../models/lightgbm_best.joblib')

['../models/lightgbm_best.joblib']

# 7.   Push changes

**[7.1]** Add you changes to git staging area

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git add .

**[7.2]** Create the snapshot of your repository and add a description

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git commit -m "lightgbm"

**[7.3]** Push your snapshot to Github


In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git push

**[7.4]** Go to Github and merge the branch after reviewing the code and fixing any conflict




**[7.5]** Check out to the master branch

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution:
git checkout master

**[7.6]** Pull the latest updates

In [None]:
# Placeholder for student's code (command line)

In [None]:
# Solution
git pull