# PiML Toolbox: Deal with external models

This example notebook demonstrates how to use PiML to handle external models. There are two common scenarios:

1. The external model is not trained yet: you can train and test it using just PiML's APIs. The dataset will be shared across models if you train models in this way.
2. The external model is already trained (say using scikit-learn's `fit()`): you can register it as a PiML pipeline and then do the tests. The dataset for each model is managed in the pipeline seperately.

# Install PiML package on Google Colab

1. Run `!pip install piml` to install the latest version of PiML
2. In Colab, you'll need restart the runtime in order to use newly installed PiML version.

In [None]:
# !pip install piml

# Scenario 1: Load dataset in PiML and train an external Model

## 1.1 Prepare dataset with PiML

In [1]:
from piml import Experiment
exp = Experiment()

In [2]:
# Choose BikeSharing
exp.data_loader()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Data', 'CoCircles', 'Friedman', 'BikeShar…

In [3]:
exp.data_summary()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…

VBox(children=(HTML(value='Data Shape:(17379, 13)'), Tab(children=(Output(), Output()), _dom_classes=('data-su…

In [4]:
exp.data_prepare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Split Metho…

In [5]:
exp.feature_select()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HBox(children=(Output(), Output()))

VBox(children=(ToggleButtons(layout=Layout(width='100%'), options=('Correlation', 'Distance Correlation', 'Fea…

In [6]:
exp.eda()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(VBox(children=(HTML(value='<h4>Univariate:</h4>'), HBox(children=(Dropdown(layou…

## 1.2 Train external models with PiML

In [7]:
# Fit and register an arbitrary model with the existing dataset 
# generated from `exp.data_loader()` and `exp.data_prepare()`
from lightgbm import LGBMRegressor
lgbm_1 = LGBMRegressor(max_depth=1)
exp.model_train(lgbm_1, name='LGBM_1')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

In [8]:
# Fit and register an arbitrary model with the existing dataset 
# generated from `exp.data_loader()` and `exp.data_prepare()`
from lightgbm import LGBMRegressor
lgbm_2 = LGBMRegressor(max_depth=2)
exp.model_train(lgbm_2, name='LGBM_2')

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

## 1.3 Test models with PiML

In [9]:
exp.model_explain()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'LGBM_1', 'LGBM_2'), style=Descri…

In [10]:
exp.model_diagnose()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'LGBM_1', 'LGBM_2'), style=Descri…

In [11]:
exp.model_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'LGBM_1', 'LGBM_2'…

# Scenario 2: Register external fitted models with dataset

## 2.1 Fit models without PiML

In [12]:
from xgboost import XGBRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

data = fetch_california_housing()
train_x, test_x, train_y, test_y = train_test_split(data.data, data.target, test_size=0.2)
feature_names = data.feature_names
target_name = data.target_names[0]

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

In [13]:
xgb_7 = XGBRegressor(max_depth=7, n_estimators=100)
xgb_7.fit(train_x, train_y)

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None,
             colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
             early_stopping_rounds=None, enable_categorical=False,
             eval_metric=None, feature_types=None, gamma=0, gpu_id=-1,
             grow_policy='depthwise', importance_type=None,
             interaction_constraints='', learning_rate=0.300000012, max_bin=256,
             max_cat_threshold=64, max_cat_to_onehot=4, max_delta_step=0,
             max_depth=7, max_leaves=0, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=0,
             num_parallel_tree=1, predictor='auto', random_state=0, ...)

In [14]:
xgb_2 = XGBRegressor(max_depth=2, n_estimators=100)
xgb_2.fit(train_x, train_y)

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None,
             colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,
             early_stopping_rounds=None, enable_categorical=False,
             eval_metric=None, feature_types=None, gamma=0, gpu_id=-1,
             grow_policy='depthwise', importance_type=None,
             interaction_constraints='', learning_rate=0.300000012, max_bin=256,
             max_cat_threshold=64, max_cat_to_onehot=4, max_delta_step=0,
             max_depth=2, max_leaves=0, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=0,
             num_parallel_tree=1, predictor='auto', random_state=0, ...)

## 2.2 Register fitted models to PiML

In [15]:
from piml import Experiment
exp_2 = Experiment()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

In [16]:
pipeline_1 = exp_2.make_pipeline(model=xgb_7, train_x=train_x, train_y=train_y.ravel(),
                             test_x=test_x, test_y=test_y.ravel(),
                             feature_names=feature_names, target_name=target_name)
exp_2.register(pipeline_1, "XGB-External-7")

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

In [17]:
pipeline_2 = exp_2.make_pipeline(model=xgb_2, train_x=train_x, train_y=train_y.ravel(),
                             test_x=test_x, test_y=test_y.ravel(),
                             feature_names=feature_names, target_name=target_name)
exp_2.register(pipeline_2, "XGB-External-2")

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

## 2.3 Test models with PiML

In [18]:
exp_2.model_explain()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'XGB-External-7', 'XGB-External-2…

In [19]:
exp_2.model_diagnose()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'XGB-External-7', 'XGB-External-2…

In [20]:
exp_2.model_compare()

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

HTML(value='\n        <style>\n\n        .left-label {\n            width: 30%;\n        }\n\n        .card-pa…

<IPython.core.display.Javascript object>

VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'XGB-External-7', …