#### In this tutorial we will implement distillation of complex model's knowledge to simpler models. A complex model called teacher is TabilarAutoMl object. Simpler models called students are BoostCB and BoostLGBM objects.

In [1]:
import numpy as np
from pandas import read_csv
from sklearn.metrics import roc_auc_score, accuracy_score
from sklearn.model_selection import train_test_split

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.dataset.roles import DatetimeRole
from lightautoml.tasks import Task
from lightautoml.addons.distillation import Distiller
from lightautoml.utils.profiler import Profiler

### 1. Some Setups

In [2]:
RANDOM_STATE = 42
TEST_SIZE = 2000
TARGET_NAME = 'TARGET'

np.random.seed(RANDOM_STATE)

By default, profiling decorators are turned off for speed gain and memory usage reduction. If you want to see a profiling report after using LAMA, you need to turn on the decorators using a command below

In [3]:
p = Profiler()
p.change_deco_settings({'enabled': True})

### 2. Data loading and preparation

In [4]:
data = read_csv('example_data/test_data_files/sampled_app_train.csv')

data['BIRTH_DATE'] = (np.datetime64('2018-01-01') + data['DAYS_BIRTH'].astype(np.dtype('timedelta64[D]'))).astype(str)
data['EMP_DATE'] = (np.datetime64('2018-01-01') + np.clip(data['DAYS_EMPLOYED'], None, 0).astype(np.dtype('timedelta64[D]'))
                    ).astype(str)

data['report_dt'] = np.datetime64('2018-01-01')
data['constant'] = 1
data['allnan'] = np.nan
data.drop(['DAYS_BIRTH', 'DAYS_EMPLOYED'], axis=1, inplace=True)

train, test = train_test_split(data, test_size=2000, random_state=RANDOM_STATE)

### 3. AutoML and distiller creation

In [5]:
roles = {'target': 'TARGET',
         DatetimeRole(base_date=True, seasonality=(), base_feats=False): 'report_dt'}

task = Task('binary')

automl = TabularAutoML(task=task, timeout=30, general_params={'verbose': 0})
distiller = Distiller(automl)

### 4. Distiller fitting and evaluation

In [6]:
distiller.fit(train, roles=roles)
test_pred = distiller.predict(test)
print('Teacher TEST ROC AUC: {}'.format(roc_auc_score(test[roles['target']].values, test_pred.data[:, 0])))

Copying TaskTimer may affect the parent PipelineTimer, so copy will create new unlimited TaskTimer


Layer 1 ...
Train process start. Time left 25.034852027893066 secs
Time limit exceeded after calculating fold 1
Time left 22.61545205116272
Time limit exceeded after calculating fold 0
Time limit exceeded after calculating fold 3
Time limit exceeded after calculating fold 3
Time left 2.8722121715545654
Time limit exceeded in one of the tasks. AutoML will blend level 1 models.                                         
Try to set higher time limits or use Profiler to find bottleneck and optimize Pipelines settings
Teacher TEST ROC AUC: 0.7453926944883353


### 5. Evaluation of the students on true labels

In [7]:
distiller.distill(train, labels=train['TARGET'])

metrics = distiller.eval_metrics(test, metrics=[roc_auc_score, accuracy_score])
metrics



Layer 1 ...
Train process start. Time left 9999999997.97606 secs
Time left 9999999992.295929


Layer 1 ...
Train process start. Time left 9999999997.823078 secs
Time left 9999999993.712234


Unnamed: 0,roc_auc_score,accuracy_score
Lvl_0_Pipe_0_Mod_0_CatBoost,0.73866,0.9275
Lvl_0_Pipe_0_Mod_0_LightGBM,0.726759,0.9275


### 6. Teacher knowledge distillation

In [8]:
automl = TabularAutoML(task=task, timeout=30, verbose=0)
distiller = Distiller(automl)
distiller.fit(train, roles=roles)
best_model = distiller.distill(train)
print('Best model after distillation: {}'.format(best_model.levels[0][0].ml_algos[0].name))

Time limit exceeded after calculating fold 2
Time limit exceeded after calculating fold 1
Time limit exceeded after calculating fold 3
Time limit exceeded after calculating fold 1
Best model after distillation: Lvl_0_Pipe_0_Mod_0_LightGBM


### 7. Evaluation of the students on labels derived from teacher

In [9]:
metrics = distiller.eval_metrics(test, metrics=[roc_auc_score,accuracy_score])
metrics

Unnamed: 0,roc_auc_score,accuracy_score
Lvl_0_Pipe_0_Mod_0_CatBoost,0.742482,0.9275
Lvl_0_Pipe_0_Mod_0_LightGBM,0.742203,0.9275


### 8. Profiling report creation

In [10]:
p.profile('profiling_report.html')