#### Please upvote if you find the notebook interesting/useful :)

# Install [AutoWoe](https://github.com/sberbank-ai-lab/AutoMLWhitebox) library

This library is a part of [LightAutoML](https://github.com/sberbank-ai-lab/LightAutoML) framework and is used in Whitebox preset, but here we will show how to use it separately

In [None]:
!pip install -U autowoe

# Imports 

In [None]:
%matplotlib inline

import pandas as pd
import numpy as np

from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt

from autowoe import AutoWoE, ReportDeco

# Data loading

In [None]:
INPUT_PATH = '../input/tabular-playground-series-sep-2021/'
train_data = pd.read_csv(INPUT_PATH + 'train.csv')
train_data

In [None]:
test_data = pd.read_csv(INPUT_PATH + 'test.csv')
test_data

In [None]:
submission = pd.read_csv(INPUT_PATH + 'sample_solution.csv')
submission

In [None]:
print('TRAIN TARGET MEAN = {:.3f}'.format(train_data['claim'].mean()))

# Split data for train-holdout

In [None]:
tr_data, val_data = train_test_split(train_data, test_size = 0.2, stratify = train_data['claim'], random_state = 13)
print(tr_data.shape, val_data.shape)

# Setup interpretable AutoWoe model

Here we setup the model with `ReportDeco` decorator - this decorator helps us to build automatic report (see Bonus 2 part)

In [None]:
auto_woe = AutoWoE(monotonic=False,
                 vif_th=20.,
                 imp_th=0,
                 th_const=32,
                 force_single_split=True,
                 min_bin_size = 0.005,
                 oof_woe=True,
                 n_folds=5,
                 n_jobs=4,
                 regularized_refit=True,
                 verbose=2
        )

auto_woe = ReportDeco(auto_woe)

# Model training

In [None]:
%%time
auto_woe.fit(tr_data.sample(500000, random_state = 13), 
             target_name="claim")

In [None]:
val_pred = auto_woe.predict_proba(val_data)
print("AUC_SCORE = {:.5f}".format(roc_auc_score(val_data['claim'], val_pred)))

# Bonus 1 - Automatic report generation for trained model

In [None]:
report_params = {"output_path": "./AUTOWOE_REPORT_Validation",
                 "report_name": "AutoWoE automatic report for TPS September 2021 dataset model",
                 "report_version_id": 1,
                 "city": "Moscow",
                 "model_aim": "Here we want to build a model to solve TPS September 2021 competition",
                 "model_name": "TPS_September_AutoWoE_model",
                 "zakazchik": "Kaggle", # sorry for transliterate russian key here - it means the group that ask you to build this model 
                 "high_level_department": "Google",
                 "ds_name": "Alexander Ryzhkov",
                 "target_descr": "Target claim equal 1",
                 "non_target_descr": "Target claim equal 0"}

auto_woe.generate_report(report_params)

#### Generated report is [here](./AUTOWOE_REPORT_Validation/autowoe_report.html). P.S. It is interactive - to open subtree click on black triangle on the left of the text.

# Bonus 2 - Automatic SQL inference query generation for trained model

As our model is interpretable, we can create SQL query for it automatically. With the help of this query you can receive model predictions inside database without Python at all.

All you need is setup the `table_name` with the initial data

In [None]:
print(auto_woe.get_sql_inference_query(table_name = 'TABLE_NAME'))

# Predict for the test dataset

In [None]:
preds = auto_woe.model.predict_proba(test_data)

In [None]:
preds

# Create submissions

In [None]:
submission['claim'] = preds
submission.to_csv('AutoWoE_submission.csv', index = False)

In [None]:
submission['claim'].describe()