# Dados simulados
Esse notebook apresenta dados simulados para avaliar os efeitos de correlação entre variáveis no LightGBM

## Setup

In [1]:
import pandas as pd
import numpy as np
import lightgbm
from sklearn import datasets, model_selection, metrics
import shap

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
def run(setup):
    X, y = datasets.make_classification(**setup)

    X_train, X_test, y_train, y_test = model_selection.train_test_split(
        X, y,
        test_size=0.2,
        stratify=y,
        random_state=0
    )
    lgbc = lightgbm.LGBMClassifier().fit(X_train, y_train)
    roc = metrics.roc_auc_score(y_test, lgbc.predict_proba(X_test)[:, 1])
    shap_risk = np.exp(
        pd.DataFrame(shap.TreeExplainer(lgbc).shap_values(X_test)[0]).abs().mean()
    ).sub(1).mean()
    
    return roc, shap_risk

## Benchmark

In [3]:
BENCHMARK_SETUP = {
    "n_samples": 50_000,
    "n_features": 20,
    "n_informative": 20,
    "n_redundant": 0,
    "flip_y": 0.15,
    "hypercube": False,
    "random_state": 0,
}
BENCHMARK = {}

In [4]:
BENCHMARK["roc"], BENCHMARK["shape_risk"] = run(BENCHMARK_SETUP)
print("BENCHMARK ROC={:.0%}".format(BENCHMARK["roc"]))
print("BENCHMARK SHAP RISK={:.0%}".format(BENCHMARK["shape_risk"]))

BENCHMARK ROC=91%
BENCHMARK SHAP RISK=13%


LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray


## Experimentos

In [5]:
res = []
for n in range(2, BENCHMARK_SETUP["n_informative"]+1):
    n_redundant = BENCHMARK_SETUP["n_features"] - n
    roc, shap_risk = run(dict(BENCHMARK_SETUP, n_informative=n, n_redundant=n_redundant))
    res.append({"n_informative": n, "n_redundant": n_redundant, "ROC": roc, "SHAP Risk": shap_risk})

res = pd.DataFrame(res)
res.sort_values(by="n_redundant")

LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
LightGBM binary clas

Unnamed: 0,n_informative,n_redundant,ROC,SHAP Risk
18,20,0,0.906708,0.134371
17,19,1,0.913337,0.152072
16,18,2,0.906384,0.139759
15,17,3,0.904925,0.144503
14,16,4,0.90639,0.133109
13,15,5,0.902775,0.145179
12,14,6,0.902518,0.137714
11,13,7,0.900623,0.160044
10,12,8,0.891011,0.149029
9,11,9,0.89692,0.13089
