# AutoML 与信リスクモデル & モデル解釈
- Python SDK のインポート
- Azure ML Workspace への接続
- Experiment の作成
- データの準備
- 自動機械学習の事前設定
- モデル学習と結果の確認
- モデル解釈

## Python SDK のインポート

In [1]:
import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.train.automl import AutoMLConfig

In [2]:
# Python SDK バージョン確認
print(azureml.core.VERSION)

1.0.79


## Azure Machine Learning への接続

In [3]:
subscription_id = '9c0f91b8-eb2f-484c-979c-15848c098a6b'
resource_group = 'AML-HOL'
workspace_name = 'azureml'

ws = Workspace(subscription_id, resource_group, workspace_name)
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

azureml	japaneast	AML-HOL	japaneast


## Experiment の作成

In [4]:
# choose a name for experiment
experiment_name = 'automl-hmeq-ja'
experiment=Experiment(ws, experiment_name)

## データの準備
### 住宅ローン履行 / 不履行の履歴データ

Kaggle の [HMEQ_Data](https://www.kaggle.com/ajay1735/hmeq-data) を学習データにします。

* BAD : 不履行フラグ
* LOAN : 融資依頼金
* MORTDUE : 未払担保金額
* VALUE : 現在資産価値
* REASON : 債務理由
* JOB : 職種
* YOJ : 勤務年数
* DEROG : 信用調査会社問い合わせ数
* DELINQ : 延滞トレードライン数
* CLAGE : 最も古いトレードラインの月齢
* NINQ : 最近のクレジット問い合わせ数
* CLNO : トレード（クレジット）ラインの数
* DEBTINC : 債務対所得割合

In [5]:
dataset = Dataset.get_by_name(ws, name='hmeq_ja')
dataset.to_pandas_dataframe().head()



Unnamed: 0,不履行フラグ,融資依頼金額,未払担保金額,現在資産価値,債務理由,職種,勤務年数,信用調査会社問い合わせ数,延滞トレードライン数,最も古いトレードラインの月齢,最近のクレジットの問い合わせ数,トレード(クレジット)ラインの数,債務対所得の割合
0,1,1100,25860.0,39025.0,HomeImp,Other,10.5,0.0,0.0,94.37,1.0,9.0,
1,1,1300,70053.0,68400.0,HomeImp,Other,7.0,0.0,2.0,121.83,0.0,14.0,
2,1,1500,13500.0,16700.0,HomeImp,Other,4.0,0.0,0.0,149.47,1.0,10.0,
3,1,1500,,,,,,,,,,,
4,0,1700,97800.0,112000.0,HomeImp,Office,3.0,0.0,0.0,93.33,0.0,14.0,


In [6]:
label = '不履行フラグ'

In [7]:
train_data, test_data = dataset.random_split(percentage=0.95, seed=1234)

In [8]:
train_data.to_pandas_dataframe().head()

Unnamed: 0,不履行フラグ,融資依頼金額,未払担保金額,現在資産価値,債務理由,職種,勤務年数,信用調査会社問い合わせ数,延滞トレードライン数,最も古いトレードラインの月齢,最近のクレジットの問い合わせ数,トレード(クレジット)ラインの数,債務対所得の割合
0,1,1100,25860.0,39025.0,HomeImp,Other,10.5,0.0,0.0,94.37,1.0,9.0,
1,1,1300,70053.0,68400.0,HomeImp,Other,7.0,0.0,2.0,121.83,0.0,14.0,
2,1,1500,13500.0,16700.0,HomeImp,Other,4.0,0.0,0.0,149.47,1.0,10.0,
3,1,1500,,,,,,,,,,,
4,0,1700,97800.0,112000.0,HomeImp,Office,3.0,0.0,0.0,93.33,0.0,14.0,


In [9]:
train_data.to_pandas_dataframe().shape

(5658, 13)

## 自動機械学習の事前設定

In [10]:
automl_settings = {
    "iteration_timeout_minutes": 5,
    "iterations": 5, #試行するモデルの数
    "n_cross_validations": 3,
    "primary_metric": 'accuracy',
    "preprocess": True,
    "enable_voting_ensemble": False,
    "enable_stack_ensemble": False,
    #"model_explainability" : True,
}

automl_config = AutoMLConfig(task = 'classification',
                             training_data = train_data, # 学習データ
                             label_column_name= label,  # 予測対象変数
                             **automl_settings
                            )

In [11]:
local_run = experiment.submit(automl_config, show_output = True)

This means that in case of installing LightGBM from PyPI via the ``pip install lightgbm`` command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: ``brew install libomp``.


Running on local machine
Parent Run ID: AutoML_e05c3a10-f12d-4797-aa35-f4b794eebb24

Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Classes are balanced in the training data.

TYPE:         Missing values imputation
STATUS:       FIXED
DESCRIPTION:  The training data had the following missing values which were resolved. Please review your data source for data quality issues and possibly filter out the rows with these missing values. If the missing values are exp

In [12]:
from azureml.widgets import RunDetails
RunDetails(local_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'NOTSET', …

In [13]:
automl_run, fitted_model = local_run.get_output()
automl_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-hmeq-ja,AutoML_e05c3a10-f12d-4797-aa35-f4b794eebb24_0,,Completed,Link to Azure Machine Learning studio,Link to Documentation


## モデル解釈
Automated ML のモデルを解釈していきます。

In [14]:
from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
from azureml.explain.model.mimic_wrapper import MimicWrapper
from azureml.contrib.interpret.visualize import ExplanationDashboard

ModuleNotFoundError: No module named 'azureml.contrib.interpret'

In [None]:
# モデル解釈に利用するデータの準備
X_train = train_data.drop_columns([label])
y_train = train_data.keep_columns([label])
X_test = test_data.drop_columns([label])
y_test = test_data.keep_columns([label])

In [None]:
automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, 'classification',
                                                             X=X_train, X_test=X_test,
                                                             y=y_train)

In [None]:
# import pandas as pd
# pd.DataFrame(automl_explainer_setup_obj.X_test_transform.toarray(), columns=automl_explainer_setup_obj.engineered_feature_names)

### Engineered Explanation (データ前処理以後の変数)

In [None]:
# Global surrogate model
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel,
                         init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,
                         features=automl_explainer_setup_obj.engineered_feature_names,
                         feature_maps=[automl_explainer_setup_obj.feature_map],
                         classes=automl_explainer_setup_obj.classes)

In [None]:
# Compute the engineered explanations
engineered_explanations = explainer.explain(['local', 'global'],get_raw=False,
                                            eval_dataset=automl_explainer_setup_obj.X_test_transform)

In [None]:
ExplanationDashboard(engineered_explanations, 
                     automl_explainer_setup_obj.automl_estimator, 
                     automl_explainer_setup_obj.X_test_transform, 
                     y_train.to_pandas_dataframe().values)

In [None]:
ExplanationDashboard(engineered_explanations, 
                     automl_explainer_setup_obj.automl_estimator, 
                     automl_explainer_setup_obj.X_test_transform, 
                     y_test.to_pandas_dataframe().values)

### RAW Explanation (データ前処理以前の変数)

In [None]:
# Compute the raw explanations
raw_explanations = explainer.explain(['local', 'global'], get_raw=True,
                                     raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
                                     eval_dataset=automl_explainer_setup_obj.X_test_transform)

In [None]:
ExplanationDashboard(raw_explanations, 
                     automl_explainer_setup_obj.automl_pipeline, 
                     automl_explainer_setup_obj.X_test_raw, 
                     y_test.to_pandas_dataframe().values)

In [None]:
ExplanationDashboard(raw_explanations, 
                     automl_explainer_setup_obj.automl_pipeline, 
                     automl_explainer_setup_obj.X_test_raw, 
                     y_train.to_pandas_dataframe().values)

In [None]:
from interpret_community.widget import ExplanationDashboard
ExplanationDashboard(raw_explanations, 
                     automl_explainer_setup_obj.automl_pipeline, 
                     datasetX=automl_explainer_setup_obj.X_test_raw, 
                     trueY=y_train.to_pandas_dataframe().values)

In [None]:
automl_explainer_setup_obj.X_test_raw.shape

In [None]:
len(y_test.to_pandas_dataframe().values)