# Azure Machine Learning Interpretability SDK による要因探索

品質を予測する機械学習モデルによって製造工程のデータから製造品の品質を予測することが可能になります。それだけでなく、モデルの構造を理解することで、不良に影響を与える説明変数・因子を特定し、不良の原因を見つける手助けができます。本Notebookでは、製造工程データのサンプルデータ **Factory.csv** を利用し、製造工程のデータから品質を予測する機械学習モデルを構築し、**Azure Machine Learning Interpretability SDK** によって品質に対する因子の影響度を分析します。

## 1. Python SDK のインポート
Azure Machine Learning service の Python SDKをインポートします。

In [1]:
import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
import os

W0819 18:35:27.790982 4454024640 deprecation_wrapper.py:119] From /Users/konabuta/miniconda3/envs/myenv/lib/python3.6/site-packages/azureml/automl/core/_vendor/automl/client/core/common/tf_wrappers.py:36: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

W0819 18:35:27.792153 4454024640 deprecation_wrapper.py:119] From /Users/konabuta/miniconda3/envs/myenv/lib/python3.6/site-packages/azureml/automl/core/_vendor/automl/client/core/common/tf_wrappers.py:36: The name tf.logging.ERROR is deprecated. Please use tf.compat.v1.logging.ERROR instead.



In [2]:
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.0.55


### Azure ML workspace との接続
Azure Machine Learning service との接続を行います。Azure に対する認証が必要です。

In [3]:
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

azureml	eastus	mlservice	eastus


### 実験名の設定

In [4]:
experiment=Experiment(ws, "factory_AutoML")

# 2. 学習データの準備

In [5]:
import pandas as pd
#os.makedirs("./outputs", exist_ok=True)
df = pd.read_csv('./data/Factory.csv')

In [6]:
df.tail(10)

Unnamed: 0,ID,Quality,ProcessA-Pressure,ProcessA-Humidity,ProcessA-Vibration,ProcessB-Light,ProcessB-Skill,ProcessB-Temp,ProcessB-Rotation,ProcessC-Density,ProcessC-PH,ProcessC-skewness,ProcessC-Time
4888,4889,0,6.8,0.22,0.36,1.2,0.05,38.0,127.0,0.99,3.04,0.54,9.2
4889,4890,0,4.9,0.23,0.27,11.75,0.03,34.0,118.0,1.0,3.07,0.5,9.4
4890,4891,0,6.1,0.34,0.29,2.2,0.04,25.0,100.0,0.99,3.06,0.44,11.8
4891,4892,0,5.7,0.21,0.32,0.9,0.04,38.0,121.0,0.99,3.24,0.46,10.6
4892,4893,0,6.5,0.23,0.38,1.3,0.03,29.0,112.0,0.99,3.29,0.54,9.7
4893,4894,0,6.2,0.21,0.29,1.6,0.04,24.0,92.0,0.99,3.27,0.5,11.2
4894,4895,0,6.6,0.32,0.36,8.0,0.05,57.0,168.0,0.99,3.15,0.46,9.6
4895,4896,0,6.5,0.24,0.19,1.2,0.04,30.0,111.0,0.99,2.99,0.46,9.4
4896,4897,1,5.5,0.29,0.3,1.1,0.02,20.0,110.0,0.99,3.34,0.38,12.8
4897,4898,0,6.0,0.21,0.38,0.8,0.02,22.0,98.0,0.99,3.26,0.32,11.8


In [7]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=["Quality","ID"],axis=1)
y = df["Quality"].values

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.1,random_state=100,stratify=y)

# 3. 事前設定 (Automated Machine Learning)

In [9]:
Automl_config = AutoMLConfig(task = 'classification',
                             primary_metric = 'AUC_weighted',
                             iteration_timeout_minutes = 10,
                             iterations = 10,
                             X = X_train,
                             y = y_train,
                             n_cross_validations = 3,
                             enable_stack_ensemble=False,
                             enable_voting_ensemble=False)

# 4. 実行と結果確認

In [10]:
local_run = experiment.submit(Automl_config, show_output=True)

Running on local machine
Parent Run ID: AutoML_b74b8ca3-7aa8-4566-947d-dff6ce927650
Current status: DatasetCrossValidationSplit. Generating CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************

 ITERATION   PIPELINE                                       DURATION      METRIC      BEST
         0   StandardScalerWrapper SGD                      0:00:28       0.7815    0.7815
         1   StandardScalerWrapper SGD                      0:00:27       0.7833    0.7833
         2   MinMaxScaler LightGBM                      

In [11]:
from azureml.widgets import RunDetails
RunDetails(local_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [12]:
local_run.get_details()

{'runId': 'AutoML_b74b8ca3-7aa8-4566-947d-dff6ce927650',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2019-08-19T09:35:37.333875Z',
 'endTimeUtc': '2019-08-19T09:40:30.501338Z',
 'properties': {'num_iterations': '10',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'MaxTimeSeconds': '600',
  'acquisition_parameter': '0',
  'num_cross_validation': '3',
  'target': 'local',
  'DataPrepJsonString': None,
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.0.55", "azureml-train": "1.0.55", "azureml-train-restclients-hyperdrive": "1.0.55", "azureml-train-core": "1.0.55", "azureml-train-automl": "1.0.55", "azureml-telemetry": "1.0.55", "azureml-sdk": "1.0.55", "azureml-pipeline": "1.0.55", "azureml-pipeline-steps": "1.0.55", "azureml-pipeline-core": "1.0.55", "azureml-op

In [13]:
best_run, fitted_model = local_run.get_output()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
factory_AutoML,AutoML_b74b8ca3-7aa8-4566-947d-dff6ce927650_5,,Completed,Link to Azure Portal,Link to Documentation


In [14]:
fitted_model

Pipeline(memory=None,
     steps=[('StandardScalerWrapper', <automl.client.core.common.model_wrappers.StandardScalerWrapper object at 0x1304925c0>), ('LightGBMClassifier', LightGBMClassifier(boosting_type='gbdt', class_weight=None,
          colsample_bytree=0.6933333333333332, importance_type='split',
          learning_rate..., subsample=0.3963157894736842,
          subsample_for_bin=200000, subsample_freq=0, verbose=-10))])

### モデルの理解

In [15]:
from pprint import pprint


def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        else:
            pprint(step[1].get_params())
            print()


print_model(fitted_model)

StandardScalerWrapper
{'class_name': 'StandardScaler',
 'copy': True,
 'module_name': 'sklearn.preprocessing.data',
 'with_mean': False,
 'with_std': False}

LightGBMClassifier
{'boosting_type': 'gbdt',
 'class_weight': None,
 'colsample_bytree': 0.6933333333333332,
 'importance_type': 'split',
 'learning_rate': 0.07894947368421053,
 'max_bin': 240,
 'max_depth': 3,
 'min_child_samples': 77,
 'min_child_weight': 6,
 'min_split_gain': 0.631578947368421,
 'n_estimators': 50,
 'n_jobs': 1,
 'num_leaves': 65,
 'objective': None,
 'random_state': None,
 'reg_alpha': 0.5789473684210527,
 'reg_lambda': 0.631578947368421,
 'silent': True,
 'subsample': 0.3963157894736842,
 'subsample_for_bin': 200000,
 'subsample_freq': 0,
 'verbose': -10}



# 5. Azure Machine Learning Interpretability SDK

[Azure Machine Learning Interpretability SDK](https://docs.microsoft.com/en-US/azure/machine-learning/service/machine-learning-interpretability-explainability?view=azuremgmtcompute-fluent-1.0.0) は、Microsoftと主要な3rd Partyのライブラリ(LIME,SHAP etc)で構成されたモデル解釈のフレームワークで、統合APIをご提供しています。  
<img src="https://docs.microsoft.com/en-US/azure/machine-learning/service/media/machine-learning-interpretability-explainability/interpretability-architecture.png#lightbox" width=800 align=left>

In [16]:
from azureml.explain.model.tabular_explainer import TabularExplainer
classes = ["false","true"]
tabular_explainer = TabularExplainer(fitted_model, X_train, features=X_train.columns, classes=classes)

In [17]:
global_explanation = tabular_explainer.explain_global(X_test[:100])

100%|██████████| 100/100 [00:15<00:00,  6.13it/s]


In [18]:
from azureml.contrib.explain.model.visualize import ExplanationDashboard
ExplanationDashboard(global_explanation, fitted_model, X_test[:100])

ExplanationWidget(value={'predictedY': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1…

<azureml.contrib.explain.model.visualize.ExplanationDashboard.ExplanationDashboard at 0x131f0b0b8>