# 自動機械学習 Automated Machine Learning による品質管理モデリング & モデル解釈

製造プロセスから採取されたセンサーデータと検査結果のデータを用いて、品質管理モデルを構築します。
- Python SDK のインポート
- Azure ML service Workspace への接続
- Experiment の作成
- データの準備
- 自動機械学習の事前設定
- モデル学習と結果の確認
- モデル解釈

## 1. 事前準備
### Python SDK のインポート
Azure Machine Learning service の Python SDKをインポートします

In [1]:
import logging

from matplotlib import pyplot as plt
import pandas as pd
import os

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.train.automl import AutoMLConfig

### Azure ML workspace との接続
Azure Machine Learning service との接続を行います。Azure に対する認証が必要です。

In [2]:
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

azureml	eastus	mlservice	eastus


### 実験名の設定
機械学習の実験の名称を指定します。後で記録されたメトリックなどを確認する際などに利用します。

In [3]:
# choose a name for experiment
experiment_name = 'automl-classif-factoryQC'
experiment=Experiment(ws, experiment_name)

### 学習データの準備
Pandas Dataframe や Numpy が利用できます。また、Azure Machine Learnining に _Dataset_ として登録してある場合には、Python SDK 経由でそのデータを呼び出して、そのまま利用することもできます。 

In [4]:
# Azure ML service Web Interface で Dataset が登録済みの場合
df = Dataset.get(ws, name='factory').to_pandas_dataframe() # Pandas Dataframe に変換
df.head()



Unnamed: 0,ID,Quality,ProcessA-Pressure,ProcessA-Humidity,ProcessA-Vibration,ProcessB-Light,ProcessB-Skill,ProcessB-Temp,ProcessB-Rotation,ProcessC-Density,ProcessC-PH,ProcessC-skewness,ProcessC-Time
0,1,0,7.0,0.27,0.36,20.7,0.04,45.0,170.0,1.0,3.0,0.45,8.8
1,2,0,6.3,0.3,0.34,1.6,0.05,14.0,132.0,0.99,3.3,0.49,9.5
2,3,0,8.1,0.28,0.4,6.9,0.05,30.0,97.0,1.0,3.26,0.44,10.1
3,4,0,7.2,0.23,0.32,8.5,0.06,47.0,186.0,1.0,3.19,0.4,9.9
4,5,0,7.2,0.23,0.32,8.5,0.06,47.0,186.0,1.0,3.19,0.4,9.9


In [5]:
# #  本ノートブックで Dataset として登録する場合はこちら (ここでは、factory-dataset という名称)
# datastore = ws.get_default_datastore()
# datastore.upload_files(files = ['../data/Factory.csv'],
#                        target_path = 'dllab/',
#                        overwrite = True,
#                        show_progress = True)
# dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'dllab/Factory.csv')])


# dataset = dataset.register(workspace = ws,
#                            name = 'factory-dataset',
#                            description='training dataset from client python',
#                            create_new_version=True)
# df = dataset.to_pandas_dataframe() # Pandas Dataframe に変換
# df.head()

In [6]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=["Quality", "ID"],axis=1)
y = df["Quality"].values

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.1,random_state=100,stratify=y)

In [7]:
X_train.head()

Unnamed: 0,ProcessA-Pressure,ProcessA-Humidity,ProcessA-Vibration,ProcessB-Light,ProcessB-Skill,ProcessB-Temp,ProcessB-Rotation,ProcessC-Density,ProcessC-PH,ProcessC-skewness,ProcessC-Time
3294,7.3,0.25,0.28,1.5,0.04,19.0,113.0,0.99,3.38,0.56,10.1
1935,8.8,0.34,0.33,9.7,0.04,46.0,172.0,1.0,3.08,0.4,10.2
917,7.7,0.3,0.32,1.6,0.04,23.0,124.0,0.99,2.93,0.33,11.0
1478,7.9,0.22,0.24,4.6,0.04,39.0,159.0,0.99,2.99,0.28,11.5
858,6.7,0.22,0.39,10.2,0.04,60.0,149.0,1.0,3.17,0.54,10.0


In [8]:
y_train

array([0, 0, 0, ..., 1, 0, 0], dtype=int64)

## 2. 自動機械学習 Automated Machine Learning
### 学習事前設定

<div style="text-align: left">

|Property|Description|
|-|-|
|**task**|classification, regression or forecasting|
|**primary_metric**|精度指標の指定, 回帰は下記のメトリックをサポート: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i><br>※ 詳細については、[主要なメトリック](https://docs.microsoft.com/ja-JP/azure/machine-learning/service/how-to-configure-auto-train#primary-metric) を参照|
|**iteration_timeout_minutes**|イテレーション毎の最大実行時間|
|**iterations**|イテレーション回数 (=試行するパイプライン数) |
|**X**|学習データ (説明変数)|
|**y**|学習データ (ターゲット変数)|
    
</div>



In [9]:
automl_settings = {
    "iteration_timeout_minutes": 5,
    "iterations": 5,
    "n_cross_validations": 3,
    "primary_metric": 'AUC_weighted',
    "preprocess": True,
    "enable_voting_ensemble": False,
    "enable_stack_ensemble": False
}

automl_config = AutoMLConfig(task = 'classification', # regression, forecasting
                             X = X_train,
                             y = y_train,
                             **automl_settings
                            )

### 実行と結果確認

In [10]:
local_run = experiment.submit(automl_config, show_output = True)

Running on local machine
Parent Run ID: AutoML_35685078-a4d6-46b0-b761-1cd0137037f1
Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS SUMMARY:
For more details, use API: run.get_guardrails()

TYPE:         Class Balancing Detection
STATUS:       PASSED
DESCRIPTION:  Classes are balanced in the training data.

TYPE:         Missing Values Imputation
STATUS:       PASSED
DESCRIPTION:  There were no missing values found in the training data.

TYPE:         High Cardinality Feature Detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were an

In [11]:
# Widget で結果確認
from azureml.widgets import RunDetails
RunDetails(local_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [12]:
# 詳細ログの出力
local_run.get_details()

{'runId': 'AutoML_35685078-a4d6-46b0-b761-1cd0137037f1',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2019-10-04T04:33:00.602713Z',
 'endTimeUtc': '2019-10-04T04:35:14.483804Z',
 'properties': {'num_iterations': '5',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'MaxTimeSeconds': '300',
  'acquisition_parameter': '0',
  'num_cross_validation': '3',
  'target': 'local',
  'DataPrepJsonString': None,
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.0.65", "azureml-train": "1.0.65", "azureml-train-restclients-hyperdrive": "1.0.65", "azureml-train-core": "1.0.65", "azureml-train-automl": "1.0.65", "azureml-telemetry": "1.0.65", "azureml-sdk": "1.0.65", "azureml-pipeline": "1.0.65", "azureml-pipeline-steps": "1.0.65", "azureml-pipeline-core": "1.0.65", "azureml-exp

#### チャンピョンモデルの取得

In [13]:
best_run, fitted_model = local_run.get_output()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-classif-factoryQC,AutoML_35685078-a4d6-46b0-b761-1cd0137037f1_2,,Completed,Link to Azure Portal,Link to Documentation


### モデルの理解
参考 : [自動化された ML モデルを理解する](https://docs.microsoft.com/ja-JP/azure/machine-learning/service/how-to-configure-auto-train#understand-automated-ml-models)

In [14]:
fitted_model.named_steps['datatransformer'].get_featurization_summary()

[{'RawFeatureName': 'ProcessA-Pressure',
  'TypeDetected': 'Numeric',
  'Dropped': 'No',
  'EngineeredFeatureCount': 1,
  'Tranformations': ['MeanImputer']},
 {'RawFeatureName': 'ProcessA-Humidity',
  'TypeDetected': 'Numeric',
  'Dropped': 'No',
  'EngineeredFeatureCount': 1,
  'Tranformations': ['MeanImputer']},
 {'RawFeatureName': 'ProcessA-Vibration',
  'TypeDetected': 'Numeric',
  'Dropped': 'No',
  'EngineeredFeatureCount': 1,
  'Tranformations': ['MeanImputer']},
 {'RawFeatureName': 'ProcessB-Light',
  'TypeDetected': 'Numeric',
  'Dropped': 'No',
  'EngineeredFeatureCount': 1,
  'Tranformations': ['MeanImputer']},
 {'RawFeatureName': 'ProcessB-Skill',
  'TypeDetected': 'Numeric',
  'Dropped': 'No',
  'EngineeredFeatureCount': 1,
  'Tranformations': ['MeanImputer']},
 {'RawFeatureName': 'ProcessB-Temp',
  'TypeDetected': 'Numeric',
  'Dropped': 'No',
  'EngineeredFeatureCount': 1,
  'Tranformations': ['MeanImputer']},
 {'RawFeatureName': 'ProcessB-Rotation',
  'TypeDetected': 'N

In [15]:
from pprint import pprint


def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        else:
            pprint(step[1].get_params())
            print()


print_model(fitted_model)

datatransformer
{'enable_feature_sweeping': None,
 'feature_sweeping_timeout': None,
 'featurization_config': None,
 'is_cross_validation': None,
 'is_onnx_compatible': None,
 'jasmine_client': None,
 'logger': None,
 'observer': None,
 'parent_run_id': 'AutoML_35685078-a4d6-46b0-b761-1cd0137037f1',
 'task': None}

MinMaxScaler
{'copy': True, 'feature_range': (0, 1)}

LightGBMClassifier
{'boosting_type': 'goss',
 'class_weight': None,
 'colsample_bytree': 0.7922222222222222,
 'importance_type': 'split',
 'learning_rate': 0.1,
 'max_bin': 170,
 'max_depth': 4,
 'min_child_samples': 168,
 'min_child_weight': 4,
 'min_split_gain': 0.8421052631578947,
 'n_estimators': 50,
 'n_jobs': 1,
 'num_leaves': 62,
 'objective': None,
 'random_state': None,
 'reg_alpha': 0.7894736842105263,
 'reg_lambda': 0.15789473684210525,
 'silent': True,
 'subsample': 1,
 'subsample_for_bin': 200000,
 'subsample_freq': 0,
 'verbose': -10}



## 3. モデルの解釈

Azure Machine Learning Interpretability SDK は、Microsoftと主要な3rd Partyのライブラリ(LIME,SHAP etc)で構成されたモデル解釈のフレームワークで、統合APIをご提供しています。

In [16]:
# Automated ML から情報を収集
from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations
automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, X_test=X_test, y=y_train, task='classification')



Current status: Setting up data for AutoMl explanations
Current status: Setting up the AutoML featurization for explanations
Current status: Setting up the AutoML estimator
Current status: Setting up the AutoML featurizer
Current status: Generating a feature map for raw feature importance
Current status: Finding all classes from the dataset
Current status: Data for AutoMl explanations successfully setup


In [17]:
# Automated ML のモデルを解釈する際は、MimicWrapper を利用
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
from azureml.explain.model.mimic_wrapper import MimicWrapper
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, 
                         LGBMExplainableModel, 
                         init_dataset=automl_explainer_setup_obj.X_transform, run=best_run,
                         features=automl_explainer_setup_obj.engineered_feature_names, 
                         feature_maps=[automl_explainer_setup_obj.feature_map],
                         classes=automl_explainer_setup_obj.classes)

In [18]:
raw_explanations = explainer.explain(['local', 'global'], get_raw=True, 
                                     raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
                                     eval_dataset=automl_explainer_setup_obj.X_test_transform)
#print(raw_explanations.get_feature_importance_dict())

In [19]:
# Global, Local なモデルの解釈専用のダッシュボード
from azureml.contrib.explain.model.visualize import ExplanationDashboard
ExplanationDashboard(raw_explanations, automl_explainer_setup_obj.automl_pipeline, automl_explainer_setup_obj.X_test_raw)

ExplanationWidget(value={'predictedY': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…

<azureml.contrib.explain.model.visualize.ExplanationDashboard.ExplanationDashboard at 0x123bb6e48>