# 自動機械学習 Automated Machine Learning による品質管理モデリング & モデル解釈 (リモート高速実行)

1. 事前準備
    - Python SDK のインポート
    - Azure ML `Workspace` への接続
    - `Experiment` の作成
    - `Dataset` の作成と登録


2. 自動機械学習 Automated Machine Learning
    - 計算環境 `Machine Learning Compute` の準備
    - 自動機械学習 Automated ML の事前設定
    - モデル学習と結果の確認


3. モデル解釈

## 1. 事前準備
### Python SDK のインポート
Azure Machine Learning service の Python SDKをインポートします

In [1]:
import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.train.automl import AutoMLConfig

In [2]:
# バージョン確認
azureml.core.VERSION

'1.0.76'

その他、分析に必要なライブラリをインポートします。

In [3]:
import logging
from matplotlib import pyplot as plt
import pandas as pd
import os

### Azure ML workspace との接続
Azure Machine Learning との接続を行います。Azure Active Directory の認証が必要です。

In [4]:
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

azureml	eastus	mlservice	eastus


### 実験名の設定
Azure Machine Learing では 実験を管理する仕組みがあります。自動機械学習は自動的にその実験管理の仕組みでメトリックやログが残ります。

In [5]:
# choose a name for experiment
experiment_name = 'automl-classif-factoryQC-remote'
experiment=Experiment(ws, experiment_name)

### Dataset として登録


Azure Machine Learning の計算環境 (Machine Learning Compute) で学習を回すために、Azure Machine Learning の Dataset のフォーマットでデータを定義します。

In [6]:
dataset = Dataset.get_by_name(ws, name='factory')
dataset.take(5).to_pandas_dataframe()



Unnamed: 0,ID,Quality,ProcessA-Pressure,ProcessA-Humidity,ProcessA-Vibration,ProcessB-Light,ProcessB-Skill,ProcessB-Temp,ProcessB-Rotation,ProcessC-Density,ProcessC-PH,ProcessC-skewness,ProcessC-Time
0,1,0,7.0,0.27,0.36,20.7,0.04,45.0,170.0,1.0,3.0,0.45,8.8
1,2,0,6.3,0.3,0.34,1.6,0.05,14.0,132.0,0.99,3.3,0.49,9.5
2,3,0,8.1,0.28,0.4,6.9,0.05,30.0,97.0,1.0,3.26,0.44,10.1
3,4,0,7.2,0.23,0.32,8.5,0.06,47.0,186.0,1.0,3.19,0.4,9.9
4,5,0,7.2,0.23,0.32,8.5,0.06,47.0,186.0,1.0,3.19,0.4,9.9


In [7]:
#予測変数の指定
label = 'Quality'

### 学習データとテストデータに分割

学習データとテストデータに分割します。テストデータはモデル学習ではなく、後半のモデル解釈のローカル解釈の中で利用します。

In [8]:
train_dataset, test_dataset = dataset.random_split(0.8, seed=1234)

In [9]:
train_dataset = train_dataset.register(workspace = ws, name = 'FactoryTrain', description = 'Factory AutoML workshop', create_new_version=True)
test_dataset = test_dataset.register(workspace = ws, name = 'FactoryTest', description = 'Factory AutoML workshop', create_new_version=True)

### 計算環境 (Machine Learning Compute) の設定

In [10]:
# 予め cpucluster という名称の Machine Learning Compute を作成しておく
from azureml.core.compute import ComputeTarget
compute_target = ComputeTarget(ws, "cpucluster")

## 2. 自動機械学習 Automated Machine Learning
### 学習事前設定

In [11]:
automl_settings = {
    "iteration_timeout_minutes": 5, # 各試行の最大実験時間
    "iterations": 10,  # 試行する機械学習パイプラインの数
    #"max_concurrent_iterations": 4,
    "max_cores_per_iteration": -1,
    "n_cross_validations": 3,
    "primary_metric": 'accuracy', # 精度指標
    "preprocess": True,
    "enable_voting_ensemble": False,
    "enable_stack_ensemble": False
}

automl_config = AutoMLConfig(task = 'classification',
                             training_data = train_dataset,
                             label_column_name = label,                             
                             #compute_target=compute_target,
                             #model_explainability = True,
                             **automl_settings
                            )

### 実行と結果確認

一番最初の実行は Docker Image を作成するため、20 〜 30分ほど時間がかかります。Dokcer Image を Build している様子は Azure Machine Learning studio から確認できます。

In [12]:
remote_run = experiment.submit(automl_config, show_output = True)

Running on local machine


This means that in case of installing LightGBM from PyPI via the ``pip install lightgbm`` command, you don't need to install the gcc compiler anymore.
Instead of that, you need to install the OpenMP library, which is required for running LightGBM on the system with the Apple Clang compiler.
You can install the OpenMP library by the following command: ``brew install libomp``.


Parent Run ID: AutoML_23cbe687-7d8b-411e-a568-693ddae43165

Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS SUMMARY:
For more details, use API: run.get_guardrails()

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Classes are balanced in the training data.

TYPE:         Missing values imputation
STATUS:       PASSED
DESCRIPTION:  There were no missing values found in the training data.

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high card

In [13]:
# Widget で結果確認
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'NOTSET', …

In [14]:
# 詳細ログの出力
#remote_run.get_details()

In [15]:
#remote_run.get_guardrails()

In [16]:
best_run, fitted_model = remote_run.get_output()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-classif-factoryQC-remote,AutoML_23cbe687-7d8b-411e-a568-693ddae43165_0,,Completed,Link to Azure Machine Learning studio,Link to Documentation


### 機械学習モデルの理解

In [17]:
pd.DataFrame(fitted_model.named_steps['datatransformer'].get_engineered_feature_names())

Unnamed: 0,0
0,ID_MeanImputer
1,ProcessA-Pressure_MeanImputer
2,ProcessA-Humidity_MeanImputer
3,ProcessA-Vibration_MeanImputer
4,ProcessB-Light_MeanImputer
5,ProcessB-Skill_MeanImputer
6,ProcessB-Temp_MeanImputer
7,ProcessB-Rotation_MeanImputer
8,ProcessC-Density_MeanImputer
9,ProcessC-PH_MeanImputer


In [18]:
pd.DataFrame.from_records(fitted_model.named_steps['datatransformer'].get_featurization_summary())

Unnamed: 0,Dropped,EngineeredFeatureCount,RawFeatureName,Transformations,TypeDetected
0,No,1,ID,[MeanImputer],Numeric
1,No,1,ProcessA-Pressure,[MeanImputer],Numeric
2,No,1,ProcessA-Humidity,[MeanImputer],Numeric
3,No,1,ProcessA-Vibration,[MeanImputer],Numeric
4,No,1,ProcessB-Light,[MeanImputer],Numeric
5,No,1,ProcessB-Skill,[MeanImputer],Numeric
6,No,1,ProcessB-Temp,[MeanImputer],Numeric
7,No,1,ProcessB-Rotation,[MeanImputer],Numeric
8,No,1,ProcessC-Density,[MeanImputer],Numeric
9,No,1,ProcessC-PH,[MeanImputer],Numeric


## 3. モデル解釈
Azure Machine Learning には Automated ML のモデルを解釈する仕組みがあります。詳しくは [モデルを解釈する方法](https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-machine-learning-interpretability#how-to-interpret-your-model)をご参照ください。

In [19]:
# Pandas Dataframe に変換
train_df = train_dataset.to_pandas_dataframe()
test_df = test_dataset.to_pandas_dataframe()

In [20]:
from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations
automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, 
                                                             X=train_df.drop([label], axis=1), 
                                                             X_test=test_df.drop([label], axis=1), 
                                                             y=train_df[label].values, 
                                                             task='classification')

Current status: Setting up data for AutoML explanations
Current status: Setting up the AutoML featurizer
Current status: Setting up the AutoML featurization for explanations
Current status: Setting up the AutoML estimator
Current status: Generating a feature map for raw feature importance
Current status: Finding all classes from the dataset
Current status: Data for AutoML explanations successfully setup


`MimicWrapper` は **Global Surrogaete Model** によるグローバルなモデル解釈を実現します。ここでは LightGBM を用いています。

In [21]:
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
from azureml.explain.model.mimic_wrapper import MimicWrapper
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, 
                         init_dataset=automl_explainer_setup_obj.X_transform, run=best_run,
                         features=automl_explainer_setup_obj.engineered_feature_names, 
                         feature_maps=[automl_explainer_setup_obj.feature_map],
                         classes=automl_explainer_setup_obj.classes)

## Engineered Explanation
特徴量エンジニアリングで生成された変数を用いたモデル解釈を行います。

In [22]:
# Compute the engineered explanations
engineered_explanations = explainer.explain(['local', 'global'],get_raw=False,
                                            eval_dataset=automl_explainer_setup_obj.X_test_transform)

In [23]:
# ダッシュボードの表示
from interpret_community.widget import ExplanationDashboard
ExplanationDashboard(engineered_explanations, 
                     automl_explainer_setup_obj.automl_estimator, 
                     datasetX=automl_explainer_setup_obj.X_test_transform)

ExplanationWidget(value={'predictedY': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0…

<interpret_community.widget.ExplanationDashboard.ExplanationDashboard at 0x11efb4630>

## Raw Explanation
特徴量エンジニアリング前の変数を用いたモデル解釈を行います。

In [24]:
raw_explanations = explainer.explain(['local', 'global'], get_raw=True, 
                                     raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
                                     eval_dataset=automl_explainer_setup_obj.X_test_transform)

In [25]:
# ダッシュボードの表示
from interpret_community.widget import ExplanationDashboard
ExplanationDashboard(raw_explanations, 
                     automl_explainer_setup_obj.automl_pipeline, 
                     datasetX=automl_explainer_setup_obj.X_test_raw)

ExplanationWidget(value={'predictedY': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0…

<interpret_community.widget.ExplanationDashboard.ExplanationDashboard at 0x11d35c6a0>