# 自動機械学習 Automated Machine Learning による自動車価格予測モデリング & モデル解釈

中古車の属性データから価格を予測するモデルを構築します。
- Python SDK のインポート
- Azure ML service Workspace への接続
- Experiment の作成
- データの準備
- 自動機械学習の事前設定
- モデル学習と結果の確認
- モデル解釈

## 1. 事前準備
### Python SDK のインポート
Azure Machine Learning service の Python SDKをインポートします

In [1]:
import logging

from matplotlib import pyplot as plt
import pandas as pd
import os

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.train.automl import AutoMLConfig

W0923 09:40:49.861061 4500960704 deprecation_wrapper.py:119] From /Users/konabuta/miniconda3/envs/myenv/lib/python3.6/site-packages/azureml/automl/core/_vendor/automl/client/core/common/tf_wrappers.py:36: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

W0923 09:40:49.862338 4500960704 deprecation_wrapper.py:119] From /Users/konabuta/miniconda3/envs/myenv/lib/python3.6/site-packages/azureml/automl/core/_vendor/automl/client/core/common/tf_wrappers.py:36: The name tf.logging.ERROR is deprecated. Please use tf.compat.v1.logging.ERROR instead.



### Azure ML workspace との接続
Azure Machine Learning service との接続を行います。Azure に対する認証が必要です。

In [2]:
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

azureml	eastus	mlservice	eastus


### 実験名の設定

In [3]:
# choose a name for experiment
experiment_name = 'automl-regression-automobile'
experiment=Experiment(ws, experiment_name)

### 学習データの準備

In [4]:
df = Dataset.get(ws, name='automobile').to_pandas_dataframe()

In [5]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=["price"],axis=1)
y = df["price"].values

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.1,random_state=100)

## 2. 自動機械学習 Automated Machine Learning
### 学習事前設定

In [6]:
automl_settings = {
    "iteration_timeout_minutes": 5,
    "iterations": 5,
    "n_cross_validations": 2,
    "primary_metric": 'normalized_mean_absolute_error',
    "preprocess": True,
    "enable_voting_ensemble": False,
    "enable_stack_ensemble": False
}

automl_config = AutoMLConfig(task = 'regression',
                             X = X_train,
                             y = y_train,
                             **automl_settings
                            )

### 実行と結果確認

In [7]:
local_run = experiment.submit(automl_config, show_output = True)

Running on local machine
Parent Run ID: AutoML_c477d7ec-6000-4ac2-b554-7690d1de094c
Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS SUMMARY:
For more details, use API: run.get_guardrails()

TYPE:         Missing Values Imputation
STATUS:       FIXED
DESCRIPTION:  The training data had the following missing values which were resolved.

Please review your data source for data quality issues and possibly filter out the rows with these missing values.

If the missing values are expected, you can either accept the above imputation, or implement

In [8]:
from azureml.widgets import RunDetails
RunDetails(local_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [9]:
# 詳細ログの出力
local_run.get_details()

{'runId': 'AutoML_c477d7ec-6000-4ac2-b554-7690d1de094c',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2019-09-23T00:41:54.906407Z',
 'endTimeUtc': '2019-09-23T00:44:01.359189Z',
 'properties': {'num_iterations': '5',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'normalized_mean_absolute_error',
  'train_split': '0',
  'MaxTimeSeconds': '300',
  'acquisition_parameter': '0',
  'num_cross_validation': '2',
  'target': 'local',
  'DataPrepJsonString': None,
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'regression',
  'dependencies_versions': '{"azureml-widgets": "1.0.62", "azureml-train": "1.0.62", "azureml-train-restclients-hyperdrive": "1.0.62", "azureml-train-core": "1.0.62", "azureml-train-automl": "1.0.62", "azureml-telemetry": "1.0.62", "azureml-sdk": "1.0.62", "azureml-pipeline": "1.0.62", "azureml-pipeline-steps": "1.0.62", "azureml-pipeline-core": "1.0.62"

In [10]:
best_run, fitted_model = local_run.get_output()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-regression-automobile,AutoML_c477d7ec-6000-4ac2-b554-7690d1de094c_4,,Completed,Link to Azure Portal,Link to Documentation


### モデルの理解

In [11]:
fitted_model.named_steps['datatransformer'].get_featurization_summary()

[{'RawFeatureName': 'symboling',
  'TypeDetected': 'Categorical',
  'Dropped': 'No',
  'EngineeredFeatureCount': 6,
  'Tranformations': ['StringCast-CharGramCountVectorizer']},
 {'RawFeatureName': 'fuel-type',
  'TypeDetected': 'Categorical',
  'Dropped': 'No',
  'EngineeredFeatureCount': 1,
  'Tranformations': ['ModeCatImputer-StringCast-LabelEncoder']},
 {'RawFeatureName': 'aspiration',
  'TypeDetected': 'Categorical',
  'Dropped': 'No',
  'EngineeredFeatureCount': 1,
  'Tranformations': ['ModeCatImputer-StringCast-LabelEncoder']},
 {'RawFeatureName': 'num-of-doors',
  'TypeDetected': 'Categorical',
  'Dropped': 'No',
  'EngineeredFeatureCount': 3,
  'Tranformations': ['StringCast-CharGramCountVectorizer']},
 {'RawFeatureName': 'body-style',
  'TypeDetected': 'Categorical',
  'Dropped': 'No',
  'EngineeredFeatureCount': 5,
  'Tranformations': ['StringCast-CharGramCountVectorizer']},
 {'RawFeatureName': 'drive-wheels',
  'TypeDetected': 'Categorical',
  'Dropped': 'No',
  'EngineeredF

In [12]:
from pprint import pprint


def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        else:
            pprint(step[1].get_params())
            print()


print_model(fitted_model)

datatransformer
{'enable_feature_sweeping': None,
 'feature_sweeping_timeout': None,
 'is_onnx_compatible': None,
 'logger': None,
 'observer': None,
 'task': None}

StandardScalerWrapper
{'class_name': 'StandardScaler',
 'copy': True,
 'module_name': 'sklearn.preprocessing.data',
 'with_mean': False,
 'with_std': False}

DecisionTreeRegressor
{'criterion': 'friedman_mse',
 'max_depth': None,
 'max_features': 0.9,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 0.006056302831963706,
 'min_samples_split': 0.015297321160913582,
 'min_weight_fraction_leaf': 0.0,
 'presort': False,
 'random_state': None,
 'splitter': 'best'}



## 3. モデル解釈

In [13]:
from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations

automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, X_test=X_test, y=y_train, task='regression')

Current status: Setting up data for AutoMl explanations
Current status: Setting up the AutoML featurization for explanations
Current status: Setting up the AutoML estimator
Current status: Setting up the AutoML featurizer
Current status: Generating a feature map for raw feature importance
Current status: Data for AutoMl explanations successfully setup


In [14]:
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
from azureml.explain.model.mimic_wrapper import MimicWrapper
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, 
                         init_dataset=automl_explainer_setup_obj.X_transform, run=best_run,
                         features=automl_explainer_setup_obj.engineered_feature_names, 
                         feature_maps=[automl_explainer_setup_obj.feature_map],
                         classes=automl_explainer_setup_obj.classes)

Using older than supported version of lightgbm, please upgrade to version greater than 2.2.1


In [15]:
raw_explanations = explainer.explain(['local', 'global'], get_raw=True, 
                                     raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
                                     eval_dataset=automl_explainer_setup_obj.X_test_transform)
#print(raw_explanations.get_feature_importance_dict())

In [16]:
from azureml.contrib.explain.model.visualize import ExplanationDashboard
ExplanationDashboard(raw_explanations, automl_explainer_setup_obj.automl_pipeline, automl_explainer_setup_obj.X_test_raw)

ExplanationWidget(value={'predictedY': [7397.666666666667, 8376.5, 10470.0, 10787.5, 7712.0, 8467.0, 18785.0, …

<azureml.contrib.explain.model.visualize.ExplanationDashboard.ExplanationDashboard at 0x14341e160>