Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing.png)

# 自动机器学习 AutoML
_**使用银行营销数据集进行分类和部署**_

## 内容
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Train](#Train)
1. [Results](#Results)
1. [Deploy](#Deploy)
1. [Test](#Test)
1. [Acknowledgements](#Acknowledgements)

## Introduction

在本例中，我们使用UCI银行营销数据集展示如何使用AutoML解决分类问题，并将其部署到Azure容器实例（ACI）。分类的目标是预测客户是否会向银行认购定期存款。

如果您使用的是Azure机器学习计算实例，则已设置完毕。否则，如果尚未建立到AzureML工作区的连接，请先查看[配置]（https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb）笔记本。

请在[此处](https://github.com/onnx/onnx)查找ONNX相关文档

在本笔记本中，您将学习如何：
1. 使用现有工作区创建实验。
1. 使用“AutoMLConfig”配置AutoML。
1. 使用ONNX兼容配置打开的本地计算来训练模型。
1. 探索结果、特征化透明度选项并保存ONNX模型
1. 使用ONNX模型进行推理。
1. 注册模型。
1. 创建容器映像。
1. 创建Azure容器实例（ACI）服务。
1. 测试ACI服务。

此外，此笔记本显示以下功能
-**阻止**某些管道执行
- 指定**目标指标**表示停止标准
- 处理数据源的**缺失数据**

## Setup

作为设置的一部分，您已经创建了Azure ML`Workspace`对象。对于AutoML，您需要创建一个“实验”对象，它是用于运行实验的“工作区”中的命名对象。

In [41]:
import logging

from matplotlib import pyplot as plt
import pandas as pd
import os

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.automl.core.featurization import FeaturizationConfig
from azureml.core.dataset import Dataset
from azureml.train.automl import AutoMLConfig
from azureml.interpret import ExplanationClient

此示例笔记本可能使用Azure ML SDK早期版本中不可用的功能。

In [42]:
print("This notebook was created using version 1.32.0 of the Azure ML SDK")
print("You are currently using version", azureml.core.VERSION, "of the Azure ML SDK")

This notebook was created using version 1.32.0 of the Azure ML SDK
You are currently using version 1.32.0 of the Azure ML SDK


访问Azure ML工作区需要使用Azure进行身份验证。

默认身份验证是使用默认租户的交互式身份验证。执行下面单元格中的`ws=Workspace.from_config()`行将在第一次运行时提示进行身份验证。

如果您有多个Azure租户，可以通过将下面单元格中的`ws=Workspace.from_config()`行替换为以下内容来指定租户：

```
from azureml.core.authentication import InteractiveLoginAuthentication
auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')
ws = Workspace.from_config(auth = auth)
```

如果需要在无法进行交互式登录的环境中运行，可以使用服务主体身份验证，方法是将下面单元格中的`ws=Workspace.from_config()`行替换为以下内容：

```
from azureml.core.authentication import ServicePrincipalAuthentication
auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')
ws = Workspace.from_config(auth = auth)
```
有关详细信息，请参阅[aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)

In [43]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'automl-classification-bmarketing-all'

experiment=Experiment(ws, experiment_name)

output = {}
output['Subscription ID'] = ws.subscription_id
output['Workspace'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Experiment Name'] = experiment.name
pd.set_option('display.max_colwidth', -1)
outputDf = pd.DataFrame(data = output, index = [''])
outputDf.T

Unnamed: 0,Unnamed: 1
Subscription ID,1bab3e78-2764-4555-845d-297f7a7ca7c0
Workspace,amlwsea
Resource Group,aml
Location,eastasia
Experiment Name,automl-classification-bmarketing-all


## 创建或附加现有AmlCompute
您需要为AutoML运行创建一个计算目标。在本教程中，您将创建AmlCompute作为培训计算资源。

> 请注意，如果您具有AzureML数据科学家角色，您将没有创建计算资源的权限。如果本节中描述的计算目标不存在，请与您的工作区或IT管理员联系，以创建这些目标。

#### 创建AmlCompute大约需要5分钟。
如果具有该名称的AmlCompute已在您的工作区中，则此代码将跳过创建过程。
与其他Azure服务一样，与Azure机器学习服务相关的某些资源（如AmlCompute）也有限制。请阅读[本文](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas)关于默认限制以及如何请求更多配额。

In [44]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
# cpu_cluster_name = "cpu-cluster-4"
cpu_cluster_name = "zhzhen1"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',
                                                           max_nodes=6)
    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Found existing cluster, use it.

Running


# Data

### Load Data -> 加载数据

利用azure compute将银行营销数据集作为表格数据集加载到数据集变量中。

### Training Data

In [45]:
data = pd.read_csv("https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv")
data.head()

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,57,technician,married,high.school,no,no,yes,cellular,may,mon,...,1,999,1,failure,-1.8,92.89,-46.2,1.3,5099.1,no
1,55,unknown,married,unknown,unknown,yes,no,telephone,may,thu,...,2,999,0,nonexistent,1.1,93.99,-36.4,4.86,5191.0,no
2,33,blue-collar,married,basic.9y,no,no,no,cellular,may,fri,...,1,999,1,failure,-1.8,92.89,-46.2,1.31,5099.1,no
3,36,admin.,married,high.school,no,no,no,telephone,jun,fri,...,4,999,0,nonexistent,1.4,94.47,-41.8,4.97,5228.1,no
4,27,housemaid,married,high.school,no,yes,no,cellular,jul,fri,...,2,999,0,nonexistent,1.4,93.92,-42.7,4.96,5228.1,no


In [46]:
# Add missing values in 75% of the lines. -> 在75%的行中添加缺少的值。
import numpy as np

missing_rate = 0.75
n_missing_samples = int(np.floor(data.shape[0] * missing_rate))
missing_samples = np.hstack((np.zeros(data.shape[0] - n_missing_samples, dtype=np.bool), np.ones(n_missing_samples, dtype=np.bool)))
rng = np.random.RandomState(0)
rng.shuffle(missing_samples)
missing_features = rng.randint(0, data.shape[1], n_missing_samples)
data.values[np.where(missing_samples)[0], missing_features] = np.nan

In [47]:
if not os.path.isdir('data'):
    os.mkdir('data')
    
# Save the train data to a csv to be uploaded to the datastore -> 将列车数据保存到csv，以上载到数据存储
pd.DataFrame(data).to_csv("data/train_data.csv", index=False)

ds = ws.get_default_datastore()
ds.upload(src_dir='./data', target_path='bankmarketing', overwrite=True, show_progress=True)

 

# Upload the training data as a tabular dataset for access during training on remote compute -> 将培训数据上传为表格数据集，以便在远程计算培训期间访问
train_data = Dataset.Tabular.from_delimited_files(path=ds.path('bankmarketing/train_data.csv'))
label = "y"

Uploading an estimated of 1 files
Uploading ./data/train_data.csv
Uploaded ./data/train_data.csv, 1 files out of an estimated total of 1
Uploaded 1 files


### Validation Data

In [48]:
validation_data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv"
validation_dataset = Dataset.Tabular.from_delimited_files(validation_data)

### Test Data

In [49]:
test_data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_test.csv"
test_dataset = Dataset.Tabular.from_delimited_files(test_data)

## Train

实例化AutoMLConfig对象。这定义了用于运行实验的设置和数据。

|Property|Description|
|-|-|
|**task**|classification or regression or forecasting|
|**primary_metric**|这是您要优化的指标。Classification支持以下主要指标: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|
|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|
|**blocked_models** | *List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run. <br><br> Allowed values for **Classification**<br><i>LogisticRegression</i><br><i>SGD</i><br><i>MultinomialNaiveBayes</i><br><i>BernoulliNaiveBayes</i><br><i>SVM</i><br><i>LinearSVM</i><br><i>KNN</i><br><i>DecisionTree</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>GradientBoosting</i><br><i>TensorFlowDNN</i><br><i>TensorFlowLinearClassifier</i><br><br>Allowed values for **Regression**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><br>Allowed values for **Forecasting**<br><i>ElasticNet</i><br><i>GradientBoosting</i><br><i>DecisionTree</i><br><i>KNN</i><br><i>LassoLars</i><br><i>SGD</i><br><i>RandomForest</i><br><i>ExtremeRandomTrees</i><br><i>LightGBM</i><br><i>TensorFlowLinearRegressor</i><br><i>TensorFlowDNN</i><br><i>Arima</i><br><i>Prophet</i>|
|**allowed_models** | 表示AutoML在此运行中使用的机器学习算法的*字符串列表*。上面列出的相同值适用于 **blocked_models**  允许的**allowed_models**。|
|**experiment_exit_score**| 指示*primary_metric*目标的值<br>一旦超过目标，运行将终止。|
|**experiment_timeout_hours**| 在实验终止之前，所有迭代组合所能花费的最大时间（小时）。|
|**enable_early_stopping**| 如果分数在短期内没有改善，则表明可以提前终止。|
|**featurization**| 'auto' / 'off'  是否应自动完成特征化步骤的指示器。注意：如果输入数据稀疏，则无法启用特征化。|
|**n_cross_validations**| 交叉验证拆分的数目。|
|**training_data**|输入数据集，包含要素和标签列。|
|**label_column_name**|标签列的名称。|

**_您可以找到有关主要指标(primary metrics)的更多信息_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)

In [50]:
automl_settings = {
    "experiment_timeout_hours" : 0.3,
    "enable_early_stopping" : True,
    "iteration_timeout_minutes": 5,
    "max_concurrent_iterations": 4,
    "max_cores_per_iteration": -1,
    #"n_cross_validations": 2,
    "primary_metric": 'AUC_weighted',
    "featurization": 'auto',
    "verbosity": logging.INFO,
}

automl_config = AutoMLConfig(task = 'classification',
                             debug_log = 'automl_errors.log',
                             compute_target=compute_target,
                             experiment_exit_score = 0.9984,
                             blocked_models = ['KNN','LinearSVM'],
                             enable_onnx_compatible_models=True,
                             training_data = train_data,
                             label_column_name = label,
                             validation_data = validation_dataset,
                             **automl_settings
                            )

对实验对象调用'submit'方法并输入运行配置。本地运行的执行是同步的。取决于数据和迭代次数，这可能要运行一段时间。设置“show_output=True”时，将显示验证错误和当前状态，并且执行将是同步的。

In [None]:
remote_run = experiment.submit(automl_config, show_output = False)

运行以下单元格以访问以前的运行。取消对下面单元格的注释并更新运行id。

In [51]:
# from azureml.train.automl.run import AutoMLRun
# remote_run = AutoMLRun(experiment=experiment, run_id='AutoML_56941284-2658-46cd-9230-e0592db0fcef')
# remote_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-classification-bmarketing-all,AutoML_56941284-2658-46cd-9230-e0592db0fcef,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [None]:
# Wait for the remote run to complete -> 等待远程运行完成
remote_run.wait_for_completion()

In [52]:
best_run_customized, fitted_model_customized = remote_run.get_output()

Package:azureml-automl-runtime, training version:1.32.0, current version:1.18.0.post2
Package:azureml-dataprep, training version:2.18.0, current version:2.4.5
Package:azureml-dataprep-native, training version:36.0.0, current version:24.0.0
Package:azureml-dataprep-rslex, training version:1.16.1, current version:1.2.3
Package:azureml-dataset-runtime, training version:1.32.0, current version:1.18.0
Package:azureml-defaults, training version:1.32.0, current version:1.18.0
Package:azureml-interpret, training version:1.32.0, current version:1.18.0
Package:azureml-mlflow, training version:1.32.0, current version:1.31.0
Package:azureml-telemetry, training version:1.32.0, current version:1.18.0.post1
Package:azureml-train-automl-client, training version:1.32.0, current version:1.18.0
Package:azureml-train-automl-runtime, training version:1.32.0, current version:1.18.0.post1


## Transparency

查看更新的特征化摘要

In [53]:
custom_featurizer = fitted_model_customized.named_steps['datatransformer']
df = custom_featurizer.get_featurization_summary()
pd.DataFrame(data=df)

Unnamed: 0,RawFeatureName,TypeDetected,Dropped,EngineeredFeatureCount,Transformations
0,age,Numeric,No,1,[MeanImputer]
1,duration,Numeric,No,1,[MeanImputer]
2,emp.var.rate,Numeric,No,1,[MeanImputer]
3,cons.price.idx,Numeric,No,1,[MeanImputer]
4,cons.conf.idx,Numeric,No,1,[MeanImputer]
5,euribor3m,Numeric,No,1,[MeanImputer]
6,nr.employed,Numeric,No,1,[MeanImputer]
7,job,Categorical,No,12,[StringCast-CharGramCountVectorizer]
8,marital,Categorical,No,4,[StringCast-CharGramCountVectorizer]
9,education,Categorical,No,8,[StringCast-CharGramCountVectorizer]


设置`is_user_friendly=False`以获取所应用转换的更详细摘要。

In [54]:
df = custom_featurizer.get_featurization_summary(is_user_friendly=False)
pd.DataFrame(data=df)

Unnamed: 0,RawFeatureName,TypeDetected,Dropped,EngineeredFeatureCount,Transformations,TransformationParams
0,age,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['age'], 'TransformationFunction': 'Imputer', 'Operator': 'Mean', 'FeatureType': 'Numeric', 'ShouldOutput': True, 'TransformationParams': {'add_indicator': False, 'copy': True, 'fill_value': None, 'missing_values': nan, 'strategy': 'mean', 'verbose': 0}}}"
1,duration,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['duration'], 'TransformationFunction': 'Imputer', 'Operator': 'Mean', 'FeatureType': 'Numeric', 'ShouldOutput': True, 'TransformationParams': {'add_indicator': False, 'copy': True, 'fill_value': None, 'missing_values': nan, 'strategy': 'mean', 'verbose': 0}}}"
2,emp.var.rate,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['emp.var.rate'], 'TransformationFunction': 'Imputer', 'Operator': 'Mean', 'FeatureType': 'Numeric', 'ShouldOutput': True, 'TransformationParams': {'add_indicator': False, 'copy': True, 'fill_value': None, 'missing_values': nan, 'strategy': 'mean', 'verbose': 0}}}"
3,cons.price.idx,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['cons.price.idx'], 'TransformationFunction': 'Imputer', 'Operator': 'Mean', 'FeatureType': 'Numeric', 'ShouldOutput': True, 'TransformationParams': {'add_indicator': False, 'copy': True, 'fill_value': None, 'missing_values': nan, 'strategy': 'mean', 'verbose': 0}}}"
4,cons.conf.idx,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['cons.conf.idx'], 'TransformationFunction': 'Imputer', 'Operator': 'Mean', 'FeatureType': 'Numeric', 'ShouldOutput': True, 'TransformationParams': {'add_indicator': False, 'copy': True, 'fill_value': None, 'missing_values': nan, 'strategy': 'mean', 'verbose': 0}}}"
5,euribor3m,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['euribor3m'], 'TransformationFunction': 'Imputer', 'Operator': 'Mean', 'FeatureType': 'Numeric', 'ShouldOutput': True, 'TransformationParams': {'add_indicator': False, 'copy': True, 'fill_value': None, 'missing_values': nan, 'strategy': 'mean', 'verbose': 0}}}"
6,nr.employed,Numeric,No,1,[MeanImputer],"{'Transformer1': {'Input': ['nr.employed'], 'TransformationFunction': 'Imputer', 'Operator': 'Mean', 'FeatureType': 'Numeric', 'ShouldOutput': True, 'TransformationParams': {'add_indicator': False, 'copy': True, 'fill_value': None, 'missing_values': nan, 'strategy': 'mean', 'verbose': 0}}}"
7,job,Categorical,No,12,[StringCast-CharGramCountVectorizer],"{'Transformer1': {'Input': ['job'], 'TransformationFunction': 'StringCast', 'Operator': None, 'FeatureType': 'Categorical', 'ShouldOutput': False, 'TransformationParams': {}}, 'Transformer2': {'Input': ['Transformer1'], 'TransformationFunction': 'CountVectorizer', 'Operator': 'CharGram', 'FeatureType': None, 'ShouldOutput': True, 'TransformationParams': {'analyzer': 'word', 'binary': True, 'decode_error': 'strict', 'encoding': 'utf-8', 'input': 'content', 'lowercase': False, 'max_df': 1.0, 'max_features': None, 'min_df': 1, 'ngram_range': (1, 1), 'stop_words': None, 'strip_accents': None, 'token_pattern': '(?u)\b\w\w+\b', 'vocabulary': None}}}"
8,marital,Categorical,No,4,[StringCast-CharGramCountVectorizer],"{'Transformer1': {'Input': ['marital'], 'TransformationFunction': 'StringCast', 'Operator': None, 'FeatureType': 'Categorical', 'ShouldOutput': False, 'TransformationParams': {}}, 'Transformer2': {'Input': ['Transformer1'], 'TransformationFunction': 'CountVectorizer', 'Operator': 'CharGram', 'FeatureType': None, 'ShouldOutput': True, 'TransformationParams': {'analyzer': 'word', 'binary': True, 'decode_error': 'strict', 'encoding': 'utf-8', 'input': 'content', 'lowercase': False, 'max_df': 1.0, 'max_features': None, 'min_df': 1, 'ngram_range': (1, 1), 'stop_words': None, 'strip_accents': None, 'token_pattern': '(?u)\b\w\w+\b', 'vocabulary': None}}}"
9,education,Categorical,No,8,[StringCast-CharGramCountVectorizer],"{'Transformer1': {'Input': ['education'], 'TransformationFunction': 'StringCast', 'Operator': None, 'FeatureType': 'Categorical', 'ShouldOutput': False, 'TransformationParams': {}}, 'Transformer2': {'Input': ['Transformer1'], 'TransformationFunction': 'CountVectorizer', 'Operator': 'CharGram', 'FeatureType': None, 'ShouldOutput': True, 'TransformationParams': {'analyzer': 'word', 'binary': True, 'decode_error': 'strict', 'encoding': 'utf-8', 'input': 'content', 'lowercase': False, 'max_df': 1.0, 'max_features': None, 'min_df': 1, 'ngram_range': (1, 1), 'stop_words': None, 'strip_accents': None, 'token_pattern': '(?u)\b\w\w+\b', 'vocabulary': None}}}"


In [55]:
df = custom_featurizer.get_stats_feature_type_summary()
pd.DataFrame(data=df)

## Results

In [56]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show() 

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

### 检索最佳模型的解释
从最佳_运行中检索解释，其中包括工程特性和原始特性的解释。确保为最佳模型生成解释的运行已完成。

In [57]:
# Wait for the best model explanation run to complete
from azureml.core.run import Run
model_explainability_run_id = remote_run.id + "_" + "ModelExplain"
print(model_explainability_run_id)
model_explainability_run = Run(experiment=experiment, run_id=model_explainability_run_id)
model_explainability_run.wait_for_completion()

# Get the best run object
best_run, fitted_model = remote_run.get_output()

AutoML_56941284-2658-46cd-9230-e0592db0fcef_ModelExplain


Package:azureml-automl-runtime, training version:1.32.0, current version:1.18.0.post2
Package:azureml-dataprep, training version:2.18.0, current version:2.4.5
Package:azureml-dataprep-native, training version:36.0.0, current version:24.0.0
Package:azureml-dataprep-rslex, training version:1.16.1, current version:1.2.3
Package:azureml-dataset-runtime, training version:1.32.0, current version:1.18.0
Package:azureml-defaults, training version:1.32.0, current version:1.18.0
Package:azureml-interpret, training version:1.32.0, current version:1.18.0
Package:azureml-mlflow, training version:1.32.0, current version:1.31.0
Package:azureml-telemetry, training version:1.32.0, current version:1.18.0.post1
Package:azureml-train-automl-client, training version:1.32.0, current version:1.18.0
Package:azureml-train-automl-runtime, training version:1.32.0, current version:1.18.0.post1


#### Download engineered feature importance from artifact store -> 从工件库下载工程特征影响的重要性
您可以使用ExplanationClient从最佳运行的工件库下载工程特性解释。

In [58]:
client = ExplanationClient.from_run(best_run)
engineered_explanations = client.download_model_explanation(raw=False)
exp_data = engineered_explanations.get_feature_importance_dict()
exp_data

{'duration_MeanImputer': 1.0161931693493682,
 'nr.employed_MeanImputer': 0.5239865699015908,
 'cons.conf.idx_MeanImputer': 0.2174226550040145,
 'emp.var.rate_MeanImputer': 0.20101416384082046,
 'euribor3m_MeanImputer': 0.19203022291893437,
 'cons.price.idx_MeanImputer': 0.054616947985119534,
 'age_MeanImputer': 0.04776001485439931,
 'pdays_CharGramCountVectorizer_999': 0.034757647546202676,
 'poutcome_CharGramCountVectorizer_success': 0.03317393017530207,
 'default_CharGramCountVectorizer_no': 0.023340372231114574,
 'poutcome_CharGramCountVectorizer_failure': 0.023235888431868492,
 'month_CharGramCountVectorizer_oct': 0.020365296456522797,
 'job_CharGramCountVectorizer_blue-collar': 0.019141179488508797,
 'contact_ModeCatImputer_LabelEncoder': 0.017242977881314955,
 'education_CharGramCountVectorizer_university.degree': 0.016936309781585505,
 'day_of_week_CharGramCountVectorizer_wed': 0.015680151607357074,
 'campaign_CharGramCountVectorizer_2': 0.014592644929293328,
 'month_CharGramCou

#### Download raw feature importance from artifact store -> 从工件库下载原始特性重要性
您可以使用ExplanationClient从最佳运行的工件库下载原始特性解释。

In [59]:
client = ExplanationClient.from_run(best_run)
engineered_explanations = client.download_model_explanation(raw=True)
exp_data = engineered_explanations.get_feature_importance_dict()
exp_data

{'duration': 1.0161931693493682,
 'nr.employed': 0.5239865699015908,
 'cons.conf.idx': 0.2174226550040145,
 'emp.var.rate': 0.20101416384082046,
 'euribor3m': 0.19203022291893437,
 'poutcome': 0.056409818607170564,
 'cons.price.idx': 0.054616947985119534,
 'age': 0.04776001485439931,
 'month': 0.04138082509214909,
 'pdays': 0.040656015534370364,
 'day_of_week': 0.04014257734595896,
 'education': 0.030376669672807895,
 'default': 0.028860553348805943,
 'campaign': 0.027900783017970603,
 'job': 0.02764456984085936,
 'contact': 0.017242977881314955,
 'previous': 0.012943602508246634,
 'marital': 0.0017782749137085102,
 'housing': 0.0005155075928404481,
 'loan': 0.0004008046379516597}

### 获取最佳ONNX模型

下面我们从迭代中选择最佳候选方案。`get_output`方法返回最佳运行和拟合模型。该模型包括管道和任何预处理。`get_output`上的重载允许您检索*任何*记录度量或特定*迭代*的最佳运行和拟合模型。

设置参数return_onnx_model=True以检索最佳的ONNX模型，而不是Python模型。

In [60]:
best_run, onnx_mdl = remote_run.get_output(return_onnx_model=True)

### Save the best ONNX model

In [61]:
from azureml.automl.runtime.onnx_convert import OnnxConverter
onnx_fl_path = "./best_model.onnx"
OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)

### Predict with the ONNX model, using onnxruntime package -> 使用ONNX运行时包，来用ONNX模型进行预测

In [62]:
import sys
import json
from azureml.automl.core.onnx_convert import OnnxConvertConstants
from azureml.train.automl import constants

if sys.version_info < OnnxConvertConstants.OnnxIncompatiblePythonVersion:
    python_version_compatible = True
else:
    python_version_compatible = False

import onnxruntime
from azureml.automl.runtime.onnx_convert import OnnxInferenceHelper

def get_onnx_res(run):
    res_path = 'onnx_resource.json'
    run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)
    with open(res_path) as f:
        onnx_res = json.load(f)
    return onnx_res

if python_version_compatible:
    test_df = test_dataset.to_pandas_dataframe()
    mdl_bytes = onnx_mdl.SerializeToString()
    onnx_res = get_onnx_res(best_run)

    onnxrt_helper = OnnxInferenceHelper(mdl_bytes, onnx_res)
    pred_onnx, pred_prob_onnx = onnxrt_helper.predict(test_df)

    print(pred_onnx)
    print(pred_prob_onnx)
else:
    print('Please use Python version 3.6 or 3.7 to run the inference helper.')

['yes' 'no' 'no' ... 'yes' 'no' 'no']
[[0.24757645 0.75242364]
 [0.9726738  0.02732623]
 [0.889619   0.11038105]
 ...
 [0.38156956 0.6184305 ]
 [0.99136186 0.00863812]
 [0.99007356 0.00992649]]


## Deploy

### 获取最佳模型

下面我们从迭代中选择最佳管道。`get_output`方法返回最佳运行和拟合模型。`get_output`上的重载允许您检索*任何*记录度量或特定*迭代*的最佳运行和拟合模型。

#### Widget for Monitoring Runs -> 用于监视运行的小部件

小部件将在运行第一次迭代时首先报告“加载”状态。完成第一次迭代后，将显示一个自动更新的图表。该小部件将每分钟刷新一次，因此您应该在子项运行完成时看到图形更新。

**提示：**小部件在底部显示一个链接。使用此链接打开web界面，以浏览各个运行详细信息

In [66]:
best_run, fitted_model = remote_run.get_output()

Package:azureml-automl-runtime, training version:1.32.0, current version:1.18.0.post2
Package:azureml-dataprep, training version:2.18.0, current version:2.4.5
Package:azureml-dataprep-native, training version:36.0.0, current version:24.0.0
Package:azureml-dataprep-rslex, training version:1.16.1, current version:1.2.3
Package:azureml-dataset-runtime, training version:1.32.0, current version:1.18.0
Package:azureml-defaults, training version:1.32.0, current version:1.18.0
Package:azureml-interpret, training version:1.32.0, current version:1.18.0
Package:azureml-mlflow, training version:1.32.0, current version:1.31.0
Package:azureml-telemetry, training version:1.32.0, current version:1.18.0.post1
Package:azureml-train-automl-client, training version:1.32.0, current version:1.18.0
Package:azureml-train-automl-runtime, training version:1.32.0, current version:1.18.0.post1


In [65]:
model_name = best_run.properties['model_name']

script_file_name = 'inference/score.py'

best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'inference/score.py')

### Register the Fitted Model for Deployment -> 注册最佳的模型，为部署准备
如果在`register_model`调用中既没有指定`metric`也没有指定`iteration`，则会注册具有最佳主度量的迭代。

In [67]:
description = 'AutoML Model trained on bank marketing data to predict if a client will subscribe to a term deposit'
tags = None
model = remote_run.register_model(model_name = model_name, description = description, tags = tags)

print(remote_run.model_id) # This will be written to the script file later in the notebook. -> 这将在稍后的笔记本中写入脚本文件。

AutoML56941284252


### Deploy the model as a Web Service on Azure Container Instance

In [68]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import Model
from azureml.core.environment import Environment

inference_config = InferenceConfig(entry_script=script_file_name)

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                               tags = {'area': "bmData", 'type': "automl_classification"}, 
                                               description = 'sample service for Automl Classification')

aci_service_name = 'automl-sample-bankmarketing-all'
print(aci_service_name)
aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)

automl-sample-bankmarketing-all


WebserviceException: WebserviceException:
	Message: Service automl-sample-bankmarketing-all with the same name already exists, please use a different service name or delete the existing service.
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Service automl-sample-bankmarketing-all with the same name already exists, please use a different service name or delete the existing service."
    }
}

### Get Logs from a Deployed Web Service -> 从部署的Web服务获取日志

从部署的Web服务获取日志

In [None]:
#aci_service.get_logs()

## Test

现在模型已经训练好，通过训练好的模型运行测试数据以获得预测值。这将调用ACI web服务来进行预测。

请注意，传递给ACI web服务的JSON是一个数据行数组。每一行应该是一个值数组，其顺序与用于培训的值相同，或者是一个字典，其中键与用于培训的列名相同。下面的示例使用字典行。

In [69]:
# Load the bank marketing datasets. -> 加载银行营销数据集。
from numpy import array

In [70]:
X_test = test_dataset.drop_columns(columns=['y'])
y_test = test_dataset.keep_columns(columns=['y'], validate=True)
test_dataset.take(5).to_pandas_dataframe()

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,51,retired,married,basic.4y,no,no,no,cellular,jul,wed,...,1,13,1,success,-2.9,92.47,-33.6,1.08,5076.2,no
1,53,self-employed,divorced,university.degree,no,no,no,cellular,jul,mon,...,3,999,0,nonexistent,1.4,93.92,-42.7,4.96,5228.1,no
2,32,services,married,basic.9y,no,no,no,cellular,jul,tue,...,1,999,0,nonexistent,1.4,93.92,-42.7,4.96,5228.1,no
3,44,management,married,university.degree,no,yes,no,cellular,nov,mon,...,1,999,0,nonexistent,-0.1,93.2,-42.0,4.19,5195.8,no
4,36,admin.,single,university.degree,no,no,no,cellular,may,fri,...,3,999,1,failure,-1.8,92.89,-46.2,1.31,5099.1,no


In [71]:
X_test = X_test.to_pandas_dataframe()
y_test = y_test.to_pandas_dataframe()

In [72]:
import json
import requests

X_test_json = X_test.to_json(orient='records')
data = "{\"data\": " + X_test_json +"}"
headers = {'Content-Type': 'application/json'}

resp = requests.post(aci_service.scoring_uri, data, headers=headers)

y_pred = json.loads(json.loads(resp.text))['result']

In [73]:
actual = array(y_test)
actual = actual[:,0]
print(len(y_pred), " ", len(actual))

4120   4120


### Calculate metrics for the prediction -> 计算预测的指标

现在将数据可视化为混淆矩阵，将预测值与实际值进行比较。


In [75]:
%matplotlib notebook
from sklearn.metrics import confusion_matrix
import numpy as np
import itertools

cf =confusion_matrix(actual,y_pred)
plt.imshow(cf,cmap=plt.cm.Blues,interpolation='nearest')
plt.colorbar()
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
class_labels = ['no','yes']
tick_marks = np.arange(len(class_labels))
plt.xticks(tick_marks,class_labels)
plt.yticks([-0.5,0,1,1.5],['','no','yes',''])
# plotting text value inside cells
thresh = cf.max() / 2.
for i,j in itertools.product(range(cf.shape[0]),range(cf.shape[1])):
    plt.text(j,i,format(cf[i,j],'d'),horizontalalignment='center',color='white' if cf[i,j] >thresh else 'black')
plt.show()

<IPython.core.display.Javascript object>

### Delete a Web Service

删除指定的web服务。

In [None]:
aci_service.delete()

**注意** 这是一个翻译的版本，原始的notebook在这里: https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features

## Acknowledgements

This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .

_**Acknowledgements**_
This data set is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014