Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/NotebookVM/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.png)

# Udacity Capstone Project: Azure AutoML
This notebook demonstrates the use of AutoML in Azure Machine Learning Pipeline for the Udacity capstone project.

## Introduction
In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline. 

If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.

In this notebook you will learn how to:
1. Create an `Experiment` in an existing `Workspace`.
2. Create or Attach existing AmlCompute to a workspace.
3. Define data loading in a `TabularDataset`.
4. Configure AutoML using `AutoMLConfig`.
5. Use AutoMLStep
6. Train the model using AmlCompute
7. Explore the results.
8. Test the best fitted model.

## Azure Machine Learning and Pipeline SDK-specific imports

In [2]:
import os
import sys
import json
import azureml
import logging
import pickle
import requests
import pandas as pd
import numpy as np
from io import BytesIO
from sklearn.externals import joblib
from sklearn.metrics import confusion_matrix
from pprint import pprint
from matplotlib import pyplot as plt
from train import *


import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.automl.core.featurization import FeaturizationConfig
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.core import Workspace, Dataset
from azureml.data.datapath import DataPath

from azureml.widgets import RunDetails
from azureml.train.automl import constants
from azureml.pipeline.steps import AutoMLStep
from azureml.pipeline.core import PipelineData, TrainingOutput
from azureml.pipeline.core import Pipeline

# Model deployment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import Model

import warnings
warnings.filterwarnings("ignore")

pd.set_option('display.max_rows', None)

# Check system and core SDK version number
print("System version: {}".format(sys.version))
print("SDK version:", azureml.core.VERSION)

System version: 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 12:58:59) 
[GCC Clang 10.0.0 ]
SDK version: 1.23.0


## Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

In [3]:
interactive_auth = InteractiveLoginAuthentication(tenant_id="660b3398-b80e-49d2-bc5b-ac1dc93b5254")
ws = Workspace(subscription_id="f5091c60-1c3c-430f-8d81-d802f6bf2414",
                   resource_group="aml-quickstarts-142350",
                   workspace_name="quick-starts-ws-142350",
                   auth=interactive_auth)

experiment_name = 'online_news_project'
experiment=Experiment(ws, experiment_name)
experiment

Note, we have launched a browser for you to login. For old experience with device code, use "az login --use-device-code"


Performing interactive authentication. Please follow the instructions on the terminal.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


Name,Workspace,Report Page,Docs Page
online_news_project,quick-starts-ws-142350,Link to Azure Machine Learning studio,Link to Documentation


In [4]:
dic_data = {'Workspace name': ws.name,
            'Azure region': ws.location,
            'Subscription id': ws.subscription_id,
            'Resource group': ws.resource_group,
            'Experiment Name': experiment.name}

az_data = pd.DataFrame.from_dict(data = dic_data, orient='index')
az_data.rename(columns={0:''}, inplace = True)
az_data

Unnamed: 0,Unnamed: 1
Workspace name,quick-starts-ws-142350
Azure region,southcentralus
Subscription id,f5091c60-1c3c-430f-8d81-d802f6bf2414
Resource group,aml-quickstarts-142350
Experiment Name,online_news_project


### Create or Attach an AmlCompute cluster
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you get the default `AmlCompute` as your training compute resource.

**Udacity Note** There is no need to create a new compute target, it can re-use the previous cluster

In [5]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# Define CPU cluster name
compute_target_name = "cpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=compute_target_name)
    print("Found existing cpu-cluster. Use it.")
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS12_V2",
                                                           min_nodes=1, 
                                                           max_nodes=4) 
    compute_target = ComputeTarget.create(ws, compute_target_name, compute_config)

compute_target.wait_for_completion(show_output=True)

print(compute_target.get_status().serialize())

Creating...
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded..................
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 1, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2021-04-10T23:24:11.244000+00:00', 'errors': None, 'creationTime': '2021-04-10T23:22:27.803998+00:00', 'modifiedTime': '2021-04-10T23:22:43.243621+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_DS12_V2'}


In [6]:
# Check details about compute_targets (i.e. compute_target)
compute_targets = ws.compute_targets
for name, ct in compute_targets.items():
    print(name, ct.type, ct.provisioning_state)

cpu-cluster AmlCompute Succeeded


## Dataset

**Udacity note:** Make sure the `key` is the same name as the dataset that is uploaded, and that the description matches. If it is hard to find or unknown, loop over the `ws.datasets.keys()` and `print()` them.
If it *isn't* found because it was deleted, it can be recreated with the link that has the CSV 

In [7]:
DATA_LOC = "https://raw.githubusercontent.com/franckess/AzureML_Capstone/main/data/OnlineNewsPopularity.csv"
BORUTA_LOC = "https://github.com/franckess/AzureML_Capstone/releases/download/1.1/boruta_model_final.pkl"

# Loading data
df = pd.read_csv(DATA_LOC)

# Removing space character in the feature names
df.columns=df.columns.str.replace(' ','')

# Drop URL column
df = df.drop(['url'], axis=1)

# Perform Data pre-processing
df = corr_drop_cols(df)
df = create_label(df)
df = scaling_num(df)
df = feature_selection(df, BORUTA_LOC)
    
# Split train data into train & test
X_train, X_test, y_train, y_test = split_train_test(df)

m, k = X_train.shape
print("{} x {} table of data:".format(m, k))
X_train.info()

31715 x 47 table of data:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 31715 entries, 38512 to 35050
Data columns (total 47 columns):
n_tokens_title                   31715 non-null float64
n_tokens_content                 31715 non-null float64
n_unique_tokens                  31715 non-null float64
num_hrefs                        31715 non-null float64
num_self_hrefs                   31715 non-null float64
num_imgs                         31715 non-null float64
num_videos                       31715 non-null float64
average_token_length             31715 non-null float64
num_keywords                     31715 non-null float64
data_channel_is_entertainment    31715 non-null int64
data_channel_is_bus              31715 non-null int64
data_channel_is_socmed           31715 non-null int64
data_channel_is_tech             31715 non-null int64
data_channel_is_world            31715 non-null int64
kw_min_min                       31715 non-null float64
kw_max_min                     

### Upload data to Azure Datastore

In [8]:
# merge the output x and y dataframes into a single table for AutoML experiment
train_data = pd.concat([X_train, y_train], axis=1)
train_data.to_csv('./data/train_data.csv', index = None, header=True)

datastore = ws.get_default_datastore()
datastore.upload_files(files = ['./data/train_data.csv'],  target_path='data/', overwrite=True, show_progress=True)

datastore_path =[
    DataPath(datastore, 'data/train_data.csv')
]

# Upload the training data as a tabular dataset for access during training on remote compute
train_data = Dataset.Tabular.from_delimited_files(path=datastore_path)

Uploading an estimated of 1 files
Uploading ./data/train_data.csv
Uploaded ./data/train_data.csv, 1 files out of an estimated total of 1
Uploaded 1 files


In [9]:
print(
    "Datastore type: " + datastore.datastore_type,
    "Account name: " + datastore.account_name,
    "Container name: " + datastore.container_name,
    sep="\n",
)

Datastore type: AzureBlob
Account name: mlstrg142350
Container name: azureml-blobstore-6e20d511-a881-482c-a2aa-f5d3dec6ca1a


In [10]:
train_data

{
  "source": [
    "('workspaceblobstore', 'data/train_data.csv')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes"
  ]
}

## Train
This creates a general AutoML settings object.
**Udacity notes:** These inputs must match what was used when training in the portal. `label_column_name` has to be `y` for example.

In [11]:
automl_settings = {
    "experiment_timeout_minutes": 60, # define the duration of the experiment (in minutes).
    "max_concurrent_iterations": 9,
    "primary_metric" : 'accuracy'
}

project_folder = './capstone-project'

automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=train_data,
                             label_column_name="label",   
                             path = project_folder,
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "online_news_automl_errors.log",
                             n_cross_validations=5,
                             max_cores_per_iteration=-1,
                             verbosity=logging.INFO,
                             **automl_settings)

In [12]:
# Submit your automl run
automl_exp = Experiment(workspace=ws, name="Udacity_capstone_AutoML")  
automl_run = automl_exp.submit(automl_config, show_output = True)
RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=True)

Running on remote.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster
Parent Run ID: AutoML_d183b4db-641e-489f-87cd-136da8ee1c90

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values we

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more abo

{'runId': 'AutoML_d183b4db-641e-489f-87cd-136da8ee1c90',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-04-10T23:29:19.634749Z',
 'endTimeUtc': '2021-04-10T23:49:38.320424Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"c87031cc-54df-4d26-b266-4e42a127bddd\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"data/train_data.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-142350\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"f5091c60-1c3c-430f-8d81-d802f6bf2414\\\\\\", \\\\\\"work

## Examine Results

### Retrieve the Best Model

In [13]:
best_run, best_model = automl_run.get_output()
print(best_model.steps)

Package:azureml-automl-runtime, training version:1.25.0, current version:1.23.0
Package:azureml-core, training version:1.25.0, current version:1.23.0
Package:azureml-dataprep, training version:2.11.2, current version:2.10.1
Package:azureml-dataprep-rslex, training version:1.9.1, current version:1.8.1
Package:azureml-dataset-runtime, training version:1.25.0, current version:1.23.0
Package:azureml-defaults, training version:1.25.0, current version:1.23.0
Package:azureml-interpret, training version:1.25.0, current version:1.23.0
Package:azureml-pipeline-core, training version:1.25.0, current version:1.23.0
Package:azureml-telemetry, training version:1.25.0, current version:1.23.0
Package:azureml-train-automl-client, training version:1.25.0, current version:1.23.0.post1
Package:azureml-train-automl-runtime, training version:1.25.0, current version:1.23.0.post1


[('datatransformer', DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                feature_sweeping_config=None, feature_sweeping_timeout=None,
                featurization_config=None, force_text_dnn=None,
                is_cross_validation=None, is_onnx_compatible=None, logger=None,
                observer=None, task=None, working_dir=None)), ('prefittedsoftvotingclassifier', PreFittedSoftVotingClassifier(classification_labels=None,
                              estimators=[('7',
                                           Pipeline(memory=None,
                                                    steps=[('standardscalerwrapper',
                                                            <azureml.automl.runtime.shared.model_wrappers.StandardScalerWrapper object at 0x7f8a91c72f60>),
                                                           ('lightgbmclassifier',
                                                            LightGBMClassifier(boosting_type='gbdt',
    

In [14]:
get_best_autoML_metrics = best_run.get_metrics()
for run_metric in get_best_autoML_metrics:
    metric = get_best_autoML_metrics[run_metric]
    print(run_metric,metric)

matthews_correlation 0.34805196897239615
weighted_accuracy 0.6803398340697117
average_precision_score_micro 0.7304928932620735
accuracy 0.6765883651269116
AUC_macro 0.7353407682779607
norm_macro_recall 0.34560037998080656
average_precision_score_weighted 0.728469356583559
recall_score_macro 0.6728001899904033
average_precision_score_macro 0.7264377994192253
precision_score_weighted 0.6759779362969539
AUC_weighted 0.7353407682779606
f1_score_macro 0.6731894547408708
log_loss 0.6009250740456948
AUC_micro 0.7374037388455619
recall_score_weighted 0.6765883651269116
recall_score_micro 0.6765883651269116
precision_score_macro 0.675260861272658
f1_score_weighted 0.6754505723163502
balanced_accuracy 0.6728001899904033
precision_score_micro 0.6765883651269116
f1_score_micro 0.6765883651269116
accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_d183b4db-641e-489f-87cd-136da8ee1c90_46/accuracy_table
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_d183b4db-641e-489f-87cd-136da8ee

In [15]:
best_run.get_file_names()

['accuracy_table',
 'automl_driver.py',
 'azureml-logs/55_azureml-execution-tvmps_665d3ee8c3b0743b5f28c4c7ac12549d28d1a94a41a5aed93aa191ccff5a0140_d.txt',
 'azureml-logs/65_job_prep-tvmps_665d3ee8c3b0743b5f28c4c7ac12549d28d1a94a41a5aed93aa191ccff5a0140_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_665d3ee8c3b0743b5f28c4c7ac12549d28d1a94a41a5aed93aa191ccff5a0140_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'confusion_matrix',
 'explanation/1e90e0cc/classes.interpret.json',
 'explanation/1e90e0cc/eval_data_viz.interpret.json',
 'explanation/1e90e0cc/expected_values.interpret.json',
 'explanation/1e90e0cc/features.interpret.json',
 'explanation/1e90e0cc/global_names/0.interpret.json',
 'explanation/1e90e0cc/global_rank/0.interpret.json',
 'explanation/1e90e0cc/global_values/0.interpret.json',
 'explanation/1e90e0cc/local_importance_values.interpret.json',
 'explanation/1e90e0cc/per_class_names/0.interpret.json',
 'explanati

In [18]:
# Save the best model
automl_model_name = best_run.properties['model_name']
joblib.dump(best_model, filename="output/automl_model.pkl")
print("Model saved successfully!")

Model saved successfully!


In [19]:
# Register best model
AutoML_model = best_run.register_model(model_name = 'best_autoML_model', model_path =  'outputs/model.pkl')
AutoML_model

Model(workspace=Workspace.create(name='quick-starts-ws-142350', subscription_id='f5091c60-1c3c-430f-8d81-d802f6bf2414', resource_group='aml-quickstarts-142350'), name=best_autoML_model, id=best_autoML_model:1, version=1, tags={}, properties={})

In [20]:
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
Udacity_capstone_AutoML,AutoML_d183b4db-641e-489f-87cd-136da8ee1c90_46,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [21]:
def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0]+ ' - ')
        elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):
            print("\nMeta Learner")
            pprint(step[1]._meta_learner)
            print()
            for estimator in step[1]._base_learners:
                print_model(estimator[1], estimator[0]+ ' - ')
        else:
            pprint(step[1].get_params())
            print()
            
print_model(best_model)

datatransformer
{'enable_dnn': None,
 'enable_feature_sweeping': None,
 'feature_sweeping_config': None,
 'feature_sweeping_timeout': None,
 'featurization_config': None,
 'force_text_dnn': None,
 'is_cross_validation': None,
 'is_onnx_compatible': None,
 'logger': None,
 'observer': None,
 'task': None,
 'working_dir': None}

prefittedsoftvotingclassifier
{'estimators': ['7', '0', '37', '21', '13', '11', '15', '31'],
 'weights': [0.23076923076923078,
             0.07692307692307693,
             0.07692307692307693,
             0.15384615384615385,
             0.23076923076923078,
             0.07692307692307693,
             0.07692307692307693,
             0.07692307692307693]}

7 - standardscalerwrapper
{'class_name': 'StandardScaler',
 'copy': True,
 'module_name': 'sklearn.preprocessing._data',
 'with_mean': False,
 'with_std': False}

7 - lightgbmclassifier
{'boosting_type': 'gbdt',
 'class_weight': None,
 'colsample_bytree': 0.6933333333333332,
 'importance_type': 'split',

In [None]:
# Download scoring file 
best_run.download_file('outputs/scoring_file_v_1_0_0.py', './score.py')
script_file_name = './score.py'

In [None]:
# Download environment file
best_run.download_file('outputs/conda_env_v_1_0_0.yml', './AzureML_envFile.yml')

## Model Deployment

Create an inference config and deploy the model as a web service.

In [22]:
inference_config = InferenceConfig(entry_script=script_file_name)

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 2, 
                                               memory_gb = 2, 
                                               tags = {'Company': "Mashable", 'type': "capstone_Classifier"}, 
                                               description = 'sample service for Capstone Project AutoML Classifier for Online News popularity')

In [25]:
aci_service_name = 'capstone-automl-sample'
print(aci_service_name)
aci_service = Model.deploy(ws, aci_service_name, [AutoML_model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)
print(aci_service.scoring_uri)
print(aci_service.swagger_uri)

capstone-automl-sample
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running......................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://31655174-f860-46bc-8045-124807bbbdc1.southcentralus.azurecontainer.io/score
http://31655174-f860-46bc-8045-124807bbbdc1.southcentralus.azurecontainer.io/swagger.json


Testing our deployment web service

In [30]:
test_data = pd.concat([X_test, y_test], axis=1)
test_data = test_data[10:15]
display(test_data)

Unnamed: 0,n_tokens_title,n_tokens_content,n_unique_tokens,num_hrefs,num_self_hrefs,num_imgs,num_videos,average_token_length,num_keywords,data_channel_is_entertainment,...,avg_positive_polarity,min_positive_polarity,max_positive_polarity,avg_negative_polarity,min_negative_polarity,max_negative_polarity,title_subjectivity,title_sentiment_polarity,abs_title_sentiment_polarity,label
29000,0.52,0.13,0.0,0.04,0.01,0.01,0.01,0.64,0.33,0,...,0.4,0.05,0.8,0.65,0.0,0.93,0.83,0.75,0.5,0
4914,0.19,0.07,0.0,0.05,0.03,0.09,0.0,0.6,0.33,0,...,0.37,0.03,0.7,0.78,0.6,0.88,1.0,0.75,0.5,1
19445,0.43,0.05,0.0,0.02,0.04,0.01,0.0,0.57,0.44,0,...,0.46,0.14,0.8,0.68,0.1,0.95,0.0,0.5,0.0,1
32949,0.24,0.1,0.0,0.04,0.03,0.02,0.01,0.55,0.33,0,...,0.4,0.1,1.0,0.72,0.2,0.95,0.0,0.5,0.0,0
22685,0.29,0.04,0.0,0.02,0.0,0.01,0.0,0.58,0.44,0,...,0.35,0.03,0.8,0.8,0.69,0.9,0.45,0.57,0.14,0


In [31]:
# remove label column
label_data = test_data.pop('label')

# convert test input data to dictionary form
input_data = json.dumps({'data': test_data.to_dict(orient='records')})

# print test input data
print(input_data)

{"data": [{"n_tokens_title": 0.5238095238095238, "n_tokens_content": 0.13323105971206042, "n_unique_tokens": 0.0006325920646276748, "num_hrefs": 0.042763157894736836, "num_self_hrefs": 0.008620689655172414, "num_imgs": 0.0078125, "num_videos": 0.01098901098901099, "average_token_length": 0.6387343741908725, "num_keywords": 0.3333333333333333, "data_channel_is_entertainment": 0, "data_channel_is_bus": 0, "data_channel_is_socmed": 0, "data_channel_is_tech": 0, "data_channel_is_world": 1, "kw_min_min": 0.0, "kw_max_min": 0.002047587131367292, "kw_min_max": 0.012332503260998459, "kw_avg_max": 0.47818095576900277, "kw_min_avg": 0.5908604678089657, "kw_max_avg": 0.01182515012030831, "kw_avg_avg": 0.05734695866517057, "self_reference_min_shares": 0.001152614727854856, "self_reference_max_shares": 0.001152614727854856, "weekday_is_wednesday": 1, "weekday_is_saturday": 0, "weekday_is_sunday": 0, "is_weekend": 0, "LDA_00": 0.05393875832876898, "LDA_01": 0.3236042844264332, "LDA_02": 0.5982119195

In [32]:
output = aci_service.run(input_data)
print(output)

{"result": [0, 1, 0, 1, 1]}


In [33]:
print(aci_service.get_logs())

2021-04-05T12:42:09,839673247+00:00 - rsyslog/run 
2021-04-05T12:42:09,839800452+00:00 - gunicorn/run 
2021-04-05T12:42:09,839800252+00:00 - iot-server/run 
2021-04-05T12:42:09,886409272+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_2b14f450572e78de640d54eaabed5e4d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_2b14f450572e78de640d54eaabed5e4d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_2b14f450572e78de640d54eaabed5e4d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_2b14f450572e78de640d54eaabed5e4d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_2b14f450572e78de640d54eaabed5e4d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
EdgeHubC

### Create Pipeline and AutoMLStep

Define outputs for the AutoMLStep using TrainingOutput.

In [34]:
ds = ws.get_default_datastore()
metrics_output_name = 'metrics_output'
best_model_output_name = 'best_model_output'

metrics_data = PipelineData(name='metrics_data',
                           datastore=ds,
                           pipeline_output_name=metrics_output_name,
                           training_output=TrainingOutput(type='Metrics'))
model_data = PipelineData(name='model_data',
                           datastore=ds,
                           pipeline_output_name=best_model_output_name,
                           training_output=TrainingOutput(type='Model'))

In [35]:
# Create AutoMLStep
automl_step = AutoMLStep(name='automl_module',
                         automl_config=automl_config,
                         outputs=[metrics_data, model_data],
                         allow_reuse=True)

In [36]:
pipeline = Pipeline(description="pipeline_with_automlstep",
                    workspace=ws,    
                    steps=[automl_step])

pipeline_run = experiment.submit(pipeline)

Created step automl_module [eaf425fa][b8747483-e2ef-4c0e-8c42-047678b0ef8f], (This step will run and generate new outputs)
Submitted PipelineRun 59876b24-0fdb-4158-a29f-550f1cec6a38
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/online_news_project/runs/59876b24-0fdb-4158-a29f-550f1cec6a38?wsid=/subscriptions/81cefad3-d2c9-4f77-a466-99a7f541c7bb/resourcegroups/aml-quickstarts-142004/workspaces/quick-starts-ws-142004


In [37]:
RunDetails(pipeline_run).show()
pipeline_run.wait_for_completion()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

PipelineRunId: 59876b24-0fdb-4158-a29f-550f1cec6a38
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/online_news_project/runs/59876b24-0fdb-4158-a29f-550f1cec6a38?wsid=/subscriptions/81cefad3-d2c9-4f77-a466-99a7f541c7bb/resourcegroups/aml-quickstarts-142004/workspaces/quick-starts-ws-142004
PipelineRun Status: Running


StepRunId: e91a76c5-f4c8-47df-bb1a-917f4499b45a
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/online_news_project/runs/e91a76c5-f4c8-47df-bb1a-917f4499b45a?wsid=/subscriptions/81cefad3-d2c9-4f77-a466-99a7f541c7bb/resourcegroups/aml-quickstarts-142004/workspaces/quick-starts-ws-142004
StepRun( automl_module ) Status: Running

StepRun(automl_module) Execution Summary
StepRun( automl_module ) Status: Finished



PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '59876b24-0fdb-4158-a29f-550f1cec6a38', 'status': 'Completed', 'startTimeUtc': '2021-04-05T12:44:35.408354Z', 'endTimeUtc': '2021-04-05T13:18:31.62

'Finished'

## Examine Results

### Retrieve the metrics of all child runs

Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will examine the outputs by retrieve output data and running some tests.

In [38]:
metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)
num_file_downloaded = metrics_output.download('.', show_progress=True)

Downloading azureml/e91a76c5-f4c8-47df-bb1a-917f4499b45a/metrics_data
Downloaded azureml/e91a76c5-f4c8-47df-bb1a-917f4499b45a/metrics_data, 1 files out of an estimated total of 1


In [39]:
with open(metrics_output._path_on_datastore) as f:
    metrics_output_result = f.read()
    
deserialized_metrics_output = json.loads(metrics_output_result)
df = pd.DataFrame(deserialized_metrics_output)
df

Unnamed: 0,e91a76c5-f4c8-47df-bb1a-917f4499b45a_10,e91a76c5-f4c8-47df-bb1a-917f4499b45a_14,e91a76c5-f4c8-47df-bb1a-917f4499b45a_18,e91a76c5-f4c8-47df-bb1a-917f4499b45a_12,e91a76c5-f4c8-47df-bb1a-917f4499b45a_16,e91a76c5-f4c8-47df-bb1a-917f4499b45a_13,e91a76c5-f4c8-47df-bb1a-917f4499b45a_6,e91a76c5-f4c8-47df-bb1a-917f4499b45a_3,e91a76c5-f4c8-47df-bb1a-917f4499b45a_0,e91a76c5-f4c8-47df-bb1a-917f4499b45a_8,...,e91a76c5-f4c8-47df-bb1a-917f4499b45a_34,e91a76c5-f4c8-47df-bb1a-917f4499b45a_5,e91a76c5-f4c8-47df-bb1a-917f4499b45a_9,e91a76c5-f4c8-47df-bb1a-917f4499b45a_37,e91a76c5-f4c8-47df-bb1a-917f4499b45a_47,e91a76c5-f4c8-47df-bb1a-917f4499b45a_28,e91a76c5-f4c8-47df-bb1a-917f4499b45a_35,e91a76c5-f4c8-47df-bb1a-917f4499b45a_36,e91a76c5-f4c8-47df-bb1a-917f4499b45a_39,e91a76c5-f4c8-47df-bb1a-917f4499b45a_31
recall_score_micro,[0.6583320195491092],[0.6542014819486047],[0.6503231909191235],[0.6505439066687687],[0.6502601292763677],[0.6463818382468863],[0.6704083241368437],[0.6398549582216617],[0.6705659782437333],[0.6628093961847705],...,[0.6359136055494246],[0.6505123758473907],[0.657764464764307],[0.6447422355352357],[0.6734983446318776],[0.6249093488885386],[0.5335960901781491],[0.64467917389248],[0.6486835882074727],[0.6391297493299699]
f1_score_macro,[0.6547647283783528],[0.6496540151793444],[0.6472124019157367],[0.6474199843631328],[0.6471447648277447],[0.6391155315643541],[0.6676093181537974],[0.6300033065803461],[0.6675810642841186],[0.659810288429086],...,[0.6341548645119586],[0.6454216235985741],[0.6548266361799391],[0.6369411160959247],[0.6703324082084879],[0.617233484123485],[0.3479294656582928],[0.6383640224637881],[0.6436793770085767],[0.6275131195084732]
average_precision_score_micro,[0.7125552971533352],[0.7005325482725837],[0.6921102723688115],[0.6921099013602381],[0.6921131884260724],[0.6908847664115539],[0.7246478978980603],[0.6804580852664778],[0.7239524261539938],[0.7156741999721948],...,[0.6746364822439135],[0.695960905664503],[0.7101601830591809],[0.6893091726511728],[0.7303825690676914],[0.648767520869617],[0.6203909417031054],[0.6831230141830456],[0.6937443038902359],[0.6812627761912491]
precision_score_micro,[0.6583320195491092],[0.6542014819486047],[0.6503231909191235],[0.6505439066687687],[0.6502601292763677],[0.6463818382468863],[0.6704083241368437],[0.6398549582216617],[0.6705659782437333],[0.6628093961847705],...,[0.6359136055494246],[0.6505123758473907],[0.657764464764307],[0.6447422355352357],[0.6734983446318776],[0.6249093488885386],[0.5335960901781491],[0.64467917389248],[0.6486835882074727],[0.6391297493299699]
precision_score_weighted,[0.6575872982933005],[0.6534698817776933],[0.6495471900149183],[0.6497644591525243],[0.6494833171952296],[0.6461479379759709],[0.6697765760298268],[0.6404694130692056],[0.6699276031834301],[0.6620850647784119],...,[0.6358890816900249],[0.6498128820998772],[0.6570271935771176],[0.6446567468413681],[0.6728548657688473],[0.623842995791152],[0.2847550268184723],[0.644092882598223],[0.6479484212747418],[0.6407535374840835]
f1_score_weighted,[0.6571399646959639],[0.6523506992737038],[0.6494597137205791],[0.6496717410012642],[0.6493937722436901],[0.6425720456843695],[0.6696799281966336],[0.6340756371726581],[0.669718299701219],[0.6619818957571459],...,[0.6358752315306856],[0.6482789183484394],[0.6569913369015551],[0.6405345931766953],[0.6725242817779786],[0.6209119486039931],[0.37133324903971276],[0.6415831947026416],[0.6465159085104881],[0.6319457489535198]
recall_score_weighted,[0.6583320195491092],[0.6542014819486047],[0.6503231909191235],[0.6505439066687687],[0.6502601292763677],[0.6463818382468863],[0.6704083241368437],[0.6398549582216617],[0.6705659782437333],[0.6628093961847705],...,[0.6359136055494246],[0.6505123758473907],[0.657764464764307],[0.6447422355352357],[0.6734983446318776],[0.6249093488885386],[0.5335960901781491],[0.64467917389248],[0.6486835882074727],[0.6391297493299699]
AUC_macro,[0.7181072193478162],[0.708812150283816],[0.7032473988397153],[0.7032627550898373],[0.7032518137576504],[0.6996003310093031],[0.7303001774043407],[0.6924487081435464],[0.7293441988330767],[0.7219386407731664],...,[0.68836652929166],[0.70454093174092],[0.7154511062950153],[0.6985060149919518],[0.7351986212947589],[0.6652602230241971],[0.6723169528423117],[0.6957571240325043],[0.7023423663566405],[0.6922159905199802]
recall_score_macro,[0.6544973204856218],[0.6495254922797465],[0.6469555562766557],[0.6471612888436343],[0.6468884859208197],[0.6397717535824385],[0.6672849326147958],[0.631801628032827],[0.6672425986154177],[0.6594929053716118],...,[0.6341570433499378],[0.6454401915222379],[0.6545326897715629],[0.6377981416744116],[0.6699658241634128],[0.6181771850828766],[0.5],[0.6387403635161081],[0.6437030172428984],[0.6301569149984066]
AUC_weighted,[0.7181072193478162],[0.708812150561319],[0.7032473988397153],[0.7032627550898373],[0.7032518137576504],[0.6996003310093031],[0.7303001774043407],[0.6924487081435464],[0.7293441988330768],[0.7219386407731665],...,[0.68836652929166],[0.7045409324514027],[0.7154511062950151],[0.6985060149919518],[0.7351986212947589],[0.6652602230241971],[0.6723169528423116],[0.6957571240325042],[0.702342366001399],[0.6922159905199802]


### Retrieve the Best Model

In [40]:
# Retrieve best model from Pipeline Run
best_model_output = pipeline_run.get_pipeline_output(best_model_output_name)
num_file_downloaded = best_model_output.download('.', show_progress=True)

Downloading azureml/e91a76c5-f4c8-47df-bb1a-917f4499b45a/model_data
Downloaded azureml/e91a76c5-f4c8-47df-bb1a-917f4499b45a/model_data, 1 files out of an estimated total of 1


In [42]:
with open(best_model_output._path_on_datastore, "rb" ) as f:
    best_model = pickle.load(f)
best_model

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                  min_samples_split=0.01,
                                                                                                  min_weight_fraction_leaf=0.0,
                                                                                                  n_estimators=200,
                          

In [43]:
best_model.steps

[('datatransformer',
  DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                  feature_sweeping_config=None, feature_sweeping_timeout=None,
                  featurization_config=None, force_text_dnn=None,
                  is_cross_validation=None, is_onnx_compatible=None, logger=None,
                  observer=None, task=None, working_dir=None)),
 ('prefittedsoftvotingclassifier',
  PreFittedSoftVotingClassifier(classification_labels=None,
                                estimators=[('0',
                                             Pipeline(memory=None,
                                                      steps=[('maxabsscaler',
                                                              MaxAbsScaler(copy=True)),
                                                             ('lightgbmclassifier',
                                                              LightGBMClassifier(boosting_type='gbdt',
                                                          

### Test Model

Testing our best fitted model

In [44]:
ypred = best_model.predict(X_test)
cm = confusion_matrix(y_test, ypred)

In [45]:
# Visualize the confusion matrix
pd.DataFrame(cm).style.background_gradient(cmap='Blues', low=0, high=0.9)

Unnamed: 0,0,1
0,2317,1381
1,1136,3095
