Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/NotebookVM/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.png)

# Udacity Capstone Project: Azure AutoML
This notebook demonstrates the use of AutoML in Azure Machine Learning Pipeline for the Udacity capstone project.

## Introduction
In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline. 

If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.

In this notebook you will learn how to:
1. Create an `Experiment` in an existing `Workspace`.
2. Create or Attach existing AmlCompute to a workspace.
3. Define data loading in a `TabularDataset`.
4. Configure AutoML using `AutoMLConfig`.
5. Use AutoMLStep
6. Train the model using AmlCompute
7. Explore the results.
8. Test the best fitted model.

## Azure Machine Learning and Pipeline SDK-specific imports

In [2]:
import os
import sys
import json
import azureml
import logging
import pickle
import requests
import pandas as pd
import numpy as np
from io import BytesIO
from sklearn.externals import joblib
from sklearn.metrics import confusion_matrix
from pprint import pprint
from matplotlib import pyplot as plt
from train import *


import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.automl.core.featurization import FeaturizationConfig
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.core import Workspace, Dataset
from azureml.data.datapath import DataPath

from azureml.widgets import RunDetails
from azureml.train.automl import constants
from azureml.pipeline.steps import AutoMLStep
from azureml.pipeline.core import PipelineData, TrainingOutput
from azureml.pipeline.core import Pipeline

# Model deployment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import Model

import warnings
warnings.filterwarnings("ignore")

pd.set_option('display.max_rows', None)

# Check system and core SDK version number
print("System version: {}".format(sys.version))
print("SDK version:", azureml.core.VERSION)

System version: 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 12:58:59) 
[GCC Clang 10.0.0 ]
SDK version: 1.23.0


## Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

In [3]:
interactive_auth = InteractiveLoginAuthentication(tenant_id="660b3398-b80e-49d2-bc5b-ac1dc93b5254")
ws = Workspace.get(subscription_id="976ee174-3882-4721-b90a-b5fef6b72f24",
                   resource_group="aml-quickstarts-143096",
                   name="quick-starts-ws-143096",
                   auth=interactive_auth)

experiment_name = 'automl_project'
experiment=Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
automl_project,quick-starts-ws-143096,Link to Azure Machine Learning studio,Link to Documentation


In [4]:
dic_data = {'Workspace name': ws.name,
            'Azure region': ws.location,
            'Subscription id': ws.subscription_id,
            'Resource group': ws.resource_group,
            'Experiment Name': experiment.name}

az_data = pd.DataFrame.from_dict(data = dic_data, orient='index')
az_data.rename(columns={0:''}, inplace = True)
az_data

Unnamed: 0,Unnamed: 1
Workspace name,quick-starts-ws-143096
Azure region,southcentralus
Subscription id,976ee174-3882-4721-b90a-b5fef6b72f24
Resource group,aml-quickstarts-143096
Experiment Name,automl_project


### Create or Attach an AmlCompute cluster
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you get the default `AmlCompute` as your training compute resource.

**Udacity Note** There is no need to create a new compute target, it can re-use the previous cluster

In [5]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# Define CPU cluster name
compute_target_name = "cpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=compute_target_name)
    print("Found existing cpu-cluster. Use it.")
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS12_V2",
                                                           min_nodes=1, 
                                                           max_nodes=4) 
    compute_target = ComputeTarget.create(ws, compute_target_name, compute_config)

compute_target.wait_for_completion(show_output=True)

print(compute_target.get_status().serialize())

Found existing cpu-cluster. Use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 1, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2021-04-19T03:06:20.048000+00:00', 'errors': None, 'creationTime': '2021-04-19T00:47:03.009049+00:00', 'modifiedTime': '2021-04-19T00:47:18.524985+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_DS12_V2'}


In [6]:
# Check details about compute_targets (i.e. compute_target)
compute_targets = ws.compute_targets
for name, ct in compute_targets.items():
    print(name, ct.type, ct.provisioning_state)

cpu-cluster AmlCompute Succeeded
new-compute ComputeInstance Succeeded


## Dataset

**Udacity note:** Make sure the `key` is the same name as the dataset that is uploaded, and that the description matches. If it is hard to find or unknown, loop over the `ws.datasets.keys()` and `print()` them.
If it *isn't* found because it was deleted, it can be recreated with the link that has the CSV 

In [7]:
DATA_LOC = "https://raw.githubusercontent.com/franckess/AzureML_Capstone/main/data/OnlineNewsPopularity.csv"
BORUTA_LOC = "https://github.com/franckess/AzureML_Capstone/releases/download/1.1/boruta_model_final.pkl"

# Loading data
df = pd.read_csv(DATA_LOC)

# Removing space character in the feature names
df.columns=df.columns.str.replace(' ','')

# Drop URL column
df = df.drop(['url'], axis=1)

# Perform Data pre-processing
df = corr_drop_cols(df)
df = create_label(df)
df = scaling_num(df)
df = feature_selection(df, BORUTA_LOC)
    
# Split train data into train & test
X_train, X_test, y_train, y_test = split_train_test(df)

m, k = X_train.shape
print("{} x {} table of data:".format(m, k))
X_train.info()

31715 x 47 table of data:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 31715 entries, 38512 to 35050
Data columns (total 47 columns):
n_tokens_title                   31715 non-null float64
n_tokens_content                 31715 non-null float64
n_unique_tokens                  31715 non-null float64
num_hrefs                        31715 non-null float64
num_self_hrefs                   31715 non-null float64
num_imgs                         31715 non-null float64
num_videos                       31715 non-null float64
average_token_length             31715 non-null float64
num_keywords                     31715 non-null float64
data_channel_is_entertainment    31715 non-null int64
data_channel_is_bus              31715 non-null int64
data_channel_is_socmed           31715 non-null int64
data_channel_is_tech             31715 non-null int64
data_channel_is_world            31715 non-null int64
kw_min_min                       31715 non-null float64
kw_max_min                     

### Upload data to Azure Datastore

In [8]:
# merge the output x and y dataframes into a single table for AutoML experiment
train_data = pd.concat([X_train, y_train], axis=1)
train_data.to_csv('./data/train_data.csv', index = None, header=True)

datastore = ws.get_default_datastore()
datastore.upload_files(files = ['./data/train_data.csv'],  target_path='data/', overwrite=True, show_progress=True)

datastore_path =[
    DataPath(datastore, 'data/train_data.csv')
]

# Upload the training data as a tabular dataset for access during training on remote compute
train_data = Dataset.Tabular.from_delimited_files(path=datastore_path)

Uploading an estimated of 1 files
Uploading ./data/train_data.csv
Uploaded ./data/train_data.csv, 1 files out of an estimated total of 1
Uploaded 1 files


In [9]:
print(
    "Datastore type: " + datastore.datastore_type,
    "Account name: " + datastore.account_name,
    "Container name: " + datastore.container_name,
    sep="\n",
)

Datastore type: AzureBlob
Account name: mlstrg143096
Container name: azureml-blobstore-43a6a916-893a-4a7a-92a1-649d05a98e8e


In [10]:
train_data

{
  "source": [
    "('workspaceblobstore', 'data/train_data.csv')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes"
  ]
}

## Train
This creates a general AutoML settings object.
**Udacity notes:** These inputs must match what was used when training in the portal. `label_column_name` has to be `y` for example.

In [11]:
automl_settings = {
    "experiment_timeout_minutes": 60, # define the duration of the experiment (in minutes).
    "max_concurrent_iterations": 9,
    "primary_metric" : 'accuracy'
}

project_folder = './capstone-project'

automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=train_data,
                             label_column_name="label",   
                             path = project_folder,
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "online_news_automl_errors.log",
                             n_cross_validations=5,
                             max_cores_per_iteration=-1,
                             verbosity=logging.INFO,
                             **automl_settings)

In [12]:
# Submit your automl run
automl_exp = Experiment(workspace=ws, name="Udacity_capstone_AutoML")  
automl_run = automl_exp.submit(automl_config, show_output = True)
RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=True)

Running on remote.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster
Parent Run ID: AutoML_ba44cd3a-cc64-4ad3-b51c-7e38c30a1073

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: 

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more abo

{'runId': 'AutoML_ba44cd3a-cc64-4ad3-b51c-7e38c30a1073',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-04-19T03:36:32.807368Z',
 'endTimeUtc': '2021-04-19T03:57:25.506574Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"83b970e0-a086-40ca-abc6-82f157f0c214\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"data/train_data.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-143096\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"976ee174-3882-4721-b90a-b5fef6b72f24\\\\\\", \\\\\\"work

## Examine Results

### Retrieve the Best Model

In [13]:
best_run, best_model = automl_run.get_output()
print(best_model.steps)

Package:azureml-automl-runtime, training version:1.25.0, current version:1.23.0
Package:azureml-core, training version:1.25.0, current version:1.23.0
Package:azureml-dataprep, training version:2.11.2, current version:2.10.1
Package:azureml-dataprep-rslex, training version:1.9.1, current version:1.8.1
Package:azureml-dataset-runtime, training version:1.25.0, current version:1.23.0
Package:azureml-defaults, training version:1.25.0, current version:1.23.0
Package:azureml-interpret, training version:1.25.0, current version:1.23.0
Package:azureml-pipeline-core, training version:1.25.0, current version:1.23.0
Package:azureml-telemetry, training version:1.25.0, current version:1.23.0
Package:azureml-train-automl-client, training version:1.25.0, current version:1.23.0.post1
Package:azureml-train-automl-runtime, training version:1.25.0, current version:1.23.0.post1


[('datatransformer', DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                feature_sweeping_config=None, feature_sweeping_timeout=None,
                featurization_config=None, force_text_dnn=None,
                is_cross_validation=None, is_onnx_compatible=None, logger=None,
                observer=None, task=None, working_dir=None)), ('prefittedsoftvotingclassifier', PreFittedSoftVotingClassifier(classification_labels=None,
                              estimators=[('0',
                                           Pipeline(memory=None,
                                                    steps=[('maxabsscaler',
                                                            MaxAbsScaler(copy=True)),
                                                           ('lightgbmclassifier',
                                                            LightGBMClassifier(boosting_type='gbdt',
                                                                               clas

In [14]:
# Best model: name, id, type
print(best_run)

Run(Experiment: Udacity_capstone_AutoML,
Id: AutoML_ba44cd3a-cc64-4ad3-b51c-7e38c30a1073_46,
Type: azureml.scriptrun,
Status: Completed)


In [15]:
get_best_autoML_metrics = best_run.get_metrics()
for run_metric in get_best_autoML_metrics:
    metric = get_best_autoML_metrics[run_metric]
    print(run_metric,metric)

average_precision_score_weighted 0.7276774801596801
balanced_accuracy 0.671542820082786
f1_score_micro 0.675516317200063
precision_score_micro 0.675516317200063
average_precision_score_macro 0.7256116346658349
precision_score_weighted 0.6749067650945209
AUC_macro 0.7345790309734715
precision_score_macro 0.6742397954052416
recall_score_weighted 0.675516317200063
matthews_correlation 0.34577146926033936
accuracy 0.675516317200063
recall_score_micro 0.675516317200063
average_precision_score_micro 0.72973044547779
f1_score_weighted 0.6742565657010988
norm_macro_recall 0.3430856401655721
recall_score_macro 0.671542820082786
AUC_micro 0.7365544397924235
AUC_weighted 0.7345790302629889
log_loss 0.6072316194712652
f1_score_macro 0.6719261246437711
weighted_accuracy 0.6794509918828663
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_ba44cd3a-cc64-4ad3-b51c-7e38c30a1073_46/confusion_matrix
accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_ba44cd3a-cc64-4ad3-b51c-7e38c30a1073_4

In [41]:
best_run.get_file_names()

['accuracy_table',
 'automl_driver.py',
 'azureml-logs/55_azureml-execution-tvmps_854d0b1b98de6672c56ae45169b0e8c1c9cac75d7d16df2e76a3eac47d7e0e5b_d.txt',
 'azureml-logs/65_job_prep-tvmps_854d0b1b98de6672c56ae45169b0e8c1c9cac75d7d16df2e76a3eac47d7e0e5b_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_854d0b1b98de6672c56ae45169b0e8c1c9cac75d7d16df2e76a3eac47d7e0e5b_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'confusion_matrix',
 'explanation/3bc09132/classes.interpret.json',
 'explanation/3bc09132/eval_data_viz.interpret.json',
 'explanation/3bc09132/expected_values.interpret.json',
 'explanation/3bc09132/features.interpret.json',
 'explanation/3bc09132/global_names/0.interpret.json',
 'explanation/3bc09132/global_rank/0.interpret.json',
 'explanation/3bc09132/global_values/0.interpret.json',
 'explanation/3bc09132/local_importance_values.interpret.json',
 'explanation/3bc09132/per_class_names/0.interpret.json',
 'explanati

In [16]:
# Save the best model
automl_model_name = best_run.properties['model_name']
joblib.dump(best_model, filename="output/automl_model.pkl")
print("Model saved successfully!")

Model saved successfully!


In [17]:
# Register best model
AutoML_model = best_run.register_model(model_name = 'best_autoML_model', model_path =  'outputs/model.pkl')
AutoML_model

Model(workspace=Workspace.create(name='quick-starts-ws-143096', subscription_id='976ee174-3882-4721-b90a-b5fef6b72f24', resource_group='aml-quickstarts-143096'), name=best_autoML_model, id=best_autoML_model:1, version=1, tags={}, properties={})

In [44]:
def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0]+ ' - ')
        elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):
            print("\nMeta Learner")
            pprint(step[1]._meta_learner)
            print()
            for estimator in step[1]._base_learners:
                print_model(estimator[1], estimator[0]+ ' - ')
        else:
            pprint(step[1].get_params())
            print()
            
print_model(best_model)

datatransformer
{'enable_dnn': None,
 'enable_feature_sweeping': None,
 'feature_sweeping_config': None,
 'feature_sweeping_timeout': None,
 'featurization_config': None,
 'force_text_dnn': None,
 'is_cross_validation': None,
 'is_onnx_compatible': None,
 'logger': None,
 'observer': None,
 'task': None,
 'working_dir': None}

prefittedsoftvotingclassifier
{'estimators': ['7', '0', '13', '14', '11', '30', '8', '5', '6', '19', '20'],
 'weights': [0.2,
             0.06666666666666667,
             0.13333333333333333,
             0.06666666666666667,
             0.06666666666666667,
             0.06666666666666667,
             0.13333333333333333,
             0.06666666666666667,
             0.06666666666666667,
             0.06666666666666667,
             0.06666666666666667]}

7 - maxabsscaler
{'copy': True}

7 - lightgbmclassifier
{'boosting_type': 'gbdt',
 'class_weight': None,
 'colsample_bytree': 0.4955555555555555,
 'importance_type': 'split',
 'learning_rate': 0.1,
 'max

In [19]:
# Download scoring file 
best_run.download_file('outputs/scoring_file_v_1_0_0.py', './automl_score.py')

In [20]:
with open('automl_score.py') as f:
    print(f.read())

# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
import json
import logging
import os
import pickle
import numpy as np
import pandas as pd
import joblib

import azureml.automl.core
from azureml.automl.core.shared import logging_utilities, log_server
from azureml.telemetry import INSTRUMENTATION_KEY

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType


input_sample = pd.DataFrame({"n_tokens_title": pd.Series([0.0], dtype="float64"), "n_tokens_content": pd.Series([0.0], dtype="float64"), "n_unique_tokens": pd.Series([0.0], dtype="float64"), "num_hrefs": pd.Series([0.0], dtype="float64"), "num_self_hrefs": pd.Series([0.0], dtype="float64"), "num_imgs": pd.Series([0.0]

In [21]:
# Download environment file
best_run.download_file('outputs/conda_env_v_1_0_0.yml', './AzureML_envFile.yml')

## Model Deployment

Create an inference config and deploy the model as a web service.

In [22]:
inference_config = InferenceConfig(entry_script='./automl_score.py')

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 2, 
                                               memory_gb = 2, 
                                               tags = {'Company': "Mashable", 'type': "capstone_Classifier"}, 
                                               description = 'sample service for Capstone Project AutoML Classifier for Online News popularity')

In [23]:
aci_service_name = 'automl-deployment'
print(aci_service_name)
aci_service = Model.deploy(ws, aci_service_name, [AutoML_model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)
print(aci_service.scoring_uri)
print(aci_service.swagger_uri)

automl-deployment
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.............................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://a2363964-7952-40b8-b9f7-8933218819e2.southcentralus.azurecontainer.io/score
http://a2363964-7952-40b8-b9f7-8933218819e2.southcentralus.azurecontainer.io/swagger.json


Testing our deployment web service

In [24]:
test_data = pd.concat([X_test, y_test], axis=1)
test_data = test_data[10:15]
display(test_data)

Unnamed: 0,n_tokens_title,n_tokens_content,n_unique_tokens,num_hrefs,num_self_hrefs,num_imgs,num_videos,average_token_length,num_keywords,data_channel_is_entertainment,...,avg_positive_polarity,min_positive_polarity,max_positive_polarity,avg_negative_polarity,min_negative_polarity,max_negative_polarity,title_subjectivity,title_sentiment_polarity,abs_title_sentiment_polarity,label
29000,0.52,0.13,0.0,0.04,0.01,0.01,0.01,0.64,0.33,0,...,0.4,0.05,0.8,0.65,0.0,0.93,0.83,0.75,0.5,0
4914,0.19,0.07,0.0,0.05,0.03,0.09,0.0,0.6,0.33,0,...,0.37,0.03,0.7,0.78,0.6,0.88,1.0,0.75,0.5,1
19445,0.43,0.05,0.0,0.02,0.04,0.01,0.0,0.57,0.44,0,...,0.46,0.14,0.8,0.68,0.1,0.95,0.0,0.5,0.0,1
32949,0.24,0.1,0.0,0.04,0.03,0.02,0.01,0.55,0.33,0,...,0.4,0.1,1.0,0.72,0.2,0.95,0.0,0.5,0.0,0
22685,0.29,0.04,0.0,0.02,0.0,0.01,0.0,0.58,0.44,0,...,0.35,0.03,0.8,0.8,0.69,0.9,0.45,0.57,0.14,0


In [25]:
# remove label column
label_data = test_data.pop('label')

# convert test input data to dictionary form
input_data = json.dumps({'data': test_data.to_dict(orient='records')})

# print test input data
print(input_data)

{"data": [{"n_tokens_title": 0.5238095238095238, "n_tokens_content": 0.13323105971206042, "n_unique_tokens": 0.0006325920646276748, "num_hrefs": 0.042763157894736836, "num_self_hrefs": 0.008620689655172414, "num_imgs": 0.0078125, "num_videos": 0.01098901098901099, "average_token_length": 0.6387343741908725, "num_keywords": 0.3333333333333333, "data_channel_is_entertainment": 0, "data_channel_is_bus": 0, "data_channel_is_socmed": 0, "data_channel_is_tech": 0, "data_channel_is_world": 1, "kw_min_min": 0.0, "kw_max_min": 0.002047587131367292, "kw_min_max": 0.012332503260998459, "kw_avg_max": 0.47818095576900277, "kw_min_avg": 0.5908604678089657, "kw_max_avg": 0.01182515012030831, "kw_avg_avg": 0.05734695866517057, "self_reference_min_shares": 0.001152614727854856, "self_reference_max_shares": 0.001152614727854856, "weekday_is_wednesday": 1, "weekday_is_saturday": 0, "weekday_is_sunday": 0, "is_weekend": 0, "LDA_00": 0.05393875832876898, "LDA_01": 0.3236042844264332, "LDA_02": 0.5982119195

In [26]:
output = aci_service.run(input_data)
print(output)

{"result": [0, 1, 0, 1, 1]}


In [27]:
print(aci_service.get_logs())

2021-04-16T06:12:08,821807800+00:00 - gunicorn/run 
2021-04-16T06:12:08,824846900+00:00 - rsyslog/run 
2021-04-16T06:12:08,825948800+00:00 - iot-server/run 
2021-04-16T06:12:08,832464100+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd