# <center>Automated Machine Learning </center>

### Check if required package is installed.

In [10]:
pip show azureml-train-automl

Name: azureml-train-automl
Version: 1.44.0
Summary: Used for automatically finding the best machine learning model and its parameters.
Home-page: https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py
Author: Microsoft Corp
Author-email: None
License: https://aka.ms/azureml-sdk-license
Location: /anaconda/envs/azureml_py38/lib/python3.8/site-packages
Requires: azureml-automl-runtime, azureml-train-automl-runtime, azureml-dataset-runtime, azureml-train-automl-client, azureml-automl-core
Required-by: azureml-automl-dnn-nlp
Note: you may need to restart the kernel to use updated packages.


### Connect to the Workspace

In [11]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.44.0 to work with avanade-airline-delays


### Get data

In [12]:
from azureml.core import Dataset

#dataset is already registered in ML Workspace
df_airlines = ws.datasets.get('airlines_processed_df_final_2')

#train/test split
df_train, df_test = df_airlines.random_split(percentage = 0.7, seed = 42)

print('Dataset is ready')

Dataset is ready


### Prepare compute

In [13]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = 'avanade-compute-cluster'

try:
    # Check for existing compute target
    training_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Cluster already exists')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        training_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        training_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)

Cluster already exists


### Configure automated machine learning

Now, let's configure automated ML experiment. Metric for model performance should be picked as one of the first thing. This is the list of possible choices when it comes to classification task.

In [14]:
import azureml.train.automl.utilities as automl_utils

for metric in automl_utils.get_primary_metrics('classification'):
    print(metric)

norm_macro_recall
accuracy
AUC_weighted
precision_score_weighted
average_precision_score_weighted


In that scenario, I will chose <b>AUC_weighted</b> for the performance metric. Let's prepare AutoMLConfig class now.

In [15]:
from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(name='automl_airlines_delays_3',
                             task='classification',
                             compute_target=training_cluster,
                             training_data = df_train,
                             validation_data = df_test,
                             label_column_name='TARGET',
                             iterations=4,
                             primary_metric = 'AUC_weighted',
                             max_concurrent_iterations=2,
                             featurization='off'
                             )

### Run an AutoML experiment


    from azureml.core.experiment import Experiment
    from azureml.widgets import RunDetails

    print('Running..')
    automl_experiment = Experiment(ws, 'automl_airlines_delays_3')
    automl_run = automl_experiment.submit(automl_config)
    RunDetails(automl_run).show()
    automl_run.wait_for_completion(show_output=True)

### AutoML output


I ran few experiments with different datasets and number of iterations. I tested dataset that wasn't standarized with MaxAbsScaler or the one where I kept ouliers. And the last one actually had the best performance. 

Here is the list of experiments I ran.

In [16]:
from azureml.core import Experiment, Run

ex = ws.experiments

for key in ex:
    print(key)

automl_airlines_delays
automl_airlines_delays_2
automl_airlines_delays_3
automl_airlines_delays_firstconfig_rerun


The best model was created in <b>automl_airlines_delays_3</b> with RunID: <b>AutoML_4d7894fc-b7b1-420c-987f-efc733671a90</b>

In [17]:
aml_exp = Experiment(ws, 'automl_airlines_delays_3')

aml_run = ws.get_run('AutoML_4d7894fc-b7b1-420c-987f-efc733671a90')

best_run, fitted_model = aml_run.get_output()
print(best_run)

print('\nBest Run Metrics:')
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name,':  ', metric)

Package:azureml-automl-runtime, training version:1.46.1, current version:1.44.0
Package:azureml-core, training version:1.46.0, current version:1.44.0
Package:azureml-dataprep, training version:4.5.7, current version:4.2.2
Package:azureml-dataprep-rslex, training version:2.11.4, current version:2.8.1
Package:azureml-dataset-runtime, training version:1.46.0, current version:1.44.0
Package:azureml-defaults, training version:1.46.0, current version:1.44.0
Package:azureml-interpret, training version:1.46.0, current version:1.44.0
Package:azureml-mlflow, training version:1.46.0, current version:1.44.0
Package:azureml-pipeline-core, training version:1.46.0, current version:1.44.0
Package:azureml-responsibleai, training version:1.46.0, current version:1.44.0
Package:azureml-telemetry, training version:1.46.0, current version:1.44.0
Package:azureml-train-automl-client, training version:1.46.0, current version:1.44.0
Package:azureml-train-automl-runtime, training version:1.46.1, current version:

Run(Experiment: automl_airlines_delays_3,
Id: AutoML_4d7894fc-b7b1-420c-987f-efc733671a90_7,
Type: azureml.scriptrun,
Status: Completed)

Best Run Metrics:
accuracy :   0.8259597752362156
f1_score_weighted :   0.7685567757525456
matthews_correlation :   0.18077760409354432
weighted_accuracy :   0.9462719620998075
recall_score_macro :   0.5376385610340318
log_loss :   0.43525938969436395
precision_score_macro :   0.7170682223495031
norm_macro_recall :   0.0752771220680637
balanced_accuracy :   0.5376385610340318
precision_score_weighted :   0.7907534681867007
average_precision_score_micro :   0.8620856713957593
f1_score_macro :   0.5282728900954706
AUC_macro :   0.6865246130797513
precision_score_micro :   0.8259597752362156
recall_score_micro :   0.8259597752362156
recall_score_weighted :   0.8259597752362156
f1_score_micro :   0.8259597752362156
average_precision_score_weighted :   0.8013364788757266
AUC_micro :   0.8757310208475149
average_precision_score_macro :   0.6271038445767481

Even though this model is not performing too well, I will register it for now and then deploy.

In [19]:
from azureml.core import Model

# Register model
best_run.register_model(model_path ='', model_name='airline_delays_classification',
                        tags={'Training context':'Auto ML',
                             'Task': 'Classification',
                             'Objective': 'Avanade Challenge'},
                        properties={'AUC': best_run_metrics['AUC_weighted'], 'Accuracy': best_run_metrics['accuracy'],
                                   'F1_score': best_run_metrics['f1_score_weighted']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

airline_delays_classification_2 version: 1
	 Training context : Auto ML
	 Task : Classification
	 Objective : Avanade Challenge
	 AUC : 0.6865246130797511
	 Accuracy : 0.8259597752362156
	 F1_score : 0.7685567757525456


airline_delays_classification version: 1
	 Training context : Auto ML
	 Task : Classification
	 Objective : Avanade Challenge
	 AUC : 0.6865246130797511
	 Accuracy : 0.8259597752362156
	 F1_score : 0.7685567757525456




Next step in the <b>Deployment Notebook</b>