# Submitting an experiment

In this notebook, we will use the Estimator object to submit an experiment to processing on a Azure-ML Compute Target.

An experiment is a process that can be tracked through its generated metrics.

In this particular case, we are using an external Python script to:

- get a dataset from Azure DataStorage
- create a machine learning pipeline for classification that includes:
    - grid search cross validation
    - logistic regression
    - decision trees
    - Random Forests
    - Gradient Boosting

## Imports


In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget
from azureml.widgets import RunDetails
from azureml.train.estimator import Estimator
from azureml.train.automl import AutoMLConfig


In [2]:
# load ws
ws = Workspace.from_config()

# Get the dataset
data = ws.datasets.get('poker_ds')

## Compute target

In [3]:
# get compute target and start it
cpu_cluster = ComputeTarget(workspace=ws, name='pc3')
cpu_cluster.start()
cpu_cluster.wait_for_completion(show_output=True)


Running


## Configure Automl


In [4]:
from azureml.train.automl.utilities import get_primary_metrics

get_primary_metrics('classification')

['accuracy',
 'AUC_weighted',
 'average_precision_score_weighted',
 'norm_macro_recall',
 'precision_score_weighted']

In [5]:
automl_config = AutoMLConfig(
    name='PokerHand_Classification_AutoML',
    task='classification',
    compute_target=cpu_cluster,
    label_column_name='class',
    iterations=1000, # 1000 models will be built
    primary_metric = 'accuracy',
    max_concurrent_iterations=2,
    featurization='auto',
    n_cross_validations=5,
    enable_early_stopping=True,
    max_cores_per_iteration=-1,
    experiment_timeout_hours=4,
    training_data=data,
    )

In [6]:
print('Submitting Auto ML experiment...')
automl_experiment = Experiment(ws, 'PokerHand_Classification_AutoML')
automl_run = automl_experiment.submit(automl_config)
RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=True)

Submitting Auto ML experiment...
Running on remote or ADB.


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…


Current status: DatasetBalancing. Performing class balancing sweeping
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+---------------------------------+--------------------------------------+
|Size of the smallest class       |Name/Label of the smallest class |Number of samples in the training data|
|4                                |9       

{'runId': 'AutoML_09cb4d47-dda3-4804-b9c6-5dbbad687dbc',
 'target': 'pc3',
 'status': 'Completed',
 'startTimeUtc': '2020-06-09T04:00:27.795411Z',
 'endTimeUtc': '2020-06-09T04:06:49.374354Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'pc3',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"80587e9e-f525-478b-b839-99750da184f8\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"poker_data/*.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"erickfis-ml-rg\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"d8610c93-6c20-40ef-8ce5-281bf8b7f1d0\\\\\\", \\\\\\"workspaceName\\\\\\": \\\\\\"po

In [33]:
# automl_run.fail()

## Get metrics & best model

In [7]:
best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_09cb4d47-dda3-4804-b9c6-5dbbad687dbc_0/accuracy_table
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_09cb4d47-dda3-4804-b9c6-5dbbad687dbc_0/confusion_matrix
AUC_micro 0.9676023891417869
AUC_macro 0.7232309227645399
f1_score_micro 0.8047904453495546
f1_score_macro 0.17644670642183505
weighted_accuracy 0.8741160102640702
AUC_weighted 0.8987243934131314
average_precision_score_micro 0.7588379445487563
f1_score_weighted 0.7748766745455948
precision_score_macro 0.2177417900282846
log_loss 0.9668714681369996
precision_score_micro 0.8047904453495546
average_precision_score_macro 0.22411431548808852
average_precision_score_weighted 0.7787693352205851
norm_macro_recall 0.1024956146344643
matthews_correlation 0.6466066837220275
precision_score_weighted 0.7684853158155894
balanced_accuracy 0.18808625663485296
recall_score_weighted 0.8047904453495546
recall_score_micro 0.8047904453495546
recall_score_macro 0.18808625663485296
a

## Check Preprocessing steps



In [8]:
for step in fitted_model.named_steps:
    print(step)

datatransformer
MaxAbsScaler
LightGBMClassifier


## Register the model




In [9]:
best_run.register_model(
    model_path='outputs/model.pkl',
    model_name='model_automl',
    tags={'Training context':'Auto ML'},
    properties={
        'AUC': best_run_metrics['AUC_weighted'],
        'Accuracy': best_run_metrics['accuracy']
        }
    )

Model(workspace=Workspace.create(name='poker-ws', subscription_id='d8610c93-6c20-40ef-8ce5-281bf8b7f1d0', resource_group='erickfis-ml-rg'), name=model_automl, id=model_automl:2, version=2, tags={'Training context': 'Auto ML'}, properties={'AUC': '0.8987243934131314', 'Accuracy': '0.8047904453495546'})

## Stopping the compute target

In [10]:
for pc_name in ws.compute_targets:
    pc = ws.compute_targets[pc_name]
    pc.stop()

