# Submitting an experiment

In this notebook, we will use the Estimator object to submit an experiment to processing on a Azure-ML Compute Target.

An experiment is a process that can be tracked through its generated metrics.

In this particular case, we are using an external Python script to:

- get a dataset from Azure DataStorage
- create a machine learning pipeline for classification that includes:
    - grid search cross validation
    - logistic regression
    - decision trees
    - Random Forests
    - Gradient Boosting

## Imports


In [4]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget
from azureml.widgets import RunDetails
from azureml.train.estimator import Estimator
from azureml.train.automl import AutoMLConfig


In [2]:
# load ws
ws = Workspace.from_config()

# Get the dataset
data = ws.datasets.get('poker_ds')

# Dataset head
data.take(10).to_pandas_dataframe()



Unnamed: 0,Rank_1,Rank_2,Rank_3,Rank_4,Rank_5,class,Suit_1_2,Suit_1_3,Suit_1_4,Suit_2_2,...,Suit_2_4,Suit_3_2,Suit_3_3,Suit_3_4,Suit_4_2,Suit_4_3,Suit_4_4,Suit_5_2,Suit_5_3,Suit_5_4
0,11,13,10,12,1,9,1,0,0,1,...,0,1,0,0,1,0,0,1,0,0
1,12,11,13,10,1,9,0,1,0,0,...,0,0,1,0,0,1,0,0,1,0
2,10,11,1,13,12,9,0,0,1,0,...,1,0,0,1,0,0,1,0,0,1
3,1,13,12,11,10,9,0,0,1,0,...,1,0,0,1,0,0,1,0,0,1
4,2,4,5,3,6,8,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,9,12,10,11,13,8,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,1,2,3,4,5,8,1,0,0,1,...,0,1,0,0,1,0,0,1,0,0
7,5,6,9,7,8,8,0,1,0,0,...,0,0,1,0,0,1,0,0,1,0
8,1,4,2,3,5,8,0,0,1,0,...,1,0,0,1,0,0,1,0,0,1
9,1,1,9,5,3,1,0,0,0,1,...,0,0,1,0,0,0,0,1,0,0


In [24]:
train_dataset, test_dataset = data.random_split(percentage=0.3, seed=95276)

## Compute target

In [13]:
# get compute target and start it
cpu_cluster = ComputeTarget(workspace=ws, name='pc3')
cpu_cluster.start()
cpu_cluster.wait_for_completion(show_output=True)


Starting................
Running


## Configure Automl


In [3]:
from azureml.train.automl.utilities import get_primary_metrics

get_primary_metrics('classification')

['accuracy',
 'norm_macro_recall',
 'average_precision_score_weighted',
 'precision_score_weighted',
 'AUC_weighted']

In [35]:
automl_config = AutoMLConfig(
    name='PokerHand_Classification_AutoML',
    task='classification',
    compute_target=cpu_cluster,
    label_column_name='class',
    iterations=100, # 100 models will be built
    primary_metric = 'accuracy',
    max_concurrent_iterations=2,
    featurization='auto',
    n_cross_validations=3,
    enable_early_stopping=True,
    max_cores_per_iteration=-1,
    experiment_timeout_hours=4,
    training_data=train_dataset,
    )

In [36]:
print('Submitting Auto ML experiment...')
automl_experiment = Experiment(ws, 'PokerHand_Classification_AutoML')
automl_run = automl_experiment.submit(automl_config)
RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=True)

Submitting Auto ML experiment...
Running on remote or ADB.


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…


Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+---------------------------------+--------------------------------------+
|Size of the smallest class       |Name/Label of the smallest class |Number of samples in the training data|
|1                                |9                                |7456                                  |
+---

{'runId': 'AutoML_f7d3fed1-d8b1-425a-90cf-cda6520ccb1f',
 'target': 'pc3',
 'status': 'Completed',
 'startTimeUtc': '2020-06-06T23:15:00.940877Z',
 'endTimeUtc': '2020-06-06T23:42:41.647293Z',
 'properties': {'num_iterations': '100',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '3',
  'target': 'pc3',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"80587e9e-f525-478b-b839-99750da184f8\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"poker_data/*.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"erickfis-ml-rg\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"d8610c93-6c20-40ef-8ce5-281bf8b7f1d0\\\\\\", \\\\\\"workspaceName\\\\\\": \\\\\\"pok

In [33]:
# automl_run.fail()

## Get metrics & best model

In [37]:
best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_f7d3fed1-d8b1-425a-90cf-cda6520ccb1f_32/accuracy_table
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_f7d3fed1-d8b1-425a-90cf-cda6520ccb1f_32/confusion_matrix
f1_score_micro 0.6875026506585774
log_loss 0.8026195341518662
AUC_micro 0.9627938326989574
precision_score_micro 0.6875026506585774
recall_score_weighted 0.6875026506585774
f1_score_macro 0.16484911004089578
precision_score_weighted 0.6307679450440071
balanced_accuracy 0.17220782390211029
average_precision_score_weighted 0.6811427792841375
accuracy 0.6875026506585774
weighted_accuracy 0.7492878633736503
average_precision_score_macro 0.2160722770519291
recall_score_macro 0.17220782390211029
AUC_macro 0.6315899631240934
average_precision_score_micro 0.7097650930656049
f1_score_weighted 0.656253911632371
norm_macro_recall 0.06290951219319695
precision_score_macro 0.15893473457493498
recall_score_micro 0.6875026506585774
matthews_correlation 0.420967609725239
AUC_

## Check Preprocessing steps



In [38]:
for step in fitted_model.named_steps:
    print(step)

datatransformer
stackensembleclassifier


## Register the model




In [39]:
best_run.register_model(
    model_path='outputs/model.pkl',
    model_name='model_automl',
    tags={'Training context':'Auto ML'},
    properties={
        'AUC': best_run_metrics['AUC_weighted'],
        'Accuracy': best_run_metrics['accuracy']
        }
    )

Model(workspace=Workspace.create(name='poker-ws', subscription_id='d8610c93-6c20-40ef-8ce5-281bf8b7f1d0', resource_group='erickfis-ml-rg'), name=model_automl, id=model_automl:1, version=1, tags={'Training context': 'Auto ML'}, properties={'AUC': '0.7929115031500755', 'Accuracy': '0.6875026506585774'})

## Stopping the compute target

In [40]:
for pc_name in ws.compute_targets:
    pc = ws.compute_targets[pc_name]
    pc.stop()

