# Automated ML

All relevant steps for AutoML have been implemented as Python functions in the file `functions.py`. Thereby this notebook becomes less cluttered. For imported dependencies for scikit-learn and Azure, please see the file `functions.py`.

In [1]:
import azureml.core
from functions import get_workspace, get_data, get_compute_cluster, run_automl, \
    show_and_test_local_automl_model, register_and_deploy_automl_model, \
    test_deployed_automl_model, clean_up

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.42.0


In [2]:
ws = get_workspace()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

quick-starts-ws-199863
aml-quickstarts-199863
southcentralus
48a74bb7-9950-4cc1-9caa-5d50f995cc55


## Dataset

As dataset the Adult dataset from the UCI machine learning repository is used. The task is to predict the income class (over 50k or below) based on an individual person's features. For a more thorough description of the dataset, please see the `README.md` file. The function `get_data()` called in the following downloads the training and test data, does some preprocessing and stores the data in a blobstore of the current workspace. For details, please see the comments in the file `functions.py`.

In [3]:
train_ds, test_ds = get_data()

Loading datasets from workspace ...


## AutoML 

Since the target variable `income` is categorical, the AutoML task is configured as a classification task with the `income` column being the label. We want to get results in a reasonable time, so the maximum running time for AutoML is specified as 30 minutes. Cross validation usually gets a better estimate of performance than using a single validation set, so 5-fold cross validation is used. As primary performance metric `accuracy` has been chosen for easier comparison with already published results on the dataset. The function `run_automl()` stores the best model found under `./outputs/best_model_automl.pkl`. For more details, please see the file `functions.py`. 

In [4]:
from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(
    experiment_timeout_minutes=30,
    task='classification',
    primary_metric='accuracy',
    training_data=train_ds,
    label_column_name='income',
    n_cross_validations=5,
    compute_target=get_compute_cluster())

run_automl(automl_config)

Found existing cluster, use it.
Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
adult-automl,AutoML_edc30558-b27c-4168-a13d-a7caec7264ff,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
adult-automl,AutoML_edc30558-b27c-4168-a13d-a7caec7264ff,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFea

## Best Model

The function `show_and_test_local_automl_model` loads the best model from the AutoML run, prints its properties and assesses performance on an independent test set. For details, see the file `functions.py`.

In [4]:
show_and_test_local_automl_model(test_ds)

datatransformer
{'enable_dnn': False,
 'enable_feature_sweeping': True,
 'feature_sweeping_config': {},
 'feature_sweeping_timeout': 86400,
 'featurization_config': None,
 'force_text_dnn': False,
 'is_cross_validation': True,
 'is_onnx_compatible': False,
 'observer': None,
 'task': 'classification',
 'working_dir': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/ndeg-prj3-inst/code/Users/odl_user_199863'}

prefittedsoftvotingclassifier
{'estimators': ['24', '0', '1', '21', '18', '6', '20'],
 'weights': [0.125, 0.25, 0.125, 0.125, 0.125, 0.125, 0.125]}

24 - standardscalerwrapper
{'class_name': 'StandardScaler',
 'copy': True,
 'module_name': 'sklearn.preprocessing._data',
 'with_mean': False,
 'with_std': False}

24 - xgboostclassifier
{'base_score': 0.5,
 'booster': 'gbtree',
 'colsample_bylevel': 1,
 'colsample_bynode': 1,
 'colsample_bytree': 1,
 'eta': 0.05,
 'gamma': 0,
 'gpu_id': -1,
 'importance_type': 'gain',
 'interaction_constraints': '',
 'learning_rate': 0.0500000007,
 '

Unnamed: 0,0,1
0,11746,689
1,1336,2510


## Model Deployment

The `register_and_deploy_automl_model()` function registers and deploys the best model found by AutoML on a compute instance. For more details, see the file `functions.py`.

In [None]:
register_and_deploy_automl_model()

The `test_deployed_automl_model()` function takes a row from the test set, encodes it as JSON string and sends an according HTTP request to test the endpoint. In the following, we are getting predictions for the first ten samples in the test set. For more details, see the file `functions.py`.

In [None]:
for row in range(0, 10):
    test_deployed_automl_model(test_ds, row)

Finally, some clean up is performed, i.e. the compute cluster, web service and registered model are deleted from the current workspace. For details, see the file `functions.py`.

In [None]:
clean_up(automl=True)