# Automated ML

All relevant steps for AutoML have been implemented as Python functions in the file `functions.py`. Thereby this notebook becomes less cluttered. For imported dependencies for scikit-learn and Azure, please see the file `functions.py`.

In [1]:
import azureml.core
from functions import get_workspace, get_data, get_compute_cluster, run_automl, \
    show_and_test_local_automl_model, deploy_automl_model, AUTOML_MODEL_PATH, \
    create_automl_test_input, clean_up

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.42.0


In [2]:
ws = get_workspace()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

quick-starts-ws-199949
aml-quickstarts-199949
southcentralus
9e65f93e-bdd8-437b-b1e8-0647cd6098f7


## Dataset

As dataset the Adult dataset from the UCI machine learning repository is used. The task is to predict the income class (over 50k or below) based on an individual person's features. For a more thorough description of the dataset, please see the `README.md` file. The function `get_data()` called in the following downloads the training and test data, does some preprocessing and stores the data in a blobstore of the current workspace. For details, please see the comments in the file `functions.py`.

In [3]:
train_ds, test_ds = get_data()

Loading datasets from workspace ...


## AutoML 

Since the target variable `income` is categorical, the AutoML task is configured as a classification task with the `income` column being the label. We want to get results in a reasonable time, so the maximum running time for AutoML is specified as 30 minutes. Cross validation usually gets a better estimate of performance than using a single validation set, so 5-fold cross validation is used. As primary performance metric `accuracy` has been chosen for easier comparison with already published results on the dataset. The function `run_automl()` stores the best model found under `./outputs/best_model_automl.pkl`. For more details, please see the file `functions.py`. 

In [4]:
from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(
    experiment_timeout_minutes=30,
    task='classification',
    primary_metric='accuracy',
    training_data=train_ds,
    label_column_name='income',
    n_cross_validations=5,
    compute_target=get_compute_cluster())

run_automl(automl_config)

Found existing cluster, use it.
Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
adult-automl,AutoML_edc30558-b27c-4168-a13d-a7caec7264ff,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
adult-automl,AutoML_edc30558-b27c-4168-a13d-a7caec7264ff,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFea

## Best Model

The function `show_and_test_local_automl_model` loads the best model from the AutoML run, prints its properties and assesses performance on an independent test set. For details, see the file `functions.py`.

In [4]:
show_and_test_local_automl_model(test_ds)

datatransformer
{'enable_dnn': False,
 'enable_feature_sweeping': True,
 'feature_sweeping_config': {},
 'feature_sweeping_timeout': 86400,
 'featurization_config': None,
 'force_text_dnn': False,
 'is_cross_validation': True,
 'is_onnx_compatible': False,
 'observer': None,
 'task': 'classification',
 'working_dir': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/notebook199949/code/Users/odl_user_199949'}

prefittedsoftvotingclassifier
{'estimators': ['24', '0', '1', '21', '18', '6', '20'],
 'weights': [0.125, 0.25, 0.125, 0.125, 0.125, 0.125, 0.125]}

24 - standardscalerwrapper
{'class_name': 'StandardScaler',
 'copy': True,
 'module_name': 'sklearn.preprocessing._data',
 'with_mean': False,
 'with_std': False}

24 - xgboostclassifier
{'base_score': 0.5,
 'booster': 'gbtree',
 'colsample_bylevel': 1,
 'colsample_bynode': 1,
 'colsample_bytree': 1,
 'eta': 0.05,
 'gamma': 0,
 'gpu_id': -1,
 'importance_type': 'gain',
 'interaction_constraints': '',
 'learning_rate': 0.0500000007,
 '

Unnamed: 0,0,1
0,11746,689
1,1336,2510


## Model Deployment

First the model is registered under the name `adult-automl-model`, then it is deployed using the function `deploy_automl_model()` defined in the file `functions.py`.

In [4]:
from azureml.core.model import Model

model = Model.register(ws, model_name='adult-automl-model',
    description = 'AutoML model for the Adult dataset from the UCI machine learning repository',
    model_path=AUTOML_MODEL_PATH)

deploy_automl_model(model)

Registering model adult-automl-model
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-07-01 06:35:14+00:00 Creating Container Registry if not exists.
2022-07-01 06:35:14+00:00 Registering the environment.
2022-07-01 06:35:16+00:00 Building image..
2022-07-01 06:55:33+00:00 Generating deployment configuration.
2022-07-01 06:55:34+00:00 Submitting deployment to compute..
2022-07-01 06:55:37+00:00 Checking the status of deployment adult-automl-service..
2022-07-01 06:58:09+00:00 Checking the status of inference endpoint adult-automl-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"
/bin/bash: /azureml-envs/azureml_7becf92bd9d8786204e7278bb441885c/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7becf92bd9d8786204e7278bb441885c/lib/libtinfo.so.6: no version 

AciWebservice(workspace=Workspace.create(name='quick-starts-ws-199949', subscription_id='9e65f93e-bdd8-437b-b1e8-0647cd6098f7', resource_group='aml-quickstarts-199949'), name=adult-automl-service, image_id=None, image_digest=None, compute_type=ACI, state=Healthy, scoring_uri=http://d8469b14-9a10-41a6-8f5a-ece9b9b7a2ef.southcentralus.azurecontainer.io/score, tags={}, properties={'hasInferenceSchema': 'False', 'hasHttps': 'False', 'authEnabled': 'True'}, created_by={'userObjectId': '3336bb5d-9fd8-47b9-83ad-d1c66fdaec87', 'userPuId': '100320020BABF014', 'userIdp': None, 'userAltSecId': None, 'userIss': 'https://sts.windows.net/660b3398-b80e-49d2-bc5b-ac1dc93b5254/', 'userTenantId': '660b3398-b80e-49d2-bc5b-ac1dc93b5254', 'userName': 'ODL_User 199949', 'upn': 'odl_user_199949@udacitylabs.onmicrosoft.com'})

In the following, the first 10 rows from the Adult test set are taken, encoded as JSON strings and sent by an according HTTP request to the endpoint. Returned are predictions and printed together with the labels for comparison.

In [7]:
import json
import requests
from azureml.core import Datastore, Environment, Webservice

# Get web service (endpoint) by name
service = Webservice(workspace=ws, name='adult-automl-service')
# Get scoring URI and authorization keys
scoring_uri = service.scoring_uri
key, _ = service.get_keys()

# Construct request headers
headers = {"Content-Type": "application/json"}
headers['Authorization'] = f'Bearer {key}'

for row in range(0, 10):
    # Create test input for specified row in Adult test set
    test_input, test_label = create_automl_test_input(test_ds, row)
    # JSON encode input data
    data = json.dumps(test_input)
    # Send headers and data via HTTP post request
    response = requests.post(scoring_uri, data=data, headers=headers)
    
    # Print label and received prediction for comparison
    print('label:', test_label, ', prediction:', str(response.json()))

label: <=50K , prediction: <=50K
label: <=50K , prediction: <=50K
label: >50K , prediction: <=50K
label: >50K , prediction: >50K
label: <=50K , prediction: <=50K
label: <=50K , prediction: <=50K
label: <=50K , prediction: <=50K
label: >50K , prediction: >50K
label: <=50K , prediction: <=50K
label: <=50K , prediction: <=50K


Finally, some clean up is performed, i.e. the compute cluster, web service and registered model are deleted from the current workspace. For details, see the file `functions.py`.

In [None]:
clean_up(automl=True)