# Model training with Automated ML

In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
!pip install azureml-sdk
!pip install azureml-sdk[notebooks]

In [6]:
# Imports
import pandas as pd

In [2]:
# Azure ML Imports
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import Workspace, Experiment
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
import azureml.core

## Workspace

In [3]:
print("SDK version:", azureml.core.VERSION)
ws = Workspace.from_config()

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

SDK version: 1.19.0
If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.
Workspace name: quick-starts-ws-135706
Azure region: southcentralus
Subscription id: 48a74bb7-9950-4cc1-9caa-5d50f995cc55
Resource group: aml-quickstarts-135706


## Compute

Create a remote GPU compute cluster for model training

In [4]:
# Choose a name for your CPU cluster
gpu_cluster_name = "gpu-cluster"

# Verify that cluster does not exist already
try:
    gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC24',
                                                           max_nodes=10)
    gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)

gpu_cluster.wait_for_completion(show_output=True)

Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [5]:


# Create TabularDataset using TabularDatasetFactory
# Data is available at: 
# "https://www.kaggle.com/datamunge/sign-language-mnist"

found = False
key = "sign-language-mnist"
description_text = "sign Language MNIST"

if key in ws.datasets.keys(): 
    found = True
    ds = ws.datasets[key] 

if not found:    
    datastore_path = "https://github.com/emanbuc/ASL-Recognition-Deep-Learning/raw/main/datasets/sign-language-mnist/sign_mnist_train/sign_mnist_train.csv"
    ds = TabularDatasetFactory.from_delimited_files(path=datastore_path,header=True)       
    #Register Dataset in Workspace
    ds = ds.register(workspace=ws,name=key,description=description_text)


df = ds.to_pandas_dataframe()
df.describe()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
count,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,...,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0,27455.0
mean,12.318813,145.419377,148.500273,151.247714,153.546531,156.210891,158.411255,160.472154,162.339683,163.954799,...,141.104863,147.495611,153.325806,159.125332,161.969259,162.736696,162.906137,161.966454,161.137898,159.824731
std,7.287552,41.358555,39.942152,39.056286,38.595247,37.111165,36.125579,35.016392,33.661998,32.651607,...,63.751194,65.512894,64.427412,63.708507,63.738316,63.444008,63.50921,63.298721,63.610415,64.396846
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,6.0,121.0,126.0,130.0,133.0,137.0,140.0,142.0,144.0,146.0,...,92.0,96.0,103.0,112.0,120.0,125.0,128.0,128.0,128.0,125.5
50%,13.0,150.0,153.0,156.0,158.0,160.0,162.0,164.0,165.0,166.0,...,144.0,162.0,172.0,180.0,183.0,184.0,184.0,182.0,182.0,182.0
75%,19.0,174.0,176.0,178.0,179.0,181.0,182.0,183.0,184.0,185.0,...,196.0,202.0,205.0,207.0,208.0,207.0,207.0,206.0,204.0,204.0
max,24.0,255.0,255.0,255.0,255.0,255.0,255.0,255.0,255.0,255.0,...,255.0,255.0,255.0,255.0,255.0,255.0,255.0,255.0,255.0,255.0


## AutoML Configuration

**max_concurrent_iterations** : 10

AmlCompute clusters support one interation running per node. For multiple AutoML experiment parent runs executed in parallel on a single AmlCompute cluster, the sum of the max_concurrent_iterations values for all experiments should be less than or equal to the maximum number of nodes. Otherwise, runs will be queued until nodes are available.
Set to 10 as number of node in compute cluster.

**iteration_timeout_minutes** : 10

Maximum time in minutes that each iteration can run for before it terminates. 30 minutes to avoid Lab timeout.

**experiment_timeout_hours**: 1.1

Experiment must end before lab timeout. 
The ExperimentTimeout should be set more than 60 minutes with an input data of rows*cols(24709*784=19371856), and up to 10,000,000. The minimum allowed is 1.1

**enable_early_stopping**: true

Whether to enable early termination if the score is not improving in the short term. Set to True to avoid waste time. We don't need to try every possible iteration in this demo experiment.

**enable_onnx_compatible_models**: True

Whether to enable or disable enforcing the ONNX-compatible models. Must be True to anable deploy on ONNX runtime.




In [14]:
automl_settings = {
    "experiment_timeout_hours" : 1.1,
    #"experiment_exit_score": 0.999,
    "enable_early_stopping" : True,
    "iteration_timeout_minutes": 10,
    "max_concurrent_iterations": 10,
    "enable_onnx_compatible_models": True
}

automl_config = AutoMLConfig(
    debug_log='automl_errors.log',
    compute_target=gpu_cluster,
    task='classification',
    primary_metric='accuracy',
    training_data= ds,
    label_column_name='label',
    **automl_settings)

In [13]:
# Submit AutoML Experiment
experiment_name = 'ASL-DeepLearning-AutoML'
exp_automl = Experiment(workspace=ws, name=experiment_name)
automl_run = exp_automl.submit(automl_config)

Running on remote.


## Run Details

In [15]:
RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…


Current status: DatasetEvaluation. Generating features for the dataset.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed fit featurizers and featurizing the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Train-Test data split
STATUS:       DONE
DESCRIPTION:  Your input data has been split into a training dataset and a holdout test dataset for validation of the model. The test holdout dataset reflects the original distribution of your input data.
              
DETAILS:      
+---------------------------------+---------------------------------+---------------------------------+
|Dataset                          |Row counts                       |Percentage                       |
|train                            |24709                            |89.99

\\\\\\"column\\\\\\": {\\\\\\"type\\\\\\": 2, \\\\\\"details\\\\\\": {\\\\\\"selectedColumn\\\\\\": \\\\\\"pixel680\\\\\\"}}, \\\\\\"typeProperty\\\\\\": 2}, {\\\\\\"column\\\\\\": {\\\\\\"type\\\\\\": 2, \\\\\\"details\\\\\\": {\\\\\\"selectedColumn\\\\\\": \\\\\\"pixel681\\\\\\"}}, \\\\\\"typeProperty\\\\\\": 2}, {\\\\\\"column\\\\\\": {\\\\\\"type\\\\\\": 2, \\\\\\"details\\\\\\": {\\\\\\"selectedColumn\\\\\\": \\\\\\"pixel682\\\\\\"}}, \\\\\\"typeProperty\\\\\\": 2}, {\\\\\\"column\\\\\\": {\\\\\\"type\\\\\\": 2, \\\\\\"details\\\\\\": {\\\\\\"selectedColumn\\\\\\": \\\\\\"pixel683\\\\\\"}}, \\\\\\"typeProperty\\\\\\": 2}, {\\\\\\"column\\\\\\": {\\\\\\"type\\\\\\": 2, \\\\\\"details\\\\\\": {\\\\\\"selectedColumn\\\\\\": \\\\\\"pixel684\\\\\\"}}, \\\\\\"typeProperty\\\\\\": 2}, {\\\\\\"column\\\\\\": {\\\\\\"type\\\\\\": 2, \\\\\\"details\\\\\\": {\\\\\\"selectedColumn\\\\\\": \\\\\\"pixel685\\\\\\"}}, \\\\\\"typeProperty\\\\\\": 2}, {\\\\\\"column\\\\\\": {\\\\\\"type\\\\\\": 2, 

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [18]:
best_automl_run, auto_ml_fitted_model = automl_run.get_output()
print(best_automl_run)
print(auto_ml_fitted_model)

Package:azureml-core, training version:1.20.0, current version:1.19.0
Package:azureml-dataprep, training version:2.7.2, current version:2.6.3
Package:azureml-dataprep-native, training version:27.0.0, current version:26.0.0
Package:azureml-dataprep-rslex, training version:1.5.0, current version:1.4.0
Package:azureml-dataset-runtime, training version:1.20.0, current version:1.19.0.post1
Package:azureml-pipeline-core, training version:1.20.0, current version:1.19.0
Package:azureml-telemetry, training version:1.20.0, current version:1.19.0
Package:azureml-train-automl-client, training version:1.20.0, current version:1.19.0
Package:azureml-defaults, training version:1.20.0
Package:azureml-interpret, training version:1.20.0
Package:azureml-model-management-sdk, training version:1.0.1b6.post1
Package:azureml-train-automl-runtime, training version:1.20.0
Run(Experiment: ASL-DeepLearning-AutoML,
Id: AutoML_05929722-afb4-4f00-ba86-05193341869f_48,
Type: azureml.scriptrun,
Status: Completed)
None

In [19]:
vc=auto_ml_fitted_model.steps[1][1]
vc.named_estimators

AttributeError: 'NoneType' object has no attribute 'steps'

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [None]:
#Register the best model
model = best_automl_run.register_model(model_name='automl-model', model_path='outputs/model.pkl')
# To load already registered model
#model = Model(ws, name="automl-model", version=1)


In [None]:
model = Model(ws, name="automl-model", version=1)

In [None]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig


env = Environment.get(workspace, "AzureML-Minimal").clone(env_name)

for pip_package in ["scikit-learn"]:
    env.python.conda_dependencies.add_pip_package(pip_package)

inference_config = InferenceConfig(entry_script='../models/AutoML05929722a48/scoring_file_v_1_0_0.py',
                                    environment=env)

In [None]:
from azureml.core.webservice import AciWebservice, AksWebservice, LocalWebservice
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
#deployment_config = LocalWebservice.deploy_configuration(port=8890)

from azureml.core.webservice import LocalWebservice, Webservice
service = Model.deploy(ws, "asl-automl-004", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)
print(service.state)


TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service