# AzureML AutoML Demo
MLRun function for using Azure AutoML, Including the following handlers:
1. `init_experiment` -     Initialize workspace and experiment in Azure ML.
2. `init_compute` -        Initialize Azure ML compute target to run experiment.
3. `register_dataset` -    Register dataset object (can be also an Iguazio FeatureVector) in Azure ML.
4. `download_model` -      Download trained model from Azure ML to local filesystem.
5. `upload_model` -        Upload pre-trained model from local filesystem to Azure ML.
6. `submit_training_job` - Submit training job to Azure AutoML and download trained model when completed.
7. `automl_train` -        Whole training flow for Azure AutoML:
                           - Initializing workspace and experiment in Azure ML
                           - Registers dataset/feature vector,
                           - submits training job
                           - downloads trained model

## 1. Setup MLRun Project

Creating MLRun project

In [1]:
import os
import json
import pandas as pd
import mlrun

> 2022-01-03 17:34:51,578 [info] Server and client versions are not the same: {'parsed_server_version': VersionInfo(major=0, minor=8, patch=0, prerelease=None, build=None), 'parsed_client_version': VersionInfo(major=0, minor=9, patch=1, prerelease=None, build=None)}


In [2]:
# Initialize the MLRun project object
project = mlrun.get_or_create_project('azureml', context="./", user_project=True)

> 2022-01-03 17:34:51,619 [info] loaded project azureml from MLRun DB


## 2. Preparing Dataset (Iris)

- Preparing training URI for the MLRun function

In [3]:
DATA_URL = "https://s3.wasabisys.com/iguazio/data/iris/iris_dataset.csv"

mlrun.get_dataitem(DATA_URL).as_df().head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),label
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## 3. Submit Azure AutoML Training Job

### Submit Azure Secrets
For more information about working with secrets see:  [MLRun docs: Working with secrets](https://docs.mlrun.org/en/latest/secrets.html)

In [4]:
from dotenv import dotenv_values
secrets = dict(dotenv_values("env"))

mlrun.get_run_db().create_project_secrets(
    project.name,
    provider=mlrun.api.schemas.SecretProviderName.kubernetes,
    secrets=secrets
)

### Import azureml_utils from marketplace



In [5]:
azureml_fn = mlrun.code_to_function(
    name="azure",
    filename="azure_automl.py",
    kind="job",
)

# azureml_fn = mlrun.import_function('hub://azureml_utils')

In [6]:
mlrun.build_function(
    azureml_fn, 
    with_mlrun=False, 
    skip_deployed=False, 
    base_image="mlrun/mlrun", 
    commands=["python -m pip install azureml-core azureml-train-automl-client"]
)

> 2022-01-03 17:34:52,312 [info] Started building image: .mlrun/func-azureml-yonatan-azure:latest
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.8.0  
[36mINFO[0m[0000] Retrieving image mlrun/mlrun:0.8.0 from registry index.docker.io 
[36mINFO[0m[0000] Built cross stage deps: map[]                
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.8.0  
[36mINFO[0m[0000] Returning cached image manifest              
[36mINFO[0m[0000] Executing 0 build triggers                   
[36mINFO[0m[0000] Unpacking rootfs as cmd RUN python -m pip install azureml-core azureml-train-automl-client requires it. 
[36mINFO[0m[0014] RUN python -m pip install azureml-core azureml-train-automl-client 
[36mINFO[0m[0014] Taking snapshot of full filesystem...        
[36mINFO[0m[0026] cmd: /bin/sh                                 
[36mINFO[0m[0026] args: [-c python -m pip install azureml-core azureml-train-automl-client] 
[36mINFO[0m[0026] Running: [/bin/sh -c pytho

BuildStatus(ready=True, outputs={'image': '.mlrun/func-azureml-yonatan-azure:latest'})

### Automl configuration & run parameters

- The `automl_settings` object is the setup for Azure AutoML. It holds the `task` type, number of  models to train - `iterations`, the desired metric - `primary metric`, the allowed types of models `allowed_models` and more.

- The `params` are the parameters for the MLRun function, such as experiment (`experiment_name`) and cpu cluster (`cpu_cluster_name`) names in AzureML, dataset properties for registration, target label for training - `label_column_name`, number of models to download `save_n_models` and more.

In [7]:
label_column_name = 'label' # target label

# Configure automl settings:
automl_settings = {
            "task": 'classification',
            "debug_log": 'automl_errors.log',
#             "experiment_exit_score" : 0.9,
            "enable_early_stopping": False,
            "allowed_models": ['LogisticRegression', 'SGD', 'SVM'],
            "iterations": 5,
            "iteration_timeout_minutes": 2,
            "max_concurrent_iterations": 2,
            "max_cores_per_iteration": -1,
            "n_cross_validations": 5,
            "primary_metric": 'accuracy',
            "featurization": 'off',
            "model_explainability": False,
            "enable_voting_ensemble": False,
            "enable_stack_ensemble": False
        }

# Setting params to azure_run function:
params = {
    "experiment_name": 'azure-automl-test',
    "cpu_cluster_name": 'azureml-cpu',
    "dataset_name": 'iris',
    "dataset_description": 'iris training data',
    "label_column_name": label_column_name,
    "create_new_version": True,
    "register_model_name": "iris-model",
    "save_n_models": 3,
    "automl_settings": automl_settings
}

### Run Azure AutoML train:

This MLRun function will perform the following:
- Initialize workspace and experiment in your AzureML
- Register the dataset/feature vector to Iguazio and to AzureML.
- Submit the training job to AzureML and print the live training results fro each model
- Generate the top trained models.

In [8]:
azureml_run = azureml_fn.run(
    handler="automl_train",
    inputs={"training_data": DATA_URL},
    params=params,
)

> 2022-01-03 17:35:49,618 [info] starting run azure-automl_train uid=4f6705ed258746d58f9603ae2b8a5314 DB=http://mlrun-api:8080
> 2022-01-03 17:35:49,869 [info] Job is running in the background, pod: azure-automl-train-44ph4
> 2022-01-03 17:35:59,136 [info] Loading AzureML Workspace
> 2022-01-03 17:36:02,114 [info] Initializing AzureML experiment azure-automl-test
> 2022-01-03 17:36:03,331 [info] Initializing AzureML compute target azureml-cpu
> 2022-01-03 17:36:03,567 [info] Found existing cluster, will use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
> 2022-01-03 17:36:03,747 [info] Connecting to AzureML experiment default datastore
> 2022-01-03 17:36:05,005 [info] Retrieving feature vector and uploading to Azure blob storage: az://azureml-blobstore-27f8977b-4946-4ca0-bdc5-5a685d2fe8d7/iris.csv
> 2022-01-03 17:36:05,312 [info] Registering dataset iris in Azure ML
> 2022-01-03 17:36:05,312 [info] OpenSSL version must be 

project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
azureml-yonatan,...2b8a5314,0,Jan 03 17:35:59,completed,azure-automl_train,v3io_user=yonatankind=jobowner=yonatanhost=azure-automl-train-44ph4,training_data,"experiment_name=azure-automl-testcpu_cluster_name=azureml-cpudataset_name=irisdataset_description=iris training datalabel_column_name=labelcreate_new_version=Trueregister_model_name=iris-modelsave_n_models=3automl_settings={'task': 'classification', 'debug_log': 'automl_errors.log', 'enable_early_stopping': False, 'allowed_models': ['LogisticRegression', 'SGD', 'SVM'], 'iterations': 5, 'iteration_timeout_minutes': 2, 'max_concurrent_iterations': 2, 'max_cores_per_iteration': -1, 'n_cross_validations': 5, 'primary_metric': 'accuracy', 'featurization': 'off', 'model_explainability': False, 'enable_voting_ensemble': False, 'enable_stack_ensemble': False}",dataset_blob_path=az://azureml-blobstore-27f8977b-4946-4ca0-bdc5-5a685d2fe8d7/iris.csvbest_iteration=1auc_macro=0.9973298059964726recall_score_macro=0.9729629629629629average_precision_score_micro=0.9962110898173272recall_score_micro=0.9733333333333334norm_macro_recall=0.9594444444444443weighted_accuracy=0.9739694654594448auc_weighted=0.9972857142857142log_loss=0.0724925102796574f1_score_macro=0.9721779225097302matthews_correlation=0.9613232982405628f1_score_micro=0.9733333333333334average_precision_score_macro=0.9952267382822939auc_micro=0.998precision_score_micro=0.9733333333333334average_precision_score_weighted=0.9952664742664743precision_score_macro=0.9754761904761905recall_score_weighted=0.9733333333333334precision_score_weighted=0.9767380952380952balanced_accuracy=0.9729629629629629f1_score_weighted=0.9730901151988108accuracy=0.9733333333333334,modelmodel_0model_1model_2





> 2022-01-03 17:48:16,307 [info] run executed, status=completed


## 4. Deploying the Model-Serving Function

### Importing `v2_model_server` function from marketplace for serving the model

Firstly we collect the model paths from our run object and getting the best model.

Then importing the serving function from marketplace and adding our best model to the serving function.

In [9]:
# Get trained models:
model_paths = [azureml_run.outputs[key] for key in azureml_run.outputs.keys() if "model" in key]
best_model_path = model_paths[0]

# Importing serving function from marketplace:
serving_fn = mlrun.import_function('hub://v2_model_server')
serving_fn.add_model('best_model', model_path=best_model_path)

<mlrun.serving.states.TaskStep at 0x7feea7f07a90>

In [10]:
best_model_path

'store://artifacts/azureml-yonatan/model_0_logisticregression:4f6705ed258746d58f9603ae2b8a5314'

In [11]:
[(key, val) for key, val in azureml_run.outputs.items()]

[('dataset_blob_path',
  'az://azureml-blobstore-27f8977b-4946-4ca0-bdc5-5a685d2fe8d7/iris.csv'),
 ('best_iteration', 1),
 ('auc_macro', 0.9973298059964726),
 ('recall_score_macro', 0.9729629629629629),
 ('average_precision_score_micro', 0.9962110898173272),
 ('recall_score_micro', 0.9733333333333334),
 ('norm_macro_recall', 0.9594444444444443),
 ('weighted_accuracy', 0.9739694654594448),
 ('auc_weighted', 0.9972857142857142),
 ('log_loss', 0.0724925102796574),
 ('f1_score_macro', 0.9721779225097302),
 ('matthews_correlation', 0.9613232982405628),
 ('f1_score_micro', 0.9733333333333334),
 ('average_precision_score_macro', 0.9952267382822939),
 ('auc_micro', 0.998),
 ('precision_score_micro', 0.9733333333333334),
 ('average_precision_score_weighted', 0.9952664742664743),
 ('precision_score_macro', 0.9754761904761905),
 ('recall_score_weighted', 0.9733333333333334),
 ('precision_score_weighted', 0.9767380952380952),
 ('balanced_accuracy', 0.9729629629629629),
 ('f1_score_weighted', 0.973

### Building and Deploying the Serving Function

In [12]:
function_address = serving_fn.deploy()

> 2022-01-03 17:48:16,624 [info] Starting remote function deploy
2022-01-03 17:48:16  (info) Deploying function
2022-01-03 17:48:16  (info) Building
2022-01-03 17:48:17  (info) Staging files and preparing base images
2022-01-03 17:48:17  (info) Building processor image
2022-01-03 17:48:18  (info) Build complete
2022-01-03 17:48:24  (info) Function deploy complete
> 2022-01-03 17:48:25,070 [info] successfully deployed function: {'internal_invocation_urls': ['nuclio-azureml-yonatan-v2-model-server.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['azureml-yonatan-v2-model-server-azureml-yonatan.default-tenant.app.yh41.iguazio-cd1.com/']}


## 5. Using the Live Model-Serving Function

In [13]:
print (f'The address for the function is {function_address} \n')

!curl $function_address

The address for the function is http://azureml-yonatan-v2-model-server-azureml-yonatan.default-tenant.app.yh41.iguazio-cd1.com/ 

{"name": "ModelRouter", "version": "v2", "extensions": []}

After deploying the serving function with the required model we can make prediction:

In [17]:
serving_fn.invoke(f'/v2/models/best_model/infer', '0,1,2')

> 2022-01-03 17:50:08,449 [info] invoking function: {'method': 'POST', 'path': 'http://nuclio-azureml-yonatan-v2-model-server.default-tenant.svc.cluster.local:8080/v2/models/best_model/infer'}


RuntimeError: bad function response 400: Unrecognized request format: Extra data: line 1 column 2 (char 1)

## 6. Clean up

For cleaning up AzureML resources see:
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-auto-train-models#clean-up-resources