# AzureML AutoML Demo
MLRun function for using Azure AutoML, Including the following handlers:
1. `init_experiment` -     Initialize workspace and experiment in Azure ML.
2. `init_compute` -        Initialize Azure ML compute target to run experiment.
3. `register_dataset` -    Register dataset object (can be also an Iguazio FeatureVector) in Azure ML.
4. `download_model` -      Download trained model from Azure ML to local filesystem.
5. `upload_model` -        Upload pre-trained model from local filesystem to Azure ML.
6. `submit_training_job` - Submit training job to Azure AutoML and download trained model when completed.
7. `automl_train` -        Whole training flow for Azure AutoML:
                           - Initializing workspace and experiment in Azure ML
                           - Registers dataset/feature vector,
                           - submits training job
                           - downloads trained model

## 1. Setup MLRun Project

Creating MLRun project

In [1]:
import os
import json
import pandas as pd
import mlrun



In [2]:
# Initialize the MLRun project object
project = mlrun.get_or_create_project('azureml', context="./", user_project=True)

> 2022-01-20 10:38:56,262 [info] loaded project azureml from MLRun DB


## 2. Preparing Dataset (Iris)

- Preparing training URI for the MLRun function

In [3]:
DATA_URL = "https://s3.wasabisys.com/iguazio/data/iris/iris_dataset.csv"

mlrun.get_dataitem(DATA_URL).as_df().head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),label
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## 3. Submit Azure AutoML Training Job

### Submit Azure Secrets
For more information about working with secrets see:  [MLRun docs: Working with secrets](https://docs.mlrun.org/en/latest/secrets.html)

In [4]:
from dotenv import dotenv_values
secrets = dict(dotenv_values("env"))

mlrun.get_run_db().create_project_secrets(
    project.name,
    provider=mlrun.api.schemas.SecretProviderName.kubernetes,
    secrets=secrets
)

### Import `azureml_utils` from marketplace

In [5]:
azureml_fn = mlrun.import_function('function.yaml')

mlrun.build_function(
    azureml_fn, 
    with_mlrun=True, 
    skip_deployed=True, 
    base_image="python:3.7.9-slim", 
    commands=["python -m pip install pip==21.2.4", 
              "apt-get update && apt-get install -y --no-install-recommends git"]
)

> 2022-01-20 10:38:56,916 [info] Started building image: .mlrun/func-azureml-yonatan-azureml-utils:latest
[36mINFO[0m[0000] Retrieving image manifest python:3.7.9-slim  
[36mINFO[0m[0000] Retrieving image python:3.7.9-slim from registry index.docker.io 
[36mINFO[0m[0000] Built cross stage deps: map[]                
[36mINFO[0m[0000] Retrieving image manifest python:3.7.9-slim  
[36mINFO[0m[0000] Returning cached image manifest              
[36mINFO[0m[0000] Executing 0 build triggers                   
[36mINFO[0m[0000] Unpacking rootfs as cmd RUN python -m pip install azureml-core azureml-train-automl-client requires it. 
[36mINFO[0m[0002] RUN python -m pip install azureml-core azureml-train-automl-client 
[36mINFO[0m[0002] Taking snapshot of full filesystem...        
[36mINFO[0m[0008] cmd: /bin/sh                                 
[36mINFO[0m[0008] args: [-c python -m pip install azureml-core azureml-train-automl-client] 
[36mINFO[0m[0008] Running: [/bin/sh 

BuildStatus(ready=True, outputs={'image': '.mlrun/func-azureml-yonatan-azureml-utils:latest'})

### Automl configuration & run parameters

- The `automl_settings` object is the setup for Azure AutoML. It holds the `task` type, number of  models to train - `iterations`, the desired metric - `primary metric`, the allowed types of models `allowed_models` and more.

- The `params` are the parameters for the MLRun function, such as experiment (`experiment_name`) and cpu cluster (`cpu_cluster_name`) names in AzureML, dataset properties for registration, target label for training - `label_column_name`, number of models to download `save_n_models` and more.

In [6]:
label_column_name = 'label' # target label

# Configure automl settings:
automl_settings = {
            "task": 'classification',
            "debug_log": 'automl_errors.log',
#             "experiment_exit_score" : 0.9,
            "enable_early_stopping": False,
            "allowed_models": ['LogisticRegression', 'SGD', 'SVM'],
            "iterations": 5,
            "iteration_timeout_minutes": 2,
            "max_concurrent_iterations": 2,
            "max_cores_per_iteration": -1,
            "n_cross_validations": 5,
            "primary_metric": 'accuracy',
            "featurization": 'off',
            "model_explainability": False,
            "enable_voting_ensemble": False,
            "enable_stack_ensemble": False
        }

# Setting params to azure_run function:
params = {
    "experiment_name": 'azure-automl-test',
    "cpu_cluster_name": 'azureml-cpu',
    "dataset_name": 'iris',
    "dataset_description": 'iris training data',
    "label_column_name": label_column_name,
    "create_new_version": True,
    "register_model_name": "iris-model",
    "save_n_models": 3,
    "automl_settings": automl_settings
}

### Run Azure AutoML train:

This MLRun function will perform the following:
- Initialize workspace and experiment in your AzureML
- Register the dataset/feature vector to Iguazio and to AzureML.
- Submit the training job to AzureML and print the live training results fro each model
- Generate the top trained models.

In [None]:
azureml_run = azureml_fn.run(
    handler="train",
    inputs={"dataset": DATA_URL},
    params=params,
)

> 2022-01-20 10:41:50,310 [info] starting run azureml-utils-train uid=c5ad72a81754473f8f461e994d399db7 DB=http://mlrun-api:8080
> 2022-01-20 10:41:50,628 [info] Job is running in the background, pod: azureml-utils-train-vxrqg
> 2022-01-20 10:42:17,136 [info] Loading AzureML Workspace
> 2022-01-20 10:42:20,062 [info] Initializing AzureML experiment azure-automl-test
> 2022-01-20 10:42:21,314 [info] Initializing AzureML compute target azureml-cpu
> 2022-01-20 10:42:21,495 [info] Found existing cluster, will use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
> 2022-01-20 10:42:21,677 [info] Connecting to AzureML experiment default datastore
> 2022-01-20 10:42:22,010 [info] Retrieving feature vector and uploading to Azure blob storage: az://azureml-blobstore-27f8977b-4946-4ca0-bdc5-5a685d2fe8d7/iris.csv
> 2022-01-20 10:42:22,324 [info] Registering dataset iris in Azure ML
> 2022-01-20 10:42:22,324 [info] OpenSSL version must b

## 4. Clean up

For cleaning up AzureML resources see:
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-auto-train-models#clean-up-resources