In [1]:
import azureml.core
print("SDK Version:", azureml.core.VERSION)

SDK Version: 1.26.0


# ESML - accelerator: Quick 3 step DEMO
- 1) `AutoMap datalake` & init ESML project
- 2) `Train model & compare` scoring to existing active model in DEV, TEST, or PROD
- 3) `Deploy model & Test` AKS webservice




Uses `ESML accelerators` & factories, to `code faster` to abstract away `Azure ML Studio` (datasets/versioning/experiments) automated createion, and adds `NEW` concepts.
- EMSL enables `enterprise CONCEPTS` (Project/Model/Dev_Test_Prod)` - able to scale across Azure subscriptions in DEV, TEST, PROD for a model.
- ESML includes `accelerators for data refinement, with CONCEPTS`: Bronze, Silver, Gold, able to `share refined data ACROSS projects` & models
- ESML includes `accelerators for ML CONCEPTS` such as `Split to TRAIN,VALIDATE, TEST` (X_test, y_test to auto-generate charting)
![](./images/split_gold_and_train_automl_small.png)

# 1) `ESML - Autolake-mapping` generates Azure ML Datasets + Feature engineering: `Bronze->Silver->Gold`

In [2]:
######  NB! This,InteractiveLoginAuthentication, is only needed to run 1st time, then when ws_config is written, use later CELL in notebook, that just reads that file
from azureml.core import Workspace
from azureml.core.authentication import InteractiveLoginAuthentication
sys.path.append(os.path.abspath("../azure-enterprise-scale-ml/esml/common/"))  # NOQA: E402
from esml import ESMLDataset, ESMLProject

p = ESMLProject()
p.dev_test_prod="dev"
auth = InteractiveLoginAuthentication(tenant_id = p.tenant)
ws, config_name = p.authenticate_workspace_and_write_config(auth)
######  NB!

Performing interactive authentication. Please follow the instructions on the terminal.
Note, we have launched a browser for you to login. For old experience with device code, use "az login --use-device-code"
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


In [3]:
demo_mode = False
unregister_all_datasets=False

In [4]:
import sys, os
import pandas as pd 
from azureml.core import Workspace
sys.path.append(os.path.abspath("./common/"))  # NOQA: E402
from esml import ESMLDataset, ESMLProject

if(demo_mode):
    p = ESMLProject(True) # Demo settings, will search in internal TEMPLATE SETTINGS folder '../settings'
    print("DEMO mode - using internal TEMPLATE settings")
else:
    p = ESMLProject() # Will search in ROOT for your copied SETTINGS folder '../../../settings', you should copy template settings from '../settings'

ws = p.get_workspace_from_config() #2) Load DEV or TEST or PROD Azure ML Studio workspace

if(unregister_all_datasets):
    p.unregister_all_datasets(ws) # For DEMO purpose
    

In [5]:
p.describe()

Training

 - ds01_diabetes
master/1_projects/project002/03_diabetes_model_reg/train/ds01_diabetes/in/dev/2020/01/01/
master/1_projects/project002/03_diabetes_model_reg/train/ds01_diabetes/out/bronze/dev/
master/1_projects/project002/03_diabetes_model_reg/train/ds01_diabetes/out/silver/dev/

 - ds02_other
master/1_projects/project002/03_diabetes_model_reg/train/ds02_other/in/dev/2020/01/01/
master/1_projects/project002/03_diabetes_model_reg/train/ds02_other/out/bronze/dev/
master/1_projects/project002/03_diabetes_model_reg/train/ds02_other/out/silver/dev/
Training GOLD 

master/1_projects/project002/03_diabetes_model_reg/train/gold/dev/
 

ENVIRONMENT - DEV, TEST, or PROD?
ACTIVE ENVIRONMENT = dev
ACTIVE subscription = ca0a8c40-b06a-4e4e-8434-63c03a1dee34
- MSFT-WEU-EAP_PROJECT02_AI-DEV-RG
- msft-weu-DEV-eap-proj02_ai-amls
- westeurope
- MSFT-WEU-EAP_CMN_AI-DEV-RG
- msft-weu-dev-cmnai-vnet
- msft-weu-dev-cmnai-sn-aml


In [6]:
datastore = p.init(ws) # 3) Automapping from datalake to Azure ML datasets

...
....
Using GEN2 as Datastore
ds01_diabetes
ds02_other

####### Automap & Autoregister - SUCCESS!
1) Auto mapped 2 ESML Dataset with registered Azure ML Datasets (potentially all 3: IN,BRONZE, SILVER) in Datastore project002lake 

Dataset 'ds01_diabetes' status:
 - IN_Folder_has_files
 - BRONZE_Folder_has_files
 - SILVER_Folder_has_files
Dataset 'ds02_other' status:
 - IN_Folder_has_files
 - BRONZE_Folder_has_files
 - SILVER_Folder_has_files

2) Registered each Dataset with suffixes (_IN_CSV, _BRONZE, _SILVER) 
 Tip: Use ESMLProject.Datasets list or .DatasetByName(myDatasetName) to read/write
#######


In [7]:
# Feture engineering: Bronze 2 Gold - working with Azure ML Datasets with Bronze, Silver, Gold concept
esml_dataset = p.DatasetByName("ds01_diabetes") # Get dataset
df_bronze = esml_dataset.Bronze.to_pandas_dataframe()
p.save_silver(esml_dataset,df_bronze) #Bronze -> Silver

df = esml_dataset.Silver.to_pandas_dataframe() 
df_filtered = df[df.AGE > 0.015] 
gold_train = p.save_gold(df_filtered)  #Silver -> Gold

## SUMMARY - step 1
- ESML has now `Automap` and `Autoregister` Azure ML Datasets as: `IN, SILVER, BRONZE, GOLD`
- ESML has read configuration for correct environment (DEV, TEST, PROD). 
    - Both small customers, and large Enterprise customers often wants:  DEV, TEST, PROD in `diffferent Azure ML workspaces` (and different subscriptions)
- User has done feature engineering, and saved GOLD `p.save_gold`

# 2) `ESML` Train model in `5 codelines`

In [9]:
from esml import ESMLDataset, ESMLProject
from baselayer_azure_ml import AutoMLFactory,azure_metric_regression,azure_metric_classification
from azureml.train.automl import AutoMLConfig

automl_performance_config = p.get_automl_performance_config() # 1)Get config, for active environment (dev,test or prod)
aml_compute = p.get_training_aml_compute(ws) # 2)Get compute, for active environment

label = "Y"
train_6, validate_set_2, test_set_2 = p.split_gold_3(0.6,label) # 3) Auto-registerin AZURE (M03_GOLD_TRAIN | M03_GOLD_VALIDATE | M03_GOLD_TEST)          # Alt: p.Gold.random_split(percentage=0.8, seed=23)

automl_config = AutoMLConfig(task = 'regression', # 4) Override the ENV config, for model(that inhertits from enterprise DEV_TEST_PROD config baseline)
                            primary_metric = azure_metric_regression.MAE, # # Note: Regression(MAPE) are not possible in AutoML
                            compute_target = aml_compute,
                            training_data = p.GoldTrain, # is 'train_6' pandas dataframe, but as an Azure ML Dataset
                            experiment_exit_score = '0.208', # DEMO purpose
                            label_column_name = label,
                            **automl_performance_config
                        )
train_as_pipeline = False
best_run, fitted_model, experiment = None, None, None # Consistent/same return values from both AutoML ALTERNATIVES

if (train_as_pipeline):
    print("train_as_pipeline")
    best_run, fitted_model, experiment = AutoMLFactory(p).train_pipeline(automl_config) #) 5 Train model
else: 
    print("train_as_run")
    best_run, fitted_model, experiment = AutoMLFactory(p).train_as_run(automl_config)

Loading AutoML config settings from: dev
Note: OVERRIDING enterprise performance settings with project specifics. (to change, set flag in 'dev_test_prod_settings.json' -> override_enterprise_settings_with_model_specific=False)
Using a model specific cluster, per configuration in project specific settings, (the integer of 'model_number' is the base for the name)
Note: OVERRIDING enterprise performance settings with project specifics. (to change, set flag in 'dev_test_prod_settings.json' -> override_enterprise_settings_with_model_specific=False)
Found existing cluster prj02-m03-dev for project and environment, using it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
train_as_run
Experiment name: 03_diabetes_model_reg
Azure ML Studio Workspace: msft-weu-DEV-eap-proj02_ai-amls
Start training run...
Submitting remote run.
No run_configuration provided, running on prj02-m03-dev with default configuration
Running on remote compute: p

Experiment,Id,Type,Status,Details Page,Docs Page
03_diabetes_model_reg,AutoML_449305f3-031c-4cef-bd90-b3d7c452a7ad,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more about high cardinality feature handling: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

*********************************

- ESML has now fetched `configuration & train compute` for enterprise `environment (DEV,TEST or PROD)`
- ESML has `autogenerated` a AutoML-experiment, optinally as `pipline`, in correct environment.
- User has overridden some AutoML settings (`label, split percentage`, `target metric`), and use the `1-liner TRAIN` code snippet 

## 2b) ESML Scoring compare `1-codeline`: Promote model or not? If better, then `Register model`
- `IF` newly trained model in `current` environment (`DEV`, `TEST` or `PROD`) scores BETTER than existing model in `target` environment, then `new model` can be registered and promoted.

In [10]:
p.dev_test_prod

'dev'

In [11]:
from baselayer_azure_ml import AutoMLFactory
target_env = p.dev_test_prod #"dev", test, prod  = Target environment. Does Model A score better than Model B?
print("Example: If new model scores better in DEV, we can promote this to TEST")

promote, m1_name, r1_id, m2_name, r2_run_id = AutoMLFactory(p).compare_scoring_current_vs_new_model(target_env)

print("Promote model?  {}".format(promote))
print("New Model: {} in environment {}".format(m1_name, p.dev_test_prod))
print("Existing Model: {} in environment {}".format(m2_name,target_env))

if (promote and p.dev_test_prod == target_env):# Can only register a model in same workspace (test->test) - need to retrain if going from dev->test
    AutoMLFactory(p).register_active_model(target_env)


Example: If new model scores better in DEV, we can promote this to TEST
Loading AutoML config settings from: dev
targe=source environement. Compare model version in DEV/TEST/PROD with latest registered in same DEV/TEST/PROD workspace (same workspace & subscriptiom comparison)
MAPE (Mean average Percentage Error): 37.36539689710541
MAE (normalized_mean_absolute_error): 0.1641754076941266
R2 (r2_score): 0.46617256831419573
Spearman (spearman_correlation): 0.6726695514799195
TARGET is in the same Azure ML Studio workspace as SOURCE, comparing with latest registered model...
target_best_run_id AutoML_aca8ec09-9803-4d71-bf0f-87dfc5649f13
MAPE (Mean average Percentage Error): 37.36539689710541
MAE (normalized_mean_absolute_error): 0.1641754076941266
R2 (r2_score): 0.46617256831419573
Spearman (spearman_correlation): 0.6726695514799195
Current Production model normalized mean mse: 0.1641754076941266, New trained model mse: 0.1641754076941266

OBS! 'debug_always_promote_model' config-flag acti

# 3) ESML `Deploy model ONLINE` in `2 lines of code` (AKS) 
- Deploy "offline" MODEL from old `run` in environment To →  `DEV`, `TEST` or `PROD` environment
- ESML saves `API_key in Azure keyvault automatically`
- ESML auto-config solves 4 common 'errors/things': `correct compute name` and `valid replicas, valid agents, valid auto scaling`
    - Tip: You can adjust the number of replicas, and different CPU/memory configuration, or using a different compute target.

In [12]:
p.dev_test_prod,ws.name

('dev', 'msft-weu-DEV-eap-proj02_ai-amls')

In [13]:
inference_config, model, best_run = p.get_active_model_inference_config(ws) #  AutoML support 
service,api_uri, kv_aks_api_secret= p.deploy_automl_model_to_aks(model,inference_config)

#Howto: Inject your own deployment config
# own_deploy_config_to_inject = p.ComputeFactory.get_deploy_config(p.dev_test_prod,False, "02","01")  # 1) Create your config, or use config-baseline, then modify
# service,api_uri, kv_aks_api_secret= p.deploy_automl_model_to_aks(model,inference_config,own_deploy_config_to_inject) #2) pass as argument
#

Loading AutoML config settings from: dev
Loading AutoML config settings from: dev
Deploying model: AutoML449305f300 with verison: 1 to environment: dev with overwrite_endpoint=True
Note: OVERRIDING enterprise performance settings with project specifics. (to change, set flag in 'dev_test_prod_settings.json' -> override_enterprise_settings_with_model_specific=False)
Found existing AksWebservice endpoint, deleting it, since overwrite=True
Note: OVERRIDING enterprise performance settings with project specifics. (to change, set flag in 'dev_test_prod_settings.json' -> override_enterprise_settings_with_model_specific=False)
Found existing cluster, esml-dev-prj02, using it.
Note: Autoscale_enabled=False, or since aks_dev_test=True in config, autoscaling is automatically shut off, e.g. overridden in config (since not supported) for environment dev
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deploymen

## 3b) ESML Test AKS webservice, `2 lines of code`

In [14]:
X_test, y_test, tags = p.get_gold_validate_Xy() # Get the X_test data, ESML knows the SPLIT and LABEL already (due to training)
print(tags)

df = p.call_webservice(p.ws, X_test,False) # Auto-fetch key from keyvault, and calls the webservice
df.head()

M03_GOLD_VALIDATE : (37, 11)
X_test  (37, 10)
y_test  (37,)
{'split_percentage': '0.2', 'label': 'Y'}
Note: OVERRIDING enterprise performance settings with project specifics. (to change, set flag in 'dev_test_prod_settings.json' -> override_enterprise_settings_with_model_specific=False)
Note: Fetching keys automatically via workspace keyvault.
Saving scoring to lake for project folder project002 and inference_model_version: 1 ...
...
....
..

Saved DATA to score successfully in LAKE, as file 'to_score_False.parquet'
..
Saved SCORED data in LAKE, as file 'scored_False.parquet'


Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,result
0,0.05,0.05,0.12,0.08,-0.1,-0.1,-0.07,-0.0,0.04,-0.03,243.13
1,0.07,-0.04,0.07,0.04,0.02,0.0,-0.04,0.04,0.08,0.11,293.24
2,0.06,0.05,-0.03,0.01,0.02,0.02,0.03,-0.04,-0.03,-0.06,94.01
3,0.02,-0.04,0.02,-0.02,0.06,0.04,0.03,-0.0,0.04,-0.0,148.81
4,0.02,-0.04,0.11,0.06,0.01,-0.03,-0.02,0.02,0.1,0.02,253.0


# END