# ESML - `AutoMLFactory` and `ComputeFactory`

## PROJECT + DATA CONCEPTS + ENTERPRISE Datalake Design + DEV->PROD MLOps
- `1)ESML Project`: The ONLY thing you need to remember is your `Project number` (and `BRONZE, SILVER, GOLD` concept )
   -  ...`read earlier notebook
## ENTERPRISE Deployment of Models & Governance - MLOps  at scale
- `3) DEV->TEST-PROD` (configs, compute, performance)
    - ESML has config for 3 environemnts: Easy DEPLOY model across subscriptions and Azure ML Studio workspaces 
        - Save costs & time: 
            - `DEV` has cheaper compute performance for TRAIN and INFERENCE (batch, AKS)
            - `DEV` has Quick-debug ML training (fast training...VS good scoring in TEST and PROD)
        - How? ESML `AutoMLFactory` and `ComputeFactory`
        - Where to config these?
            - settings/dev_test_prod/`dev_test_prod_settings.json`
            - settings/dev_test_prod/`train/*/automl/*`

In [1]:
import json
sys.path.append(os.path.abspath("../azure-enterprise-scale-ml/esml/common/"))  # NOQA: E402
from esml import ESMLDataset, ESMLProject

# Note: These and the other configs for TRAIN and AutoML are self-booting from ESMLProject
try:
    with open("../settings/active_dev_test_prod.json") as f2: # Enterprise: MSFT-WEU-EAP_PROJECT{}_AI-{}-RG
        active_env =  json.load(f2)
    with open("../settings/enterprise_specific/dev_test_prod_settings.json") as f:
        environment_settings = json.load(f)
        environment_settings["active_dev_test_prod"] = active_env["active_dev_test_prod"]
    with open("../settings/project_specific/model/lake_settings.json") as f:
        esml_settings = json.load(f)
    with open("../settings/project_specific/security_config.json") as f2: # Project service principles, etc
        security_config = json.load(f2)
except Exception as e:
    raise Exception("Could not open config.json or storage_config.json - could not load experimentname, or access storage") from e

os.getcwd()

'c:\\Users\\jostrm\\OneDrive - Microsoft\\0_GIT\\2_My\\github2\\azure-enterprise-scale-ml-usage\\notebook_demos'

In [2]:
p = ESMLProject(esml_settings,environment_settings,security_config) # read from config
#p = ESMLProject() #  self-booting config

# Azure ML Studio Workspace
- ESML will `Automap` and `Autoregister` Azure ML Datasets as: `IN, SILVER, BRONZE, GOLD`

In [3]:
from azureml.core import Workspace
ws = p.get_workspace_from_config()
datastore = p.init(ws)

...
....
Using GEN2 as Datastore
ds01_diabetes
ds02_other

####### Automap & Autoregister - SUCCESS!
1) Auto mapped 2 ESML Dataset with registered Azure ML Datasets (potentially all 3: IN,BRONZE, SILVER) in Datastore project002lake 

Dataset 'ds01_diabetes' status:
 - IN_Folder_has_files
 - BRONZE_Folder_has_files
 - SILVER_Folder_has_files
Dataset 'ds02_other' status:
 - IN_Folder_has_files
 - BRONZE_Folder_has_files
 - SILVER_Folder_has_files

2) Registered each Dataset with suffixes (_IN_CSV, _BRONZE, _SILVER) 
 Tip: Use ESMLProject.Datasets list or .DatasetByName(myDatasetName) to read/write
#######


# ESML `GOLD` Dataset

In [4]:
ds_01 = p.DatasetByName("ds01_diabetes")
print(ds_01.InData.name)
print(ds_01.Bronze.name)
print(ds_01.Silver.name)
#print(p.Gold.name)

M03_ds01_diabetes_IN_CSV
M03_ds01_diabetes_BRONZE
M03_ds01_diabetes_SILVER


In [5]:
df_01 = ds_01.Silver.to_pandas_dataframe() 

ds_02 = ds_01 = p.DatasetByName("ds02_other")
df_02 = ds_02.Silver.to_pandas_dataframe()
df_gold1_join = df_01.join(df_02) # left join -> NULL on df_02
print("Diabetes shape: ", df_01.shape)
print(df_gold1_join.shape)

Diabetes shape:  (442, 11)
(442, 19)


In [6]:
ds_gold_v1 = p.save_gold(df_01)

# Look at `GOLD` vLatest

In [7]:
import pandas as pd 
df = p.Gold.to_pandas_dataframe()
df.head()

Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,Y
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019908,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.06833,-0.092204,75.0
2,0.085299,0.05068,0.044451,-0.005671,-0.045599,-0.034194,-0.032356,-0.002592,0.002864,-0.02593,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022692,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031991,-0.046641,135.0


In [8]:
train, validate, test = p.split_gold_3(0.6) # Also registers the datasets in AZURE as M03_GOLD_TRAIN | M03_GOLD_VALIDATE | M03_GOLD_TEST

## 3) ESML TRAIN model -> See other notebook `esml_howto_2_train.ipynb`
- `AutoMLFactory, ComputeFactory`
- Get `Train COMPUTE` for `X` environment
- Get `Train Hyperparameters` for `X` environment (less crossvalidations in DEV etc)

## 4a) ESML Scoring compare: Promote model or not? Register
- `IF` newly trained model in `current` environment scores BETTER than existing model in `target` environment, then `new model` can be registered and promoted.
-  `ValidationSet` comparison of offline/previous `AutoML run` for `DEV` environment
- For `DEV`, `TEST` or `PROD` environment
- Future roadmap: Also include `TestSet SCORING` comparison

In [9]:
from baselayer_azure_ml import AutoMLFactory
p.dev_test_prod = "dev" # Current env, new unregistered model A to validate
target_env = "test" # Target env. Existing registered model B - Does Model A score better than Model B?

print("Example: If new model scores better in DEV, we can promote this to TEST")
print("But: If new model we trained was DEV workspace, we can register it as a new version in DEV (same workspace),or in TEST subscription/workpace directly")

promote, m1_name, r1_id, m2_name, r2_run_id = AutoMLFactory(p).compare_scoring_current_vs_new_model(target_env)

print("Promote model?  {}".format(promote))
print("New Model: {} in environment {}".format(m1_name, p.dev_test_prod))
print("Existing Model: {} in environment {}".format(m2_name,target_env))

if (promote and p.dev_test_prod == target_env):# Can only register a model in same workspace (test->test) - need to retrain if going from dev->test
    AutoMLFactory(p).register_active_model(target_env)


Example: If new model scores better in DEV, we can promote this to TEST
But: If new model we trained was DEV workspace, we can register it as a new version in DEV (same workspace),or in TEST subscription/workpace directly
Compare model version in DEV with latest registered in TEST subscription/workspace
MAPE (Mean average Percentage Error): 37.36539689710541
MAE (normalized_mean_absolute_error): 0.1641754076941266
R2 (r2_score): 0.46617256831419573
Spearman (spearman_correlation): 0.6726695514799195
target_best_run_id AutoML_15923484-6857-4dab-a9ae-dd755e0e1538
MAPE (Mean average Percentage Error): 35.06257640918979
MAE (normalized_mean_absolute_error): 0.15358817665517124
R2 (r2_score): 0.4828444894860793
Spearman (spearman_correlation): 0.7233493484272395
Current Production model normalized mean mse: 0.15358817665517124, New trained model mse: 0.1641754076941266

OBS! 'debug_always_promote_model' config-flag active - new model will probably always perform better, added +10 to all err

# START 2) TEST env - `register a model` starting "offline", not an active training run?

### Alt 1) No ESMLProject dependency

In [10]:
from azureml.core import Workspace
sys.path.append(os.path.abspath("../common/"))  # NOQA: E402
from baselayer_azure_ml import AutoMLFactory

ws = p.get_workspace_from_config()
AutoMLFactory().register_active_model_in_ws(ws,"dev")

model.version 4
Model name AutoMLa4b60322a0 is registered.


Model(workspace=Workspace.create(name='msft-weu-DEV-eap-proj02_ai-amls', subscription_id='ca0a8c40-b06a-4e4e-8434-63c03a1dee34', resource_group='MSFT-WEU-EAP_PROJECT02_AI-DEV-RG'), name=AutoMLa4b60322a0, id=AutoMLa4b60322a0:4, version=4, tags={'run_id': 'AutoML_a4b60322-a808-4aa6-b5c8-4c5da22a4802', 'model_name': 'AutoMLa4b60322a0', 'trained_in_environment': 'dev', 'trained_in_workspace': 'msft-weu-DEV-eap-proj02_ai-amls'}, properties={})

### Alt 2) ESMLProject dependency: `ENVIRONMENT Self aware` and `config aware`
 - More `Future proof`: Features such as "able to register trained model in TARGET - from TEST to PROD without retraining"

In [11]:
sys.path.append(os.path.abspath("../common/"))  # NOQA: E402
from esml import ESMLDataset, ESMLProject
from baselayer_azure_ml import AutoMLFactory
from azureml.core import Workspace

ws = p.get_workspace_from_config()

p = ESMLProject() # Makes it "environment aware (dev,test,prod)", and "configuration aware"
p.init(ws) 
p.dev_test_prod = "dev"
# ....train model....

model = AutoMLFactory(p).register_active_model(p.dev_test_prod)

...
Using GEN2 as Datastore
ds01_diabetes
ds02_other

####### Automap & Autoregister - SUCCESS!
1) Auto mapped 2 ESML Dataset with registered Azure ML Datasets (potentially all 3: IN,BRONZE, SILVER) in Datastore project002lake 

Dataset 'ds01_diabetes' status:
 - IN_Folder_has_files
 - BRONZE_Folder_has_files
 - SILVER_Folder_has_files
Dataset 'ds02_other' status:
 - IN_Folder_has_files
 - BRONZE_Folder_has_files
 - SILVER_Folder_has_files

2) Registered each Dataset with suffixes (_IN_CSV, _BRONZE, _SILVER) 
 Tip: Use ESMLProject.Datasets list or .DatasetByName(myDatasetName) to read/write
#######
model.version 5
Model name AutoMLa4b60322a0 is registered.


Model(workspace=Workspace.create(name='msft-weu-DEV-eap-proj02_ai-amls', subscription_id='ca0a8c40-b06a-4e4e-8434-63c03a1dee34', resource_group='MSFT-WEU-EAP_PROJECT02_AI-DEV-RG'), name=AutoMLa4b60322a0, id=AutoMLa4b60322a0:5, version=5, tags={'run_id': 'AutoML_a4b60322-a808-4aa6-b5c8-4c5da22a4802', 'model_name': 'AutoMLa4b60322a0', 'trained_in_environment': 'dev', 'trained_in_workspace': 'msft-weu-DEV-eap-proj02_ai-amls'}, properties={})

### ..Model compared, promoted, register - ready for deployment

## 4b) ESML Loadtesting performance
- Using `GOLD_TEST` TestSet for AutoML to see which algorithm that is fastest, smallest size footprint
- For `DEV`, `TEST` or `PROD` environment

In [12]:
label = "Y"
train, validate, test = p.split_gold_3() # Save as M03_GOLD_TRAIN | M03_GOLD_VALIDATE | M03_GOLD_TEST  # Alt: train_data, test_data = p.Gold.random_split(percentage=0.8, seed=223) 
test.head()

Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,Y
207,0.01,-0.04,0.05,0.03,0.01,-0.01,0.03,-0.04,0.05,0.04,202.0
212,0.07,-0.04,0.0,0.04,0.05,0.03,0.07,-0.04,-0.0,0.02,73.0
295,-0.05,0.05,0.04,-0.04,-0.01,-0.01,0.01,-0.04,0.02,0.0,85.0
403,-0.02,-0.04,0.1,-0.01,-0.01,-0.02,-0.02,-0.0,0.06,0.04,275.0
251,-0.05,0.05,0.1,0.09,0.06,0.05,-0.06,0.11,0.08,0.04,243.0


## 5a) ESML Deploy ONLINE, to AKS -> See other notebook
- Deploy "offline" from old `AutoML run` for `DEV` environment
- To →  `DEV`, `TEST` or `PROD` environment

GOTO Notebook [`esml_howto_3_compare_and_deploy`](./esml_howto_3_compare_and_deploy)

## 5b) ESML `Deploy BATCH` pipeline
- Deploy same model "offline / previous" `AutoML Run` for `DEV` environment
- To →  `DEV`, `TEST` or `PROD` environment
