![](../azure-enterprise-scale-ml/esml/images/split_gold_and_train_automl_small.png)


# ESML - `AutoMLFactory` and `ComputeFactory`

## PROJECT + DATA CONCEPTS + ENTERPRISE Datalake Design + DEV->PROD MLOps
- `1)ESML Project`: The ONLY thing you need to remember is your `Project number` (and `BRONZE, SILVER, GOLD` concept )
   -  ...`read earlier notebook
## ENTERPRISE Deployment of Models & Governance - MLOps  at scale
- `3) DEV->TEST-PROD` (configs, compute, performance)
    - ESML has config for 3 environemnts: Easy DEPLOY model across subscriptions and Azure ML Studio workspaces 
        - Save costs & time: 
            - `DEV` has cheaper compute performance for TRAIN and INFERENCE (batch, AKS)
            - `DEV` has Quick-debug ML training (fast training...VS good scoring in TEST and PROD)
        - How? ESML `AutoMLFactory` and `ComputeFactory`
        - Where to config these?
            - settings/dev_test_prod/`dev_test_prod_settings.json`
            - settings/dev_test_prod/`train/*/automl/*`

## SAVE COST in R&D MODE
    `- EMSL has R&D mode: Set p.rnd=True 
        - Versioning on dataset will be turned off (save storage & cluttering in Azure)
        - Save cost & time "while debuggin" building codebase: 
            - In R&D mode the IN data will have a filter to use only 20% of the data.
            - Meaning - You can use a SMALL compute, since less data
            - .....when your codebase is ready for the real TRAINING, switch COMPUTE and turnoff p.rnd = False

---
**Q:Howto work with different environemnts?** 
- A: *Train in TEST instead of DEV like below:*  
>
`
print(p.dev_test_prod)  
ws_test = p.get_other_workspace(p.dev_test_prod)  
datastore = p.init(ws_test)  
`

https://medium.com/analytics-vidhya/the-ultimate-markdown-guide-for-jupyter-notebook-d5e5abf728fd

In [None]:
sys.path.append(os.path.abspath("../azure-enterprise-scale-ml/esml/common/"))  # NOQA: E402
from esml import ESMLDataset, ESMLProject
p = ESMLProject() #  self-booting config ..... p = ESMLProject(esml_settings,env_settings, security_settings) # read from config
p.describe()

# Azure ML Studio Workspace
- ESML will `Automap` and `Autoregister` Azure ML Datasets as: `IN, SILVER, BRONZE, GOLD`

In [None]:
import azureml.core 
from azureml.core import Workspace
ws = p.get_workspace_from_config()
print("SDK version:", azureml.core.VERSION)

In [None]:
p.unregister_all_datasets(ws) # For DEMO purpose
p.init(ws)

# ESML `GOLD` Dataset

In [None]:
ds_01 = p.DatasetByName("ds01_diabetes")
print(ds_01.InData.name)
print(ds_01.Bronze.name)
print(ds_01.Silver.name)
#print(p.Gold.name)

In [None]:
p.dataset_list[1].Name

In [None]:
df_01 = ds_01.Silver.to_pandas_dataframe()

ds_02 = ds_01 = p.DatasetByName("ds02_other")
df_02 = ds_02.Silver.to_pandas_dataframe()
df_gold1_join = df_01.join(df_02) # left join -> NULL on df_02
print("Diabetes shape: ", df_01.shape)
print(df_gold1_join.shape)

In [None]:
ds_gold_v1 = p.save_gold(df_01)

# Look at `GOLD` vLatest

In [None]:
import pandas as pd 
df = p.Gold.to_pandas_dataframe()
df.head()

In [None]:
print(p.rnd)

In [None]:
label = "Y"
train_6, validate_set_2, test_set_2 = p.split_gold_3(0.6, label)
print(" Q:  But...Why add LABEL info when splitting for TRAIN?")

In [None]:
print("...This is why:")
X_test, y_test, tags = p.get_gold_validate_Xy() # Version is default latest
print(tags)

## 3B) ESML TRAIN model - via PIPELINE (reuse, call via REST, etc..)
- `AutoMLFactory, ComputeFactory`
- Get `Train COMPUTE` for `X` environment
- Get `Train Hyperparameters` for `X` environment (less crossvalidations in DEV etc)
- Splits into versioned `train, validate, test` sets from GOLD, `and register them` as Azure ML datasets

In [None]:
sys.path.append(os.path.abspath("../common/"))  # NOQA: E402
from azureml.train.automl import AutoMLConfig
from baselayer_azure_ml import AutoMLFactory, ComputeFactory, azure_metric_regression, azure_metric_classification

In [None]:
automl_performance_config = p.get_automl_performance_config()
aml_compute = p.get_training_aml_compute(ws)

label = "Y" 
# Automatically registers dataframes in AZURE as M03_GOLD_TRAIN | M03_GOLD_VALIDATE | M03_GOLD_TEST # Alt: train,testv= p.Gold.random_split(percentage=0.8, seed=23)
train_6, validate_set_2, test_set_2 = p.split_gold_3(0.6,label)

automl_config = AutoMLConfig(task = 'regression',
                             compute_target = aml_compute,
                             primary_metric = azure_metric_regression.MAE, #  'normalized_mean_absolute_error, normalized_root_mean_squared_error, spearman_correlation, r2_score'
                             experiment_exit_score = '0.208', # DEMO purpose
                             training_data = p.GoldTrain,   # This is the Azure ML Dataset representation fo 'train_6' pandas dataframe
                             label_column_name = label,
                             **automl_performance_config
                            )
best_run, fitted_model, exp = AutoMLFactory(p).train_pipeline(automl_config)

## 3A) ESML TRAIN model
- `AutoMLFactory, ComputeFactory`
- Get `Train COMPUTE` for `X` environment
- Get `Train Hyperparameters` for `X` environment (less crossvalidations in DEV etc)
- Splits into versioned `train, validate, test` sets from GOLD, `and register them` as Azure ML datasets

In [None]:
automl_performance_config = p.get_automl_performance_config()
aml_compute = p.get_training_aml_compute(ws)

label = "Y" 
# Automatically registers dataframes in AZURE as M03_GOLD_TRAIN | M03_GOLD_VALIDATE | M03_GOLD_TEST # Alt: train,testv= p.Gold.random_split(percentage=0.8, seed=23)
train_6, validate_set_2, test_set_2 = p.split_gold_3(0.6,label)

automl_config = AutoMLConfig(task = 'regression',
                             compute_target = aml_compute,
                             primary_metric = azure_metric_regression.RMSE,
                             training_data = p.GoldTrain, # is 'train_6' pandas dataframe, but as an Azure ML Dataset
                             experiment_exit_score = '0.208', # DEMO purpose
                             label_column_name = label,
                             **automl_performance_config
                            )

via_pipeline = False
best_run, fitted_model, experiment = AutoMLFactory(p).train_pipeline(automl_config) if via_pipeline else AutoMLFactory(p).train_as_run(automl_config)

## 4a) ESML Scoring compare: Promote model or not? Register
- `IF` newly trained model in `current` environment scores BETTER than existing model in `target` environment, then `new model` can be registered and promoted.
-  `ValidationSet` comparison of offline/previous `AutoML run` for `DEV` environment
- For `DEV`, `TEST` or `PROD` environment
- Future roadmap: Also include `TestSet SCORING` comparison

In [None]:
from baselayer_azure_ml import AutoMLFactory
p.dev_test_prod = "dev" # Current env, new unregistered model A to validate
target_env = "dev" # Target env. Existing registered model B - Does Model A score better than Model B?

print("If new model scores better, we can register this in DEV/TEST/PROD")
print("If new model we trained was DEV workspace, we can register it DEV - or in TEST subscription/workpace.")

promote, m1_name, r1_id, m2_name, r2_run_id = AutoMLFactory(p).compare_scoring_current_vs_new_model(target_env)

print("Promote model?  {}".format(promote))
print("New Model 1: {}".format(m1_name))
print("Existing Model: {} in environment {}".format(m2_name,target_env))

if (promote and p.dev_test_prod == target_env ): # Can only register a model in same workspace (test->test) - need to retrain if going from dev->test
    AutoMLFactory(p).register_active_model(target_env)


### ..Model compared, promoted, register - ready for deployment

## 4b) ESML Loadtesting performance & Cost estimation
- Using ESML GOLD_TEST Dataset for AutoML to see which algorithm that is fastest, smallest size footprint
- Using ESML GOLD_SCORING Dataset, to see `COST` of a `Training run`
- ...For different environments: `DEV`, `TEST` or `PROD` environment

GOTO Notebook [`esml_howto_5_load_test_and_predict_cost`](./esml_howto_5_load_test_and_predict_cost.ipynb)

## 5a) ESML Deploy ONLINE, to AKS
- Deploy "offline" from old `AutoML run` for `DEV` environment
- To →  `DEV`, `TEST` or `PROD` environment


GOTO Notebook [`esml_howto_3_deploy_score.ipynb`](./esml_howto_3_deploy_score.ipynb)

## 5b) ESML `Deploy BATCH` pipeline
- Deploy same model "offline / previous" `AutoML Run` for `DEV` environment
- To →  `DEV`, `TEST` or `PROD` environment
