# PRODUCTION phase: About this notebook
- Purpose: Creates 1 PIPELINE to serve the model.
    - `Batch scoring pipeline:` Fetches the best trained model, BUILDs an `Azure Machine Learning pipeline`, to batch score the data in a scheduled or triggered way

## DETAILS - about this notebook and the 2 pipelines, generated            
- 1) Initiate ESMLPipelineFactory:
- 2) `AUTO-GENERATE code: a snapshot folder` via ESML, that generates Python scripts and the `ESML runtime`
    - azure-enterprise-scale-ml\2_A_aml_pipeline\4_inference\batch\\`M11`
        - Edit the feature engineering files if needed
            - azure-enterprise-scale-ml\2_A_aml_pipeline\4_inference\batch\\`M11\your_code\your_custom_code.py`
            - `your_custom_code.py` is referenced from all the `in_2_silver_...` files, such as: 2_A_aml_pipeline\4_inference\batch\M11\\`in2silver_ds01_diabetes.py`  and `silver_merged_2_gold`
- 3) `BUILDS the pipeline` of certain type IN_2_GOLD_SCORING
    - `An Azure Machine Learning pipeline` with steps will be auto-generated by ESML, based on your `lake_settings.json` dataset array.
    - 3b) BUILDS a `training pipeline` of ESML type `IN_2_GOLD_SCORING`
- 4) `EXECUTES the pipeline` (smoke testing purpose - see that it works...)
    - 4b) Batch scoring pipeline (`IN_2_GOLD_SCORING`)
        - Feature engineering of each in-data - via `IN_2_SILVER` step (here sample data is needed, or else StreamAccessException)
        - Merges all SILVERS to `GOLD`
        - Score data: Fetched the best trained model, leading model, to score with
        - Saves scored data to the datalake, and writes metadata about WHAT data was scored, WHEN was the scoring, and with WHAT model_version was used.
- 5) PUBLISH the pipeline
    - Purpose: Now when the pipeline is `smoke tested`, we can publish is, to get a `pipeline_id to use in Azure Data factory`
    - PRINT the pipeline ID after publish also
- DONE.
    

Note: This notebook is called: `M11_v143_esml_regression_batch_scoring.ipynb` in the notebook_templates folder
 

## 1) Initiate ESMLPipelineFactory (Always run thic CELL below)
- To attach ESML controlplane to your project
- To point at `template-data` for the pipeline to know the schema of data.
    - NB! Azure machine learning pipelines need sample data. You need to have sample-data underneath the datalake folder structure:
    - `1` is recommended for `model_version folder`
    - `1000-01-01 00:00:00.243860` is recommended for `date_folder`
    - Example: project002/11_diabetes_model_reg/inference/`1`/ds01_diabetes/in/dev/`1000/01/01/`
- To init the ESMLPipelinefactory

In [None]:
import sys
sys.path.insert(0, "../azure-enterprise-scale-ml/esml/common/")
from esml import ESMLProject
from baselayer_azure_ml_pipeline import ESMLPipelineFactory, esml_pipeline_types

p = ESMLProject()
p.inference_mode = True
p.active_model = 11 # 10=titanic , 11=Diabetes
p_factory = ESMLPipelineFactory(p)

# Azure machine learling pipelines need sample data to know schema
model_version = 0
p_factory.batch_pipeline_parameters[0].default_value = model_version
training_datefolder = '1000-01-01 10:35:01.243860'
p_factory.batch_pipeline_parameters[1].default_value = training_datefolder # overrides ESMLProject.date_scoring_folder.
p_factory.describe()


# 2) `AUTO-GENERATE code: a snapshot folder`

In [None]:
## Generate CODE - then edit it to get correct environments
p_factory.create_dataset_scripts_from_template(overwrite_if_exists=True) # Do this once, then edit them manually. overwrite_if_exists=False is DEFAULT

# 3) `BUILDS the pipeline, and RUN the pipeline (smoke testing)`

Take note on the `esml_pipeline_types` below, of type: esml_pipeline_types.`IN_2_GOLD_SCORING`

In [None]:
## BUILD
batch_pipeline = p_factory.create_batch_pipeline(esml_pipeline_types.IN_2_GOLD_SCORING) # Note the esml_pipeline_types

# 4) `Execute the pipeline (smoke testing)`

In [None]:
## RUN for smoke testing purpose, to see that it works during runtime
pipeline_run = p_factory.execute_pipeline(batch_pipeline) # Tip: Pointing at the wrong folder for the sample data is the most common error "StreamAccessException"
pipeline_run.wait_for_completion(show_output=False)

# 5a) PUBLISH the TRAINING pipeline & PRINT its ID

In [None]:
# PUBLISH
published_pipeline, endpoint = p_factory.publish_pipeline(batch_pipeline,"_1") # "_1" is optional    to create a NEW pipeline with 0 history, not ADD version to existing pipe & endpoint

# PRINT: Get info to use in Azure data factory
- `published_pipeline.id` (if private Azure ML workspace)

In [None]:
print("2) Fetch scored data: Below needed for Azure Data factory PIPELINE activity (Pipeline OR Endpoint. Choose the latter") 
print ("- Endpoint ID")
print("Endpoint ID:  {}".format(endpoint.id))
print("Endpoint Name:  {}".format(endpoint.name))
print("Experiment name:  {}".format(p_factory.experiment_name))

print("In AZURE DATA FACTORY - This is the ID you need, if using PRIVATE LINK, private Azure ML workspace.")
print("-You need PIPELINE id, not pipeline ENDPOINT ID ( since cannot be chosen in Azure data factory if private Azure ML)")
published_pipeline.id

 # DONE! Next step would be

 - Q: `Next step in PRODUCTION phaase after the 2a and 3a or 3b notebooks are done?`

1) Go to your ESMLProjects `Azure data factory`, and use the `ESML DataOps templates` (Azure data factory templates) for `IN_2_GOLD_SCORING`
    - azure-enterprise-scale-ml\copy_my_subfolders_to_my_grandparent\adf\v1_3\PROJECT000\LakeOnly\`STEP03_IN_2_GOLD_SCORING.zip`
2) Go to the next notebook `mlops` folder, to setup `CI/CD` in Azure Devops
    - Import this in Azure devops
        azure-enterprise-scale-ml\copy_my_subfolders_to_my_grandparent\mlops\01_template_v14\azure-devops-build-pipeline-to-import\\`ESML-v14-project002_M11-DevTest.json`
    - Change the Azure Devops `VARIABLES` for service principle, tenant, etc.
    - Change parameters in the `inlince Azure CLI script` to correct model you want to work with, and the correct data you want to train with, or score.
        - File: `31-deploy_and_smoketest_batch_scoring.py`
        - INLINE code: `--esml_model_number 11 --esml_date_utc "1000-01-01 10:35:01.243860"`