Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Develop your own Azure Machine Learning component using CLI & dsl.component

In this notebook, you learn how to create a simple machine learning module from scratch and use it in an ML pipeline.

* Init a module from an existing python script with module cli;
* Run local test(optional), to make sure the code works correctly;
* Register the module to your Machine learning workspace;



## Prerequisites
* Install azure cli with azure-cli-ml extension following the [instructions here](https://github.com/Azure/DesignerPrivatePreviewFeatures/blob/sdkpreview/azureml-modules/samples/setup-environment.ipynb).


## Setup workspace

Login to azure with cli and set the default workspace using `az ml folder attach` command.

After this operation, the workspace could be retrived with the `Workspace.from_config()` for SDK usage.

In [None]:
# NOTE: Update the following information with your environment

SUBSCRIPTION_ID = '<your subscription ID>'
WORKSPACE_NAME = '<your workspace name>'
RESOURCE_GROUP_NAME = '<your resource group>'

In [None]:
!az login -o none # When the first time you use az cli, you need to login to make sure you could access you sub and workspace
!az account set -s $SUBSCRIPTION_ID 
!az ml folder attach -w $WORKSPACE_NAME -g $RESOURCE_GROUP_NAME 

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

## Init a module with an existing python script

We will init a module from an existing python script with azureml cli. The script is in [modules/creation/prepare_data](modules/creation/prepare_data). Print the script to see what's in it.

In [None]:
with open('modules/creation/prepare_data/prepare_data.py') as fin:
    print(fin.read())

Now init a dsl module with `az ml module init` command. In following cell, `--source` defines the source script (need to parse command-line arguments with argparse) to initalize the dsl module from, `--inputs/outputs` defines the input and output from the source script's argparse parameters, `source-dir` defines the source directory to generate the initalized files, `--entry-only` defines it only generate the entry script.

Run `az ml module init --help` to learn more how to use this command. 

In [1]:
!az ml module init --source prepare_data.py --inputs input_data --outputs output_data --source-dir modules/creation/prepare_data --entry-only

az: error: unrecognized arguments: --entry-only
usage: az [-h] [--verbose] [--debug] [--only-show-errors]
          [--output {json,jsonc,yaml,yamlc,table,tsv,none}] [--query JMESPATH]
          {ml} ...


Note: This toturial run `az ml module init` command with `--entry-only` arugment, which means it will only generate the entry file. Without `--entry-only`, following files will be generated to help module development/test. 

|File name|Purpose|
| -----------| ----------- |
|prepare_data_entry.py|Module entry script wrapped by dsl module decorator, which defines the interface and main logic of the module|
|prepare_data.spec.yaml|Module spec in yaml format, use it to register the module|
|conda.yaml|YAML file that manages conda environment. It's refferneced by module yaml sepc|
|prepare_data_entry.test.ipynb| Test notebook for this module. It contains sample code on how to test the module|
|config.json|A blank AML workspace config.josn file. Copy your AML workspace config here|
|./tests|The template test folder. It contains a template unit test script for the module.|
|./data|The placeholder data folder to put local test data.|


The `az ml init --source` required source script parse the command-line using argparse. If your script is not written by argparse, we suggest to init a template dsl module (by not defining source), and edit the module script manually. 

Print the generated module entry file

In [None]:
with open('modules/creation/prepare_data/prepare_data_entry.py') as fin:
    print(fin.read())

For this toturial, the inialized entry script parse all module arguments correctly, so there is no need to further edit the entry script. For a different module, you may adjust the entry script in following parts:

* Put the meta information in `@dsl.module()`, see [module spec](https://aka.ms/azureml-module-specs) for more details;
* Adjust the function args to indicate the module interface, use InputFile/InputDirectory/OutputFile/OutputDirectory to indicate the input/output, others are parameters;
* Adjust the logic in function body, this is the main logic for the module;

In AzureML, the script is executed with command line args, in the main part, we call "ModuleExecutor(prepare_data).execute(sys.argv)" to parse the command line to function args and pass the args to the function.
Thus we cannot change this line to make sure the module runs correctly in AzureML.

## Run the module in local 

After we developed the module, to ensure it works correctly, we could load this module and run the module in local.

`Module.from_func` can load a module from a Python fucntion. 
`Module.from_yaml` can load a module object from yaml spec. 

This two methods are designed for load module quickly for test purpose.

In [None]:
from azureml.pipeline.wrapper import Module
import sys
sys.path.insert(0, 'modules/creation/prepare_data/')  # This line adds modules/prepare_data to your path so you could import it.
from prepare_data_entry import prepare_data

prepare_module_func = Module.from_func(ws, prepare_data)
help(prepare_module_func)

Run the module with local test data.`module.run` support both run the module with local python environment, or with local container, configured by `use_docker` parameter. Note that in module entry script the input type is InputDirectory, so here pass a direcotry for local test. 

In [None]:
local_sample = 'modules/creation/prepare_data_small_input'
module = prepare_module_func(input_data=local_sample)
module.run(experiment_name='prepare_data_local_run', use_docker=False, track_run_history=True)

## Register the module
Once we have successfully tested the module, we could register the module to the workspace with the following command. After registeration, the module will be availbe to all users that have access to the worksapce. And it can be consumed both in designer UI and SDK.  
- `az ml module build` will automatically generate yaml spec from module entry script that are wrapped as dsl module
- `az ml module register` will register module to an Azure Machine Learning workspace. 


In [None]:
!az ml module build --target modules/creation/prepare_data/prepare_data_entry.py --source-dir modules/creation/prepare_data
!az ml module register --set-as-default-version --spec-file modules/creation/prepare_data/prepare_data_entry.spec.yaml

In [None]:
# Check whether the module is successfully registered
module_func = Module.load(ws, name='Prepare data')
help(module_func)

It's also possible to consume the registered module in designer UI. Add &flight=cm at end of your URL to see custom modules in designer.
![consume-in-designer](./modules/media/consume-in-designer.png)