## Azure Machine Learning SDK for Python

Sources from __[What is the Azure Machine Learning SDK for Python?](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py)__

Key areas of the SDK include:

- Explore, prepare and manage the lifecycle of your datasets used in machine learning experiments.
- Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.
- Train models either locally or by using cloud resources, including GPU-accelerated model training.
- Use automated machine learning, which accepts configuration parameters and training data. It automatically iterates through algorithms and hyperparameter settings to find the best model for running predictions.
- Deploy web services to convert your trained models into RESTful services that can be consumed in any application.


AML SDK for Python Namespace:
* Workspace
* Dataset
* Experiment
* Run
* Model
* ComputeTarget
* RunConfiguration
* ScriptRunConfig
* Environment
* Pipeline
* PythonScriptStep

![AMLWorkspace](https://docs.microsoft.com/en-us/azure/machine-learning/media/concept-workspace/azure-machine-learning-taxonomy.png#lightbox)


## Check version of the SDK

In [1]:
import azureml.core


print("Azure Machine Learning SDK for python version {0}".format(azureml.core.VERSION))

Azure Machine Learning SDK for python version 1.37.0


## Workspace

### azureml.core.workspace.Workspace

Create AML Workspace

```python
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
           subscription_id='<azure-subscription-id>',
           resource_group='myresourcegroup',
           create_resource_group=True,
           location='eastus2'
           )
```

__Get workspace object__

```python
from azureml.core import Workspace

ws = Workspace.get(name="myworkspace",
            subscription_id='<azure-subscription-id>',
            resource_group='myresourcegroup')
```

In [1]:
from azureml.core import Workspace


# Read from '.azureml/' in the current working directory and 'config.json' file
ws = Workspace.from_config()

ws.get_details()

{'id': '/subscriptions/89da9f33-fd31-4ece-861e-5fab7af4dc11/resourceGroups/mtcs-dev-aml-rg/providers/Microsoft.MachineLearningServices/workspaces/mtcs-dev-aml',
 'name': 'mtcs-dev-aml',
 'identity': {'principal_id': '825a800c-aeb0-41be-945e-3caf8d9b5f19',
  'tenant_id': '72f988bf-86f1-41af-91ab-2d7cd011db47',
  'type': 'SystemAssigned'},
 'location': 'westus2',
 'type': 'Microsoft.MachineLearningServices/workspaces',
 'tags': {},
 'sku': 'Basic',
 'workspaceid': '55f81537-6bfe-40ca-8b34-4b2093de7c9e',
 'sdkTelemetryAppInsightsKey': '19f24253-9564-406c-9a1e-a48a21b145aa',
 'description': '',
 'friendlyName': 'mtcs-dev-aml',
 'creationTime': '2020-08-04T22:57:58.9401208+00:00',
 'containerRegistry': '/subscriptions/89da9f33-fd31-4ece-861e-5fab7af4dc11/resourcegroups/mtcs-dev-aml-rg/providers/microsoft.containerregistry/registries/mtcsdevamlcr',
 'adbWorkspace': '/subscriptions/89da9f33-fd31-4ece-861e-5fab7af4dc11/resourceGroups/mtcs-dev-adb-rg/providers/Microsoft.Databricks/workspaces/hy

In [3]:
fs = open('./.azureml/config.json')

print(fs.read())

{"Id": null, "Scope": "/subscriptions/89da9f33-fd31-4ece-861e-5fab7af4dc11/resourceGroups/mtcs-dev-aml-rg/providers/Microsoft.MachineLearningServices/workspaces/mtcs-dev-aml"}


## Datastore and DataSet


In [71]:
csvfile = 'diabetes.csv'

In [82]:
from azureml.core import Datastore

datastore_name = 'workspaceblobstore' # Update the value with your datastore name

# retrieve an existing datastore in the workspace by name
datastore = Datastore.get(ws, datastore_name)
datastore

{
  "name": "workspaceblobstore",
  "container_name": "azureml-blobstore-55f81537-6bfe-40ca-8b34-4b2093de7c9e",
  "account_name": "mtcsdevamlsa",
  "protocol": "https",
  "endpoint": "core.windows.net"
}

In [74]:
file_datastore = datastore.upload('./data', 'data', overwrite=True)

Uploading an estimated of 1 files
Uploading ./data/diabetes.csv
Uploaded ./data/diabetes.csv, 1 files out of an estimated total of 1
Uploaded 1 files


In [81]:
from azureml.core import Dataset

ds = Dataset.File.from_files(path=[(datastore,'./data/diabetes.csv')])
ds.register(ws, 'diabetes', create_new_version=True)

{
  "source": [
    "('workspaceblobstore', './data/diabetes.csv')"
  ],
  "definition": [
    "GetDatastoreFiles"
  ],
  "registration": {
    "id": "609ceedf-1263-404e-bc7a-4ae357c0c93c",
    "name": "diabetes",
    "version": 2,
    "workspace": "Workspace.create(name='mtcs-dev-aml', subscription_id='89da9f33-fd31-4ece-861e-5fab7af4dc11', resource_group='mtcs-dev-aml-rg')"
  }
}

## Experiment

### azureml.core.experiment.Experiment

The Experiment class is another foundational cloud resource that represents a collection of trials (individual model runs). The following code fetches an Experiment object from within Workspace by name, or it creates a new Experiment object if the name doesn't exist.


In [2]:
from azureml.core.experiment import Experiment

expName = "mtc-aml-lab-exp"
exp = Experiment(workspace=ws, name=expName)
exp

Name,Workspace,Report Page,Docs Page
mtc-aml-lab-exp,mtcs-dev-aml,Link to Azure Machine Learning studio,Link to Documentation


In [5]:
exp.tag("projectName","AML-Lab")
exp.tag("MTCLocation","Seattle")
exp.tag("MTCTeam","MTCS")
exp.tag("MTCTeam","MTC Seattle") # Careful, tags are mutable

In [None]:
list_experiments = Experiment.list(ws)
list_experiments

In [None]:
for experiment in list_experiments:
    if experiment.name == expName:
        print(experiment.name) 
        print(experiment.tags)

# Check the value of key 'Team'

In [6]:
list_runs = exp.get_runs()

for run in list_runs:
    print(run.id)

There are two ways to execute an experiment trial. 

If you're interactively experimenting in a Jupyter notebook, use the `start_logging` function. 

If you're submitting an experiment from a standard Python environment, use the `submit` function. 

Both functions return a Run object. The experiment variable represents an Experiment object in the following code examples.

In [22]:
run = exp.start_logging()
run

Experiment,Id,Type,Status,Details Page,Docs Page
mtc-aml-lab-exp,7ea2acca-a102-4d37-9cfa-5c0adfe61208,,Running,Link to Azure Machine Learning studio,Link to Documentation


In [8]:
list_runs = exp.get_runs()

for run in list_runs:
    print(run)

Run(Experiment: mtc-aml-lab-exp,
Id: db0262dc-7a57-4138-a6ec-7a48ef6b73c9,
Type: None,
Status: Running)


## Run

### azureml.core.run.Run

A run represents a single trial of an experiment. Run is the object that you use to monitor the asynchronous execution of a trial, store the output of the trial, analyze results, and access generated artifacts. You use Run inside your experimentation code to log metrics and artifacts to the Run History service. Functionality includes:

- Storing and retrieving metrics and data.
- Using tags and the child hierarchy for easy lookup of past runs.
- Registering stored model files for deployment.
- Storing, modifying, and retrieving properties of a run.

Create a Run object by submitting an Experiment object with a run configuration object. Use the tags parameter to attach custom categories and labels to your runs. You can easily find and retrieve them later from Experiment.


In [23]:
# run = experiment.submit(config=your_config_object, tags=tags)

run.tag("owner","hyun")
run.tag("build","dev")
run.tag("codeVersion",1) # Integer or string for value

Converting non-string tag to string: (codeVersion: 1)


In [10]:
print(run)

Run(Experiment: mtc-aml-lab-exp,
Id: db0262dc-7a57-4138-a6ec-7a48ef6b73c9,
Type: None,
Status: Running)


In [11]:
from azureml.core.run import Run

filtered_list_runs = Run.list(exp, tags={"owner":"hyun", "build":"dev"})

for filtered_run in filtered_list_runs:
    print(filtered_run)
    print(filtered_run.tags)

Run(Experiment: mtc-aml-lab-exp,
Id: db0262dc-7a57-4138-a6ec-7a48ef6b73c9,
Type: None,
Status: Running)
{'owner': 'hyun', 'build': 'dev', 'codeVersion': '1'}


In [12]:
run_details = run.get_details()
run_details

{'runId': 'db0262dc-7a57-4138-a6ec-7a48ef6b73c9',
 'target': 'local',
 'status': 'Running',
 'startTimeUtc': '2022-03-01T07:25:41.922778Z',
 'services': {},
 'properties': {'azureml.git.repository_uri': 'https://github.com/hyssh/mtc-open-workshop.git',
  'mlflow.source.git.repoURL': 'https://github.com/hyssh/mtc-open-workshop.git',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': '971b75559670c5ff8880ddd2b70372227a814603',
  'mlflow.source.git.commit': '971b75559670c5ff8880ddd2b70372227a814603',
  'azureml.git.dirty': 'True',
  'ContentSnapshotId': '5eb990c6-bb4a-4a25-934a-b4213668b1ff'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {},
 'submittedBy': 'Hyun Suk Shin (MTC SEATTLE)'}

Output for this function is a dictionary that includes:

- Run ID
- Status
- Start and end time
- Compute target (local versus cloud)
- Dependencies and versions used in the run
- Training-specific data (differs depending on model type)

### Upload file/s to AML using RUN

In [24]:
run.upload_file(name='aml-lab/workshop.ipynb', path_or_stream="./0.AMLSDKforPython.ipynb")

# Go to 'Run' > 'Outputs + Logs' in Experiment of Azure Machine Learning

<azureml._restclient.models.batch_artifact_content_information_dto.BatchArtifactContentInformationDto at 0x7f721415fac0>

### Logging metrics using RUN

In [14]:
run.log_list(name='Fibonacci', value=[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89])

In [15]:
import numpy as np


for i in (range(-10, 10)): 
    run.log(name='Sigmoid', value=1 / (1 + np.exp(-i)))
    angle = i / 2.0

In [16]:
for i in (range(-10, 10)):
    angle = i / 2.0
    run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle))


In [17]:
citrus = ['orange', 'lemon', 'lime']
sizes = [ 10, 7, 3]

for index in range(len(citrus)):
    run.log_row("citrus", fruit = citrus[index], size=sizes[index])

In [19]:
run.log_image(name='AML Concept Whiteboard', path='./AML_concept.png', plot=None, description='Discussion lead by Hyun at MTC Seattle')

In [20]:
metrics = run.get_metrics()
# metrics is of type Dict[str, List[float]] mapping metric names
# to a list of the values for that metric in the given run.

print(metrics)

{'Fibonacci': [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89], 'Sigmoid': [4.5397868702434395e-05, 0.00012339457598623172, 0.0003353501304664781, 0.0009110511944006454, 0.0024726231566347743, 0.0066928509242848554, 0.01798620996209156, 0.04742587317756678, 0.11920292202211755, 0.2689414213699951, 0.5, 0.7310585786300049, 0.8807970779778823, 0.9525741268224334, 0.9820137900379085, 0.9933071490757153, 0.9975273768433653, 0.9990889488055994, 0.9996646498695336, 0.9998766054240137], 'Cosine Wave': {'angle': [-5.0, -4.5, -4.0, -3.5, -3.0, -2.5, -2.0, -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5], 'cos': [0.28366218546322625, -0.2107957994307797, -0.6536436208636119, -0.9364566872907963, -0.9899924966004454, -0.8011436155469337, -0.4161468365471424, 0.0707372016677029, 0.5403023058681398, 0.8775825618903728, 1.0, 0.8775825618903728, 0.5403023058681398, 0.0707372016677029, -0.4161468365471424, -0.8011436155469337, -0.9899924966004454, -0.9364566872907963, -0.6536436208636119, 

In [32]:
run.complete()
# run.cancel()

ServiceException: ServiceException:
	Code: 400
	Message: (UserError) Run with id compute_target_test_1646123827_f7b9f0c7 is in a terminal state and cannot be updated.
	Details:

	Headers: {
	    "Date": "Tue, 01 Mar 2022 08:40:22 GMT",
	    "Content-Type": "application/json; charset=utf-8",
	    "Content-Length": "898",
	    "Connection": "keep-alive",
	    "Request-Context": "appId=cid-v1:2d2e8e63-272e-4b3c-8598-4ee570a0e70d",
	    "x-ms-response-type": "error",
	    "Strict-Transport-Security": "max-age=15724800; includeSubDomains; preload",
	    "X-Content-Type-Options": "nosniff",
	    "x-request-time": "0.036"
	}
	InnerException: {
    "additional_properties": {},
    "error": {
        "additional_properties": {
            "debugInfo": null
        },
        "code": "UserError",
        "severity": null,
        "message": "Run with id compute_target_test_1646123827_f7b9f0c7 is in a terminal state and cannot be updated.",
        "message_format": "Run with id {runId} is in a terminal state and cannot be updated.",
        "message_parameters": {
            "runId": "compute_target_test_1646123827_f7b9f0c7"
        },
        "reference_code": null,
        "details_uri": null,
        "target": null,
        "details": [],
        "inner_error": {
            "additional_properties": {},
            "code": "BadArgument",
            "inner_error": {
                "additional_properties": {},
                "code": "TerminalRunState",
                "inner_error": null
            }
        },
        "additional_info": null
    },
    "correlation": {
        "operation": "22de665b996bd7555657231db9866732",
        "request": "5a64625575f8d25c"
    },
    "environment": "westus2",
    "location": "westus2",
    "time": {},
    "component_name": "run-history"
}

## Model

### azureml.core.model.Model

The `Model` class is used for working with cloud representations of machine learning models. Methods help you transfer models between local development environments and the `Workspace` object in the cloud.

You can use model registration to store and version your models in the Azure cloud, in your workspace. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. Azure Machine Learning supports any model that can be loaded through Python 3, not just Azure Machine Learning models.

The following example shows how to build a simple local classification model with `scikit-learn`, register the model in `Workspace`, and download the model from the cloud.

Create a simple classifier, `clf`, to predict customer churn based on their age. Then dump the model to a `.pkl` file in the same directory.

In [25]:
from sklearn import svm
import joblib
import numpy as np

# customer ages
X_train = np.array([50, 17, 35, 23, 28, 40, 31, 29, 19, 62])
X_train = X_train.reshape(-1, 1)
# churn y/n
y_train = ["yes", "no", "no", "no", "yes", "yes", "yes", "no", "no", "yes"]

clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(X_train, y_train)

joblib.dump(value=clf, filename="churn-model.pkl")

['churn-model.pkl']

Use the `register` function to register the model in your workspace. Specify the local model path and the model name. Registering the same name more than once will create a new version.

In [26]:
from azureml.core.model import Model

model = Model.register(workspace=ws,
                       model_path="churn-model.pkl",
                       model_name="churn-model-test")

Registering model churn-model-test


Now that the model is registered in your workspace, it's easy to manage, download, and organize your models. To retrieve a model (for example, in another environment) object from `Workspace`, use the class constructor and specify the model name and any optional parameters. Then, use the download function to `download` the model, including the cloud folder structure.

## Environment

### azureml.core.environment.Environment

Azure Machine Learning environments specify the Python packages, environment variables, and software settings around your training and scoring scripts. In addition to Python, you can also configure PySpark, Docker and R for environments. Internally, environments result in Docker images that are used to run the training and scoring processes on the compute target. The environments are managed and versioned entities within your Machine Learning workspace that enable reproducible, auditable, and portable machine learning workflows across a variety of compute targets and compute types.

You can use an Environment object to:

- Develop your training script.
- Reuse the same environment on Azure Machine Learning Compute for model training at scale.
- Deploy your model with that same environment without being tied to a specific compute type.

The following code imports the Environment class from the SDK and to instantiates an environment object.

In [16]:
import sys
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies

# pyVersion = f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"
icName = '' # <update the value with your Compute Instance Name>

myenv = Environment(name="myEnv")
conda_dep = CondaDependencies()

# Installs pillow package
conda_dep.add_conda_package("numpy==1.17.0")
conda_dep.add_pip_package("scikit-learn")
conda_dep.add_pip_package("pillow")


# Adds dependencies to PythonSection of myenv
myenv.python.conda_dependencies=conda_dep
myenv.python.conda_dependencies.set_python_version("3.6")
myenv.register(ws)
myenv.build(ws, icName)

<azureml.core.environment.ImageBuildDetails at 0x7f3850b9e0d0>

In [27]:
from azureml.core.model import Model
import os

model = Model(workspace=ws, name="churn-model-test")
model.download(target_dir=os.path.join(os.getcwd(),"myDownload"))

'/mnt/batch/tasks/shared/LS_root/mounts/clusters/hyssh1/code/Users/hyssh/mtc-open-workshop/Hands-on-Labs/0.AMLSDKforPython/myDownload/churn-model.pkl'

## ComputeTarget

### azureml.core.compute.ComputeTarget

The `ComputeTarget` class is the abstract parent class for creating and managing compute targets. A compute target represents a variety of resources where you can train your machine learning models. A compute target can be either a local machine or a cloud resource, such as Azure Machine Learning Compute, Azure HDInsight, or a remote virtual machine.

Use compute targets to take advantage of powerful virtual machines for model training, and set up either persistent compute targets or temporary runtime-invoked targets. For a comprehensive guide on setting up and managing compute targets, see the how-to.

The following code shows a simple example of setting up an `ComputeInstance` target. The resource scales automatically when a job is submitted. It's deleted automatically when the run finishes.

Reuse the simple scikit-learn churn model and build it into its own file, train.py, in the current directory. At the end of the file, create a new directory called outputs. This step creates a directory in the cloud (your workspace) to store your trained model that joblib.dump() serialized.

In [4]:
from azureml.core.compute import ComputeInstance 

icName = ''
myInstance = ComputeInstance(ws, icName)

## RunConfiguration

### azureml.core.compute.RunConfiguration

Next you create the compute target by instantiating a RunConfiguration object and setting the type and size. This example uses the smallest resource size (1 CPU core, 3.5 GB of memory). The list_vms variable contains a list of supported virtual machines and their sizes.

In [11]:
from azureml.core.runconfig import RunConfiguration

compute_config = RunConfiguration()
compute_config.target = myInstance
compute_config

{
    "script": null,
    "arguments": [],
    "target": "hyssh1",
    "framework": "Python",
    "communicator": "None",
    "maxRunDurationSeconds": null,
    "nodeCount": 1,
    "priority": null,
    "environment": {
        "name": null,
        "version": null,
        "environmentVariables": {
            "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
        },
        "python": {
            "userManagedDependencies": false,
            "interpreterPath": "python",
            "condaDependenciesFile": null,
            "baseCondaEnvironment": null,
            "condaDependencies": {
                "name": "project_environment",
                "dependencies": [
                    "python=3.6.2",
                    {
                        "pip": [
                            "azureml-defaults"
                        ]
                    }
                ],
                "channels": [
                    "anaconda",
                    "conda-forge"
                ]
            }


Define Environment

In [12]:
# from azureml.core.conda_dependencies import CondaDependencies

# dependencies = CondaDependencies()
# dependencies.add_pip_package("scikit-learn")
# dependencies.add_pip_package("numpy==1.15.4")

# compute_config.environment.python.conda_dependencies = dependencies
compute_config.environment = myenv

In [68]:
%%writefile ./source/train/train.py
from sklearn import svm
import numpy as np
import joblib
import os

# customer ages
X_train = np.array([50, 17, 35, 23, 28, 40, 31, 29, 19, 62])
X_train = X_train.reshape(-1, 1)
# churn y/n
y_train = ["yes", "no", "no", "no", "yes", "yes", "yes", "no", "no", "yes"]

clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(X_train, y_train)

os.makedirs("outputs", exist_ok=True)
joblib.dump(value=clf, filename="outputs/churn-model.pkl")

Overwriting ./source/train/train.py


In [13]:
from azureml.core.experiment import Experiment
from azureml.core import ScriptRunConfig

script_run_config = ScriptRunConfig(
    source_directory="./source/train",
    script="train.py",
    run_config=compute_config)

experiment = Experiment(workspace=ws, name=expName)

run = experiment.submit(config=script_run_config)

# This may take around 7 minutes
run.wait_for_completion(show_output=True)

RunId: mtc-aml-lab-exp_1646127808_cca6ccee
Web View: https://ml.azure.com/runs/mtc-aml-lab-exp_1646127808_cca6ccee?wsid=/subscriptions/89da9f33-fd31-4ece-861e-5fab7af4dc11/resourcegroups/mtcs-dev-aml-rg/workspaces/mtcs-dev-aml&tid=72f988bf-86f1-41af-91ab-2d7cd011db47

Streaming azureml-logs/20_image_build_log.txt

2022/03/01 09:43:32 Downloading source code...
2022/03/01 09:43:33 Finished downloading source code
2022/03/01 09:43:33 Creating Docker network: acb_default_network, driver: 'bridge'
2022/03/01 09:43:34 Successfully set up Docker network: acb_default_network
2022/03/01 09:43:34 Setting up Docker configuration...
2022/03/01 09:43:34 Successfully set up Docker configuration
2022/03/01 09:43:34 Logging in to registry: mtcsdevamlcr.azurecr.io
2022/03/01 09:43:35 Successfully logged into mtcsdevamlcr.azurecr.io
2022/03/01 09:43:35 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_network'
2022/03/01 09:43:35 Scanning for dependencies..

Ran pip subprocess with arguments:
['/azureml-envs/azureml_919bc46b07c95161e32444ce80d6d83c/bin/python', '-m', 'pip', 'install', '-U', '-r', '/azureml-environment-setup/condaenv.gvosgg28.requirements.txt']
Pip subprocess output:
Collecting azureml-defaults
  Downloading azureml_defaults-1.39.0-py3-none-any.whl (3.0 kB)
Collecting scikit-learn
  Downloading scikit_learn-0.24.2-cp36-cp36m-manylinux2010_x86_64.whl (22.2 MB)
Collecting pillow
  Downloading Pillow-8.4.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Collecting azureml-core~=1.39.0
  Downloading azureml_core-1.39.0-py3-none-any.whl (2.5 MB)
Collecting azureml-inference-server-http~=0.4.1
  Downloading azureml_inference_server_http-0.4.10-py3-none-any.whl (38 kB)
Collecting azureml-dataset-runtime[fuse]~=1.39.0
  Downloading azureml_dataset_runtime-1.39.0-py3-none-any.whl (3.5 kB)
Collecting configparser==3.7.4
  Downloading configparser-3.7.4-py2.py3-none-any.whl (22 kB)
Collecting json-logging-py==0.2
  

 ---> Running in 4924a806fa66
Removing intermediate container 4924a806fa66
 ---> 15382aa967c9
Step 19/21 : RUN rm -rf azureml-environment-setup
 ---> Running in 05c8e245bf27
Removing intermediate container 05c8e245bf27
 ---> 1355185a9e54
Step 20/21 : ENV AZUREML_ENVIRONMENT_IMAGE True
 ---> Running in b5e0e4391028
Removing intermediate container b5e0e4391028
 ---> efdbc096b4cd
Step 21/21 : CMD ["bash"]
 ---> Running in daa95f0da2cf
Removing intermediate container daa95f0da2cf
 ---> fb5eaccbc3d7
Successfully built fb5eaccbc3d7
Successfully tagged mtcsdevamlcr.azurecr.io/azureml/azureml_a2297523ba63ca9bf5eeed281421a16a:latest
Successfully tagged mtcsdevamlcr.azurecr.io/azureml/azureml_a2297523ba63ca9bf5eeed281421a16a:1
2022/03/01 09:46:25 Successfully executed container: acb_step_0
2022/03/01 09:46:25 Executing step ID: acb_step_1. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_network'
2022/03/01 09:46:25 Pushing image: mtcsdevamlcr.azurecr.io/azureml/azureml_a2297523b

{'runId': 'mtc-aml-lab-exp_1646127808_cca6ccee',
 'target': 'hyssh1',
 'status': 'Completed',
 'startTimeUtc': '2022-03-01T09:47:53.377109Z',
 'endTimeUtc': '2022-03-01T09:48:38.611061Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '07e79727-d135-4b40-934b-67523cae3ea3',
  'azureml.git.repository_uri': 'https://github.com/hyssh/mtc-open-workshop.git',
  'mlflow.source.git.repoURL': 'https://github.com/hyssh/mtc-open-workshop.git',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': '971b75559670c5ff8880ddd2b70372227a814603',
  'mlflow.source.git.commit': '971b75559670c5ff8880ddd2b70372227a814603',
  'azureml.git.dirty': 'True',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': [

> TIP
> [Prep your code for production](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-convert-ml-experiment-to-production)

## ScriptRunConfig

### azureml.core.script_run_config.ScriptRunConfig


In [14]:
from azureml.core.experiment import Experiment
from azureml.core import ScriptRunConfig

runconfig = ScriptRunConfig(
    source_directory="./source/train",
    script="train.py")

# Attach compute target to run config
runconfig.run_config.target = "local"

# Attach environment to run config
runconfig.run_config.environment = myenv

experiment = Experiment(workspace=ws, name=expName)

run = experiment.submit(config=script_run_config)

run.wait_for_completion(show_output=True)

RunId: mtc-aml-lab-exp_1646128159_f5b2fa8e
Web View: https://ml.azure.com/runs/mtc-aml-lab-exp_1646128159_f5b2fa8e?wsid=/subscriptions/89da9f33-fd31-4ece-861e-5fab7af4dc11/resourcegroups/mtcs-dev-aml-rg/workspaces/mtcs-dev-aml&tid=72f988bf-86f1-41af-91ab-2d7cd011db47

Streaming azureml-logs/55_azureml-execution-tvmps_c05eed74706cac4f2642307f37d208436dd1b83b3f336d5227dcbcb41e7d4673_d.txt

2022-03-01T09:49:31Z Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/mtcs-dev-aml/azureml/mtc-aml-lab-exp_1646128159_f5b2fa8e/mounts/workspaceblobstore -- stdout/stderr: 
2022-03-01T09:49:31Z The vmsize standard_ds3_v2 is not a GPU VM, skipping get GPU count by running nvidia-smi command.
2022-03-01T09:49:31Z Starting output-watcher...
2022-03-01T09:49:31Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
2022-03-01T09:49:31Z Executing 'Copy ACR Details file' on 10.1.0.5
2022-03-01T09:49:31Z Copy ACR Details file succeeded on 10.1.0.5. Output: 
>>>   


{'runId': 'mtc-aml-lab-exp_1646128159_f5b2fa8e',
 'target': 'hyssh1',
 'status': 'Completed',
 'startTimeUtc': '2022-03-01T09:49:29.771858Z',
 'endTimeUtc': '2022-03-01T09:49:52.686096Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '07e79727-d135-4b40-934b-67523cae3ea3',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json',
  'azureml.git.repository_uri': 'https://github.com/hyssh/mtc-open-workshop.git',
  'mlflow.source.git.repoURL': 'https://github.com/hyssh/mtc-open-workshop.git',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': '971b75559670c5ff8880ddd2b70372227a814603',
  'mlflow.source.git.commit': '971b75559670c5ff8880ddd2b70372227a814603',
  'azureml.git.dirty': 'True'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': [

## Pipeline, PythonScriptStep

### azureml.pipeline.core.pipeline.Pipeline
### azureml.pipeline.steps.python_script_step.PythonScriptStep


An Azure Machine Learning pipeline is an automated workflow of a complete machine learning task. Subtasks are encapsulated as a series of steps within the pipeline. An Azure Machine Learning pipeline can be as simple as one step that calls a Python script. Pipelines include functionality for:

- Data preparation including importing, validating and cleaning, munging and transformation, normalization, and staging
- Training configuration including parameterizing arguments, filepaths, and logging / reporting configurations
- Training and validating efficiently and repeatably, which might include specifying specific data subsets, different hardware compute resources, distributed processing, and progress monitoring
- Deployment, including versioning, scaling, provisioning, and access control
- Publishing a pipeline to a REST endpoint to rerun from any HTTP library

A ```PythonScriptStep``` is a basic, built-in step to run a Python Script on a compute target. It takes a script name and other optional parameters like arguments for the script, compute target, inputs and outputs. 

### Pattern for creating and using ML Pipeline

An Azure Machine Learning pipeline is associated with an Azure Machine Learning workspace and a pipeline step is associated with a compute target available within that workspace. For more information, see this article about workspaces or this explanation of compute targets.

A common pattern for pipeline steps is:

1. Specify workspace, compute, and storage
2. Configure your input and output data using
    1. Dataset which makes available an existing Azure datastore
    2. PipelineDataset which encapsulates typed tabular data
    3. PipelineData which is used for intermediate file or directory data written by one step and intended to be consumed by another
3. Define one or more pipeline steps
4. Instantiate a pipeline using your workspace and steps
5. Create an experiment to which you submit the pipeline
6. Monitor the experiment results

In [None]:
from azureml.core import Dataset
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import Pipeline, PipelineData

# get input dataset
input_ds = Dataset.get_by_name(workspace, 'weather_ds')

# register pipeline output as dataset
output_ds = PipelineData('prepared_weather_ds', datastore=datastore).as_dataset()
output_ds = output_ds.register(name='prepared_weather_ds', create_new_version=True)

# configure pipeline step to use dataset as the input and output
prep_step = PythonScriptStep(script_name="prepare.py",
                             inputs=[input_ds.as_named_input('weather_ds')],
                             outputs=[output_ds],
                             compute_target=compute_target,
                             source_directory=project_folder)


# Python Script Step
from azureml.pipeline.steps import PythonScriptStep

train_step = PythonScriptStep(
    script_name="train.py",
    arguments=["--input", blob_input_data, "--output", output_data1],
    inputs=[blob_input_data],
    outputs=[output_data1],
    compute_target=compute_target,
    source_directory=project_folder
)

In [None]:
from azureml.pipeline.core import Pipeline

pipeline = Pipeline(workspace=ws, steps=[train_step])
# pipeline_run = experiment.submit(pipeline)

In [None]:
#End of notebook