# Single-step pipeline examples

In this example, we'll build a very simple pipeline that just contains a single train step. The dataset and compute cluster created in this tutorial will be re-used in the subsequent examples in this module.

In [None]:
!pip install azureml-sdk --upgrade

In [8]:
import os
import azureml.core
from azureml.core import Workspace, Experiment, Dataset, RunConfiguration
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.pipeline.steps import PythonScriptStep
from azureml.data.dataset_consumption_config import DatasetConsumptionConfig

print("Azure ML SDK version:", azureml.core.VERSION)

Azure ML SDK version: 1.20.0


First, we will connect to the workspace. The command `Workspace.from_config()` will either:
* Read the local `config.json` with the workspace reference (given it is there) or
* Use the `az` CLI to connect to the workspace and use the workspace attached to via `az ml folder attach -g <resource group> -w <workspace name>`

In [9]:
ws = Workspace.from_config()
print(f'WS name: {ws.name}\nRegion: {ws.location}\nSubscription id: {ws.subscription_id}\nResource group: {ws.resource_group}')

WS name: demo-ent-ws
Region: westeurope
Subscription id: bcbf34a7-1936-4783-8840-8f324c37f354
Resource group: demo


# Preparation

Let's quickly a create a compute cluster named `cluster`, in case it does not exist.

In [10]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

aml_compute_target = "cluster"
try:
    aml_compute = AmlCompute(ws, aml_compute_target)
except ComputeTargetException:
    config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2", min_nodes = 0, max_nodes = 1,
                                                   idle_seconds_before_scaledown=3600)
    aml_compute = ComputeTarget.create(ws, aml_compute_target, config)
    aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

Furthermore, we'll create a new dataset and register it to the workspace. We'll be using this dataset also in the subsequent pipelines. If you already created this dataset, jump to the next cell.

In [None]:
from azureml.core import Dataset

datastore = ws.get_default_datastore()
datastore.upload(src_dir='../data-training', target_path='german-credit-train-tutorial', overwrite=True)
ds = Dataset.File.from_files(path=[(datastore, 'german-credit-train-tutorial')])
ds.register(ws, name='german-credit-train-tutorial', description='Dataset for workshop tutorials', create_new_version=True)

Next, let's reference our newly created training dataset, so that we can use it as the pipeline input:

In [11]:
training_dataset = Dataset.get_by_name(ws, "german-credit-train-tutorial")
# Download dataset to compute node - we can also use .as_mount() if the dataset does not fit the machine
training_dataset_consumption = DatasetConsumptionConfig("training_dataset", training_dataset).as_download()

Next, we can create a `PythonScriptStep` that runs our training code. In this case, we use a `runconfig` from a YAML file ([`runconfig.yml`](runconfig.yml)), that defines our training job (target compute cluster, conda environement, etc.) - have a look at it.

In [12]:
runconfig = RunConfiguration.load("runconfig.yml")

train_step = PythonScriptStep(name="train-step",
                        source_directory="./",
                        script_name="train.py",
                        arguments=['--data-path', training_dataset_consumption],
                        inputs=[training_dataset_consumption],
                        runconfig=runconfig,
                        allow_reuse=False)

steps = [train_step]

Finally, we can create our pipeline object and validate it. This will check the input and outputs are properly linked and that the pipeline graph is a non-cyclic graph:

In [13]:
pipeline = Pipeline(workspace=ws, steps=steps)
pipeline.validate()

Step train-step is ready to be created [e4b00fa2]


[]

Lastly, we can submit the pipeline against an experiment:

In [14]:
pipeline_run = Experiment(ws, 'mlops-workshop-pipelines').submit(pipeline)
pipeline_run.wait_for_completion()

Created step train-step [e4b00fa2][585717ca-65f0-4175-97d5-cfdb742ad49e], (This step will run and generate new outputs)
Submitted PipelineRun 8e15e619-e240-4c45-8c20-614c55517286
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/mlops-workshop-pipelines/runs/8e15e619-e240-4c45-8c20-614c55517286?wsid=/subscriptions/bcbf34a7-1936-4783-8840-8f324c37f354/resourcegroups/demo/workspaces/demo-ent-ws
PipelineRunId: 8e15e619-e240-4c45-8c20-614c55517286
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/mlops-workshop-pipelines/runs/8e15e619-e240-4c45-8c20-614c55517286?wsid=/subscriptions/bcbf34a7-1936-4783-8840-8f324c37f354/resourcegroups/demo/workspaces/demo-ent-ws
PipelineRun Status: NotStarted
PipelineRun Status: Running


StepRunId: ebbc9dcb-3976-4d3f-ab08-c9a32ad1104f
Link to Azure Machine Learning Portal: https://ml.azure.com/experiments/mlops-workshop-pipelines/runs/ebbc9dcb-3976-4d3f-ab08-c9a32ad1104f?wsid=/subscriptions/bcbf34a7-1936-4783-8840

Verifying transaction: ...working... done
Executing transaction: ...working... 
done
Collecting azureml-defaults
  Downloading azureml_defaults-1.20.0-py3-none-any.whl (3.1 kB)
Collecting azureml-dataprep[fuse,pandas]
  Downloading azureml_dataprep-2.8.2-py3-none-any.whl (39.4 MB)
Collecting scikit-learn==0.20.3
  Downloading scikit_learn-0.20.3-cp36-cp36m-manylinux1_x86_64.whl (5.4 MB)
Collecting pandas==0.25.3
  Downloading pandas-0.25.3-cp36-cp36m-manylinux1_x86_64.whl (10.4 MB)
Collecting joblib==0.13.2
  Downloading joblib-0.13.2-py2.py3-none-any.whl (278 kB)
Collecting configparser==3.7.4
  Downloading configparser-3.7.4-py2.py3-none-any.whl (22 kB)
Collecting flask==1.0.3
  Downloading Flask-1.0.3-py2.py3-none-any.whl (92 kB)
Collecting json-logging-py==0.2
  Downloading json-logging-py-0.2.tar.gz (3.6 kB)
Collecting gunicorn==19.9.0
  Downloading gunicorn-19.9.0-py2.py3-none-any.whl (112 kB)
Collecting azureml-dataset-runtime[fuse]~=1.20.0
  Downloading azureml_dataset_runtime-

[91mERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

azureml-dataset-runtime 1.20.0 requires azureml-dataprep<2.8.0a,>=2.7.0a, but you'll have azureml-dataprep 2.8.2 which is incompatible.
[0mSuccessfully installed Jinja2-2.11.2 MarkupSafe-1.1.1 PyJWT-1.7.1 SecretStorage-3.3.0 adal-1.2.5 applicationinsights-0.11.9 azure-common-1.1.26 azure-core-1.10.0 azure-graphrbac-0.61.1 azure-identity-1.4.1 azure-mgmt-authorization-0.61.0 azure-mgmt-containerregistry-2.8.0 azure-mgmt-keyvault-2.2.0 azure-mgmt-resource-12.0.0 azure-mgmt-storage-11.2.0 azureml-core-1.20.0 azureml-dataprep-2.8.2 azureml-dataprep-native-28.0.0 azureml-dataprep-rslex-1.6.0 azureml-dataset-runtime-1.20.0 azureml-defaults-1.20.0 azureml-model-management-sdk-1.0.1b6.post1 backpor

699b75ff4717: Pull complete
b177109c9d16: Pull complete
59cea07bb66c: Pull complete
d54d011de0e3: Pull complete
ec2c061b6e79: Pull complete
45be97372f16: Pull complete
741ed879c2f2: Pull complete
dcb42b399f96: Pull complete
c5158f856775: Pull complete
Digest: sha256:96a223a2d683aab4b4f91719ba3f705a79883c430ca39e73845fd2ba36704f14
Status: Downloaded newer image for viennaglobal.azurecr.io/azureml/azureml_9d4fa30783fc98f2c7c7f19c6a312f30:latest
viennaglobal.azurecr.io/azureml/azureml_9d4fa30783fc98f2c7c7f19c6a312f30:latest
2021-01-19T09:18:23Z Check if container ebbc9dcb-3976-4d3f-ab08-c9a32ad1104f already exist exited with 0, 


Streaming azureml-logs/65_job_prep-tvmps_3b58b23ebb37469ac1f07c2ab41fbc7f43557fb2270f063578cd53530d107746_d.txt
[2021-01-19T09:18:35.748220] Entering job preparation.
[2021-01-19T09:18:36.588790] Starting job preparation.
[2021-01-19T09:18:36.588827] Extracting the control code.
[2021-01-19T09:18:36.616573] fetching and extracting the control code on master node


StepRun(train-step) Execution Summary
StepRun( train-step ) Status: Finished
{'runId': 'ebbc9dcb-3976-4d3f-ab08-c9a32ad1104f', 'target': 'cluster', 'status': 'Completed', 'startTimeUtc': '2021-01-19T09:17:43.049641Z', 'endTimeUtc': '2021-01-19T09:20:13.784114Z', 'properties': {'azureml.runsource': 'azureml.StepRun', 'ContentSnapshotId': '2194c1ae-958c-4883-bdf7-244de1662fd3', 'StepType': 'PythonScriptStep', 'ComputeTargetType': 'AmlCompute', 'azureml.moduleid': '585717ca-65f0-4175-97d5-cfdb742ad49e', 'azureml.nodeid': 'e4b00fa2', 'azureml.pipelinerunid': '8e15e619-e240-4c45-8c20-614c55517286', '_azureml.ComputeTargetType': 'amlcompute', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}, 'inputDatasets': [{'dataset': {'id': '73b4c537-e008-4d3c-8770-055011622520'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'training_dataset', 'mechanism': 'Download'}}], 'outputDatasets': [], 'runDefinition': {'script': 'train.py',



PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '8e15e619-e240-4c45-8c20-614c55517286', 'status': 'Completed', 'startTimeUtc': '2021-01-19T09:07:54.522925Z', 'endTimeUtc': '2021-01-19T09:20:24.393796Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://demoentws5367325393.blob.core.windows.net/azureml/ExperimentRun/dcid.8e15e619-e240-4c45-8c20-614c55517286/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=KRc31AiqTFxT6jXINcTmmUxESMwXeU3OtDjLTiqz9t8%3D&st=2021-01-19T09%3A10%3A26Z&se=2021-01-19T17%3A20%3A26Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://demoentws5367325393.blob.core.windows.net/azureml/ExperimentRun/dcid.8e15e619-e240-4c45-8c20-614c55517286/logs/azureml/stderrlogs.txt?sv=2019-02-02&sr=b&sig=EmZ4byUrxlGHuPWXZmHjNY6clvxw0Ww8ohJkiW%2FFzwA%3D&st=2021-01-19T09%3A10%3A26Z&se=20

'Finished'

Alternatively, we can also publish the pipeline as a RESTful API Endpoint:

In [16]:
published_pipeline = pipeline.publish('mlops-training-pipeline')
published_pipeline

Name,Id,Status,Endpoint
mlops-training-pipeline,0d97563c-77e9-46a3-bf6d-47ca2d574946,Active,REST Endpoint


What if we want to continously publish a new pipelines, but have it published as the same URL as the version prior? For this, we can use [`PipelineEndpoint`](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelineendpoint?view=azure-ml-py), which keeps multiple `PublishedPipeline`s behind a single endpoint URL. It allows to set `default_version`, which determines to which `PublishedPipeline` it should route the request.

In [17]:
from azureml.pipeline.core import PipelineEndpoint

endpoint_name = "mlops-training-pipeline-new"

# Try to find the upon defined endpoint name.
# If not exists, create a new endpoint with that name as deafult endpoint
try:
   pipeline_endpoint = PipelineEndpoint.get(workspace=ws, name=endpoint_name)
   # Add new default endpoint - only works from PublishedPipeline
   pipeline_endpoint.add_default(published_pipeline)
except Exception:
    pipeline_endpoint = PipelineEndpoint.publish(workspace=ws,
                                            name=endpoint_name,
                                            pipeline=pipeline,
                                            description="New Training Pipeline Endpoint")
