# Introduction 

In this notebook we investigate `Pipeline` object of `azureml-sdk`. One of the great thing about pipeline is that we can use it to scheule periodic run of the model for the batch inference. 

In [2]:
from azureml.core import Workspace, Environment, Experiment
from azureml.core.environment import CondaDependencies
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core.runconfig import RunConfiguration
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineData, Pipeline

In [3]:
ws = Workspace.from_config()

# Create an Environment 

In [4]:
myenv = Environment(name='env_azure_pipeline')
myenv_dep = CondaDependencies.create(conda_packages=['pandas', 'scikit-learn', 'pip'], 
                                    pip_packages=['azureml-defaults'])
myenv.python.conda_dependencies = myenv_dep
myenv.register(ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210806.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "env_azure_pipeline",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-fo

# Compute target in the cloud 

In [5]:
compute_name = 'rk-test-compute'

compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', 
                                                       max_nodes=2)

rk_cluster = ComputeTarget.create(ws, compute_name, compute_config)
rk_cluster.wait_for_completion(show_output=True)

InProgress...
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [7]:
run_config = RunConfiguration()
run_config.target = rk_cluster
run_config.environment = myenv

# Pipeline and steps 

This is the new step here. This step is similar to the previous notebook where we create `Experiment` object and submit the model via python script. But only this time we submit the pipeline object which may consists of one or more than one steps. 

##### Check for `OutputFileDatasetConfig` option!

In [8]:
input_ds = ws.datasets.get('Irish Data')
datafolder = PipelineData('datafolder', datastore=ws.get_default_datastore())

# Step 01 - Data Preprocessing 
preprocess_step = PythonScriptStep(name='01 data_preprocessing', 
                                   source_directory='.', 
                                   script_name='pipeline_data_prepare.py', 
                                   arguments=['--datafolder', datafolder],
                                   inputs=[input_ds.as_named_input('raw_data')], 
                                   outputs=[datafolder],
                                   runconfig=run_config)

# Step 02 - Train Model 
model_train_step = PythonScriptStep(name='02 model_training', 
                                   source_directory='.', 
                                   script_name='pipeline_model_training.py', 
                                   arguments=['--datafolder', datafolder],
                                   inputs=[datafolder],
                                   runconfig=run_config)

In [9]:
# Construct the pipeline 
train_pipeline = Pipeline(workspace=ws,
                          steps=[preprocess_step, model_train_step])

# Create the experiment 
experiment = Experiment(workspace=ws, name='exp-008')

# Run the experiment with pipeline
pipeline_run = experiment.submit(train_pipeline, 
                                 regenerate_outputs=True)
pipeline_run.wait_for_completion(show_output=True)

Created step 01 data_preprocessing [d1ec2c59][32b7e507-26a9-4df0-bee2-53a7cafba485], (This step will run and generate new outputs)
Created step 02 model_training [1b7eaa45][3b8d6047-7c93-46c7-906d-42bd24a6552a], (This step will run and generate new outputs)
Submitted PipelineRun 6a399762-9681-4083-8554-a236775bf31f
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/6a399762-9681-4083-8554-a236775bf31f?wsid=/subscriptions/038a8790-7ab1-483b-abba-30f101e8dcce/resourcegroups/aml-resources-mstutorial/workspaces/aml-mstutorial&tid=68fda48c-5b34-479d-91f9-034da6f0efe3
PipelineRunId: 6a399762-9681-4083-8554-a236775bf31f
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/6a399762-9681-4083-8554-a236775bf31f?wsid=/subscriptions/038a8790-7ab1-483b-abba-30f101e8dcce/resourcegroups/aml-resources-mstutorial/workspaces/aml-mstutorial&tid=68fda48c-5b34-479d-91f9-034da6f0efe3
PipelineRun Status: NotStarted
PipelineRun Status: Running


Expected a StepRun object but received <class 'azureml.core.run.Run'> instead.
This usually indicates a package conflict with one of the dependencies of azureml-core or azureml-pipeline-core.
Please check for package conflicts in your python environment







Expected a StepRun object but received <class 'azureml.core.run.Run'> instead.
This usually indicates a package conflict with one of the dependencies of azureml-core or azureml-pipeline-core.
Please check for package conflicts in your python environment






PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '6a399762-9681-4083-8554-a236775bf31f', 'status': 'Completed', 'startTimeUtc': '2021-10-05T18:21:03.63169Z', 'endTimeUtc': '2021-10-05T18:39:22.716214Z', 'services': {}, 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}', 'azureml.pipelineComponent': 'pipelinerun'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://amlmstutstorageaac522ec0.blob.core.windows.net/azureml/ExperimentRun/dcid.6a399762-9681-4083-8554-a236775bf31f/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=DnflbnItcu87o3wOUJ2crnjaWka1rfsI8bL9uPeDcAw%3D&skoid=e53d356d-d897-4c01-860f-b0c035994e1c&sktid=68fda48c-5b34-479d-91f9-034da6f0efe3&skt=2021-10-05T16%3A56%3A26Z&ske=2021-10-07T01%3A06%3A26Z&sks=b&skv=2019-07-07&st=2021-10-05T18%3A11%3A16Z&se=2021-10-06T02%3A21%3A16Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://amlmstut

'Finished'

# Publish pipeline 

We can access the rest endpoint by publishing the pipeline. 

In [31]:
# The pipeline_run here looks same as the running notebook pipeline_run. 
# But this give an error proceeding forward. 
pipeline_exp = ws.experiments.get('exp-008')
list(pipeline_exp.get_runs())[0]

Experiment,Id,Type,Status,Details Page,Docs Page
exp-008,6300c1b3-af44-4977-a5c6-3f95ea032b03,azureml.PipelineRun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [21]:
pipeline_run

Experiment,Id,Type,Status,Details Page,Docs Page
exp-008,6a399762-9681-4083-8554-a236775bf31f,azureml.PipelineRun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [19]:
published_pipeline = pipeline_run.publish_pipeline(name='training_pipeline', 
                                                   description='Pipeline demo', 
                                                   version='1.0')
published_pipeline

Name,Id,Status,Endpoint
training_pipeline,f6aea66a-750e-458f-a078-df8d2addefdc,Active,REST Endpoint


In [20]:
rest_endpoint = published_pipeline.endpoint
rest_endpoint

'https://eastus.api.azureml.ms/pipelines/v1.0/subscriptions/038a8790-7ab1-483b-abba-30f101e8dcce/resourceGroups/aml-resources-mstutorial/providers/Microsoft.MachineLearningServices/workspaces/aml-mstutorial/PipelineRuns/PipelineSubmit/f6aea66a-750e-458f-a078-df8d2addefdc'

In [24]:
published_pipeline.id

'f6aea66a-750e-458f-a078-df8d2addefdc'

# Using published pipeline 

In [23]:
import requests

response = requests.post(rest_endpoint, 
                        header=auth_header,
                        json={'ExperimentName':'exp-008'})
response

NameError: name 'auth_header' is not defined

# Schedule a pipeline 

We can scheule to run a pipeline in a regular interval. This is one of the key step in the deployment through batch inference.

In [10]:
from azureml.pipeline.core import ScheduleRecurrence, Schedule

In [25]:
hourly = ScheduleRecurrence(frequency='Hour', interval=1)
pipeline_schedule = Schedule.create(workspace=ws, 
                                   name='trains model hourly', 
                                   pipeline_id=published_pipeline.id, 
                                   experiment_name='exp-008', 
                                   recurrence=hourly)

In [26]:
pipeline_schedule

Name,Id,Status,Pipeline Id,Pipeline Endpoint Id,Recurrence Details
trains model hourly,08da9f2d-8811-42b8-8180-4c77c14feca3,Active,f6aea66a-750e-458f-a078-df8d2addefdc,,Runs every Hour


# Disable/Enable schedule 

In [34]:
pipeline_schedule.disable()

```pyhton 
pipeline_schedule.disable()
pipeline_schedule.enable()
```

# Delete compute cluster 

How to delete the cluster between the scheduled run?

```python 
cluster.delete()
```

In [33]:
rk_cluster.delete()