Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

# How to Publish a Pipeline and Invoke the REST endpoint
In this notebook, we will see how we can publish a pipeline and then invoke the REST endpoint.

## Prerequisites and AML Basics
Make sure you go through the [00.configuration](../01.tutorials/00.configuration.ipynb) Notebook first if you haven't.

### Initialization Steps

In [None]:
import azureml.core
from azureml.core import Workspace, Run, Experiment, Datastore
from azureml.core.compute import BatchAiCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute import DataFactoryCompute
from azureml.train.widgets import RunDetails

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

from azureml.data.data_reference import DataReference
from azureml.pipeline.core import Pipeline, PipelineData, StepSequence
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.steps import DataTransferStep
from azureml.pipeline.core import PublishedPipeline
from azureml.pipeline.core.graph import PipelineParameter

print("Pipeline SDK-specific imports completed")

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

# Default datastore (Azure file storage)
def_data_store = ws.get_default_datastore() 
print("Default datastore's name: {}".format(def_data_store.name))

def_blob_store = Datastore(ws, "workspaceblobstore")
print("Blobstore's name: {}".format(def_blob_store.name))

# project folder
project_folder = './scripts'

### Compute Targets
#### Retrieve an already attached BatchAI cluster

In [None]:
# choose a name for your cluster
batchai_cluster_name = "batchai"

#Note that batch_ai is the name of the target object, not the compute target object
batch_ai = None
# see if this compute target already exists in the workspace
cts = ws.compute_targets
if not cts == None:
    for ct in cts:
        if (ct=="batchai"):
            batch_ai = ct
            break
            
if batch_ai == None:
    print('No compute target \'{}\' found. No worries, we will create one in the next call.'.format(batchai_cluster_name))
else:
    print(batch_ai)

#### Create and attach BatchAI cluster if one already doesn't exist
**See the previous notebook for details.**

#### Attach a DSVM cluster as a compute target

In [None]:
# YOU MAY SKIP THIS BLOCK
from azureml.core.compute import DsvmCompute
# cpu dsvm
dsvm_name = "cpudsvm"
try:
    dsvm = DsvmCompute(ws, dsvm_name)
    print("found existing dsvm.")
except:
    print("See the previous notebook to understand how to create a DSVM target")
print(dsvm)

## Building Pipeline Steps with Inputs and Outputs
As mentioned earlier, a step in the pipeline can take data as input. This data can be a data source that lives in one of the accessible data locations, or intermediate data produced by a previous step in the pipeline.

In [None]:
# Reference the data uploaded to blob storage using DataReference
# Assign the datasource to blob_input_data variable
blob_input_data = DataReference(
    datastore=def_blob_store,
    data_reference_name="test_data",
    path_on_datastore="20newsgroups/20news.pkl")
print("DataReference object created")

In [None]:
# Define intermediate data using PipelineData
processed_data1 = PipelineData("processed_data1",datastore=def_blob_store)
print("PipelineData object created")

#### Define a Step that consumes a datasource and produces intermediate data.
In this step, we define a step that consumes a datasource and produces intermediate data.

**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** 

In [None]:
# trainStep consumes the datasource (Datareference) in the previous step
# and produces processed_data1
trainStep = PythonScriptStep(
    script_name="train.py", 
        arguments=["--input_data", blob_input_data, "--output_train", processed_data1],
    inputs=[blob_input_data],
    outputs=[processed_data1],
    target=batch_ai, 
    source_directory=project_folder
)
print("trainStep created")

#### Define a Step that consumes intermediate data and produces intermediate data
In this step, we define a step that consumes an intermediate data and produces intermediate data.

**Open `extract.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** 

In [None]:
# extractStep to use the intermediate data produced by step4
# This step also produces an output processed_data2
processed_data2 = PipelineData("processed_data2", datastore=def_blob_store)

extractStep = PythonScriptStep(
    script_name="extract.py",
    arguments=["--input_extract", processed_data1, "--output_extract", processed_data2],
    inputs=[processed_data1],
    outputs=[processed_data2],
    target=batch_ai, 
    source_directory=project_folder)
print("extractStep created")

#### Define a Step that consumes multiple intermediate data and produces intermediate data
In this step, we define a step that consumes multiple intermediate data and produces intermediate data.

### PipelineParameter

This step also has a [PipelineParameter](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.graph.pipelineparameter?view=azure-ml-py) argument that help with calling the REST endpoint of the published pipeline.

In [None]:
# We will use this later in publishing pipeline
pipeline_param = PipelineParameter(name="pipeline_arg", default_value=10)

**Open `compare.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.**

In [None]:
# Now define step6 that takes two inputs (both intermediate data), and produce an output
processed_data3 = PipelineData("processed_data3", datastore=def_blob_store)



compareStep = PythonScriptStep(
    script_name="compare.py",
    arguments=["--compare_data1", processed_data1, "--compare_data2", processed_data2, "--output_compare", processed_data3, "--pipeline_param", pipeline_param],
    inputs=[processed_data1, processed_data2],
    outputs=[processed_data3],    
    target=batch_ai, 
    source_directory=project_folder)
print("compareStep created")

#### Build the pipeline

In [None]:
pipeline1 = Pipeline(workspace=ws, steps=[compareStep])
print ("Pipeline is built")

pipeline1.validate()
print("Simple validation complete") 

## Publish the pipeline

In [None]:
published_pipeline1 = pipeline1.publish(name="My New Pipeline", description="My Published Pipeline Description")
print(published_pipeline1.id)

### Run published pipeline using its REST endpoint

In [None]:
from azureml.core.authentication import AzureCliAuthentication
import requests

cli_auth = AzureCliAuthentication()
aad_token = cli_auth.get_authentication_header()

rest_endpoint1 = published_pipeline1.endpoint

print(rest_endpoint1)

# specify the param when running the pipeline
response = requests.post(rest_endpoint1, 
                         headers=aad_token, 
                         json={"ExperimentName": "My_Pipeline1",
                               "RunSource": "SDK",
                               "ParameterAssignments": {"pipeline_arg": 45}})
run_id = response.json()["Id"]

print(run_id)

# Next: Data Transfer
The next [notebook](./04.datatransfer-between-adls-and-blob.ipynb) will showcase data transfer steps between ADLS and Blob storage.