Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

# AML Pipeline with ScopeStep
This notebook is used to demonstrate the use of ScopeStep in AML Pipeline.

## Initialize Workspace

Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

In [None]:
import azureml.core
from azureml.core import Workspace, Run, Experiment
from azureml.core.compute import ComputeTarget, DataFactoryCompute
from azureml.core.datastore import Datastore
from azureml.data.data_reference import DataReference
from azureml.exceptions import ComputeTargetException
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.pipeline.steps import AdlaStep, AzureBatchStep, DataTransferStep
from azureml.pipeline.steps_internal import ScopeStep

print("SDK version:", azureml.core.VERSION)

In [None]:
from azureml.core import Workspace

# use eastus2euap new ws
# change the json setting back master

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

## Create AML experiment

In [None]:
from azureml.core import Experiment

exp = Experiment(ws, 'sample_experiment')

## Copy scope script to script folder

In [None]:
import os
import shutil

script_folder = './scripts'
os.makedirs(script_folder, exist_ok=True)

#shutil.copy('./failed/script.script', script_folder)
#shutil.copy('./working/script.script', script_folder)

## Register the migrated ADLS Datastore
For this, you will first need to assign the Azure AD application to the Azure Data Lake Storage Gen1 account file or folder. This is detailed in [this article](https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory).

In [None]:
adl_datastore_name='MigratedADLS2'

adls_datastore = Datastore.get(ws, adl_datastore_name)
print("found datastore with name: %s" % adl_datastore_name)


# try:
#     adls_datastore = Datastore.get(ws, adl_datastore_name)
#     print("found datastore with name: %s" % adl_datastore_name)
# except:
#     adls_datastore = Datastore.register_azure_data_lake(
#         workspace=ws,
#         datastore_name=adl_datastore_name,
#         subscription_id=subscription_id, # subscription id of ADLS account
#         resource_group=resource_group, # resource group of ADLS account
#         store_name=store_name, # ADLS account name
#         tenant_id=tenant_id, # tenant id of service principal
#         client_id=client_id, # client id of service principal
#         client_secret=client_secret) # the secret of service principal
#     print("registered datastore with name: %s" % adl_datastore_name)

# Create data references

In [None]:
input_data = DataReference(
    datastore=adls_datastore,
    data_reference_name="InputData",
    #path_on_datastore="local/temp/juwang/input.tsv")
    path_on_datastore="local/AMLTest/input3.tsv")

output_ref = PipelineData("Destination", datastore=adls_datastore)

# Create Scope step

**ScopeStep** is used to run a scope script using cosmos-migrated Azure Data Lake Analytics account.

- **name:** Name of module
- **script_name:** Name of scope script
- **scope_param:** Parameters to pass to scope job
- **params:** Dictionary of name-value pairs to replace in script *(optional)*
- **custom_job_name_suffix:** Optional string to append to scope job name
- **inputs:** List of input port bindings
- **outputs:** List of output port bindings
- **resources:** List of input port bindings to download resource files and substitute their local path in script
- **adla_account_name:** the ADLA account name to use for this job
- **source_directory:** folder that contains the script, assemblies etc. *(optional)*
- **hash_paths:** list of paths to hash to detect a change (script file is always hashed) *(optional)*

In [None]:
script_step = ScopeStep(
    name='Another_Script_4',
    script_name='script.script',
    inputs=[input_data],
    outputs=[output_ref],
    adla_account_name='searchrelevance-aether-test-c09', #ADLA Name, could be any ADLA name
    allow_reuse=False,
    source_directory=script_folder)

# Run pipeline

In [None]:
pipeline = Pipeline(
    description="Scope Script Alone 3",
    workspace=ws, 
    steps=[script_step])

In [None]:
pipeline_run = exp.submit(pipeline)
#pipeline_run.wait_for_completion()

In [None]:
from azureml.widgets import RunDetails
RunDetails(pipeline_run).show()