# Build Pipeline with Components from yaml

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define and load `CommandComponent` from YAML
- Create `Pipeline` using loaded component.

**Motivations** - This notebook covers the scenario that user define components using yaml then use these components to build pipeline.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [2]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import UserIdentityConfiguration

from azure.ai.ml import MLClient, Input
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component
from azure.ai.ml import command, Input, Output
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.ai.ml.entities import Data



## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [3]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

In [4]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
# Use your own cluster name here
cluster_name = "dscompute"
print(ml_client.compute.get(cluster_name))

Found the config file in: /config.json


enable_node_public_ip: true
id: /subscriptions/6025ba02-1dfd-407f-b358-88f811c7c7aa/resourceGroups/sc1-ml1/providers/Microsoft.MachineLearningServices/workspaces/sc1ml1/computes/dscompute
identity:
  principal_id: deb5e55d-5304-44b5-b972-8e1d82eb4e00
  tenant_id: 16b3c013-d300-468d-ac64-7eda0820b6d3
  type: system_assigned
idle_time_before_scale_down: 120
location: southcentralus
max_instances: 10
min_instances: 0
name: dscompute
provisioning_state: Succeeded
size: Standard_DS3_v2
ssh_public_access_enabled: false
tier: low_priority
type: amlcompute



# 2. Define and create components into workspace
## 2.1 Load components from YAML

In [5]:
parent_dir = "."
read_component = load_component(source=parent_dir + "/read_adls.yml")


## 2.2 Inspect loaded component

In [6]:
# Print the component as yaml
print(read_component)

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: train_model
version: 0.0.1
display_name: Train Model
description: A dummy training component
type: command
inputs:
  training_data:
    type: uri_folder
outputs:
  model_output:
    type: uri_folder
command: 'python read_folder.py  --training_data ${{inputs.training_data}} '
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1
code: /mnt/batch/tasks/shared/LS_root/mounts/clusters/jacwang3/code/Users/jacwang/azureml-examples/sdk/python/jobs/pipelines/1a_read_example-copy/adls_src
is_deterministic: true



In [7]:
# Inspect more information
print(type(read_component))
help(read_component._func)

<class 'azure.ai.ml.entities._component.command_component.CommandComponent'>
Help on function [component] Train Model:

[component] Train Model(*, training_data: 'uri_folder' = None)
    A dummy training component
    
    Component yaml:
    ```yaml
    $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
    name: train_model
    version: 0.0.1
    display_name: Train Model
    description: A dummy training component
    type: command
    inputs:
      training_data:
        type: uri_folder
    outputs:
      model_output:
        type: uri_folder
    command: 'python read_folder.py  --training_data ${{inputs.training_data}} '
    environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1
    code: /mnt/batch/tasks/shared/LS_root/mounts/clusters/jacwang3/code/Users/jacwang/azureml-examples/sdk/python/jobs/pipelines/1a_read_example-copy/adls_src
    is_deterministic: true
    
    ```



# 3. Sample pipeline job
## 3.1 Build pipeline

In [8]:
# Construct pipeline
@pipeline()
def pipeline_with_components_from_yaml(
    training_input,
):
    """E2E dummy train-score-eval pipeline with components defined via yaml."""
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    read_with_sample_data = read_component(
        training_data=training_input,
    )

    # Return: pipeline outputs
    return {
        "trained_model": read_with_sample_data.outputs.model_output,
    }

In [10]:
#import data asset
data_asset = ml_client.data.get("root_folder", version="1")

pipeline_job = pipeline_with_components_from_yaml(
    training_input=Input(path=data_asset.id, type=AssetTypes.URI_FOLDER, mode=InputOutputModes.RO_MOUNT)
)

# set pipeline level compute
pipeline_job.settings.default_compute = cluster_name
pipeline_job.identity= UserIdentityConfiguration()

In [11]:
# Inspect built pipeline
print(pipeline_job)

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


display_name: pipeline_with_components_from_yaml
description: E2E dummy train-score-eval pipeline with components defined via yaml.
type: pipeline
inputs:
  training_input:
    mode: ro_mount
    type: uri_folder
    path: azureml:/subscriptions/6025ba02-1dfd-407f-b358-88f811c7c7aa/resourceGroups/sc1-ml1/providers/Microsoft.MachineLearningServices/workspaces/sc1ml1/data/root_folder/versions/1
outputs:
  trained_model:
    type: uri_folder
jobs:
  read_with_sample_data:
    type: command
    inputs:
      training_data:
        path: ${{parent.inputs.training_input}}
    outputs:
      model_output: ${{parent.outputs.trained_model}}
    component:
      $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
      name: train_model
      version: 0.0.1
      display_name: Train Model
      description: A dummy training component
      type: command
      inputs:
        training_data:
          type: uri_folder
      outputs:
        model_output:
          typ

## 3.2 Submit pipeline job

In [12]:
# Submit pipeline job to workspace
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="read_folder_pipeline_samples"
)
pipeline_job

[32mUploading adls_src (0.0 MBs):   0%|          | 0/1044 [00:00<?, ?it/s][32mUploading adls_src (0.0 MBs): 100%|██████████| 1044/1044 [00:00<00:00, 16416.37it/s]
[39m



Experiment,Name,Type,Status,Details Page
read_folder_pipeline_samples,gray_parang_tw6z1dj9ch,pipeline,Preparing,Link to Azure Machine Learning studio


In [13]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

RunId: gray_parang_tw6z1dj9ch
Web View: https://ml.azure.com/runs/gray_parang_tw6z1dj9ch?wsid=/subscriptions/6025ba02-1dfd-407f-b358-88f811c7c7aa/resourcegroups/sc1-ml1/workspaces/sc1ml1

Streaming logs/azureml/executionlogs.txt

[2024-03-06 22:44:59Z] Submitting 1 runs, first five are: 264eaf95:01da1ad6-c8a1-431a-9956-772c5caa2bb9
[2024-03-06 22:49:48Z] Execution of experiment failed, update experiment status and cancel running nodes.

Execution Summary
RunId: gray_parang_tw6z1dj9ch
Web View: https://ml.azure.com/runs/gray_parang_tw6z1dj9ch?wsid=/subscriptions/6025ba02-1dfd-407f-b358-88f811c7c7aa/resourcegroups/sc1-ml1/workspaces/sc1ml1


JobException: Exception : 
 {
    "error": {
        "code": "UserError",
        "message": "Pipeline has failed child jobs. Failed nodes: /read_with_sample_data. For more details and logs, please go to the job detail page and check the child jobs.",
        "message_format": "Pipeline has failed child jobs. {0}",
        "message_parameters": {},
        "reference_code": "PipelineHasStepJobFailed",
        "details": []
    },
    "environment": "southcentralus",
    "location": "southcentralus",
    "time": "2024-03-06T22:49:48.317053Z",
    "component_name": ""
} 

# Next Steps
You can see further examples of running a pipeline job [here](../README.md)