# Build pipeline with registered components

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `CommandComponent` using YAML, `command_component` decorator
- Create components into workspace
- Create `Pipeline` using registered components.

**Motivations** - This notebook explains different method to create components via SDK then use these components to build pipeline.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [45]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component

## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [46]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

In [47]:
# Get a handle to workspace
#ml_client = MLClient.from_config(credential=credential)

subscription_id = ""
resource_group = ""
workspace = ""

ml_client = MLClient(credential, subscription_id, resource_group, workspace)
# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
#print(ml_client.compute.get(cluster_name))

# 2. Define and create components into workspace
## 2.1 Load components definition from YAML and Register components into workspace

#### 

In [48]:
parent_dir = "."

In [49]:
data_extraction = load_component(source=parent_dir + "/data_extraction.yml")

try:
    data_extraction = ml_client.components.get(name="data_extraction", version="1.0.1")
except:
    data_extraction = ml_client.components.create_or_update(data_extraction)


print(data_extraction)

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: data_extraction
version: 1.0.1
display_name: data_extraction
type: command
inputs:
  blobname:
    type: string
outputs:
  output:
    type: uri_folder
command: python form-recognizer.py  --blobname ${{inputs.blobname}} --output ${{outputs.output}}
environment: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/environments/docker-image-for-pair-matching/versions/6
code: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/codes/af84d5e7-8a70-4422-acf6-f45fa90dfeec/versions/1
resources:
  instance_count: 1
tags: {}
is_deterministic: true
id: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/components/data_extraction/versions/1.0

#### data enrichment

In [50]:
data_enrichment = load_component(source=parent_dir + "/data_enrichment.yml")

try:
    data_enrichment = ml_client.components.get(name="data_enrichment", version="1.7")
except:
    data_enrichment = ml_client.components.create_or_update(data_enrichment)


print(data_enrichment)

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: data_enrichment
version: '1.7'
display_name: data_enrichment
description: Enrich data based on Bing API
type: command
inputs:
  file_path_in:
    type: uri_folder
outputs:
  file_path_out:
    type: uri_folder
command: python bing-enrichment.py  --file_path_in ${{inputs.file_path_in}} --file_path_out
  ${{outputs.file_path_out}}
environment: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/environments/docker-image-for-pair-matching/versions/6
code: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/codes/c7e7c783-4537-42a2-94dd-d94a77ad68c1/versions/1
resources:
  instance_count: 1
tags: {}
is_deterministic: true
id: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.M

In [51]:
openai_enrichment = load_component(source=parent_dir + "/data_enrichment_openAI.yml")

try:
    openai_enrichment = ml_client.components.get(name="openai_enrichment", version="1.1.5")
except:
    openai_enrichment = ml_client.components.create_or_update(openai_enrichment)


print(openai_enrichment)

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: openai_enrichment
version: 1.1.5
display_name: openai_enrichment
description: Enrich and categorize data based on OpenAI API
type: command
inputs:
  file_path_in:
    type: uri_folder
  gpt_engine:
    type: string
  temperature_clasify:
    type: number
    default: '0.9'
  max_tokens_clasify:
    type: integer
    default: '100'
  temperature_summarize:
    type: number
    default: '0.9'
  max_token_summarize:
    type: integer
    default: '500'
  top_p:
    type: number
    default: '1.0'
  frequency_penalty:
    type: number
    default: '0.5'
  presence_penalty:
    type: number
    default: '0.5'
  best_of:
    type: number
    default: '1.0'
outputs:
  file_path_out:
    type: uri_folder
command: python data_enrichment_openAI.py  --file_path_in ${{inputs.file_path_in}}
  --gpt_engine ${{inputs.gpt_engine}} --file_path_out ${{outputs.file_path_out}}
environment: azureml:/subscriptions/aa18b0

In [52]:
data_search = load_component(source=parent_dir + "/data_search.yml")

try:
    data_search = ml_client.components.get(name="cog_search", version="1.0.2")
except:
    data_search = ml_client.components.create_or_update(data_search)


print(data_search)


$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: cog_search
version: 1.0.2
display_name: cog_search
description: Search for similar products from Cognitive Search
type: command
inputs:
  file_path_in:
    type: uri_folder
outputs:
  file_path_out:
    type: uri_folder
command: python data_searhing.py --file_path_in ${{inputs.file_path_in}} --file_path_out
  ${{outputs.file_path_out}}
environment: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/environments/docker-image-for-pair-matching/versions/15
code: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/codes/6ce5cd74-d204-4617-85dd-c5de3d93f609/versions/1
resources:
  instance_count: 1
tags: {}
is_deterministic: true
id: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Mic

In [53]:
data_notification = load_component(source=parent_dir + "/data_notification.yml")

try:
    data_notification = ml_client.components.get(name="teams_notification", version="1.0.3")
except:
    data_notification = ml_client.components.create_or_update(data_notification)


print(data_notification)


[32mUploading data-teams-webhook-src (0.0 MBs): 100%|██████████| 4327/4327 [00:00<00:00, 132705.13it/s]
[39m



$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: teams_notification
version: 1.0.3
display_name: teams_notification
description: Send results to teams channel
type: command
inputs:
  file_path_in:
    type: uri_folder
outputs:
  file_path_out:
    type: uri_folder
command: python data-teams-webhook.py --file_path_in ${{inputs.file_path_in}} --file_path_out
  ${{outputs.file_path_out}}
environment: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/environments/docker-image-for-pair-matching/versions/15
code: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/codes/98bc6e47-75d7-4dcc-a3cc-3a9a441d116c/versions/1
resources:
  instance_count: 1
tags: {}
is_deterministic: true
id: azureml:/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Mi

##### Check if the enviroment is available and ge the enviroment id 

In [25]:
envs = ml_client.environments.get(name="docker-image-for-pair-matching", version="2")
print(envs)

Environment({'is_anonymous': False, 'auto_increment_version': False, 'name': 'docker-image-for-pair-matching', 'description': 'Environment created from a Docker image plus pair matching packages.', 'tags': {}, 'properties': {}, 'id': '/subscriptions/aa18b01c-698a-4766-8181-9121aa576dc4/resourceGroups/rs1/providers/Microsoft.MachineLearningServices/workspaces/ymao-ws1/environments/docker-image-for-pair-matching/versions/2', 'Resource__source_path': None, 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/yuexinmao1/code/Users/yuexinmao/collectioin-ym/pipeline_comfort_poc', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x7f756cd943d0>, 'serialize': <msrest.serialization.Serializer object at 0x7f7594542040>, 'version': '2', 'latest_version': None, 'conda_file': {'channels': ['conda-forge'], 'dependencies': ['python=3.8', 'pip=21.2.4', {'pip': ['jellyfish==0.9.0', 'joblib==1.2.0', 'numpy==1.23.4', 'pandas==1.5.1', 'python-dateutil==2.8.2', 'pytz==20

# 3. pipeline job
## 3.1 Build pipeline

##### data extraction, form recognizer pipeline

### Create the Pipeline

In [54]:
from azure.ai.ml.constants import AssetTypes

# Construct pipeline
@pipeline()
def pipeline_data_extraction(
    blobname,
    openai_engine


 ):
    """Vendor Invoice Analysis Inference Pipeline"""
    # Call component obj as function: apply given inputs & parameters to create a node in pipeline
    Form_recognizer_Process = data_extraction(
        blobname = blobname
     )

    OpenAI_Data_Enrichment_Process = openai_enrichment(
        file_path_in = Form_recognizer_Process.outputs.output,
        gpt_engine = openai_engine
    )

    Similar_Product_Searching_Process = data_search(
        file_path_in = OpenAI_Data_Enrichment_Process.outputs.file_path_out
    )

    Data_Summarization_to_Teams = data_notification(
        file_path_in = Similar_Product_Searching_Process.outputs.file_path_out

    )

    # Return: pipeline outputs
    return {
        "outputs": Data_Summarization_to_Teams.outputs.file_path_out
    }
pipeline_job = pipeline_data_extraction(
     
    blobname = "s_1_1.tif",
    openai_engine= 'davinci'
    #file_path_out = Input(type="uri_folder", path=parent_dir + "/data/output/"),
)

# set pipeline level compute
pipeline_job.settings.default_compute = "ym-cluster-2"

# submit job to workspace
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="Invoice_Analysis_OpenAI_Inference"
)
pipeline_job

Experiment,Name,Type,Status,Details Page
Invoice_Analysis_OpenAI_Inference,khaki_collar_ygfw5pljpx,pipeline,Preparing,Link to Azure Machine Learning studio


# Next Steps
You can see further examples of running a pipeline job [here](../)