# Use Flow in Pipeline

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with compute cluster
- A python environment (3.9 recommended)
- Installed Azure Machine Learning Python SDK v2
- Installed PromptFlow SDK

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Load a flow as a `ParallelComponent`
- Using the component along with other components loaded from yaml in one `PipelineJob`.

**Motivations** - This notebook covers the scenario that user use a flow along with other data processing steps in a pipeline.

Below is the command to install dependent packages:

In [None]:
%pip install promptflow-sdk[builtins,azure]==0.0.99531891 --extra-index-url https://azuremlsdktestpypi.azureedge.net/promptflow/
# azure-ai-ml is of a private version for now, so need to install separately after installing promptflow-sdk. 
# Targeting to release on 1.9.0 (July release)
%pip install azure-ai-ml==1.9.0a20230703002 --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/

## Connect to workspace

Similar to other SDK in azure-ai-ml, you need to import related packages and prepare a ML client connecting to a specific workspace first.

In [None]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

try:
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # NOTE: Update following workspace information if not correctly configure before
    client_config = {
        "subscription_id": "<SUBSCRIPTION_ID>",
        "resource_group": "<RESOURCE_GROUP>",
        "workspace_name": "<AML_WORKSPACE_NAME>",
    }

    if client_config["subscription_id"].startswith("<"):
        print(
            "please update your <SUBSCRIPTION_ID> <RESOURCE_GROUP> <AML_WORKSPACE_NAME> in notebook cell"
        )
        raise ex
    else:  # write and reload from config file
        import json, os

        config_path = "../.azureml/config.json"
        os.makedirs(os.path.dirname(config_path), exist_ok=True)
        with open(config_path, "w") as fo:
            fo.write(json.dumps(client_config))
        ml_client = MLClient.from_config(credential=credential, path=config_path)

print(ml_client)

## load flow as a component

Suppose you have already authored a flow, you can load it as component:

In [None]:
from promptflow.azure import load_as_component, configure
from pathlib import Path
import os

# the remote component will be created on calling load_as_component, 
# so ml_client must be configured for promptflow-sdk first
configure(client=ml_client)

flow_component = load_as_component(
    r"C:\\PyCharmProjects\\PromptFlow\\src\\promptflow-sdk\\tests\\test_configs\\flows\web_classification",
    inputs_mapping={
        "groundtruth": "${data.answer}",
        "prediction": "${variant.outputs.category}",
    },
    component_type="parallel"
)

print(flow_component)

Then you can use this component along with other components in a pipeline:

In [None]:
tsv2jsonl_component = load_component("./tsv2jsonl-component/component_spec.yaml")

@pipeline
def pipeline_with_flow(input_data):
    data_transfer = tsv2jsonl_component(input_data=input_data)

    flow_node = flow_component(
        # this can be either a URI jsonl file or a URI folder containing multiple jsonl files
        data=data_transfer.outputs.output_data,
        # this is to overwrite connection settings
        connections={
            # this is to overwrite connection related settings for a LLM node
            # "summarize_text_content" is the node name
            "summarize_text_content": {
                "deployment_name": "another_deployment_name",
                "connection": "another_connection"
            },
            # you can overwrite inputs mapping here
            "groundtruth": "Channel",
            # you can overwrite connection input of a python node here
            # "convert_to_dict": {
            #     "conn1": "another_connection"
            # }
        },
    )
    # node level run settings for flow node is similar to `ParallelComponent`
    flow_node.logging_level = "DEBUG"
    flow_node.max_concurrency_per_instance = 2
    return flow_node.outputs

pipeline = pipeline_with_flow(
    input_data=Input(path="./data.tsv", type=AssetTypes.URI_FILE),
)

pipeline.settings.default_compute = "cpu-cluster"

created_job = ml_client.jobs.create_or_update(pipeline)


Like other pipeline jobs in azure-ai-ml, you can monitor the status of the job via `ml_client.jobs.stream`:

In [None]:
ml_client.jobs.stream(created_job.name)