# Use Flow in Pipeline

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with compute cluster
- A python environment (3.9 recommended)
- Installed Azure Machine Learning Python SDK v2
- Installed PromptFlow SDK

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Load a flow as a `ParallelComponent`
- Using the component along with other components loaded from yaml in one `PipelineJob`.

**Motivations** - This notebook covers the scenario that user use a flow along with other data processing steps in a pipeline.

Below is the command to install dependent packages:

In [None]:
# azure-ai-ml is of a private version for now, targeting to release on 1.9.0 (July release)
%pip install promptflow-sdk[azure, built-ins] azure-ai-ml==1.9.0a20230703002 --extra-index-url https://azuremlsdktestpypi.azureedge.net/promptflow/ --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/

Then, similar to other SDK in azure-ai-ml, you need to prepare a ML client connecting to a specific workspace:

In [None]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component
import promptflow as pf


try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cluster_name = "cpu-cluster"
print(ml_client.compute.get(cluster_name))

## load flow as a component

Suppose you have already authoring a flow, you can load it as component:

In [None]:
# the remote component will be created on calling load_as_component, 
# so ml_client must be configured for promptflow-sdk first
pf.azure.configure(ml_client)

flow_component = pf.load_as_component(
    "../../flows/standard/web-classification",
    inputs_mapping={
        "groundtruth": "1",
        "prediction": "${{variant.outputs.category}}"
    },
    node_variant="variant_1"
)

print(flow_component)

In [None]:
data_process_component = load_component()

@pipeline
def pipeline_with_flow(input_data):
    data_process = data_process_component(input_data=input_data)

    flow_node = data_process_component(
        data=data_process.outputs.output_data,
        # this is to overwrite
        connections={
            # this is to overwrite connection related settings for a LLM node
            "summarize_text_content": {
                "deployment_name": "another_deployment_name",
                "connection": "another_connection"
            },
            # TODO: not sure if we should mention this
            # you can also overwrite connection input of a python node here
            "post_process": {
                "conn1": "another_connection"
            }
        },
    )
    # node level run settings for flow node is similar to `ParallelComponent`
    flow_node.logging_level = "DEBUG"
    flow_node.max_concurrency_per_instance = 2
    return flow_node.outputs

pipeline = pipeline_with_flow(
    input_data=Input(path=f"../../data", type=AssetTypes.URI_FOLDER),
)

pipeline.settings.default_compute = "cpu-cluster"

created_job = ml_client.jobs.create_or_update(pipeline)


In [None]:
ml_client.jobs.stream(created_job.name)