# Use Flow in Pipeline

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2
- Installed PromptFlow SDK

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Load a flow as a `ParallelComponent`
- Using the component along with other components loaded from yaml in one `PipelineJob`.

**Motivations** - This guide will introduce how to use a flow along with other data processing steps in a pipeline.

**Known issues** - This feature is not stable now and here are known issues we are actively fixing:
- You must include a `.promptflow/flow.tools.json` in the flow directory first. This file will automatically generated when you run the flow locally.
- Component of the same name (even with different version) can be created only once. An auto-generated component name based on hash will be used when component name & version are neither provided.
- The flow nodes can only run on computer cluster with managed identity assigned Azure ML Data Scientist role.
- connection/columns_mapping overwrite doesn't work for now.
- This feature works on canary workspace only for now: [sample job link](https://ml.azure.com/experiments/id/9ce1a534-9d3d-4761-a5e7-5299dd6912f1/runs/clever_leek_4xh6x9z7s5?wsid=/subscriptions/96aede12-2f73-41cb-b983-6d11a904839b/resourcegroups/promptflow/workspaces/promptflow-canary-dev&tid=72f988bf-86f1-41af-91ab-2d7cd011db47)

## 0. Install dependent packages

Please follow [configuration.ipynb](../../configuration.ipynb) to install dependent packages and connect to a workspace first.

In [1]:
!pip install "promptflow[azure]" promptflow-tools--extra-index-url https://azuremlsdktestpypi.azureedge.net/promptflow/

Looking in indexes: https://pypi.org/simple, https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
Collecting azure-ai-ml==1.9.0a20230703002
  Downloading https://pkgs.dev.azure.com/azure-sdk/29ec6040-b234-4e31-b139-33dc4287b756/_packaging/3572dbf9-b5ef-433b-9137-fc4d7768e7cc/pypi/download/azure-ai-ml/1.9a20230703002/azure_ai_ml-1.9.0a20230703002-py3-none-any.whl (6.2 MB)
     ---------------------------------------- 6.2/6.2 MB 4.5 MB/s eta 0:00:00

Installing collected packages: azure-ai-ml
  Attempting uninstall: azure-ai-ml
    Found existing installation: azure-ai-ml 1.7.0
    Uninstalling azure-ai-ml-1.7.0:
      Successfully uninstalled azure-ai-ml-1.7.0
Successfully installed azure-ai-ml-1.9.0a20230703002


## 1. Connect to MLClient and create necessary connections
Similar to other SDK in azure-ai-ml, you need to import related packages and prepare a ML client connecting to a specific workspace first.

In [1]:
# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component
from promptflow.azure import PFClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Create PFClient connected to workspace
pf = PFClient(ml_client)

Found the config file in: C:\github\config.json


## 2. Load flow as a component

Suppose you have already authored a flow, you can load it as component:

In [2]:
flow_component = pf.load_as_component(
    "../../flows/standard/web-classification/",
    columns_mapping={
        "url": "${data.url}",
        "groundtruth": "${data.answer}",
    },
    component_type="parallel",
)

print(flow_component)

run_mode is not a known attribute of class <class 'promptflow.azure._restclient.flow.models._models_py3.LoadFlowAsComponentRequest'> and will be ignored


name: azureml_promptflow_3bea27d3_f880_ed1f_3caa_f1a06c099cf7
version: '1'
display_name: azureml_promptflow_3bea27d3_f880_ed1f_3caa_f1a06c099cf7
type: parallel
inputs:
  data:
    type: uri_folder
    optional: false
  run_outputs:
    type: uri_folder
    optional: true
  url:
    type: string
    optional: false
    default: ${data.url}
  groundtruth:
    type: string
    optional: false
    default: ${data.answer}
  connections.classify_with_llm.connection:
    type: string
    optional: true
    default: azure_open_ai_connection
  connections.classify_with_llm.deployment_name:
    type: string
    optional: true
    default: text-davinci-003
  connections.summarize_text_content.connection:
    type: string
    optional: true
    default: azure_open_ai_connection
  connections.summarize_text_content.deployment_name:
    type: string
    optional: true
    default: text-davinci-003
outputs:
  flow_outputs:
    type: uri_folder
  debug_info:
    type: uri_folder
task:
  type: run_func

## 3. Use the component in a pipeline

Then you can use this component along with other components in a pipeline:

In [3]:
tsv2jsonl_component = load_component("./tsv2jsonl-component/component_spec.yaml")


@pipeline
def pipeline_with_flow(input_data):
    data_transfer = tsv2jsonl_component(input_data=input_data)

    flow_node = flow_component(
        # this can be either a URI jsonl file or a URI folder containing multiple jsonl files
        data=data_transfer.outputs.output_data,
        # you can overwrite inputs mapping here
        groundtruth="Channel",
        # this is to overwrite connection settings
        connections={
            # this is to overwrite connection related settings for a LLM node
            # "summarize_text_content" is the node name
            "summarize_text_content": {
                "deployment_name": "another_deployment_name",
            },
            # you can overwrite custom connection input of a python node here
            # "convert_to_dict": {
            #     "conn1": "another_connection"
            # }
        },
    )
    # node level run settings for flow node is similar to `ParallelComponent`
    flow_node.logging_level = "DEBUG"
    flow_node.max_concurrency_per_instance = 2
    return flow_node.outputs


pipeline = pipeline_with_flow(
    input_data=Input(path="./data.tsv", type=AssetTypes.URI_FILE),
)

pipeline.settings.default_compute = "cpu-cluster"

created_job = ml_client.jobs.create_or_update(pipeline)

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Like other pipeline jobs in azure-ai-ml, you can monitor the status of the job via `ml_client.jobs.stream`:

In [4]:
ml_client.jobs.stream(created_job.name)

RunId: bright_ring_574b03b4qb
Web View: https://ml.azure.com/runs/bright_ring_574b03b4qb?wsid=/subscriptions/96aede12-2f73-41cb-b983-6d11a904839b/resourcegroups/promptflow/workspaces/promptflow-eastus

Streaming logs/azureml/executionlogs.txt

[2023-08-02 04:03:19Z] Submitting 1 runs, first five are: d5148e39:efbf785b-90cb-404c-a59b-6d8af9ae6235
