# cifar-10 pipeline

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Define `CommandComponent` using YAML
- Create basic `Pipeline` using component from local YAML file

**Motivations** - This notebook explains how to define `commandComponent` via YAML then use command component to build pipeline. The command compoonent is a fundamental construct of Azure Machine Learning pipeline. It can be used to run a task on a specified compute (either local or on the cloud). The command compoonent accepts `Environment` to setup required infrastructure. You can define a `command` to run on this infrastructure with `inputs`. You can reuse the same `Component` with different pipeline.  

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [13]:
# import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component

## 1.2 Configure credential

We are using `DefaultAzureCredential` to get access to workspace. 
`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [14]:
credential = DefaultAzureCredential()

    # Enter details of your AML workspace
subscription_id = ""
resource_group = ""
workspace = ""
ml_client = MLClient(credential, subscription_id, resource_group, workspace)

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

# 2. Define command component via YAML
Define command component using YAML and load as function.

In [24]:
parent_dir = "."
get_data_func = load_component("./get-data-1.yml")
train_model_func = load_component("./train-model.yml")
eval_model_func = load_component("./eval-model.yml")

In [25]:
print(get_data_func)

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: get_data
display_name: Get Data
type: command
inputs:
  cifar_zip:
    type: uri_file
outputs:
  cifar:
    type: uri_folder
command: python get_data.py --input_data ${{inputs.cifar_zip}} --output_folder ${{outputs.cifar}}
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1
code: azureml:./get-data
is_deterministic: true
tags: {}



# 3. Basic pipeline job

## 3.1 Build pipeline

In [17]:
# Define pipeline
@pipeline()
def train_cifar_10_with_pytorch():
    """CIFAR-10 Pipeline Example."""
    # define the job to get data
    get_data = get_data_func(
        cifar_zip=Input(
            path="wasbs://datasets@azuremlexamples.blob.core.windows.net/cifar-10-python.tar.gz",
            type="uri_file",
        )
    )
    get_data.outputs.cifar.mode = "upload"

    # define the job to train the model
    train_model = train_model_func(epochs=1, cifar=get_data.outputs.cifar)
    train_model.compute = "gpu-cluster-V100"
    train_model.outputs.model_dir.mode = "upload"
    train_model.resources.instance_count = 1

    # define the job to evaluate the model
    eval_model = eval_model_func(
        cifar=get_data.outputs.cifar, model_dir=train_model.outputs.model_dir
    )
    eval_model.compute = "gpu-cluster-V100"
    eval_model.resources.instance_count = 1


pipeline_job = train_cifar_10_with_pytorch()

# set pipeline level compute
pipeline_job.settings.default_compute = "cpu-cluster"

# 3.2 Submit pipeline job

In [27]:
# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="pipeline_samples"
)
pipeline_job

In [26]:
# Wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

# Next Steps
You can see further examples of running a pipeline job [here](../)