# Build a simple ML pipeline for image classification

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Create `Pipeline` with components

**Motivations** -This tutorial shows how to train a simple deep neural network using the [Fashion MNIST dataset and Keras on Azure Machine Learning. Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.


# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [1]:
#import required libraries
from azure.ml import MLClient, dsl
from azure.ml.entities import load_component

## 1.2. Configure credential

We are using `DefaultAzureCredential` to get access to workspace. When an access token is needed, it requests one using multiple identities(`EnvironmentCredential, ManagedIdentityCredential, SharedTokenCacheCredential, VisualStudioCodeCredential, AzureCliCredential, AzurePowerShellCredential`) in turn, stopping when one provides a token.
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for more information.

`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 
Reference [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python) for all available credentials if it does not work for you.  

In [2]:
from azure.identity import DefaultAzureCredential

try:
    credential = DefaultAzureCredential(exclude_visual_studio_code_credential=True)
    # Check if given credential can get token successfully.
    credential.get_token('https://management.azure.com/.default')
except Exception as ex:
    # If exception happens when retrieve token, try exclude the failed credential like this then try again:
    # Exclude VSCode credential:
    # credential = DefaultAzureCredential(exclude_visual_studio_code_credential=True)
    raise Exception("Failed to retrieve a token from the included credentials due to the following exception, try to add `exclude_xxx_credential=True` to `DefaultAzureCredential` and try again.") from ex

## 1.3. Configure workspace details and get a handle to the workspace

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. `MLClient.from_config()` reads the file config.json and loads the details into an object named ml_client, and it expects the workspace configuration to be saved in the current directory or its parent.

In [3]:
config_path = "./config.json"
ml_client = MLClient.from_config(credential=credential, path=config_path)
print(ml_client)

Found the config file in: config.json


MLClient(credential=<azure.identity._credentials.default.DefaultAzureCredential object at 0x0000022983D5A748>,
         subscription_id=d128f140-94e6-4175-87a7-954b9d27db16,
         resource_group_name=ModuleX-rg,
         workspace_name=shiyuws-canary)


## 1.4. Retrieve or create an Azure Machine Learning compute target

In [4]:
from azure.ml.entities import AmlCompute

# specify aml compute name.
gpu_compute_target = 'gpu-cluster'
cpu_compute_target = 'test-ci'

try:
    ml_client.compute.get(cpu_compute_target)
except Exception:
    print('Creating a new cpu compute target...')
    compute = AmlCompute(name=cpu_compute_target, size="STANDARD_D2_V2", min_instances=0, max_instances=4)
    ml_client.compute.begin_create_or_update(compute)

try:
    ml_client.compute.get(gpu_compute_target)
except Exception:
    print('Creating a new gpu compute target...')
    compute = AmlCompute(name=gpu_compute_target, size="STANDARD_NC6", min_instances=0, max_instances=4)
    ml_client.compute.begin_create_or_update(compute)

## 1.5. Prepare Job Input
By defining `JobInput`, you create a reference to the data source location. The data remains in its existing location, so no extra storage cost is incurred.

In [5]:
from azure.ml.entities import JobInput

fashion_ds = JobInput(path="wasbs://demo@data4mldemo6150520719.blob.core.windows.net/mnist-fashion/")

In [6]:
fashion_ds

{'type': 'uri_folder', 'path': 'wasbs://demo@data4mldemo6150520719.blob.core.windows.net/mnist-fashion/', 'mode': 'ro_mount'}

# 2. Define and load command component
In this section, we will define and load component to build pipeline in two ways:
1. Using python function
1. Using yaml


## 2.1 Load components defined with python function
We define `Prep Data` component and `Train Image Classification Keras` component using dsl.command_component respectively in [./prep/prep_dsl_component.py](./prep/prep_dsl_component.py) and [./train/train_dsl_component.py](./train/train_dsl_component.py).

Use following code to import component.

In [7]:
%load_ext autoreload
%autoreload 2

# if you modify the componetn source code, use following code to reload it
# otherwise notebook has cache and may not load the latest update
import importlib, prep.prep_dsl_component, train.train_dsl_component
importlib.reload(prep.prep_dsl_component)
importlib.reload(train.train_dsl_component)

# load component function from dsl component python file
from prep.prep_dsl_component import prep
from train.train_dsl_component import keras_train

# print hint of components
# help(prep)
# help(keras_train)




## 2.2 Load component defined with python function
We define `Score Image Classification Keras` in [yaml](./score/score.yaml).

Use following code to load component from yaml.

In [8]:
# load component function from yaml
keras_score = load_component(yaml_file='./score/score.yaml')

# 3. Build pipeline

We define a pipeline containing 3 nodes:
- `prepare_data_node` will load the image and labels from Fashion MNIST dataset into mnist_train.csv and mnist_test.csv. 
- `train_node` will train a CNN model with Keras using training data.
- `score_node` will score the model using test data.

In [9]:
# define a pipeline containing 3 nodes: Prepare data node, train node, and score node
@dsl.pipeline(
    description='E2E image classification pipeline with keras',
    default_compute=cpu_compute_target,
)
def image_classification_keras_minist_convnet():

    prepare_data_node = prep(input_data=fashion_ds)

    train_node = keras_train(input_data=prepare_data_node.outputs.training_data)
    train_node.compute = gpu_compute_target

    score_node = keras_score(input_data=prepare_data_node.outputs.test_data, input_model=train_node.outputs.output_model)

# create a pipeline
pipeline = image_classification_keras_minist_convnet()

In [10]:
print(pipeline)

name: cyan_wire_wbsrw7qz2p
display_name: image_classification_keras_minist_convnet
description: E2E image classification pipeline with keras
type: pipeline
inputs: {}
outputs: {}
tags: {}
compute: azureml:test-ci
settings: {}
properties: {}
jobs:
  prepare_data_node:
    $schema: '{}'
    type: command
    inputs:
      input_data:
        mode: ro_mount
        type: uri_folder
        path: azureml:wasbs://demo@data4mldemo6150520719.blob.core.windows.net/mnist-fashion/
    outputs: {}
    command: python -m azure.ml.dsl.executor --file prep_dsl_component.py --name prep_data
      --params --input_data ${{inputs.input_data}} --training_data ${{outputs.training_data}}
      --test_data ${{outputs.test_data}}
    code: d:/Github/test/2e_image_classification_keras_minist_convnet/prep
    component:
      name: prep_data
      version: '1'
      display_name: Prep Data
      description: Convert data to CSV file, and split to training and test data
      type: command
      inputs:
      

# 4. Submit pipeline job

In [11]:
pipeline_job = ml_client.jobs.create_or_update(pipeline, experiment_name='pipeline_samples')
pipeline_job



Experiment,Name,Type,Status,Details Page
pipeline_samples,cyan_wire_wbsrw7qz2p,pipeline,Preparing,Link to Azure Machine Learning studio


In [None]:
# wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

# 5. (Optional) Register your component to workspace

If you want to share and reuse your component within workspace, you can also register it after you are confident to its performance.

Use following code to register, and load a registered component from your workspace. After loading the registered component, you can use it to build pipeline in the same way in the previous section.

In [13]:
try:
    # try get back the dsl.command_component defined component
    prep = ml_client.components.get(name="train_image_classification_keras", version="1")
except:
    # if not exists, register component using following code
    prep = ml_client.components.create_or_update(prep)

# list all components registered in workspace
for c in ml_client.components.list():
    print(c)

{'additional_properties': {}, 'id': '/subscriptions/d128f140-94e6-4175-87a7-954b9d27db16/resourceGroups/ModuleX-rg/providers/Microsoft.MachineLearningServices/workspaces/shiyuws-canary/components/prep_data', 'name': 'prep_data', 'type': 'Microsoft.MachineLearningServices/workspaces/components', 'system_data': <azure.ml._restclient.v2022_02_01_preview.models._models_py3.SystemData object at 0x0000022985120988>, 'properties': <azure.ml._restclient.v2022_02_01_preview.models._models_py3.ComponentContainerDetails object at 0x000002298513C548>}
{'additional_properties': {}, 'id': '/subscriptions/d128f140-94e6-4175-87a7-954b9d27db16/resourceGroups/ModuleX-rg/providers/Microsoft.MachineLearningServices/workspaces/shiyuws-canary/components/eval_model', 'name': 'eval_model', 'type': 'Microsoft.MachineLearningServices/workspaces/components', 'system_data': <azure.ml._restclient.v2022_02_01_preview.models._models_py3.SystemData object at 0x000002298513C5C8>, 'properties': <azure.ml._restclient.v2

# Next Steps
You can see further examples of running a pipeline job [here](/sdk/jobs/pipelines/)