# Build a simple ML pipeline for image classification

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription - [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace with computer cluster - [Configure workspace](../../configuration.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Create `Pipeline` with components

**Motivations** -This tutorial shows how to train a simple deep neural network using the [Fashion MNIST dataset and Keras on Azure Machine Learning. Fashion-MNIST is a dataset of Zalando's article images-consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.


# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1 Import the required libraries

In [1]:
# import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component

## 1.2 Configure credential
We are using `DefaultAzureCredential` to get access to workspace.

`DefaultAzureCredential` should be capable of handling most Azure SDK authentication scenarios. 

Reference for more available credentials if it does not work for you: [configure credential example](../../configuration.ipynb), [azure-identity reference doc](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [2]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

## 1.3 Get a handle to the workspace

We use config file to connect to a workspace. The Azure ML workspace should be configured with computer cluster. [Check this notebook for configure a workspace](../../configuration.ipynb)

In [4]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# Retrieve an already attached Azure Machine Learning Compute.
cpu_compute_target = "test-computing-cluster"
print(ml_client.compute.get(cpu_compute_target))
# gpu_compute_target = "gpu-cluster"
# print(ml_client.compute.get(gpu_compute_target))

Found the config file in: /Users/dgmr2/notebook/azure/azureml/config.json


enable_node_public_ip: true
id: /subscriptions/f4550262-c232-4d1f-8f2f-70bcd32ebfe1/resourceGroups/rg-azureml/providers/Microsoft.MachineLearningServices/workspaces/ws-azureml/computes/test-computing-cluster
idle_time_before_scale_down: 30
location: westus
max_instances: 1
min_instances: 0
name: test-computing-cluster
provisioning_state: Succeeded
size: Standard_DS11_v2
ssh_public_access_enabled: false
tier: dedicated
type: amlcompute



## 1.4 Prepare Job Input
By defining `Input`, you create a reference to the data source location. We will use local locations in this example.

In this sample, we used [`mnist`](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-mnist?tabs=azure-storage) as the target task. You can also try [`fashion-mnist`](https://github.com/zalandoresearch/fashion-mnist?tab=readme-ov-file#get-the-data), which has the same structure as `mnist` but of a harder task.

Note that, **you may meet failure in the below block if you're in a private network and have no access to the public blob**. If so, you can try downloading from [official site of `fashion-mnist`](https://github.com/zalandoresearch/fashion-mnist?tab=readme-ov-file#get-the-data) or another machine, and unzip the downloaded file to [mnist](./mnist/).

In [5]:
import urllib3
import shutil
import gzip
import os
from pathlib import Path
from azure.ai.ml import Input

base_url = "https://azureopendatastorage.blob.core.windows.net/mnist/"
base_dir = Path("mnist")
if not base_dir.exists():
    base_dir.mkdir(parents=True)

c = urllib3.PoolManager()
for target_file in [
    "train-images-idx3-ubyte.gz",
    "train-labels-idx1-ubyte.gz",
    "t10k-images-idx3-ubyte.gz",
    "t10k-labels-idx1-ubyte.gz",
]:
    if (base_dir / target_file[:-3]).exists():
        continue
    with c.request("GET", base_url + target_file, preload_content=False) as resp, open(
        base_dir / target_file, "wb"
    ) as out_file:
        shutil.copyfileobj(resp, out_file)
        resp.release_conn()
    with gzip.open(base_dir / target_file, "rb") as f_in, open(
        base_dir / target_file[:-3], "wb"
    ) as f_out:
        shutil.copyfileobj(f_in, f_out)
    os.unlink(base_dir / target_file)

mnist_ds = Input(path=base_dir.as_posix())

In [6]:
mnist_ds

{'type': 'uri_folder', 'path': 'mnist'}

# 2. Define and load command component
In this section, we will define and load component to build pipeline in two ways:
1. Using python function
1. Using yaml


## 2.1 Load components defined with python function
We define `Prep Data` component and `Train Image Classification Keras` component using @command_component respectively in [./prep/prep_component.py](./prep/prep_component.py) and [./train/train_component.py](./train/train_component.py).

Use following code to import component.

In [10]:
%load_ext autoreload
%autoreload 2

# load component function from component python file
from prep.prep_component import prepare_data_component
from train.train_component import keras_train_component

# print hint of components
help(prepare_data_component)
help(keras_train_component)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Help on function prepare_data_component in module prep.prep_component:

prepare_data_component(input_data: <mldesigner._input_output.Input object at 0x138a5c700>, training_data: <mldesigner._input_output.Output object at 0x138a5c190>, test_data: <mldesigner._input_output.Output object at 0x138baa880>)

Help on function keras_train_component in module train.train_component:

keras_train_component(input_data: <mldesigner._input_output.Input object at 0x138baa580>, output_model: <mldesigner._input_output.Output object at 0x138728c40>, epochs=10)



(Optional) Python import will have cache if the code does not change. However, if you change the conda.yaml file, you will need to force re-import.

In [None]:
# import importlib, prep.prep_component, train.train_component
# importlib.reload(prep.prep_component)
# importlib.reload(train.train_component)
# from prep.prep_component import prepare_data_component
# from train.train_component import keras_train_component

## 2.2 Load component defined with yaml
We define `Score Image Classification Keras` in [yaml](./score/score.yaml).

Use following code to load component from yaml.

In [11]:
# load component function from yaml
keras_score_component = load_component(source="./score/score.yaml")

# 3. Build pipeline

We define a pipeline containing 3 nodes:
- `prepare_data_node` will load the image and labels from Fashion MNIST dataset into mnist_train.csv and mnist_test.csv. 
- `train_node` will train a CNN model with Keras using training data.
- `score_node` will score the model using test data.

In [12]:
# define a pipeline containing 3 nodes: Prepare data node, train node, and score node
@pipeline(
    default_compute=cpu_compute_target,
)
def image_classification_keras_minist_convnet(pipeline_input_data):
    """E2E image classification pipeline with keras using python sdk."""
    prepare_data_node = prepare_data_component(input_data=pipeline_input_data)

    train_node = keras_train_component(
        input_data=prepare_data_node.outputs.training_data
    )
    # train_node.compute = gpu_compute_target

    score_node = keras_score_component(
        input_data=prepare_data_node.outputs.test_data,
        input_model=train_node.outputs.output_model,
    )


# create a pipeline
pipeline_job = image_classification_keras_minist_convnet(pipeline_input_data=mnist_ds)

In [13]:
print(pipeline_job)

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


display_name: image_classification_keras_minist_convnet
description: E2E image classification pipeline with keras using python sdk.
type: pipeline
inputs:
  pipeline_input_data:
    type: uri_folder
    path: azureml:mnist
jobs:
  prepare_data_node:
    type: command
    inputs:
      input_data:
        path: ${{parent.inputs.pipeline_input_data}}
    component:
      name: prep_data
      version: '1'
      display_name: Prep Data
      description: Convert data to CSV file, and split to training and test data
      tags:
        codegenBy: mldesigner
      type: command
      inputs:
        input_data:
          type: uri_folder
      outputs:
        training_data:
          type: uri_folder
        test_data:
          type: uri_folder
      command: mldesigner execute --source prep_component.py --name prep_data --inputs
        input_data='${{inputs.input_data}}' --outputs training_data='${{outputs.training_data}}'
        test_data='${{outputs.test_data}}'
      environment:
  

# 4. Submit pipeline job

In [14]:
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="pipeline_samples"
)
pipeline_job

[32mUploading prep (0.0 MBs): 100%|██████████| 1692/1692 [00:00<00:00, 3986.25it/s]
[39m

[32mUploading train (0.0 MBs): 100%|██████████| 4446/4446 [00:00<00:00, 9059.53it/s]
[39m

[32mUploading score (0.0 MBs): 100%|██████████| 3368/3368 [00:00<00:00, 6743.00it/s]
[39m

[32mUploading mnist (54.95 MBs): 100%|██████████| 54950048/54950048 [00:15<00:00, 3555376.10it/s]
[39m



Experiment,Name,Type,Status,Details Page
pipeline_samples,epic_oxygen_dfxg35xncz,pipeline,NotStarted,Link to Azure Machine Learning studio


In [15]:
# wait until the job completes
ml_client.jobs.stream(pipeline_job.name)

RunId: epic_oxygen_dfxg35xncz
Web View: https://ml.azure.com/runs/epic_oxygen_dfxg35xncz?wsid=/subscriptions/f4550262-c232-4d1f-8f2f-70bcd32ebfe1/resourcegroups/rg-azureml/workspaces/ws-azureml

Streaming logs/azureml/executionlogs.txt

[2025-01-29 14:20:49Z] Submitting 1 runs, first five are: ff7ecd4b:46ce7c0b-07e4-4e79-8636-6df923654375
[2025-01-29 14:37:59Z] Completing processing run id 46ce7c0b-07e4-4e79-8636-6df923654375.
[2025-01-29 14:38:00Z] Submitting 1 runs, first five are: 1dad1d0e:ecdc6f91-2adf-4840-aca4-293e52941df3
[2025-01-29 14:45:05Z] Completing processing run id ecdc6f91-2adf-4840-aca4-293e52941df3.
[2025-01-29 14:45:06Z] Submitting 1 runs, first five are: 48e19523:22920551-1414-4f22-acd5-969e5c1db33b
[2025-01-29 14:46:38Z] Completing processing run id 22920551-1414-4f22-acd5-969e5c1db33b.

Execution Summary
RunId: epic_oxygen_dfxg35xncz
Web View: https://ml.azure.com/runs/epic_oxygen_dfxg35xncz?wsid=/subscriptions/f4550262-c232-4d1f-8f2f-70bcd32ebfe1/resourcegroups/r

# 5. (Optional) Register your component to workspace

If you want to share and reuse your component within workspace, you can also register it to a machine learning workspace.

The following sample code shows how to register a component to your workspace and get a registered component from your workspace.

In [18]:
try:
    # try get back the component
    prep = ml_client.components.get(name="prep_data", version="1")
except:
    # if not exists, register component using following code
    prep = ml_client.components.create_or_update(prepare_data_component)

# list all components registered in workspace
for c in ml_client.components.list():
    print(c)

creation_context:
  created_at: '2025-01-29T15:50:31.025631+00:00'
  created_by: S T
  created_by_type: User
  last_modified_at: '2025-01-29T15:50:31.025631+00:00'
  last_modified_by: S T
  last_modified_by_type: User
description: ''
id: azureml:/subscriptions/f4550262-c232-4d1f-8f2f-70bcd32ebfe1/resourceGroups/rg-azureml/providers/Microsoft.MachineLearningServices/workspaces/ws-azureml/components/prep_data
name: prep_data



# Next Steps
You can see further examples of running a pipeline job [here](../)

In [17]:
pipeline_job.name

'epic_oxygen_dfxg35xncz'