# Simple Azure ML Pipeline Tutorial using Designer and MPI components

In this notebook-based tutorial, we will create and run a Azure ML pipeline
for a simple cats vs dogs classification model.
The pipeline will consist of the following essential components which are:
- Convert to Image Directory (Designer component)
- Init Image Transformation (Designer component)
- Apply Image Transformation (Designer component)
- Train Image Classification (MPI custom component)

We followed this Azure ML [tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2d_image_classification_with_densenet/image_classification_with_densenet.ipynb).

## Set Up
We first need to install the azure-ai-ml and azureml Python packages.

In [None]:
!pip install azure-ai-ml

In [None]:
!pip install mldesigner

In [None]:
!pip install azureml

## Connect to workspace

In [8]:
# Handle to the workspace
from azure.ai.ml import MLClient, Input

# Authentication package
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

In [9]:
# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id="SUBSCRIPTION_ID",
    resource_group_name="RESOURCE_GROUP_NAME",
    workspace_name="WORKSPACE_NAME",
)

## Create a directory for pipeline components

In [3]:
import os

components_dir = "./components"
os.makedirs(components_dir, exist_ok=True)

trainer_dir = "./components/trainer"
os.makedirs(trainer_dir, exist_ok=True)

## Set up necessary variables

There are some variables used to define a pipeline. You can customize these
variables as you want. By default all output from the pipeline will be
generated under the current directory.

In [10]:
import os

_components_dir = 'components'
_trainer_file = os.path.join(trainer_dir, 'train_component.py')

IMG_SIZE = 150 # This refers to the height and width of the images

compute_instance = "COMPUTE_INSTANCE_NAME"

In [11]:
data_root ='../datasets/dogs_vs_cats_small'
train_dir = os.path.join(data_root, 'train')
validation_dir = os.path.join(data_root, 'test')

train_cats_dir = os.path.join(train_dir, 'cats')  # directory with our training cat pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')  # directory with our training dog pictures
validation_cats_dir = os.path.join(validation_dir, 'cats')  # directory with our validation cat pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs')  # directory with our validation dog pictures


num_cats_tr = len(os.listdir(train_cats_dir))
num_dogs_tr = len(os.listdir(train_dogs_dir))

num_cats_val = len(os.listdir(validation_cats_dir))
num_dogs_val = len(os.listdir(validation_dogs_dir))

total_train = num_cats_tr + num_dogs_tr
total_val = num_cats_val + num_dogs_val


print('total training cat images:', num_cats_tr)
print('total training dog images:', num_dogs_tr)
print('total validation cat images:', num_cats_val)
print('total validation dog images:', num_dogs_val)
print("--")
print("Total training images:", total_train)
print("Total validation images:", total_val)

total training cat images: 102
total training dog images: 100
total validation cat images: 52
total validation dog images: 50
--
Total training images: 202
Total validation images: 102


## Loading the component when it is YAML

In [12]:
from azure.ai.ml import load_component

# Load component function from yaml

convert_to_image = load_component(source="components/convert_to_image/convert_to_image_component.yaml")

apply_transform = load_component(
    source="components/apply_image_transformation/apply_image_transformation.yaml"
)

init_transform = load_component(
    source="components/init_image_transformation/init_image_transformation.yaml"
)

# The train component is an mpi component.
imagecnn_train = load_component(source="components/imagecnn_train/entry.spec.yaml")

### Loading the datasets

In [22]:
train_ds = Input(name="TrainData", type="uri_file", path="../datasets/dogs_vs_cats_small/train")
test_ds = Input(name="TestData", type="uri_file", path="../datasets/dogs_vs_cats_small/test")

## Pipeline definition

In [23]:
from azure.ai.ml.dsl import pipeline
from azure.ai.ml import load_component
# define a pipeline containing 3 nodes: Prepare data node, train node, and score node
@pipeline(
    default_compute=compute_instance,
)
def cat_vs_dog_classifier():
    convert_train = convert_to_image(input_path = train_ds)
    convert_test = convert_to_image(input_path = test_ds)

    init_image_transformation = init_transform(
        resize="True",
        size=IMG_SIZE,
        center_crop="False",
        pad="False",
        padding=0,
        color_jitter="False",
        grayscale="False",
        random_resized_crop="False",
        random_crop="False",
        random_horizontal_flip="True",
        random_vertical_flip="False",
        random_rotation="False",
        random_affine="False",
        random_grayscale="False",
        random_perspective="False",
    )

    apply_trans_on_train = apply_transform(
    mode="For training",
    input_image_transform_path=init_image_transformation.outputs.output_path,
    input_image_dir_path=convert_train.outputs.output_path,
    )

    apply_trans_on_val = apply_transform(
        mode="For inference",
        input_image_transform_path=init_image_transformation.outputs.output_path,
        input_image_dir_path=convert_test.outputs.output_path,
    )

    imagecnn_train_gpu = imagecnn_train(
        train_data=apply_trans_on_train.outputs.output_path,
        valid_data=apply_trans_on_val.outputs.output_path,
        data_backend="pytorch",
        arch="resnet50",
        model_config="classic",
        workers=5,
        epochs=4,
        batch_size=16,
        optimizer_batch_size=-1,
        lr=0.1,
        lr_schedule="step",
        warmup=0,
        label_smoothing=0.0,
        mixup=0.0,
        momentum=0.9,
        weight_decay=0.0001,
        print_freq=10,
        resume="",
        pretrained_weights="",
        static_loss_scale=1.0,
        prof=-1,
        seed=123,
        raport_file="experiment_raport.json",
        save_checkpoint_epochs=2,
    )

    # It does not work with the GPU of type "Standard_NC24ads_A100_v4 (24 cores, 220 GB RAM, 64 GB disk)"
    # imagecnn_train_gpu.compute = gpu_compute_target
    # imagecnn_train_gpu.resources.instance_count = 1

# create a pipeline
pipeline_job = cat_vs_dog_classifier()

## Run the pipeline job

In [None]:
pipeline_job = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="dogs_vs_cats_designer_components"
)
pipeline_job

In [None]:
# wait until the job completes
ml_client.jobs.stream(pipeline_job.name)