#  Machine learning with Flux.jl and AzureClusterlessHPC.jl

This tutorial demonstrates how to run a simple machine learning example with Julia's Flux library and AzureClusterlessHPC. We start by setting the environment variables for our credential and parameter files:

In [1]:
# Install required packages for this example
using Pkg; Pkg.add("Flux")

# Set paths to credentials + parameters
ENV["CREDENTIALS"] = joinpath(pwd(), "../..", "credentials.json")
ENV["PARAMETERS"] = joinpath(pwd(), "parameters.json")

# Load package
using AzureClusterlessHPC
batch_clear();

Next, we set up a pool with 2 workers:

In [2]:
# Create pool
startup_script = "pool_startup_script_flux.sh"
create_pool_and_resource_file(startup_script);

Pool 1 of 1 in canadacentral already exists.


We load the required packages and tag packages that we want to use on the batch workers with the `@batchdef` macro:

In [3]:
using Random
@batchdef using Flux
@batchdef using Flux.Optimise: update!

Next, we define our loss function, which takes the current network, as well as the input `x` and output `y` as input arguments:

In [4]:
# Loss function
@batchdef function loss(model, x, y)
    ŷ = model(x)
    sum((y .- ŷ).^2)
end;

As the loss function only computes the function value, but not the gradients, we define an additional function called `objective`, which evaluates the loss function and then uses Flux' automatic differentiation to compute the gradients. We then collect all gradients in a cell array and return it:

In [5]:
# Function for evaluation network and computing gradients
@batchdef function objective(_model, x, y)

    # Load model into memory
    model = fetch(_model)
    θ = params(model)

    # Compute grad
    gs = gradient(() -> loss(model, x, y), θ)

    # Return cell array of local grads
    grads = []
    for p in θ
        push!(grads, gs[p])
    end
    return grads
end;

With our objective function in place, we now define our neural network. In this case, our network is a simple two-layer model with a softmax function:

In [6]:
# Define network
model = Chain(
  Dense(10, 5, σ),
  Dense(5, 2),
  softmax);

We create a simple training data set consisting of 100 training examples (i.e. matrix rows):

In [7]:
# Create training dataset
ntrain = 100
x = rand(10, ntrain)
y = rand(2, ntrain);

For this example, we use a simple SGD optimizer which we run for 2 iterations with a batch size of 4:

In [8]:
# SGD optimizer
opt = Descent(1f-3)
maxiter = 2
batchsize = 4;

Finally, we run our training loop. During each iteration, we select a random set of rows from the training data. Then, we broadcast the current version of our network to the batch workers, which returns a batch future `_model`. Next, we evaluate the objective function as a multi-task batch job, in which each task compute the gradient of the weights for one row of the current data batch. We then collect and sum all gradients into a single update and use it to update our (locally stored) network:

In [9]:
# Training loop
for j=1:maxiter
    print("Iteration $j\n")

    # Current batch
    idx = randperm(ntrain)[1:batchsize]
    x_batch = x[:, idx]
    y_batch = y[:, idx]

    # Broadcast current version of network
    _model = @bcast(model)

    # Compute gradients using Azure Batch
    bctrl = @batchexec pmap(i -> objective(_model, x_batch[:,i], y_batch[:,i]), 1:batchsize)
    grad = fetchreduce(bctrl; op=+); delete_job(bctrl)

    # Update local network parameters
    for (p, g) in zip(params(model), grad)
        update!(opt, p, g)
    end
end;

Iteration 1
  6.396477 seconds (4.37 M allocations: 259.314 MiB, 2.32% gc time, 4.60% compilation time)
Monitoring tasks for 'Completed' state, timeout in 60 minutes ...Uploading file model.dat to container [azureclusterlesstemp]...
Creating job [FluxDeepLearning_SI77hbf2_1]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/application-cmd to blob container [azureclusterlesstemp]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/batch_runtime.jl to blob container [azureclusterlesstemp]...
Uploading file packages.dat to container [azureclusterlesstemp]...
Uploading file task_1.dat to container [azureclusterlesstemp]...
Uploading file task_2.dat to container [azureclusterlesstemp]...
Uploading file task_3.dat to container [azureclusterlesstemp]...
Uploading file task_4.dat to container [azureclusterlesstemp]...
.............................
Fetch output from task task_4...............................
Fetch output from task task_3...........

The last step is the clean up of our resources. We delete the blob container that contains all temporary files and delete the batch pool:

In [10]:
# Delete container and pool specified in the parameter json file
delete_container()
delete_all_jobs()
delete_pool();

## Copyright

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License (MIT). See LICENSE in the repo root for license information.