#  Hybrid cloud machine learning with Flux.jl

This tutorial demonstrates how we can use AzureClusterlessHPC to develop a machine learning model on our local machine, while running the training on a remote machine with a GPU. This hybrid cloud scenarios enables researchers to make efficient use of expensive GPU instances without having to manually move data & code from and to GPU VMs.

In [1]:
# Install required packages for this example
using Pkg; Pkg.add("Flux")

# Set paths to credentials + parameters
ENV["CREDENTIALS"] = joinpath(pwd(), "../..", "credentials.json")
ENV["PARAMETERS"] = joinpath(pwd(), "parameters.json")

# Load package
using AzureClusterlessHPC
batch_clear();

Next, we set up a pool with 1 GPU instance:

In [2]:
# Create pool
startup_script = "pool_startup_script_flux.sh"
create_pool_and_resource_file(startup_script);

Pool 1 of 1 in southcentralus already exists.


We load the required packages and tag packages that we want to use on the batch workers with the `@batchdef` macro:

In [3]:
using Random
@batchdef using Flux
@batchdef using Flux.Optimise: update!

Next, we define a simple neural network consisting of 2 dense layers and an activation function:

In [4]:
# Define network
model = Chain(
  Dense(10, 5, σ),
  Dense(5, 2),
  softmax);

Having defined our model, we implement our loss function. Here we use a simple sum of squares loss:

In [5]:
# Loss function
@batchdef function loss(model, x, y)
    ŷ = model(x)
    sum((y .- ŷ).^2)
end;

Now we create a simple testing data set:

In [6]:
# Create training dataset
ntrain = 100
x = rand(10, ntrain)
y = rand(2, ntrain);

Next we define a function for training our model. The function takes our network, as well as the input and training labels as input arguments and it returns the trained model. Additionally, we create a flag for optionally running the training on a GPU:

In [7]:
# Training function
@batchdef function train_model(model, x, y; cuda=false, learning_rate=1f-3)

    # Move to gpu?
    if cuda
        x |> gpu
        y |> gpu
        model |> gpu
    end

    # Optimization
    θ = params(model)
    opt = Descent(learning_rate)

    # Training loop
    for j=1:10
        print("Training epoch $j of 10.\n")
        
        # Evaluate network & compute gradients
        grads = gradient(() -> loss(model, x, y), θ)

        # Update network weights
        for p in θ
            update!(opt, p, grads[p])
        end
    end
    return model
end;

Once we have everything set up, we want to locally test our network and ensure that everything is implemented correctly:

In [8]:
# Test locally
model_train = train_model(model, x, y);

Training epoch 1 of 10.
Training epoch 2 of 10.
Training epoch 3 of 10.
Training epoch 4 of 10.
Training epoch 5 of 10.
Training epoch 6 of 10.
Training epoch 7 of 10.
Training epoch 8 of 10.
Training epoch 9 of 10.
Training epoch 10 of 10.


Once we have tested our training function locally on a CPU, we can run the actual training on an Azure GPU instance. Instead of manually having to start an instance and moving all required code and data over manually, we can simple execute our training function remotely with `@batchexec`. We set to `cuda` flag to `true` to move the data and network to the GPU of the VM. After training, we copy the trained network back to our notebook:

In [9]:
# Train remotely on GPU
bctrl = @batchexec train_model(model, x, y; cuda=true)
model_train = fetch(bctrl);

Monitoring tasks for 'Completed' state, timeout in 60 minutes ...Creating job [FluxDeepLearning_tC2Rx4AY_1]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/application-cmd to blob container [azureclusterlesstemp]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/batch_runtime.jl to blob container [azureclusterlesstemp]...
Uploading file packages.dat to container [azureclusterlesstemp]...
Uploading file task_1.dat to container [azureclusterlesstemp]...
....................................................
Fetch output from task task_1


We can also easily run multiple training runs in parallel, e.g. to train with different hyperparameters:

In [10]:
bctrl = @batchexec pmap(α -> train_model(model, x, y; learning_rate=α), [1f-2, 1f-3, 1f-4])
models_tune = fetch(bctrl);

  0.954479 seconds (102.41 k allocations: 6.401 MiB, 14.05% compilation time)
Monitoring tasks for 'Completed' state, timeout in 60 minutes ...Creating job [FluxDeepLearning_80TRorha_1]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/application-cmd to blob container [azureclusterlesstemp]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/batch_runtime.jl to blob container [azureclusterlesstemp]...
Uploading file packages.dat to container [azureclusterlesstemp]...
Uploading file task_1.dat to container [azureclusterlesstemp]...
Uploading file task_2.dat to container [azureclusterlesstemp]...
Uploading file task_3.dat to container [azureclusterlesstemp]...
...........................................
Fetch output from task task_3..................................................
Fetch output from task task_1.......................................................
Fetch output from task task_2


The last step is the clean up of our resources. We delete the blob container that contains all temporary files and delete the batch pool:

In [11]:
# Delete container and pool specified in the parameter json file
delete_container()
delete_all_jobs()
delete_pool();

## Copyright

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License (MIT). See LICENSE in the repo root for license information.