#  Julia workloads with nested levels of task parallelism

AzureClusterlessHPC supports running multi-node batch tasks that use Julia's original package for distributed programming. This enables users to call e.g. `pmap` functions in a nested fashion.

## Set up

To enable multi-node Julia tasks, we need to set the following parameters in our `parameters.json` file:

- `"_POOL_COUNT"`: Set this parameter to the number of batch pool that you want to use in parallel.

- `"_NODE_COUNT_PER_POOL"`: Number of nodes per pool

- `"_MPI_RUN"`: Set to `"0"`

- `"_INTER_NODE_CONNECTION"`: Set to `"1"`

- `"_NUM_NODES_PER_TASK"`: Number of parallel Julia workers per task. This value needs to be equal to or smaller than the number of nodes per pool.

- `"_NUM_PROCS_PER_NODE"`: Set to `"1"`

Note than in comparison to running a multi-node MPI task, multi-node distributed tasks have `"_MPI_RUN"` set to zero, while `"_INTER_NODE_CONNECTION"` must be set to 1. In this combination, the code of each AzureClusterlessHPC task is executed by the runtime via `julia -p $_NUM_NODES_PER_TASK`.

We start by setting the environment variables that point to our credentials and our `parameters.json` file:

In [1]:
# Set path to credentials
ENV["CREDENTIALS"] = joinpath(pwd(), "credentials.json")

# Set path to batch parameters (pool id, VM types, etc.)
ENV["PARAMETERS"] = joinpath(pwd(), "parameters.json")

# Load package
using AzureClusterlessHPC;

Next, we start our batch pool. If `"_INTER_NODE_CONNECTION"` is set to `"1"`, AzureClusterlessHPC enables inter-node communication in the batch pool(s).

In [2]:
# Create pool
startup_script = "pool_startup_script.sh"
create_pool_and_resource_file(startup_script);

Created pool 1 of 2 in canadacentral with 4 nodes.
Created pool 2 of 2 in canadacentral with 4 nodes.


## Parallel Julia jobs with nested levels of parallelization

To execute individual Julia tasks in parallel Julia sessions, we first need to load the Distributed.jl package:

In [3]:
@batchdef using Distributed;

Next, we define a hello_world function that will be executed in parallel by multiple Julia tasks:

In [4]:
@batchdef function hello_world(name)
    print("Hello ", name, "from ", myid(), "\n")
    return "Goodbye from $name"
end;

Now, we define another function called say_hello, which executes the hello_world function in parallel via Julia's (original) pmap function. 

In [5]:
@batchdef function say_hello(name_lists)
    N = length(name_lists)
    out = pmap(i -> hello_world(name_lists[i]), 1:N)
    return out
end;

Our goal is to use AzureClusterlessHPC to execute the say_hello function in parallel as a multi-task batch job. Each of the tasks calls the above say_hello function, which each execute an additional parallel pmap function. We now define our list of input arguments, which consists of two individual lists, each with four names:

In [6]:
name_lists = [["Bob", "Jane", "John", "Anne"],
              ["Mark", "Sarah", "Max", "Emma"]];

We now execute the say_hello function twice via AzureClusterlessHPC. Each task receives a list with four names as an input argument and then calls the hello_world function via the pmap function in say_hello.

In [7]:
# Run say_hello function in parallel via AzureClusterlessHPC
bctrl = @batchexec pmap(i -> say_hello(name_lists[i]), 1:2);

  4.229663 seconds (3.53 M allocations: 210.794 MiB, 1.34% gc time, 53.39% compilation time)


Each say_hello function collects the output from its parallel Julia session. Via the fetch function, we can then collect the output from the two say_hello tasks:

In [8]:
# Collect output
out = fetch(bctrl)
print(out);

Monitoring tasks for 'Completed' state, timeout in 60 minutes ...Creating job [BatchJob_sIld3PxL_1]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/application-cmd to blob container [azureclusterlesstemp]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/batch_runtime.jl to blob container [azureclusterlesstemp]...
Uploading file packages.dat to container [azureclusterlesstemp]...
Uploading file task_1.dat to container [azureclusterlesstemp]...
Creating job [BatchJob_sIld3PxL_2]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/application-cmd to blob container [azureclusterlesstemp]...
Uploading file /home/pwitte/.julia/dev/AzureClusterlessHPC/src/runtime/batch_runtime.jl to blob container [azureclusterlesstemp]...
Uploading file packages.dat to container [azureclusterlesstemp]...
Uploading file task_2.dat to container [azureclusterlesstemp]...
.....................................................................

At the end, we clean up all consumed Azure resources:

In [9]:
destroy!(bctrl);

## Copyright

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License (MIT). See LICENSE in the repo root for license information.