# Tutorial: Running DVC stages asynchronously with SLURM using EncFS and Sarus

This guide shows how to configure app-policies to run an application in a SLURM cluster within [Sarus](https://products.cscs.ch/sarus/) containers while operating on encrypted data managed through EncFS. We focus on an asynchronous execution model as familiar from `sbatch`. That is when reproducing a stage, DVC will only be blocking during job submission to the SLURM queue, but not wait until the resources have been allocated, the job executed and the output data is available.

Familiarity with the basic constructs such as how to construct a DVC repository with infrastructure-as-code techniques in the [tutorial for an ML repository](ml_tutorial.ipynb) and running [DVC stages on EncFS and Docker](encfs_sim_tutorial.ipynb) is assumed. We will use the same iterative simulation workflow as in the EncFS-tutorial as an example application. The key steps we will cover include:

 * Configure app-policies to run DVC stages on a SLURM cluster
 * Submit DVC stages that are asynchronously completed, committed and pushed to the SLURM queue
 * Monitor and control SLURM jobs corresponding to a DVC stage

The integration of these features with support for containers and EncFS is seamless, i.e. a previously developed application using them can be configured without any invasive code changes to run on a SLURM cluster.

As a prerequisite to this tutorial, you should have `encfs` installed as described in the [instructions](../async_encfs_dvc/encfs_int/README.md).

## Initializing the DVC repository
We first import the depencies for the tutorial.

In [1]:
import os

In [2]:
from IPython.display import SVG  # test_slurm_async_sim_tutorial: skip

Create a new directory `data/v2` for the DVC root and change to it.

In [3]:
os.chdir('data/v2')

Initialize an `encfs` DVC repository using the command

In [4]:
!dvc_init_repo . encfs

[34m2023-07-25 16:01:41,052[39m [34mDEBUG[39m: v0.1.dev8866+gd72e04c, CPython 3.9.4 on Linux-5.3.18-24.102-default-x86_64-with-glibc2.26
[34m2023-07-25 16:01:41,052[39m [34mDEBUG[39m: command: /scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/venv/bin/dvc init --subdir --verbose
[34m2023-07-25 16:01:42,191[39m [34mDEBUG[39m: Added '/scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/.dvc/config.local' to gitignore file.
[34m2023-07-25 16:01:42,192[39m [34mDEBUG[39m: Added '/scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/.dvc/tmp' to gitignore file.
[34m2023-07-25 16:01:42,193[39m [34mDEBUG[39m: Added '/scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/.dvc/cache' to gitignore file.
[34m2023-07-25 16:01:42,194[39m [34mDEBUG[39m: Removing '/var/tmp/dvc/repo/61ef98c1ff5e4b884fd58ad9105a0120'
[34m2023-07-25 16:01:42,435[39m [34mDEBUG[39m: Staging files: {'/scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples

As next step, EncFS needs to be configured, which can be achieved by running

```shell
${ENCFS_INSTALL_DIR}/bin/encfs -o allow_root,max_write=1048576,big_writes -f encrypt decrypt
```
as described in the [EncFS initialization instructions](../async_encfs_dvc/encfs_int/README.md).

Here, only for the purpose of this tutorial, we use a pre-established configuration with a simple password. It is important that this is only for demonstration purposes - in practice always generate a **random** key and store it in a **safe location**!

In [5]:
%%bash
echo 1234 > encfs_tutorial.key
cp $(git rev-parse --show-toplevel)/examples/.encfs6.xml.tutorial encrypt/ && mv encrypt/.encfs6.xml.tutorial encrypt/.encfs6.xml

At runtime, EncFS will read the password from a file. The location of that file is passed in an environment variable that has to be set when `dvc repro` is run on a stage or `encfs_launch` is used to e.g. inspect the encrypted data interactively.

In [6]:
os.environ['ENCFS_PW_FILE'] = os.path.realpath('encfs_tutorial.key')

The DVC repo has been initialized with repo and stage policies available under `.dvc_policies`.

In [7]:
!tree .dvc_policies

[01;34m.dvc_policies[00m
├── [01;34mrepo[00m
│   └── dvc_root.yaml
└── [01;34mstages[00m
    ├── dvc_config.yaml
    ├── dvc_etl.yaml
    ├── dvc_in.yaml
    ├── dvc_ml_inference.yaml
    ├── dvc_ml_training.yaml
    └── dvc_simulation.yaml

2 directories, 7 files


For the purpose of this tutorial, we will extract the paths of the encrypted directory and the mount target of EncFS into environment variables. This is not a necessary step to run DVC stages with EncFS, though.

In [8]:
from async_encfs_dvc.encfs_int.mount_config import load_mount_config

mount_config = [os.popen(f"echo {d}").read().strip() for d in  # evaluating shell exprs in paths
                load_mount_config('.dvc_policies/repo/dvc_root.yaml')]

os.environ['ENCFS_ENCRYPT_DIR'] = mount_config[0]  # encrypt (same on all hosts)
os.environ['ENCFS_DECRYPT_DIR'] = mount_config[1]  # host-specific

## Establishing the input dataset
Our pipeline will be based on a dataset labeled `sim_dataset_v1` and thereof a specific subset `ex2` (label chosen arbitrarily). First, we create a DVC stage to track the encrypted input data. As `dvc add` on longer supports the `--file` option to locate the `.dvc` file in a different folder, we use a workaround with a frozen no-op stage analogous to the manual preprocessing step for the training dataset in the ML tutorial.

In [9]:
!dvc_create_stage --app-yaml ../../in/dvc_app.yaml --stage add --dataset-name sim_dataset_v1 --subset-name ex2

Not using SLURM or MPI in this DVC stage.
Writing DVC stage to config/in/sim_dataset_v1/original/ex2
Using encfs - don't forget to set ENCFS_PW_FILE/ENCFS_INSTALL_DIR when running 'dvc repro'.
Added stage 'in_original_add_23-07-25_16-01-44_daint101_lukasd' in 'dvc.yaml'9m>

To track the changes with git, run:

	git add dvc.yaml ../../../../../encrypt/in/sim_dataset_v1/original/ex2/.gitignore .gitignore

To enable auto staging, run:

	dvc config core.autostage true
[0mFreezing stage for execution outside dvc - run 'dvc commit in_original_add_23-07-25_16-01-44_daint101_lukasd' when outputs are done.
Modifying stage 'in_original_add_23-07-25_16-01-44_daint101_lukasd' in 'dvc.yaml'
[0m

We now populate this dataset repository with input data that we generate randomly here, even though it could be downloaded from a remote source. As this will be encrypted data, we have to run `dd` that generates this data in an environment managed by EncFS, which is achieved by wrapping the command with `encfs_mount_and_run`. 

In [10]:
%%bash
encfs_mount_and_run encrypt ${ENCFS_DECRYPT_DIR} ${ENCFS_DECRYPT_DIR}/in/sim_dataset_v1/original/ex2/output/encfs_out.log \
    dd if=/dev/urandom of=${ENCFS_DECRYPT_DIR}/in/sim_dataset_v1/original/ex2/output/sim_in.dat bs=4k iflag=fullblock,count_bytes count=$((10**7)) \
    > config/in/sim_dataset_v1/original/ex2/output/stage_out.log

We can inspect the logs of the outer command, that is of `encfs_mount_and_run`, which we capture in the file `stage_out.log`.

In [11]:
!cat config/in/sim_dataset_v1/original/ex2/output/stage_out.log

encfs_mount_and_run[encrypt->/tmp/encfs_25680_async_encfs_dvc_ab9b828085d9]: Unable to determine local MPI rank - assuming to run in non-distributed mode (as a single process).
encfs_mount_and_run[encrypt->/tmp/encfs_25680_async_encfs_dvc_ab9b828085d9]: Rank 0 on daint101: Running encfs-mount at /tmp/encfs_25680_async_encfs_dvc_ab9b828085d9.
encfs_mount_and_run[encrypt->/tmp/encfs_25680_async_encfs_dvc_ab9b828085d9]: total 4.0K
drwxr-xr-x 3 lukasd csstaff 4.0K 25. Jul 16:01 in
encfs_mount_and_run[encrypt->/tmp/encfs_25680_async_encfs_dvc_ab9b828085d9]: Rank 0 on daint101: Successfully mounted encfs-dir at /tmp/encfs_25680_async_encfs_dvc_ab9b828085d9 and wrote to sync-file /scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/encrypt/.encfs_25680_async_encfs_dvc_ab9b828085d9_daint101_local_sync - starting encfs-job
encfs_mount_and_run[encrypt->/tmp/encfs_25680_async_encfs_dvc_ab9b828085d9]: Rank 0 on daint101: encfs-job completed.
encfs_mount_and_run[encrypt->/tmp/encfs_25

These logs are unencrypted and do not leak any details on the application inside the EncFS-environment. The application logs are captured separately in the encrypted file `encfs_out.log`. To inspect it, we need to mount a decrypted view of `encrypt`. In a typical session, a user would launch EncFS in the foreground on a another terminal using
```shell
encfs_launch
```
and then access the decrypted data here. However, *exclusively* for the purpose of this notebook, we will copy the decrypted data to an unencrypted location to inspect it outside of EncFS. We emphasize that this is only for demonstration purposes and confidential data should *never* be handled in that manner.

In [12]:
%%bash
encfs_mount_and_run encrypt ${ENCFS_DECRYPT_DIR} /dev/null cp ${ENCFS_DECRYPT_DIR}/in/sim_dataset_v1/original/ex2/output/encfs_out.log encfs_out.log >/dev/null
cat encfs_out.log
rm encfs_out.log

2441+1 records in
2441+1 records out
10000000 bytes (10 MB, 9.5 MiB) copied, 0.47271 s, 21.2 MB/s


To avoid logging the output of an EncFS-managed application to a file, you can supply `/dev/null` as a third parameter to `encfs_mount_and_run` as done here.

Finally, we commit the newly added data to DVC as described in the manual preprocessing step for the training dataset in the ML tutorial.

In [13]:
!dvc commit --force config/in/sim_dataset_v1/original/ex2/dvc.yaml 

Generating lock file 'config/in/sim_dataset_v1/original/ex2/dvc.lock'           
Updating lock file 'config/in/sim_dataset_v1/original/ex2/dvc.lock'
[0m

The resulting file hierarchy looks as follows:

In [14]:
!tree encrypt/in config

[01;34mencrypt/in[00m
└── [01;34msim_dataset_v1[00m
    └── [01;34moriginal[00m
        └── [01;34mex2[00m
            └── [01;34moutput[00m
                ├── encfs_out.log
                └── sim_in.dat
[01;34mconfig[00m
└── [01;34min[00m
    └── [01;34msim_dataset_v1[00m
        └── [01;34moriginal[00m
            └── [01;34mex2[00m
                ├── dvc_app.yaml
                ├── dvc.lock
                ├── dvc.yaml
                └── [01;34moutput[00m
                    └── stage_out.log

9 directories, 6 files


Before moving to the definition of preprocessing stages, we define execution labels based on timestamps for the subsequent DVC stages. In a real-world application, the timestamps would usually be generated on the fly when creating the DVC stage.

In [15]:
%env ETL_RUN_LABEL=ex2-20230714-083624
%env SIM_0_RUN_LABEL=ex2-20230714-083812-0
%env SIM_1_RUN_LABEL=ex2-20230714-083812-1
%env SIM_2_RUN_LABEL=ex2-20230714-083812-2
%env SIM_3_RUN_LABEL=ex2-20230714-083812-3

env: ETL_RUN_LABEL=ex2-20230714-083624
env: SIM_0_RUN_LABEL=ex2-20230714-083812-0
env: SIM_1_RUN_LABEL=ex2-20230714-083812-1
env: SIM_2_RUN_LABEL=ex2-20230714-083812-2
env: SIM_3_RUN_LABEL=ex2-20230714-083812-3


## Constructing the preprocessing stage
The next step involves setting up the preprocessing stage. For illustration purposes, we will use the same preprocessing application as in the ML repository tutorial. To simplify things, we do not run this step over SLURM yet, though this could be changed without difficulty.

In [16]:
%%bash

dvc_create_stage --app-yaml ../../app_prep/dvc_app.yaml --stage sim \
    --run-label ${ETL_RUN_LABEL} --input-etl ex2 --input-etl-file output/sim_in.dat

Added stage 'app_prep_v1_sim_ex2-20230714-083624' in 'dvc.yaml'

To track the changes with git, run:

	git add .gitignore ../../../../../../encrypt/in/sim_dataset_v1/app_prep_v1/auto/ex2-20230714-083624/.gitignore dvc.yaml

To enable auto staging, run:

	dvc config core.autostage true
Not using SLURM or MPI in this DVC stage.
Writing DVC stage to config/in/sim_dataset_v1/app_prep_v1/auto/ex2-20230714-083624
Using encfs - don't forget to set ENCFS_PW_FILE/ENCFS_INSTALL_DIR when running 'dvc repro'.


In [17]:
%%bash

dvc repro config/in/sim_dataset_v1/app_prep_v1/auto/${ETL_RUN_LABEL}/dvc.yaml



Stage 'config/in/sim_dataset_v1/original/ex2/dvc.yaml:in_original_add_23-07-25_16-01-44_daint101_lukasd' didn't change, skipping
Running stage 'config/in/sim_dataset_v1/app_prep_v1/auto/ex2-20230714-083624/dvc.yaml:app_prep_v1_sim_ex2-20230714-083624':
> if (set -o pipefail) 2>/dev/null; then set -o pipefail; fi; mkdir -p ../../../../../../encrypt/in/sim_dataset_v1/app_prep_v1/auto/ex2-20230714-083624/output output && encfs_mount_and_run ../../../../../../encrypt /tmp/encfs_25680_async_encfs_dvc_app_prep_v1_sim_ex2-20230714-083624_1d604493207b_ab9b828085d9 ../../../../../../../../../../../../../../tmp/encfs_25680_async_encfs_dvc_app_prep_v1_sim_ex2-20230714-083624_1d604493207b_ab9b828085d9/in/sim_dataset_v1/app_prep_v1/auto/ex2-20230714-083624/output/encfs_out_{MPI_RANK}.log bash -c "$(git rev-parse --show-toplevel)/examples/app_prep/prep.sh --etl-input /tmp/encfs_25680_async_encfs_dvc_app_prep_v1_sim_ex2-20230714-083624_1d604493207b_ab9b828085d9/in/sim_dataset_v1/original/ex2/output/s

This will execute the stage synchronously within the DVC-launched process. Execution can also be deferred to the simulation stages, where it will be triggered as a dependency. However, it is not recommended to launch mixed graphs of synchronous and asynchronous (SLURM) stages. In particular, asynchronous stages must not be launched as ancestors of synchronous DVC stages. 

## Creating the simulation stages for SLURM
Lastly, we will establish a rudimentary structure for an iterative simulation workflow that utilizes the preprocessed data. A usual simulation comes with its own tunable parameters. For this purpose, one can use a file such as for hyperparameters in the machine learning tutorial. However, for brevity here we skip this step and assume that all parameters are passed through the command line.

To enable stages to be run with SLURM, we add `slurm_opts` to the application policy under the corresponding stages (here `base_simulation` and `simulation`). Upon reproducing a DVC SLURM stage, the stage dependency graph will be submitted to the SLURM queue as a corresponding job graph. Multiple jobs are created per DVC stage, specifically a stage job for the actual application, upon its succes a DVC commit job and optionally a DVC push job and upon of stage job failure a cleanup job. There is a one-to-one correspondence between dependencies of DVC stages and those of SLURM stage jobs. In the application policy, we typically specify the options for stage jobs under `stage` and those for DVC operations like commit and push under `dvc`. For options that apply to all SLURM jobs we can use `all`.

We are now ready to set up the simulation stages. Where necessary, we can obtain completion suggestions with `--show-opts`.

In [18]:
%%bash
dvc_create_stage --app-yaml ../../app_sim/dvc_app_slurm.yaml --stage base_simulation \
  --run-label ${SIM_0_RUN_LABEL} \
  --input-simulation ${ETL_RUN_LABEL} \
  --simulation-output-file-num-per-rank 3 \
  --simulation-output-file-size $((10**6 * 2**0 / 3))

dvc_create_stage --app-yaml ../../app_sim/dvc_app_slurm.yaml --stage simulation \
  --run-label ${SIM_1_RUN_LABEL} \
  --input-simulation ${SIM_0_RUN_LABEL} \
  --simulation-output-file-num-per-rank 3 \
  --simulation-output-file-size $((10**6 * 2**1 / 3))

dvc_create_stage --app-yaml ../../app_sim/dvc_app_slurm.yaml --stage simulation \
  --run-label ${SIM_2_RUN_LABEL} \
  --input-simulation ${SIM_1_RUN_LABEL} \
  --simulation-output-file-num-per-rank 3 \
  --simulation-output-file-size $((10**6 * 2**2 / 3))

dvc_create_stage --app-yaml ../../app_sim/dvc_app_slurm.yaml --stage simulation \
  --run-label ${SIM_3_RUN_LABEL} \
  --input-simulation ${SIM_2_RUN_LABEL} \
  --simulation-output-file-num-per-rank 3 \
  --simulation-output-file-size $((10**6 * 2**3 / 3))

tree encrypt config

Added stage 'app_sim_v1_sim_dataset_v1_base_simulation_ex2-20230714-083812-0' in 'dvc.yaml'

To track the changes with git, run:

	git add dvc.yaml ../../../../../encrypt/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-0/.gitignore .gitignore

To enable auto staging, run:

	dvc config core.autostage true
Writing DVC stage to config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-0
Using encfs - don't forget to set ENCFS_PW_FILE/ENCFS_INSTALL_DIR when running 'dvc repro --no-commit'.
Added stage 'app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-1' in 'dvc.yaml'

To track the changes with git, run:

	git add .gitignore ../../../../../encrypt/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1/.gitignore dvc.yaml

To enable auto staging, run:

	dvc config core.autostage true
Writing DVC stage to config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1
Using encfs - don't forget to set ENCFS_PW_FILE/ENCFS_INSTALL_DIR when running 'dvc repro --no-co

## Running and monitoring the pipeline with SLURM
The stages can be inspected with:

In [19]:
%%bash
dvc dag --dot config/app_sim_v1/sim_dataset_v1/simulation/${SIM_3_RUN_LABEL}/dvc.yaml | tee config/app_sim_v1/sim_dataset_v1/simulation/${SIM_3_RUN_LABEL}/dvc_dag.dot
if [[ $(command -v dot) ]]; then
    dot -Tsvg config/app_sim_v1/sim_dataset_v1/simulation/${SIM_3_RUN_LABEL}/dvc_dag.dot > config/app_sim_v1/sim_dataset_v1/simulation/${SIM_3_RUN_LABEL}/dvc_dag.svg
fi

strict digraph  {
"config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-3/dvc.yaml:app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-3";
"config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-2/dvc.yaml:app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-2";
"config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1/dvc.yaml:app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-1";
"config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-0/dvc.yaml:app_sim_v1_sim_dataset_v1_base_simulation_ex2-20230714-083812-0";
"config/in/sim_dataset_v1/app_prep_v1/auto/ex2-20230714-083624/dvc.yaml:app_prep_v1_sim_ex2-20230714-083624";
"config/in/sim_dataset_v1/original/ex2/dvc.yaml:in_original_add_23-07-25_16-01-44_daint101_lukasd";
"config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-2/dvc.yaml:app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-2" -> "config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-3/dvc.yaml:app_

In [20]:
dvc_dag_img = 'config/app_sim_v1/sim_dataset_v1/simulation/' + os.environ['SIM_3_RUN_LABEL'] + '/dvc_dag.svg'  # test_slurm_async_sim_tutorial: skip
if os.path.exists(dvc_dag_img):  # test_slurm_async_sim_tutorial: skip
    display(SVG(filename=dvc_dag_img))  # test_slurm_async_sim_tutorial: skip

We can now submit the pipeline of asynchronous SLURM stages. Note that we use `dvc repro --no-commit` to submit asynchronous stages. This is in order not to add any output during SLURM job submission time to the DVC cache.

In [21]:
%%bash
if [[ ! -x "$(command -v sarus)" ]]; then  # slurm_enqueue checks availability of sarus
    module load sarus
fi
dvc repro --no-commit config/app_sim_v1/sim_dataset_v1/simulation/${SIM_3_RUN_LABEL}/dvc.yaml



Stage 'config/in/sim_dataset_v1/original/ex2/dvc.yaml:in_original_add_23-07-25_16-01-44_daint101_lukasd' didn't change, skipping
Stage 'config/in/sim_dataset_v1/app_prep_v1/auto/ex2-20230714-083624/dvc.yaml:app_prep_v1_sim_ex2-20230714-083624' didn't change, skipping
Running stage 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-0/dvc.yaml:app_sim_v1_sim_dataset_v1_base_simulation_ex2-20230714-083812-0':
> if (set -o pipefail) 2>/dev/null; then set -o pipefail; fi; mkdir -p ../../../../../encrypt/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-0/output output && slurm_enqueue.sh app_sim_v1_sim_dataset_v1_base_simulation_ex2-20230714-083812-0 dvc_app_slurm.yaml base_simulation encfs_mount_and_run ../../../../../encrypt /tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_base_simulation_ex2-20230714-083812-0_b8eaaec1281c_ab9b828085d9 ../../../../../../../../../../../../../tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_base_simulation_ex2-2023071



Generating lock file 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-0/dvc.lock'
Updating lock file 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-0/dvc.lock'





Running stage 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1/dvc.yaml:app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-1':
> if (set -o pipefail) 2>/dev/null; then set -o pipefail; fi; mkdir -p ../../../../../encrypt/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1/output output && slurm_enqueue.sh app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-1 dvc_app_slurm.yaml simulation encfs_mount_and_run ../../../../../encrypt /tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-1_f23b2839f70c_ab9b828085d9 ../../../../../../../../../../../../../tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-1_f23b2839f70c_ab9b828085d9/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1/output/encfs_out_{MPI_RANK}.log sarus run --mount=type=bind,source="/tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-1_f23b2839f70c_ab9b828085d9",destinatio



Generating lock file 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1/dvc.lock'
Updating lock file 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1/dvc.lock'





Running stage 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-2/dvc.yaml:app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-2':
> if (set -o pipefail) 2>/dev/null; then set -o pipefail; fi; mkdir -p ../../../../../encrypt/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-2/output output && slurm_enqueue.sh app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-2 dvc_app_slurm.yaml simulation encfs_mount_and_run ../../../../../encrypt /tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-2_f9f408e24b78_ab9b828085d9 ../../../../../../../../../../../../../tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-2_f9f408e24b78_ab9b828085d9/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-2/output/encfs_out_{MPI_RANK}.log sarus run --mount=type=bind,source="/tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-2_f9f408e24b78_ab9b828085d9",destinatio



Generating lock file 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-2/dvc.lock'
Updating lock file 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-2/dvc.lock'





Running stage 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-3/dvc.yaml:app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-3':
> if (set -o pipefail) 2>/dev/null; then set -o pipefail; fi; mkdir -p ../../../../../encrypt/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-3/output output && slurm_enqueue.sh app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-3 dvc_app_slurm.yaml simulation encfs_mount_and_run ../../../../../encrypt /tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-3_f0aa05e5c171_ab9b828085d9 ../../../../../../../../../../../../../tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-3_f0aa05e5c171_ab9b828085d9/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-3/output/encfs_out_{MPI_RANK}.log sarus run --mount=type=bind,source="/tmp/encfs_25680_async_encfs_dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-3_f0aa05e5c171_ab9b828085d9",destinatio



Generating lock file 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-3/dvc.lock'
Updating lock file 'config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-3/dvc.lock'

To track the changes with git, run:

	git add config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-3/dvc.lock config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-2/dvc.lock config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-0/dvc.lock config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812-1/dvc.lock

To enable auto staging, run:

	dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.


The submitted SLURM stages can be monitored with a status file and log files in the `dvc.yaml` directory and the tool `dvc_scontrol`. The latter makes functionality of SLURM's `scontrol` available for asynchronous DVC stages and, thus, also allows to control DVC SLURM job groups. In particular, we can list the submitted jobs.

In [22]:
!dvc_scontrol show all

dvc_scontrol: DVC stage jobs:
           USER           JOBID     REASON       TIME  TIME_LEFT                                                         NAME                                                                                                                           WORK_DIR
         lukasd        47951743 JobHeldUse       0:00       5:00 dvc_app_sim_v1_sim_dataset_v1_base_simulation_ex2-20230714-0 /scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812
         lukasd        47951750 JobHeldUse       0:00       5:00 dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812 /scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812
         lukasd        47951757 JobHeldUse       0:00       5:00 dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812 /scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/config

As the stage and commit jobs are put on hold in order to enable the submission of more DVC SLURM stages by the user, we have to release them. 

In [23]:
!dvc_scontrol release stage,commit

dvc_scontrol: Releasing job dvc_app_sim_v1_sim_dataset_v1_base_simulation_ex2-20230714-083812-0_ab9b828085d9[sbatch_dvc_stage_app_sim_v1_sim_dataset_v1_base_simulation_ex2-2023] (reason JobHeldUser) at 47951743.
dvc_scontrol: Releasing job dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-1_ab9b828085d9[sbatch_dvc_stage_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-] (reason JobHeldUser) at 47951750.
dvc_scontrol: Releasing job dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-2_ab9b828085d9[sbatch_dvc_stage_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-] (reason JobHeldUser) at 47951757.
dvc_scontrol: Releasing job dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812-3_ab9b828085d9[sbatch_dvc_stage_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-] (reason JobHeldUser) at 47951766.
dvc_scontrol: Releasing job dvc_op_ab9b828085d9[sbatch_dvc_commit.sh] (reason JobHeldUser) at 47951745.
dvc_scontrol: Releasing job dvc_op_ab9b828085d9[sbatch_dvc_comm

This will enable resource allocation and execution of the application stages. Upon successfull execution, the stage outputs will also be committed. We can now log out from the SLURM cluster and return later to inspect results.

For the purpose of this tutorial, however, we want to keep monitoring the pipeline on SLURM and wait for the completion of the last job, which is a DVC commit.  

In [24]:
%%bash
# ID of last commit job
commit_jobid=$(cat config/app_sim_v1/sim_dataset_v1/simulation/${SIM_3_RUN_LABEL}/app_sim_v1_sim_dataset_v1_simulation_${SIM_3_RUN_LABEL}.dvc_commit_jobid)
../../slurm_wait_for_job.sh ${commit_jobid}

Monitoring SLURM job 47951772.
The SLURM pipeline is still in process.
dvc_scontrol: DVC stage jobs:
           USER           JOBID     REASON       TIME  TIME_LEFT                                                         NAME                                                                                                                           WORK_DIR
         lukasd        47951743   Priority       0:00       5:00 dvc_app_sim_v1_sim_dataset_v1_base_simulation_ex2-20230714-0 /scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812
         lukasd        47951750 Dependency       0:00       5:00 dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812 /scratch/snx3000/lukasd/mitraccel/async-encfs-dvc/examples/data/v2/config/app_sim_v1/sim_dataset_v1/simulation/ex2-20230714-083812
         lukasd        47951757 Dependency       0:00       5:00 dvc_app_sim_v1_sim_dataset_v1_simulation_ex2-20230714-083812 /s

Once the pipeline has completed, as described above, results can be inspected by running EncFS with `encfs_launch` from the DVC root directory in another terminal and then accessing the data through the decrypted directory.

In [25]:
!tree encrypt config

[01;34mencrypt[00m
├── [01;34mapp_sim_v1[00m
│   └── [01;34msim_dataset_v1[00m
│       └── [01;34msimulation[00m
│           ├── [01;34mex2-20230714-083812-0[00m
│           │   └── [01;34moutput[00m
│           │       ├── encfs_out_0.log
│           │       ├── encfs_out_10.log
│           │       ├── encfs_out_11.log
│           │       ├── encfs_out_12.log
│           │       ├── encfs_out_13.log
│           │       ├── encfs_out_14.log
│           │       ├── encfs_out_15.log
│           │       ├── encfs_out_1.log
│           │       ├── encfs_out_2.log
│           │       ├── encfs_out_3.log
│           │       ├── encfs_out_4.log
│           │       ├── encfs_out_5.log
│           │       ├── encfs_out_6.log
│           │       ├── encfs_out_7.log
│           │       ├── encfs_out_8.log
│           │       ├── encfs_out_9.log
│           │       ├── sim.0.0.dat
│           │       ├── sim.0.1.dat
│           │       ├── sim.0.2.dat
│           │       ├── sim.10.0.d