Using signac and row workflows provide the following benefits:
-
The
signacandrowworkflows provide contained and totally reproducible results, since all the project steps and calculations are contained within this a singlesignac/rowproject. Although, to ensure total reproduciblity, the project should be run from a container. Note: This involves building a container (Docker, Apptainer, Podman, etc.), using it to run the original calculations, and providing it the future parties that are trying to reproduce the exact results. -
The
signacandrowworkflows can simply track the progress of any project locally or on the HPC, providing as much or a little details of the project status as the user programs into theactions.pyandworkflow.tomlfile. Note:rowtracks the progress and completion of a project step or section by determining if a file exists. Therefore, the user can generate this file after a verification step is performed to confirm a sucessful completion or commands run without error (Exampe:Exit Code 0). -
These
signacandrowworkflows are designed to track the progress of all the project's parts or stages, only resubmitting the jobs locally or to the HPC if they are not completed or not already in the queue. -
These
signacandrowworkflows also allow colleagues to quickly transfer their workflows to each other, and easily add new state points to a project, without the fear of rerunning the original state points. -
Please also see the signac website and row website, which outlines some of the other major features.
This is a signac and row workflow example/tutorial for a simple numpy calculation, which utilizes the following workflow steps:
-
Part 1: For each individual job (set of state points), this code generates the
signac_job_document.jsonfile from thesignac_statepoint.jsondata. Thesignac_statepoint.jsononly stores the set of state points or required variables for the given job. Thesignac_job_document.jsoncan be used to store any other variables that the user wants to store here for later use or searching. -
Part 2: This writes the input values into a file that
numpywill use to do a calculation inPart 3. There are four (4) random numbers generated that used the initialvalue_0_intvalue and thereplicate_number_intvalue to seed the random number generator. -
Part 3: Calulates the dot product of the four (4) random numbers generated in
Part 2(4 numbers dot [1, 2, 3, 4]). Also, runs a bash commandecho "Running the echo command or any other bash command here", which is an example of how to run a bash command to run a software package inside the commands for each state point. -
Part 4: Obtains the average and standard deviation for each input
value_0_intvalue across all the replicates, and prints the output data file (analysis/output_avg_std_of_replicates_txt_filename.txt). Signac is setup to automatically loop through all the json files (signac_statepoint.json), calculating the average and standard deviation for the jobs with the state points that only have a differentreplicate_number_intnumbers.
srcdirectory: This directory can be used to store any custom function that are required for this workflow. This includes any developedPythonfunctions or any template files used for the custom workflow (Example: A base template file that is used for a find and replace function, changing the variables with the differing state point inputs).
- The signac documentation, row documentation, signac GitHub, and row GitHub and can be used for reference.
Please cite this GitHub repository and the following repositories:
The signac workflows for "this project" can be built using mamba. Alternatively, you use can use micromamba or miniforge, supplimenting micromamba or conda, respectively for mamba when using them.
If you are using an HPC, you will likely need the below command or a similar command to load the correct python package manager.
module load mambaThe following steps can be used to build the environment:
cd signac_numpy_tutorialmamba env create -f environment.ymlActivate the environment:
mamba activate signac_numpy_tutorial- All the signac and row commands are run from the
<local_path>/signac_numpy_tutorial/signac_numpy_tutorial/project directory.
The clusters.toml file is used to specify the the HPC environment. The specific HPC will need to be setup for each HPC and identified on the workflow.toml file.
The following files are located here:
cd <you_local_path>/signac_pytorch_plmnist_example/signac_pytorch_plmnist_example/project-
Modify the
clusters.tomlfile to fit your HPC.-
Replace the
<ADD_YOUR_HPC_NAME_STRING>values with your unique HPC name as a string.For Example at GT,
<ADD_YOUR_HPC_NAME_STRING>is replaced with"phoenix". -
You also may need to change the
"LMOD_SYSHOST"environment variable to match how your specific HPC is setup.
-
-
Add the modified cluster configuration file (
clusters.toml) to the following location on the HPC under your account (~/.config/row/clusters.toml).
cp clusters.toml ~/.config/row/clusters.toml-
Modify the
workflow.tomlfile to fit your HPC.-
Replace the
<ADD_YOUR_HPC_NAME>values with your unique HPC name.For Example at GT,
<ADD_YOUR_HPC_NAME>is replaced withphoenix, changing[action.submit_options.<ADD_YOUR_HPC_NAME>]to[action.submit_options.phoenix]. -
<ADD_YOUR_CHARGE_ACCOUNT_NAME_STRING>values with your specific charge account as a string.For Example,
<ADD_YOUR_CHARGE_ACCOUNT_NAME_STRING>is replaced with"project_x", changingaccount =<ADD_YOUR_CHARGE_ACCOUNT_NAME_STRING>``account = "project_x"
-
-
Modify the slurm submission script, or modify the
workflow.tomlfile to your cluster's partitions that you want to use, you can do that with the below addition to theworkflow.tomlfile.
For parts 1, 2, and 4, add the CPU partion(s) you want to use:
```bash
custom = ["","--partition=cpu-1,cpu-1,cpu-3"]
```
For part 3, add the GPU partion(s) you want to use:
```bash
custom = ["","--partition=gpu-1,gpu-1,gpu-3"]
```
Note: The cluster partitions in the clusters.toml can be specified for each HPC or only the partitions that you commonly use. This allows you to specify the partition in the workflow.toml file (i.e., partition=real_partition_name), and does not need the partitions to be specified/overwritten in the custom line (i.e, custom = ["","--partition=cpu-1,cpu-1,cpu-3"].
Note: As needed, the cluster partitions in the clusters.toml can be fake ones. Then specifying a fake partition in the workflow.toml file (i.e., partition=fake_partition_name), allows you just override the selected partition and allow many real partitions in the workflow.toml (i.e., custom = ["","--partition=cpu-1,cpu-1,cpu-3"]), which is used to write the Slurm submission script.
- This can also be done if >1 or more partitions is needed.
Build the test workspace:
python init.pyRun the following command as the test:
row submit --dry-runYou should see an output that looks something like this (export ACTION_CLUSTER=<YOUR_HPC_NAME>) in the output if it is working:
...
directories=(
be31aae200171ac52a9e48260b7ba5b1
)
export ACTION_WORKSPACE_PATH=workspace
export ACTION_CLUSTER=<YOUR_HPC_NAME>
...Clean up row and delete the test workspace:
row cleanrm -r workspaceBuild the test workspace:
python init.pyRun the following command as the test:
row submit --dry-runYou should see an output that looks something like this (export ACTION_CLUSTER=<YOUR_HPC_NAME>) in the output if it is working:
...
directories=(
be31aae200171ac52a9e48260b7ba5b1
)
export ACTION_WORKSPACE_PATH=workspace
export ACTION_CLUSTER=<YOUR_HPC_NAME>
...Clean up row and delete the test workspace:
row cleanrm -r workspace- If
row submitis run locally like this, then you must remove the HPC parts in theworkflow.tomlfile (see the notes in theworkflow.toml). - Change the GPU parts to run only on CPU, if the local hardware is supports CPU workflows (see the notes in the
workflow.toml).
Build the test workspace:
python init.pyRun the following command as the test:
row submit --dry-runYou should see an output that looks something like this (export ACTION_CLUSTER=`none`) in the output if it is working:
...
directories=(
be31aae200171ac52a9e48260b7ba5b1
)
export ACTION_WORKSPACE_PATH=workspace
export ACTION_CLUSTER=`none`
...Clean up row and delete the test workspace:
row cleanrm -r workspace