{{ message }}

# glotzerlab / signac-docs Public

Switch branches/tags
Nothing to show

Cannot retrieve contributors at this time
426 lines (281 sloc) 15.9 KB

This is a collection of recipes on how to solve typical problems using signac.

.. todo::

Move all recipes below into a 'General' section once we have added more recipes.

## Migrating (changing) the data space schema

Oftentimes, one discovers at a later stage that important keys are missing from the metadata schema. For example, in the tutorial we are modeling a gas using the ideal gas law, but we might discover later that important effects are not captured using this overly simplistic model and decide to replace it with the van der Waals equation:

\left(p + \frac{N^2 a}{V^2}\right) \left(V - Nb \right) = N k_B T

Since the ideal gas law can be considered a special case of the equation above with a=b=0, we could migrate all jobs with:

>>> for job in project:
...     job.sp.setdefault('a', 0)
...     job.sp.setdefault('b', 0)
...

The setdefault() function sets the value for a and b to 0 in case that they are not already present.

• To delete a key use del job.sp['key_to_be_removed'].
• To rename a key, use job.sp.new_name = job.sp.pop('old_name').

Note

The job.sp and job.doc attributes provide all basic functions of a regular Python dict.

### Initializing Jobs with Replica Indices

If you want to initialize your workspace with multiple instances of the same state point, you may want to include a replica_index or random_seed parameter in the state point.

num_reps = 3
for i in range(num_reps) :
for p in range(1, 11):
sp = {'p': p, 'kT': 1.0, 'N': 1000, "replica_index": i}
job = project.open_job(sp)
job.init()

### Applying document-wide changes

The safest approach to apply multiple document-wide changes is to replace the document in one operation. Here is an example on how we could recursively replace all dot (.)-characters with the underscore-character in all keys [1]:

import signac
from collections.abc import Mapping

def migrate(doc):
if isinstance(doc, Mapping):
return {k.replace('.', '_'): migrate(v) for k, v in doc.items()}
else:
return doc

for job in signac.get_project():
job.sp = migrate(job.sp)
job.doc = migrate(job.doc)

This approach makes it also easy to compare the pre- and post-migration states before actually applying them.

 [1] The use of dots in keys is deprecated. Dots will be exclusively used to denote nested keywords in the future.

## Initializing state points with replica indices

We often require multiple jobs with the same state point to collect enough information to make statistical inferences about the data. Instead of creating multiple projects to handle this, we can simply add a replica_index to the state point. For example, we can use the following code to generate 3 copies of each state point in a workspace:

# init.py
import signac

project = signac.init_project('ideal-gas-project')
num_reps = 3

## Running in containerized environments

Using signac-flow in combination with container systems such as docker or singularity is easily achieved by modifying the executable directive. For example, assuming that we wanted to use a singularity container named software.simg, which is placed within the project root directory, we use the following directive to specify that a given operation is to be executed within then container:

@Project.operation.with_directives({"executable": "singularity exec software.simg python"})
def containerized_operation(job):
pass

If you are using the run command for execution, simply execute the whole script in the container:

\$ singularity exec software.simg python project.py run

Attention!

Many cluster environments will not allow you to submit jobs to the scheduler using the container image. This means that the actual submission, (e.g. python project.py submit or similar) will need to be executed with a local Python executable.

To avoid issues with dependencies that are only available in the container image, move imports into the operation function. Condition functions will be executed during the submission process to determine what to submit, so depedencies for those must be installed into the local environment as well.

Tip

You can define a decorator that can be reused like this:

def on_container(func):
return flow.directives(executable='singularity exec software.simg python')(func)

@on_container
@Project.operation
def containerized_operation(job):
pass
.. todo::

1. How to do hyperparameter optimization for your awesome ML application.
2. How to implement branched workflows.
3. How to implement a dynamic data space (*e.g.* add jobs on-the-fly).
4. How to implement aggregation operations.

Parallel and Super Computing

1. How to run and submit MPI operations.
3. How to submit a bundle of operations to a cluster.
4. How to synchronize between two different compute environments.
5. How to use **signac** in combination with a docker/singularity container.

## Using multiple execution environments for operations

Suppose that for a given project you wanted to run jobs on multiple supercomputers, your laptop, and your desktop. On each of these different machines, different operation directives may be needed. The :py:class:FlowGroup class provides a mechanism to easily specify the different requirements of each different environment.

# project.py
from flow import FlowProject, directives

class Project(FlowProject):
pass

supercomputer = Project.make_group(name='supercomputer')
laptop = Project.make_group(name='laptop')
desktop = Project.make_group(name='desktop')

@supercomputer.with_directives(directives=dict(
ngpu=4, executable="singularity exec --nv /path/to/container python"))
@laptop.with_directives(directives=dict(ngpu=0))
@desktop.with_directives(directives=dict(ngpu=1))
@Project.operation
def op1(job):
pass

@supercomputer.with_directives(directives=dict(
nranks=40, executable="singularity exec /path/to/container python"))
@laptop.with_directives(directives=dict(nranks=4))
@desktop.with_directives(directives=dict(nranks=8))
@Project.operation
def op2(job):
pass

if __name__ == '__main__':
Project().main()

Tip

Sometimes, a machine should only run certain operations. To specify that an operation should only run on certain machines, only decorate the operation with the groups for the 'right' machine(s).

Tip

To test operations with a small interactive job, a 'test' group can be used to ensure that the operations do not try to run on multiple cores or GPUs.