Skip to content

Commit

Permalink
Add documentation for aggregation (#122)
Browse files Browse the repository at this point in the history
* Add aggregation docs without FlowGroups

* Add documentation for aggregation with FlowGroups

* Address reviews and update docs with current API

* Improve wordings

* Address code review

* Improvements to doc

* Improve wording

* Update docs/source/aggregation.rst

Co-authored-by: Carl Simon Adorf <carl.simon.adorf@gmail.com>

* Update docs/source/aggregation.rst

Co-authored-by: Carl Simon Adorf <carl.simon.adorf@gmail.com>

* Update docs/source/aggregation.rst

Co-authored-by: Carl Simon Adorf <carl.simon.adorf@gmail.com>

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Update docs/source/aggregation.rst

* Explain that operations are like aggregate operations acting on aggregates of one job.

Co-authored-by: Carl Simon Adorf <carl.simon.adorf@gmail.com>

* Add aggregation to table of contents.

* Rename section to match FlowGroup.

* Unitalicize.

* Fix links to pre/post.

* Fix intersphinx references.

* Use :py: role prefix for consistency with other docs.

Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Co-authored-by: Carl Simon Adorf <carl.simon.adorf@gmail.com>
  • Loading branch information
3 people committed Jun 26, 2021
1 parent f945b5c commit 81bfeb3
Show file tree
Hide file tree
Showing 3 changed files with 193 additions and 1 deletion.
191 changes: 191 additions & 0 deletions docs/source/aggregation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
.. _aggregation:

===========
Aggregation
===========

This chapter provides information about passing aggregates of jobs to operation functions.


.. _aggregator_definition:

Definition
==========

An :py:class:`~flow.aggregator` is used as a decorator for operation functions which accept a variable number of positional arguments, ``*jobs``.
The argument ``*jobs`` is unpacked into an *aggregate*, defined as an ordered tuple of jobs.
See also the Python documentation about :ref:`argument unpacking <python:tut-unpacking-arguments>`.

.. code-block:: python
# project.py
from flow import FlowProject, aggregator
class Project(FlowProject):
pass
@aggregator()
@Project.operation
def op1(*jobs):
print("Number of jobs in aggregate:", len(jobs))
@Project.operation
def op2(job):
pass
if __name__ == "__main__":
Project().main()
If :py:class:`~flow.aggregator` is used with the default arguments, it will create a single aggregate containing all the jobs present in the project.
In the example above, ``op1`` is an *aggregate operation* where all the jobs present in the project are passed as a variable number of positional arguments (via ``*jobs``), while ``op2`` is an operation where only a single job is passed as an argument.

.. tip::

The concept of aggregation may be easier to understand if one realizes that "normal" operation functions are equivalent to aggregate operation functions with an aggregate group size of one job.


.. note::

For an aggregate operation, all conditions like :py:meth:`~flow.FlowProject.pre` or :py:meth:`~flow.FlowProject.post`, callable directives, and other features are required to take the same number of jobs as the operation as arguments.

.. _types_of_aggregation:

Types of Aggregation
====================

Currently, **signac-flow** allows users to aggregate jobs in the following ways:

- *All jobs*: All of the project's jobs are passed to the operation function.
- *Group by state point key(s)*: The aggregates are grouped by one or more state point keys.
- *Group by arbitrary key function*: The aggregates are grouped by keys determined by a key-function that expects an instance of :py:class:`~.signac.contrib.job.Job` and return the grouping key.
- Grouping into aggregates of a specific size.
- Using a completely custom aggregator function when even greater flexibility is needed.

Group By
--------

:py:meth:`~flow.aggregator.groupby` allows users to aggregate jobs by grouping them by a state point key, an iterable of state point keys whose values define the groupings, or an arbitrary callable of :py:class:`~signac.contrib.job.Job`.

.. code-block:: python
@aggregator.groupby("temperature")
@Project.operation
def op3(*jobs):
pass
In the above example, the jobs will be aggregated based on the state point key ``"temperature"``.
So, all the jobs having the same value of **temperature** in their state point will be aggregated together.

Groups Of
---------

:py:meth:`~flow.aggregator.groupsof` allows users to aggregate jobs by generating aggregates of a given size.

.. code-block:: python
@aggregator.groupsof(2)
@Project.operation
def op4(job1, job2=None):
pass
In the above example, the jobs will get aggregated in groups of 2 and hence, up to two jobs will be passed as arguments at once.

.. note::

In case the number of jobs in the project in this example is odd, there will be one aggregate containing only a single job.
In general, the last aggregate from :py:meth:`~flow.aggregator.groupsof` will contain the remaining jobs if the aggregate size does not evenly divide the number of jobs in the project.
If a remainder is expected and valid, users should make sure that the operation function can be called with the reduced number of arguments (e.g. by using ``*jobs`` or providing default arguments as shown above).

Sorting jobs for aggregation
----------------------------

Aggregators allow users to sort the jobs before creating aggregates with the ``sort_by`` parameter.
The sorting order can be defined with the ``sort_ascending`` parameter.
By default, when no ``sort_by`` parameter is specified, the order of the jobs will be decided by the iteration order of the **signac** project.

.. code-block:: python
@aggregator.groupsof(2, sort_by="temperature", sort_ascending=False)
@Project.operation
def op5(*jobs):
pass
.. note::

In the above example, all the jobs will be sorted by the state point parameter ``"temperature"`` in descending order and then be aggregated as groups of 2.

Selecting jobs for aggregation
------------------------------

**signac-flow** allows users to selectively choose which jobs to pass into operation functions.
This can be used to generate aggregates from only the selected jobs, excluding any jobs that do not meet the selection criteria.

.. code-block:: python
@aggregator(select=lambda job: job.sp.temperature > 0)
@Project.operation
def op6(*jobs):
pass
.. _aggregate_id:

Aggregate ID
============

Similar to the concept of a job id, an aggregate id is a unique hash identifying an aggregate of jobs.
The aggregate id is sensitive to the order of the jobs in the aggregate.


.. note::

The id of an aggregate containing one job is that job's id.

In order to distinguish between an aggregate id and a job id, the id of aggregates with more than one job will always have a prefix ``agg-``.

Users can generate the aggregate id of an aggregate using :py:func:`flow.get_aggregate_id`.

.. tip::

Users can also pass an aggregate id to the ``--job-id`` command line flag provided by **signac-flow** in ``run``, ``submit``, and ``exec``.


.. _aggregation_with_flow_groups:

Aggregation with FlowGroups
===========================

In order to associate an aggregator object with a :py:class:`~flow.project.FlowGroup`, **signac-flow** provides a ``group_aggregator`` parameter in :py:meth:`~flow.FlowProject.make_group`.
By default, no aggregation takes place for a :py:class:`FlowGroup`.

.. note::

All the operations in a :py:class:`~flow.project.FlowGroup` will use the same :py:class:`~flow.aggregator` object provided to the group's ``group_aggregator`` parameter.

.. code-block:: python
# project.py
from flow import FlowProject, aggregator
class Project(FlowProject):
pass
group = Project.make_group("agg-group", group_aggregator=aggregator())
@group
@aggregator()
@Project.operation
def op1(*jobs):
pass
@group
@Project.operation
def op2(*jobs):
pass
if __name__ == "__main__":
Project().main()
In the above example, when the group ``agg-group`` is executed using ``python project.py run -o agg-group``, all the jobs in the project are passed as positional arguments for both ``op1`` and ``op2``.
If ``op1`` is executed using ``python project.py run -o op1``, all the jobs in the project are passed as positional arguments because a :py:class:`~flow.aggregator` is associated with the operation function ``op1`` (separately from the aggregator used for ``agg-group``).
If ``op2`` is executed using ``python project.py run -o op2``, only a single job is passed as an argument because no :py:class:`~flow.aggregator` is associated with the operation function ``op2``.
2 changes: 1 addition & 1 deletion docs/source/flow-project.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Defining a workflow
===================

We will reproduce the simple workflow introduced in the previous section by first copying both the ``greeted()`` condition function and the ``hello()`` *operation* function into the ``project.py`` module.
We then use the :py:func:`~flow.FlowProject.operation` and the :py:func:`~.flow.FlowProject.post` decorator functions to specify that the ``hello()`` operation function is part of our workflow and that it should only be executed if the ``greeted()`` condition is not met.
We then use the :py:meth:`~flow.FlowProject.operation` and the :py:meth:`~flow.FlowProject.post` decorator functions to specify that the ``hello()`` operation function is part of our workflow and that it should only be executed if the ``greeted()`` condition is not met.

.. code-block:: python
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ If you are new to **signac**, the best place to start is to read the :ref:`intro
environments
templates
flow-group
aggregation
indexing
collections
configuration
Expand Down

0 comments on commit 81bfeb3

Please sign in to comment.