Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,12 @@ Installation

Pydra is implemented purely in Python and has a small number of dependencies
It is easy to install via pip for Python >= 3.11 (preferably within a
`virtual environment`_):
`virtual environment`_). To get the latest version you will need to explicitly specify
greater than or equal to 1.0a, otherwise PyPI will install the last 0.* version:

.. code-block:: bash

$ pip install pydra
$ pip install pydra>=1.0a

Pre-designed tasks are available under the `pydra.tasks.*` namespace. These tasks
are typically implemented within separate packages that are specific to a given
Expand All @@ -41,7 +42,7 @@ ANTs_ (*pydra-ants*), or a collection of related tasks/workflows, such as Niwork

.. code-block:: bash

$ pip install pydra-fsl pydra-ants
$ pip install pydra-tasks-fsl pydra-tasks-ants

Of course, if you use Pydra to execute commands within non-Python toolkits, you will
need to either have those commands installed on the execution machine, or use containers
Expand Down
68 changes: 38 additions & 30 deletions docs/source/reference/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,62 +4,69 @@ Glossary
.. glossary::

Cache-root
The directory where cache directories for tasks to be executed are created.
Task cache directories are named within the cache root directory using a hash
of the task's parameters, so that the same task with the same parameters can be
reused.
The root directory in which separate cache directories for each job are created.
Job cache directories are named within the cache-root directory using a unique
checksum for the job based on the task's parameters and software environment,
so that if the same job is run again the outputs from the previous run can be
reuused.

Combiner
A combiner is used to combine :ref:`State-array` values created by a split operation
defined by a :ref:`Splitter` on the current node, upstream workflow nodes or
stand-alone tasks.

Container-ndim
The number of dimensions of the container object to be iterated over when using
a :ref:`Splitter` to split over an iterable value. For example, a list-of-lists
or a 2D array with `container_ndim=2` would be split over the elements of the
inner lists into a single 1-D state array. However, if `container_ndim=1`,
the outer list/2D would be split into a 1-D state array of lists/1D arrays.
The number of dimensions of the container object to be flattened into a single
state array when splitting over nested containers/multi-dimension arrays.
For example, a list-of-list-of-floats or a 2D numpy array with `container_ndim=1`,
the outer list/2D would be split into a 1-D state array consisting of
list-of-floats or 1D numpy arrays, respectively. Whereas with
`container_ndim=2` they would be split into a state-array of floats consisiting
of all the elements of the inner-lists/array.

Environment
An environment refers to a specific software encapsulation, such as a Docker
or Singularity image, that is used to run a task.
or Singularity image, in which a shell tasks are run. They are specified in the
Submitter object to be used when executing a task.

Field
A field is a parameter of a task, or a task outputs object, that can be set to
a specific value. Fields are specified to be of any types, including objects
and file-system objects.
A field is a parameter of a task, or an output in a task outputs class.
Fields define the expected datatype of the parameter and other metadata
parameters that control how the field is validated and passed through to the
execution of the task.

Hook
A hook is a user-defined function that is executed at a specific point in the task
execution process. Hooks can be used to prepare/finalise the task cache directory
A hook is a user-defined function that is executed at a specific point either before
or after a task is run. Hooks can be used to prepare/finalise the task cache directory
or send notifications

Job
A job is a discrete unit of work, a :ref:`Task`, with all inputs resolved
(i.e. not lazy-values or state-arrays) that has been assigned to a worker.
A task describes "what" is to be done and a submitter object describes
"how" it is to be done, a job combines both objects to describe a concrete unit
of processing.
A job consists of a :ref:`Task` with all inputs resolved
(i.e. not lazy-values or state-arrays) and a Submitter object. It therefore
represents a concrete unit of work to be executed, be combining "what" is to be
done (Task) with "how" it is to be done (Submitter).

Lazy-fields
A lazy-field is a field that is not immediately resolved to a value. Instead,
it is a placeholder that will be resolved at runtime, allowing for dynamic
parameterisation of tasks.
it is a placeholder that will be resolved at runtime when a workflow is executed,
allowing for dynamic parameterisation of tasks.

Node
A single task within the context of a workflow, which is assigned a name and
references a state. Note this task can be nested workflow task.
A single task within the context of a workflow. It is assigned a unique name
within the workflow and references a state object that determines the
state-array of jobs to be run if present (if the state is None then a single
job will be run for each node).

Read-only-caches
A read-only cache is a cache root directory that was created by a previous
pydra runs, which is checked for matching task caches to be reused if present
but not written not modified during the execution of a task.
pydra run. The read-only caches are checked for matching job checksums, which
are reused if present. However, new job cache dirs are written to the cache root
so the read-only caches are not modified during the execution.

State
The combination of all upstream splits and combines with any splitters and
combiners for a given node, it is used to track how many jobs, and their
parameterisations, need to be run for a given workflow node.
combiners for a given node. It is used to track how many jobs, and their
parameterisations, that need to be run for a given workflow node.

State-array
A state array is a collection of parameterised tasks or values that were generated
Expand All @@ -84,8 +91,9 @@ Glossary

Worker
Encapsulation of a task execution environment. It is responsible for executing
tasks and managing their lifecycle. Workers can be local (e.g., a thread or
process) or remote (e.g., high-performance cluster).
tasks and managing their lifecycle. Workers can be local (e.g., debug and
concurrent-futures multiprocess) or orchestrated through a remote scheduler
(e.g., SLURM, SGE).

Workflow
A Directed-Acyclic-Graph (DAG) of parameterised tasks, to be executed in order.
Expand Down
162 changes: 0 additions & 162 deletions empty-docs/conf.py

This file was deleted.

5 changes: 0 additions & 5 deletions empty-docs/index.rst

This file was deleted.

1 change: 0 additions & 1 deletion empty-docs/requirements.txt

This file was deleted.

12 changes: 7 additions & 5 deletions pydra/compose/base/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,11 +196,13 @@ def __call__(
readonly_caches : list[os.PathLike], optional
Alternate cache locations to check for pre-computed results, by default None
audit_flags : AuditFlag, optional
Auditing configuration, by default AuditFlag.NONE
messengers : list, optional
Messengers, by default None
messenger_args : dict, optional
Messenger arguments, by default None
Configure provenance tracking. available flags: :class:`~pydra.utils.messenger.AuditFlag`
Default is no provenance tracking.
messenger : :class:`Messenger` or :obj:`list` of :class:`Messenger` or None
Messenger(s) used by Audit. Saved in the `audit` attribute.
See available flags at :class:`~pydra.utils.messenger.Messenger`.
messengers_args : messengers_args : dict[str, Any], optional
Argument(s) used by `messegner`. Saved in the `audit` attribu
**kwargs : dict
Keyword arguments to pass on to the worker initialisation

Expand Down
12 changes: 7 additions & 5 deletions pydra/engine/submitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,13 @@ class Submitter:
max_concurrent: int | float, optional
Maximum number of concurrent tasks to run, by default float("inf") (unlimited)
audit_flags : AuditFlag, optional
Auditing configuration, by default AuditFlag.NONE
messengers : list, optional
Messengers, by default None
messenger_args : dict, optional
Messenger arguments, by default None
Configure provenance tracking. available flags: :class:`~pydra.utils.messenger.AuditFlag`
Default is no provenance tracking.
messenger : :class:`Messenger` or :obj:`list` of :class:`Messenger` or None
Messenger(s) used by Audit. Saved in the `audit` attribute.
See available flags at :class:`~pydra.utils.messenger.Messenger`.
messengers_args : dict[str, Any], optional
Argument(s) used by `messegner`. Saved in the `audit` attribu
clean_stale_locks : bool, optional
Whether to clean stale lock files, i.e. lock files that were created before the
start of the current run. Don't set if using a global cache where there are
Expand Down
Loading