Skip to content

jsmentch/pydra

 
 

Repository files navigation

Pydra

A simple dataflow engine with scalable semantics.

Build Status CircleCI codecov

The goal of pydra is to provide a lightweight Python dataflow engine for DAG construction, manipulation, and distributed execution.

Feature list:

  1. Python 3.7+ using type annotation and attrs
  2. Composable dataflows with simple node semantics. A dataflow can be a node of another dataflow.
  3. splitter and combiner provides many ways of compressing complex loop semantics
  4. Cached execution with support for a global cache across dataflows and users
  5. Distributed execution, presently via ConcurrentFutures, SLURM, and Dask (this is an experimental implementation with limited testing)

[API Documentation] [PyCon 2020 Poster]

Tutorial

The Pydra Tutorial can be found in the pydra-tutorial repository.

The tutorial can be run locally (with the necessary requirements) or using Binder service: Binder

Please note that mybinder times out after an hour.

Installation

pip install pydra

Developer installation

Pydra requires Python 3.7+. To install in developer mode:

git clone git@github.com:nipype/pydra.git
cd pydra
pip install -e .[dev]

If you want to test execution with Dask:

git clone git@github.com:nipype/pydra.git
cd pydra
pip install -e .[dask]

It is also useful to install pre-commit:

pip install pre-commit
pre-commit

About

Pydra Dataflow Engine

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.9%
  • Other 1.1%