pytask is a workflow management system that facilitates reproducible data analyses. Its features include:
- Automatic discovery of tasks.
- Lazy evaluation. If a task, its dependencies, and its products have not changed, do not execute it.
- Debug mode. Jump into the debugger if a task fails, get feedback quickly, and be more productive.
- Repeat a task with different inputs. Loop over task functions to run the same task with different inputs.
- Select tasks via expressions. Run only a subset of tasks with expressions and marker expressions.
- Easily extensible with plugins. pytask is built on pluggy, a plugin management framework that allows you to adjust pytask to your needs. Plugins are available for parallelization, LaTeX, R, and Stata and more can be found here. Learn more about plugins in this tutorial.
pytask is available on PyPI and on Anaconda.org. Install the package with
$ pip install pytask
or
$ conda install -c conda-forge pytask
Color support is automatically available on non-Windows platforms. On Windows, please, use Windows Terminal, which can be, for example, installed via the Microsoft Store.
To quickly set up a new project, use the cookiecutter-pytask-project template or start from other templates or example projects.
A task is a function that is detected if the module and the function name are prefixed
with task_
. Here is an example.
# Content of task_hello.py.
from pathlib import Path
from pytask import Product
from typing import Annotated
def task_hello_earth(path: Annotated[Path, Product] = Path("hello_earth.txt")):
path.write_text("Hello, earth!")
-
The purpose of the task is to create the file
hello_earth.txt
and add some content. -
To tell pytask that
hello_earth.txt
is a product and not an input, use theProduct
annotation.(If you are not used to type annotations, do not worry. pytask also offers simpler interfaces without type annotations.)
-
Since you pass a
pathlib.Path
to the function, pytask will check whether the file exists after the function is executed.
To execute the task, enter pytask
on the command-line
You find the documentation https://pytask-dev.readthedocs.io/en/stable with tutorials and guides for best practices.
Consult the release notes to find out about what is new.
pytask is distributed under the terms of the MIT license.
The license also includes a copyright and permission notice from pytest since some modules, classes, and functions are copied from pytest. Not to mention how pytest has inspired the development of pytask in general. Without the excellent work of Holger Krekel and pytest's many contributors, this project would not have been possible. Thank you!
pytask owes its beautiful appearance on the command line to rich, written by Will McGugan.
Repeating tasks in loops is inspired by ward written by Darren Burns.
If you rely on pytask to manage your research project, please cite it with the following key to help others to discover the tool.
@Unpublished{Raabe2020,
Title = {A Python tool for managing scientific workflows.},
Author = {Tobias Raabe},
Year = {2020},
Url = {https://github.com/pytask-dev/pytask}
}