|
3 | 3 | [](https://coveralls.io/github/pyiron/executorlib?branch=main)
|
4 | 4 | [](https://mybinder.org/v2/gh/pyiron/executorlib/HEAD?labpath=notebooks%2Fexamples.ipynb)
|
5 | 5 |
|
6 |
| -## Challenges |
7 |
| -In high performance computing (HPC) the Python programming language is commonly used as high-level language to |
8 |
| -orchestrate the coupling of scientific applications. Still the efficient usage of highly parallel HPC clusters remains |
9 |
| -challenging, in primarily three aspects: |
10 |
| - |
11 |
| -* **Communication**: Distributing python function calls over hundreds of compute node and gathering the results on a |
12 |
| - shared file system is technically possible, but highly inefficient. A socket-based communication approach is |
13 |
| - preferable. |
14 |
| -* **Resource Management**: Assigning Python functions to GPUs or executing Python functions on multiple CPUs using the |
15 |
| - message passing interface (MPI) requires major modifications to the python workflow. |
16 |
| -* **Integration**: Existing workflow libraries implement a secondary the job management on the Python level rather than |
17 |
| - leveraging the existing infrastructure provided by the job scheduler of the HPC. |
18 |
| - |
19 |
| -### executorlib is ... |
20 |
| -In a given HPC allocation the `executorlib` library addresses these challenges by extending the Executor interface |
21 |
| -of the standard Python library to support the resource assignment in the HPC context. Computing resources can either be |
22 |
| -assigned on a per function call basis or as a block allocation on a per Executor basis. The `executorlib` library |
23 |
| -is built on top of the [flux-framework](https://flux-framework.org) to enable fine-grained resource assignment. In |
24 |
| -addition, [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) is supported as alternative |
25 |
| -queuing system and for workstation installations `executorlib` can be installed without a job scheduler. |
26 |
| - |
27 |
| -### executorlib is not ... |
28 |
| -The executorlib library is not designed to request an allocation from the job scheduler of an HPC. Instead within a given |
29 |
| -allocation from the job scheduler the `executorlib` library can be employed to distribute a series of python |
30 |
| -function calls over the available computing resources to achieve maximum computing resource utilization. |
31 |
| - |
32 |
| -## Example |
33 |
| -The following examples illustrates how `executorlib` can be used to distribute a series of MPI parallel function calls |
34 |
| -within a queuing system allocation. `example.py`: |
| 6 | +Up-scale python functions for high performance computing (HPC) with executorlib. |
| 7 | + |
| 8 | +## Key Features |
| 9 | +* **Up-scale your Python functions beyond a single computer.** - executorlib extends the [Executor interface](https://docs.python.org/3/library/concurrent.futures.html#executor-objects) |
| 10 | + from the Python standard library and combines it with job schedulers for high performance computing (HPC) including |
| 11 | + the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) and [flux](http://flux-framework.org). |
| 12 | + With this combination executorlib allows users to distribute their Python functions over multiple compute nodes. |
| 13 | +* **Parallelize your Python program one function at a time** - executorlib allows users to assign dedicated computing |
| 14 | + resources like CPU cores, threads or GPUs to one Python function call at a time. So you can accelerate your Python |
| 15 | + code function by function. |
| 16 | +* **Permanent caching of intermediate results to accelerate rapid prototyping** - To accelerate the development of |
| 17 | + machine learning pipelines and simulation workflows executorlib provides optional caching of intermediate results for |
| 18 | + iterative development in interactive environments like jupyter notebooks. |
| 19 | + |
| 20 | +## Examples |
| 21 | +The Python standard library provides the [Executor interface](https://docs.python.org/3/library/concurrent.futures.html#executor-objects) |
| 22 | +with the [ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) and the |
| 23 | +[ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor) for parallel |
| 24 | +execution of Python functions on a single computer. executorlib extends this functionality to distribute Python |
| 25 | +functions over multiple computers within a high performance computing (HPC) cluster. This can be either achieved by |
| 26 | +submitting each function as individual job to the HPC job scheduler - [HPC Submission Mode]() - or by requesting a |
| 27 | +compute allocation of multiple nodes and then distribute the Python functions within this allocation - [HPC Allocation Mode](). |
| 28 | +Finally, to accelerate the development process executorlib also provides a - [Local Mode]() - to use the executorlib |
| 29 | +functionality on a single workstation for testing. Starting with the [Local Mode]() set by setting the backend parameter |
| 30 | +to local - `backend="local"`: |
35 | 31 | ```python
|
36 |
| -import flux.job |
37 | 32 | from executorlib import Executor
|
38 | 33 |
|
| 34 | + |
| 35 | +with Executor(backend="local") as exe: |
| 36 | + future_lst = [exe.submit(sum, [i, i]) for i in range(1, 5)] |
| 37 | + print([f.result() for f in future_lst]) |
| 38 | +``` |
| 39 | +In the same way executorlib can also execute Python functions which use additional computing resources, like multiple |
| 40 | +CPU cores, CPU threads or GPUs. For example if the Python function internally uses the Message Passing Interface (MPI) |
| 41 | +via the [mpi4py](https://mpi4py.readthedocs.io) Python libary: |
| 42 | +```python |
| 43 | +from executorlib import Executor |
| 44 | + |
| 45 | + |
39 | 46 | def calc(i):
|
40 | 47 | from mpi4py import MPI
|
| 48 | + |
41 | 49 | size = MPI.COMM_WORLD.Get_size()
|
42 | 50 | rank = MPI.COMM_WORLD.Get_rank()
|
43 | 51 | return i, size, rank
|
44 | 52 |
|
45 |
| -with flux.job.FluxExecutor() as flux_exe: |
46 |
| - with Executor(max_cores=2, executor=flux_exe, resource_dict={"cores": 2}) as exe: |
47 |
| - fs = exe.submit(calc, 3) |
48 |
| - print(fs.result()) |
49 |
| -``` |
50 |
| -This example can be executed using: |
51 |
| -``` |
52 |
| -python example.py |
53 |
| -``` |
54 |
| -Which returns: |
55 |
| -``` |
56 |
| ->>> [(0, 2, 0), (0, 2, 1)], [(1, 2, 0), (1, 2, 1)] |
57 |
| -``` |
58 |
| -The important part in this example is that [mpi4py](https://mpi4py.readthedocs.io) is only used in the `calc()` |
59 |
| -function, not in the python script, consequently it is not necessary to call the script with `mpiexec` but instead |
60 |
| -a call with the regular python interpreter is sufficient. This highlights how `executorlib` allows the users to |
61 |
| -parallelize one function at a time and not having to convert their whole workflow to use [mpi4py](https://mpi4py.readthedocs.io). |
62 |
| -The same code can also be executed inside a jupyter notebook directly which enables an interactive development process. |
63 |
| - |
64 |
| -The interface of the standard [concurrent.futures.Executor](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures) |
65 |
| -is extended by adding the option `cores_per_worker=2` to assign multiple MPI ranks to each function call. To create two |
66 |
| -workers the maximum number of cores can be increased to `max_cores=4`. In this case each worker receives two cores |
67 |
| -resulting in a total of four CPU cores being utilized. |
68 |
| - |
69 |
| -After submitting the function `calc()` with the corresponding parameter to the executor `exe.submit(calc, 0)` |
70 |
| -a python [`concurrent.futures.Future`](https://docs.python.org/3/library/concurrent.futures.html#future-objects) is |
71 |
| -returned. Consequently, the `executorlib.Executor` can be used as a drop-in replacement for the |
72 |
| -[`concurrent.futures.Executor`](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures) |
73 |
| -which allows the user to add parallelism to their workflow one function at a time. |
74 |
| - |
75 |
| -## Disclaimer |
76 |
| -While we try to develop a stable and reliable software library, the development remains a opensource project under the |
77 |
| -BSD 3-Clause License without any warranties:: |
| 53 | + |
| 54 | +with Executor(backend="local") as exe: |
| 55 | + fs = exe.submit(calc, 3, resource_dict={"cores": 2}) |
| 56 | + print(fs.result()) |
78 | 57 | ```
|
79 |
| -BSD 3-Clause License |
80 |
| -
|
81 |
| -Copyright (c) 2022, Jan Janssen |
82 |
| -All rights reserved. |
83 |
| -
|
84 |
| -Redistribution and use in source and binary forms, with or without |
85 |
| -modification, are permitted provided that the following conditions are met: |
86 |
| -
|
87 |
| -* Redistributions of source code must retain the above copyright notice, this |
88 |
| - list of conditions and the following disclaimer. |
89 |
| -
|
90 |
| -* Redistributions in binary form must reproduce the above copyright notice, |
91 |
| - this list of conditions and the following disclaimer in the documentation |
92 |
| - and/or other materials provided with the distribution. |
93 |
| -
|
94 |
| -* Neither the name of the copyright holder nor the names of its |
95 |
| - contributors may be used to endorse or promote products derived from |
96 |
| - this software without specific prior written permission. |
97 |
| -
|
98 |
| -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" |
99 |
| -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
100 |
| -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE |
101 |
| -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE |
102 |
| -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
103 |
| -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR |
104 |
| -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER |
105 |
| -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, |
106 |
| -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE |
107 |
| -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| 58 | +The additional `resource_dict` parameter defines the computing resources allocated to the execution of the submitted |
| 59 | +Python function. In addition to the compute cores `cores`, the resource dictionary can also define the threads per core |
| 60 | +as `threads_per_core`, the GPUs per core as `gpus_per_core`, the working directory with `cwd`, the option to use the |
| 61 | +OpenMPI oversubscribe feature with `openmpi_oversubscribe` and finally for the [Simple Linux Utility for Resource |
| 62 | +Management (SLURM)](https://slurm.schedmd.com) queuing system the option to provide additional command line arguments |
| 63 | +with the `slurm_cmd_args` parameter - [resource dictionary](). |
| 64 | + |
| 65 | +This flexibility to assign computing resources on a per-function-call basis simplifies the up-scaling of Python programs. |
| 66 | +Only the part of the Python functions which benefit from parallel execution are implemented as MPI parallel Python |
| 67 | +funtions, while the rest of the program remains serial. |
| 68 | + |
| 69 | +The same function can be submitted to the [SLURM](https://slurm.schedmd.com) queuing by just changing the `backend` |
| 70 | +parameter to `slurm_submission`. The rest of the example remains the same, which highlights how executorlib accelerates |
| 71 | +the rapid prototyping and up-scaling of HPC Python programs. |
| 72 | +```python |
| 73 | +from executorlib import Executor |
| 74 | + |
| 75 | + |
| 76 | +def calc(i): |
| 77 | + from mpi4py import MPI |
| 78 | + |
| 79 | + size = MPI.COMM_WORLD.Get_size() |
| 80 | + rank = MPI.COMM_WORLD.Get_rank() |
| 81 | + return i, size, rank |
| 82 | + |
| 83 | + |
| 84 | +with Executor(backend="slurm_submission") as exe: |
| 85 | + fs = exe.submit(calc, 3, resource_dict={"cores": 2}) |
| 86 | + print(fs.result()) |
108 | 87 | ```
|
| 88 | +In this case the [Python simple queuing system adapter (pysqa)](https://pysqa.readthedocs.io) is used to submit the |
| 89 | +`calc()` function to the [SLURM](https://slurm.schedmd.com) job scheduler and request an allocation with two CPU cores |
| 90 | +for the execution of the function - [HPC Submission Mode](). In the background the [sbatch](https://slurm.schedmd.com/sbatch.html) |
| 91 | +command is used to request the allocation to execute the Python function. |
| 92 | + |
| 93 | +Within a given [SLURM](https://slurm.schedmd.com) allocation executorlib can also be used to assign a subset of the |
| 94 | +available computing resources to execute a given Python function. In terms of the [SLURM](https://slurm.schedmd.com) |
| 95 | +commands, this functionality internally uses the [srun](https://slurm.schedmd.com/srun.html) command to receive a subset |
| 96 | +of the resources of a given queuing system allocation. |
| 97 | +```python |
| 98 | +from executorlib import Executor |
| 99 | + |
109 | 100 |
|
110 |
| -# Documentation |
| 101 | +def calc(i): |
| 102 | + from mpi4py import MPI |
| 103 | + |
| 104 | + size = MPI.COMM_WORLD.Get_size() |
| 105 | + rank = MPI.COMM_WORLD.Get_rank() |
| 106 | + return i, size, rank |
| 107 | + |
| 108 | + |
| 109 | +with Executor(backend="slurm_allocation") as exe: |
| 110 | + fs = exe.submit(calc, 3, resource_dict={"cores": 2}) |
| 111 | + print(fs.result()) |
| 112 | +``` |
| 113 | +In addition, to support for [SLURM](https://slurm.schedmd.com) executorlib also provides support for the hierarchical |
| 114 | +[flux](http://flux-framework.org) job scheduler. The [flux](http://flux-framework.org) job scheduler is developed at |
| 115 | +[Larwence Livermore National Laboratory](https://computing.llnl.gov/projects/flux-building-framework-resource-management) |
| 116 | +to address the needs for the up-coming generation of Exascale computers. Still even on traditional HPC clusters the |
| 117 | +hierarchical approach of the [flux](http://flux-framework.org) is beneficial to distribute hundreds of tasks within a |
| 118 | +given allocation. Even when [SLURM](https://slurm.schedmd.com) is used as primary job scheduler of your HPC, it is |
| 119 | +recommended to use [SLURM with flux]() as hierarchical job scheduler within the allocations. |
| 120 | + |
| 121 | +## Documentation |
111 | 122 | * [Installation](https://executorlib.readthedocs.io/en/latest/installation.html)
|
112 | 123 | * [Compatible Job Schedulers](https://executorlib.readthedocs.io/en/latest/installation.html#compatible-job-schedulers)
|
113 | 124 | * [executorlib with Flux Framework](https://executorlib.readthedocs.io/en/latest/installation.html#executorlib-with-flux-framework)
|
|
0 commit comments