Use mpiexec only for parallel execution #264

jan-janssen · 2024-02-16T21:26:51Z

@ligerzero-ai based on your performance analysis I designed a small benchmark. Using this benchmark I found the bug that is fixed with this pull request. The updated results of the benchmark are posted below. Now pympipool is competitive with the ProcessPoolExecutor. Can you test this branch with your own benchmark?

Code

# Benchmark: 
# >>> python benchmark.py static
# Result: 79.38484382629395
# >>> python benchmark.py process
# Result: 33.66195225715637
# >>> python benchmark.py thread
# Result: 52.65341782569885
# >>> mpiexec -n 4 python -m mpi4py.futures benchmark.py mpi4py
# Result: 39.54463863372803
# >>> python benchmark.py pympipool
# Result: 34.79054880142212
# >>> python benchmark.py flux
# Result: 36.002100467681885

import sys
from time import time


def llh_numpy(mean, sigma):
    import numpy
    data = numpy.random.normal(size=100000000).astype('float64')
    s = (data - mean) ** 2 / (2 * (sigma ** 2))
    pdfs = numpy.exp(- s)
    pdfs /= numpy.sqrt(2 * numpy.pi) * sigma
    return numpy.log(pdfs).sum()


def run_with_executor(executor=None, mean=0.1, sigma=1.1, runs=32, **kwargs):
    with executor(**kwargs) as exe:
        future_lst = [exe.submit(llh_numpy, mean=mean, sigma=sigma) for i in range(runs)]
        return [f.result() for f in future_lst]


def run_static(mean=0.1, sigma=1.1, runs=32):
    return [llh_numpy(mean=mean, sigma=sigma) for i in range(runs)]


if __name__ == "__main__":
    run_mode = sys.argv[1]
    start_time = time()
    if run_mode == "static":
        run_static(mean=0.1, sigma=1.1, runs=32)
    elif run_mode == "process":
        from concurrent.futures import ProcessPoolExecutor
        run_with_executor(executor=ProcessPoolExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    elif run_mode == "thread":
        from concurrent.futures import ThreadPoolExecutor
        run_with_executor(executor=ThreadPoolExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    elif run_mode == "pympipool":
        from pympipool.mpi.executor import PyMPIExecutor
        run_with_executor(executor=PyMPIExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    elif run_mode == "flux":
        from pympipool.flux.executor import PyFluxExecutor
        run_with_executor(executor=PyFluxExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    elif run_mode == "mpi4py":
        from mpi4py.futures import MPIPoolExecutor
        run_with_executor(executor=MPIPoolExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    else:
        raise ValueError(run_mode)
    stop_time = time()
    print("Result:", stop_time-start_time)

Results

Command	Runtime: cmpc06 [s]
`python benchmark.py static`	79.38484382629395
`python benchmark.py process`	33.66195225715637
`python benchmark.py thread`	52.65341782569885
`mpiexec -n 4 python -m mpi4py.futures benchmark.py mpi4py`	39.54463863372803
`python benchmark.py pympipool`	34.79054880142212
`flux start; python benchmark.py flux`	36.002100467681885

jan-janssen · 2024-02-17T10:08:41Z

As an internal reference - the benchmark from @ligerzero-ai : https://github.com/orgs/pyiron/discussions/211#discussioncomment-8034046

jan-janssen · 2024-02-17T10:25:07Z

@XzzX just for your awareness: I am currently benchmarking the Executor we use inside pyiron for parallelisation. As you have quite some experience with optimising the performance of simulation codes, any suggestions are highly appreciated.

jan-janssen · 2024-02-18T07:43:49Z

The benchmark is now included in the continuous integration setup.

jan-janssen · 2024-02-18T07:45:20Z

OpenMPI

static 119.52881598472595
process 50.263325691223145
thread 70.6196928024292
mpi4py 55.50081467628479
pympipool 53.83580446243286

MPIch

static 119.12273931503296
process 50.84898900985718
thread 70.48506259918213
mpi4py 56.26586318016052
pympipool 54.29321050643921

ligerzero-ai · 2024-02-18T13:41:40Z

@ligerzero-ai based on your performance analysis I designed a small benchmark. Using this benchmark I found the bug that is fixed with this pull request. The updated results of the benchmark are posted below. Now pympipool is competitive with the ProcessPoolExecutor. Can you test this branch with your own benchmark?

Code

# Benchmark: 
# >>> python benchmark.py static
# Result: 79.38484382629395
# >>> python benchmark.py process
# Result: 33.66195225715637
# >>> python benchmark.py thread
# Result: 52.65341782569885
# >>> mpiexec -n 4 python -m mpi4py.futures benchmark.py mpi4py
# Result: 39.54463863372803
# >>> python benchmark.py pympipool
# Result: 34.79054880142212
# >>> python benchmark.py flux
# Result: 36.002100467681885

import sys
from time import time


def llh_numpy(mean, sigma):
    import numpy
    data = numpy.random.normal(size=[100000000](tel:100000000)).astype('float64')
    s = (data - mean) ** 2 / (2 * (sigma ** 2))
    pdfs = numpy.exp(- s)
    pdfs /= numpy.sqrt(2 * numpy.pi) * sigma
    return numpy.log(pdfs).sum()


def run_with_executor(executor=None, mean=0.1, sigma=1.1, runs=32, **kwargs):
    with executor(**kwargs) as exe:
        future_lst = [exe.submit(llh_numpy, mean=mean, sigma=sigma) for i in range(runs)]
        return [f.result() for f in future_lst]


def run_static(mean=0.1, sigma=1.1, runs=32):
    return [llh_numpy(mean=mean, sigma=sigma) for i in range(runs)]


if __name__ == "__main__":
    run_mode = sys.argv[1]
    start_time = time()
    if run_mode == "static":
        run_static(mean=0.1, sigma=1.1, runs=32)
    elif run_mode == "process":
        from concurrent.futures import ProcessPoolExecutor
        run_with_executor(executor=ProcessPoolExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    elif run_mode == "thread":
        from concurrent.futures import ThreadPoolExecutor
        run_with_executor(executor=ThreadPoolExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    elif run_mode == "pympipool":
        from pympipool.mpi.executor import PyMPIExecutor
        run_with_executor(executor=PyMPIExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    elif run_mode == "flux":
        from pympipool.flux.executor import PyFluxExecutor
        run_with_executor(executor=PyFluxExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    elif run_mode == "mpi4py":
        from mpi4py.futures import MPIPoolExecutor
        run_with_executor(executor=MPIPoolExecutor, mean=0.1, sigma=1.1, runs=32, max_workers=4)
    else:
        raise ValueError(run_mode)
    stop_time = time()
    print("Result:", stop_time-start_time)

Results

Command Runtime: cmpc06 [s]
python benchmark.py static 79.38484382629395
python benchmark.py process 33.66195225715637
python benchmark.py thread 52.65341782569885
mpiexec -n 4 python -m mpi4py.futures benchmark.py mpi4py 39.54463863372803
python benchmark.py pympipool 34.79054880142212
flux start; python benchmark.py flux 36.002100467681885

Sure, get it done this week (hopefully)

Use mpiexec only for parallel execution

0b5311c

jan-janssen marked this pull request as draft February 16, 2024 21:26

Update interface.py

308b7b4

jan-janssen marked this pull request as ready for review February 17, 2024 10:11

jan-janssen mentioned this pull request Feb 17, 2024

Use the same executor for pyiron table and function container pyiron/pyiron_base#1334

Merged

jan-janssen added 5 commits February 18, 2024 07:42

Add CI for benchmark

596acb1

save runtime in extra file

b9d553c

split running and analysis

705d47a

benchmark without flux

e05ad84

validate benchmark

f76f7b6

jan-janssen merged commit d9edd2d into main Feb 18, 2024

jan-janssen deleted the mpi_only_for_parallel branch February 18, 2024 12:47

jan-janssen mentioned this pull request Apr 18, 2024

[documentation] Add scaling tests #245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use mpiexec only for parallel execution #264

Use mpiexec only for parallel execution #264

Uh oh!

jan-janssen commented Feb 16, 2024 •

edited

Loading

Uh oh!

jan-janssen commented Feb 17, 2024

Uh oh!

jan-janssen commented Feb 17, 2024

Uh oh!

jan-janssen commented Feb 18, 2024

Uh oh!

jan-janssen commented Feb 18, 2024

Uh oh!

ligerzero-ai commented Feb 18, 2024

Code

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use mpiexec only for parallel execution #264

Use mpiexec only for parallel execution #264

Uh oh!

Conversation

jan-janssen commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code

Results

Uh oh!

jan-janssen commented Feb 17, 2024

Uh oh!

jan-janssen commented Feb 17, 2024

Uh oh!

jan-janssen commented Feb 18, 2024

Uh oh!

jan-janssen commented Feb 18, 2024

OpenMPI

MPIch

Uh oh!

ligerzero-ai commented Feb 18, 2024

Code

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jan-janssen commented Feb 16, 2024 •

edited

Loading