Allow multi-processing of results via a pool parameter #516

jeremykubica · 2025-09-29T13:46:24Z

Allow the core simulation function to take an executor parameter and distribute the tasks via that executor. This allows the simulation to be parallelized by a variety of mechanisms including ProcessPoolExecutor, Dask, and Ray.

This PR also provides the option (via argument) to output the results to a file instead of returning the NestedFrame so that we can distribute computation without worrying about all of the results fitting in memory together.

review-notebook-app · 2025-09-29T13:46:29Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions · 2025-09-29T13:55:19Z

Before [`e457f29`]	After [`3d44991`]	Ratio	Benchmark (Parameter)
8.57±0.02ms	8.97±0.1ms	1.05	benchmarks.TimeSuite.time_load_passbands
1.05±0.01s	1.07±0.01s	1.02	benchmarks.TimeSuite.time_make_x1_from_hostmass
552±7μs	556±6μs	1.01	benchmarks.TimeSuite.time_apply_passbands
4.46±0.06ms	4.52±0.05ms	1.01	benchmarks.TimeSuite.time_evaluate_salt3_passbands
679±30μs	687±20μs	1.01	benchmarks.TimeSuite.time_fnu_to_flam
4.79±0.04ms	4.85±0.1ms	1.01	benchmarks.TimeSuite.time_lightcurve_source
111±2μs	112±1μs	1.01	benchmarks.TimeSuite.time_sample_x0_from_distmod
11.2±0.1ms	11.2±0.05ms	1	benchmarks.TimeSuite.time_make_evaluate_constant_sed_model
47.4±0.3μs	47.2±0.5μs	1	benchmarks.TimeSuite.time_make_new_salt3_model
34.4±0.4ms	34.1±0.1ms	0.99	benchmarks.TimeSuite.time_additive_multi_model_source

Click here to view all benchmarks.

hombit

I'm not fully convinced that we should have this code at Lynx. First, we make a lot of decisions here about the data flow control, e.g. in this implementation the whole obstable would be pickled and transferred, the whole result dataframe will be pickled and transferred, and all the results (doubled) would be in memory at a few points.

I see different pipeline control decisions which potentially could be made:

Pre-generate parameters, pre-select obstable values.
Do not transfer the result back, just dump them as a parquet file.
pool.map is a common interface, but others exist. .submit/.result interface would allow a progress bar!

I would also love to see batch-size-independent randomness, but I do understand that it could be tricky to achieve.

src/lightcurvelynx/simulate.py

jeremykubica added 9 commits September 28, 2025 11:42

Create a data class to hold simulation information

a8e768e

Add a concat results function

998b402

add test

262be1f

Change how parameter nodes operate

07c800f

Update post_process_results.py

24a77b2

Add example notebook

c9c58d4

Merge branch 'main' into pool

264829b

Merge branch 'main' into pool

acbcc46

Update simulate.py

a140d99

jeremykubica requested a review from hombit September 29, 2025 13:46

hombit reviewed Sep 29, 2025

View reviewed changes

src/lightcurvelynx/simulate.py Outdated Show resolved Hide resolved

src/lightcurvelynx/simulate.py Outdated Show resolved Hide resolved

jeremykubica added 4 commits October 2, 2025 08:04

Merge branch 'main' into pool

a5cd9d3

Change pool to executor

f522c72

Add file support

c189153

Address in person comments

fb82681

jeremykubica requested a review from hombit October 2, 2025 17:19

hombit approved these changes Oct 2, 2025

View reviewed changes

src/lightcurvelynx/simulate.py Show resolved Hide resolved

src/lightcurvelynx/simulate.py Show resolved Hide resolved

src/lightcurvelynx/simulate.py Show resolved Hide resolved

src/lightcurvelynx/simulate.py Outdated Show resolved Hide resolved

jeremykubica added 2 commits October 2, 2025 14:41

Expand doc string

b498645

Add a shutdown for the ray instance

55844a6

jeremykubica merged commit a9b4597 into main Oct 2, 2025
7 checks passed

jeremykubica deleted the pool branch October 2, 2025 19:17

jeremykubica mentioned this pull request Oct 2, 2025

Add parallelization via MPI #504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow multi-processing of results via a pool parameter #516

Allow multi-processing of results via a pool parameter #516

Uh oh!

jeremykubica commented Sep 29, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented Sep 29, 2025

Uh oh!

github-actions bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

hombit left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Allow multi-processing of results via a pool parameter #516

Allow multi-processing of results via a pool parameter #516

Uh oh!

Conversation

jeremykubica commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 29, 2025

Uh oh!

github-actions bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hombit left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeremykubica commented Sep 29, 2025 •

edited

Loading

github-actions bot commented Sep 29, 2025 •

edited

Loading

hombit left a comment •

edited

Loading