Skip to content

Conversation

@jeremykubica
Copy link
Collaborator

@jeremykubica jeremykubica commented Sep 29, 2025

Allow the core simulation function to take an executor parameter and distribute the tasks via that executor. This allows the simulation to be parallelized by a variety of mechanisms including ProcessPoolExecutor, Dask, and Ray.

This PR also provides the option (via argument) to output the results to a file instead of returning the NestedFrame so that we can distribute computation without worrying about all of the results fitting in memory together.

@jeremykubica jeremykubica requested a review from hombit September 29, 2025 13:46
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

github-actions bot commented Sep 29, 2025

Before [e457f29] After [3d44991] Ratio Benchmark (Parameter)
8.57±0.02ms 8.97±0.1ms 1.05 benchmarks.TimeSuite.time_load_passbands
1.05±0.01s 1.07±0.01s 1.02 benchmarks.TimeSuite.time_make_x1_from_hostmass
552±7μs 556±6μs 1.01 benchmarks.TimeSuite.time_apply_passbands
4.46±0.06ms 4.52±0.05ms 1.01 benchmarks.TimeSuite.time_evaluate_salt3_passbands
679±30μs 687±20μs 1.01 benchmarks.TimeSuite.time_fnu_to_flam
4.79±0.04ms 4.85±0.1ms 1.01 benchmarks.TimeSuite.time_lightcurve_source
111±2μs 112±1μs 1.01 benchmarks.TimeSuite.time_sample_x0_from_distmod
11.2±0.1ms 11.2±0.05ms 1 benchmarks.TimeSuite.time_make_evaluate_constant_sed_model
47.4±0.3μs 47.2±0.5μs 1 benchmarks.TimeSuite.time_make_new_salt3_model
34.4±0.4ms 34.1±0.1ms 0.99 benchmarks.TimeSuite.time_additive_multi_model_source

Click here to view all benchmarks.

Copy link
Collaborator

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully convinced that we should have this code at Lynx. First, we make a lot of decisions here about the data flow control, e.g. in this implementation the whole obstable would be pickled and transferred, the whole result dataframe will be pickled and transferred, and all the results (doubled) would be in memory at a few points.

I see different pipeline control decisions which potentially could be made:

  1. Pre-generate parameters, pre-select obstable values.
  2. Do not transfer the result back, just dump them as a parquet file.
  3. pool.map is a common interface, but others exist. .submit/.result interface would allow a progress bar!

I would also love to see batch-size-independent randomness, but I do understand that it could be tricky to achieve.

@jeremykubica jeremykubica requested a review from hombit October 2, 2025 17:19
@jeremykubica jeremykubica merged commit a9b4597 into main Oct 2, 2025
7 checks passed
@jeremykubica jeremykubica deleted the pool branch October 2, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants