Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use parallel reductions to compute sum, maximum, etc #126

Closed
musoke opened this issue Feb 28, 2023 · 7 comments · Fixed by #128
Closed

use parallel reductions to compute sum, maximum, etc #126

musoke opened this issue Feb 28, 2023 · 7 comments · Fixed by #128

Comments

@musoke
Copy link
Owner

musoke commented Feb 28, 2023

No description provided.

@musoke
Copy link
Owner Author

musoke commented Feb 28, 2023

https://github.com/JuliaFolds/FLoops.jl (might be better for replacing @threads on loops and in some cases reducing memory use)

@musoke
Copy link
Owner Author

musoke commented Feb 28, 2023

see also https://juliapackages.com/p/kissthreading for reductions

unmaintained

@musoke musoke changed the title use Folds.jl to parallelise sum, maximum, etc use parallel reductions to compute sum, maximum, etc Feb 28, 2023
@musoke
Copy link
Owner Author

musoke commented Feb 28, 2023

@musoke
Copy link
Owner Author

musoke commented Feb 28, 2023

@musoke
Copy link
Owner Author

musoke commented Feb 28, 2023

@musoke
Copy link
Owner Author

musoke commented Mar 15, 2023

musoke added a commit that referenced this issue Mar 15, 2023
`maximum` and `sum` as defined in Base aren't parallel.  This means that
when more than one thread is available, some are wasted every time they
are computed.  This isn't a huge part of the simulation time, but does happen
every time step, especially if certain summary stats are extracted.

Folds.jl has a nearly drop in replacement for these and other
reductions.

Benchmarks suggest that with 8 threads, moving to Folds.jl gives a ~30%
speed up of each call to `maximum` and ~80% speedup for `sum`.

Fixes: #126
musoke added a commit that referenced this issue Mar 15, 2023
`maximum` and `sum` as defined in Base aren't parallel.  This means that
when more than one thread is available, some are wasted every time they
are computed.  This isn't a huge part of the simulation time, but does happen
every time step, especially if certain summary stats are extracted.

Folds.jl has a nearly drop in replacement for these and other
reductions.

Benchmarks suggest that with 8 threads, moving to Folds.jl gives a
~25-35% speed up of each call to `maximum` and ~60-80% speedup for `sum`:

    julia> include("benchmarks/folds.jl")
    res_min = minimum(results) = 2-element BenchmarkTools.BenchmarkGroup:
      tags: []
      "Base" => 2-element BenchmarkTools.BenchmarkGroup:
	      tags: []
	      "sum" => TrialEstimate(322.835 μs)
	      "maximum" => TrialEstimate(825.208 μs)
      "Folds.jl" => 2-element BenchmarkTools.BenchmarkGroup:
	      tags: []
	      "sum" => TrialEstimate(44.668 μs)
	      "maximum" => TrialEstimate(573.368 μs)
    res_med = median(results) = 2-element BenchmarkTools.BenchmarkGroup:
      tags: []
      "Base" => 2-element BenchmarkTools.BenchmarkGroup:
	      tags: []
	      "sum" => TrialEstimate(332.623 μs)
	      "maximum" => TrialEstimate(865.565 μs)
      "Folds.jl" => 2-element BenchmarkTools.BenchmarkGroup:
	      tags: []
	      "sum" => TrialEstimate(62.110 μs)
	      "maximum" => TrialEstimate(665.724 μs)
    2-element BenchmarkTools.BenchmarkGroup:
      tags: []
      "sum" => TrialJudgement(-81.33% => improvement)
      "maximum" => TrialJudgement(-23.09% => improvement)

Fixes: #126
musoke added a commit that referenced this issue Mar 15, 2023
`maximum` and `sum` as defined in Base aren't parallel.  This means that
when more than one thread is available, some are wasted every time they
are computed.  This isn't a huge part of the simulation time, but does happen
every time step, especially if certain summary stats are extracted.

Folds.jl has a nearly drop in replacement for these and other
reductions.

Benchmarks suggest that with 8 threads, moving to Folds.jl gives a
~25-35% speed up of each call to `maximum` and ~60-80% speedup for `sum`:

    julia> include("benchmarks/folds.jl")
    res_min = minimum(results) = 2-element BenchmarkTools.BenchmarkGroup:
      tags: []
      "Base" => 2-element BenchmarkTools.BenchmarkGroup:
	      tags: []
	      "sum" => TrialEstimate(322.835 μs)
	      "maximum" => TrialEstimate(825.208 μs)
      "Folds.jl" => 2-element BenchmarkTools.BenchmarkGroup:
	      tags: []
	      "sum" => TrialEstimate(44.668 μs)
	      "maximum" => TrialEstimate(573.368 μs)
    res_med = median(results) = 2-element BenchmarkTools.BenchmarkGroup:
      tags: []
      "Base" => 2-element BenchmarkTools.BenchmarkGroup:
	      tags: []
	      "sum" => TrialEstimate(332.623 μs)
	      "maximum" => TrialEstimate(865.565 μs)
      "Folds.jl" => 2-element BenchmarkTools.BenchmarkGroup:
	      tags: []
	      "sum" => TrialEstimate(62.110 μs)
	      "maximum" => TrialEstimate(665.724 μs)
    2-element BenchmarkTools.BenchmarkGroup:
      tags: []
      "sum" => TrialJudgement(-81.33% => improvement)
      "maximum" => TrialJudgement(-23.09% => improvement)

Fixes: #126
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant