Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use assocreduce in mapcompute instead of splatting #73

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DrChainsaw
Copy link
Collaborator

Fixes #72

There is a fair bit of overhead here, more than 100% more when the tree is really large. One argument why this could be acceptable is that people might not use the lazy mode unless they have heavy computations in which case the overhead should not be significant. I don't see assocreduce popping up in the profile viewer either, so maybe it is just more work for Dagger to move the partial results around.

I suppose one could have a more coarse grained splitting of the thunks array than going all the way down to a single element (e.g. doing the current splatting on up to 100-ish elements at the time). Not sure if it is worth the effort and extra moving parts though.

Benchmarks

4 Threads, no Distributed:

julia> smalltree = maketree("root" => [string(i) for i in 1:3]);

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $smalltree; lazy=true))
BenchmarkTools.Trial: 9588 samples with 1 evaluation.
 Range (min  max):  378.400 μs  27.309 ms  ┊ GC (min  max): 0.00%  64.64%
 Time  (median):     404.700 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   518.627 μs ±  1.306 ms  ┊ GC (mean ± σ):  9.67% ±  3.79%
 Memory estimate: 158.53 KiB, allocs estimate: 2728.

# With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $smalltree; lazy=true))
BenchmarkTools.Trial: 6720 samples with 1 evaluation.
 Range (min  max):  488.200 μs  44.203 ms  ┊ GC (min  max): 0.00%  74.48%
 Time  (median):     532.850 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   738.081 μs ±  1.742 ms  ┊ GC (mean ± σ):  9.43% ±  4.04%
 Memory estimate: 191.28 KiB, allocs estimate: 3320.

julia> mediumtree = maketree("root" => [string(i) for i in 1:1000]);

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $mediumtree; lazy=true))
BenchmarkTools.Trial: 75 samples with 1 evaluation.
 Range (min  max):  49.478 ms  85.537 ms  ┊ GC (min  max):  0.00%  21.06%
 Time  (median):     69.424 ms              ┊ GC (median):    21.73%
 Time  (mean ± σ):   66.818 ms ± 10.082 ms  ┊ GC (mean ± σ):  17.39% ± 10.23%
 Memory estimate: 41.53 MiB, allocs estimate: 572385.

# With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $mediumtree; lazy=true))
BenchmarkTools.Trial: 34 samples with 1 evaluation.
 Range (min  max):  136.017 ms  174.173 ms  ┊ GC (min  max): 10.26%  18.41%
 Time  (median):     143.190 ms               ┊ GC (median):    11.37%
 Time  (mean ± σ):   147.357 ms ±   9.639 ms  ┊ GC (mean ± σ):  13.47% ±  3.75%
 Memory estimate: 72.54 MiB, allocs estimate: 1133724.

julia> largetree = maketree("root" => [string(i) for i in 1:10000]);

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $largetree; lazy=true))
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min  max):  806.612 ms     1.073 s  ┊ GC (min  max): 22.84%  35.78%
 Time  (median):     969.528 ms               ┊ GC (median):    31.32%
 Time  (mean ± σ):   960.597 ms ± 107.355 ms  ┊ GC (mean ± σ):  31.01% ±  6.69%
 Memory estimate: 489.02 MiB, allocs estimate: 6003015.

# With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $largetree; lazy=true))
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min  max):  2.060 s     2.708 s  ┊ GC (min  max): 25.88%  31.11%
 Time  (median):     2.257 s               ┊ GC (median):    32.34%
 Time  (mean ± σ):   2.342 s ± 332.086 ms  ┊ GC (mean ± σ):  29.97% ±  3.43%
 Memory estimate: 928.02 MiB, allocs estimate: 11738336.

Distributed with 4 workers (single thread per worker):

julia> addprocs(4; exeflags=["--project=.", "-t 1"], lazy=false);

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $smalltree; lazy=true))
BenchmarkTools.Trial: 1323 samples with 1 evaluation.
 Range (min  max):  2.479 ms  48.137 ms  ┊ GC (min  max): 0.00%  27.85%
 Time  (median):     3.161 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.767 ms ±  3.109 ms  ┊ GC (mean ± σ):  2.54% ±  4.24%
 Memory estimate: 239.81 KiB, allocs estimate: 4244.

# With PR
julia> @benchmark exec(FileTrees.load($Returns(1), $smalltree; lazy=true))
BenchmarkTools.Trial: 803 samples with 1 evaluation.
 Range (min  max):  4.098 ms  41.777 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     5.297 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.220 ms ±  3.890 ms  ┊ GC (mean ± σ):  2.03% ± 4.23%
 Memory estimate: 294.66 KiB, allocs estimate: 5232.

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $mediumtree; lazy=true))
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  3.096 s    3.139 s  ┊ GC (min  max): 1.29%  1.51%
 Time  (median):     3.117 s              ┊ GC (median):    1.40%
 Time  (mean ± σ):   3.117 s ± 30.027 ms  ┊ GC (mean ± σ):  1.40% ± 0.16%
 Memory estimate: 182.52 MiB, allocs estimate: 6762341.

# With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $mediumtree; lazy=true))
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  3.978 s     4.406 s  ┊ GC (min  max): 1.67%  2.08%
 Time  (median):     4.192 s               ┊ GC (median):    1.88%
 Time  (mean ± σ):   4.192 s ± 302.113 ms  ┊ GC (mean ± σ):  1.88% ± 0.29%
 Memory estimate: 236.52 MiB, allocs estimate: 7776384.

# Current main:
julia> @benchmark exec(FileTrees.load($Returns(1), $largetree; lazy=true))
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 406.002 s (1.05% GC) to evaluate,
 with a memory estimate of 14.73 GiB, over 607986258 allocations.

With PR:
julia> @benchmark exec(FileTrees.load($Returns(1), $largetree; lazy=true))
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 401.316 s (1.57% GC) to evaluate,
 with a memory estimate of 15.45 GiB, over 619338178 allocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large lazy trees with heterogenous values causes inference overflow in Dagger
1 participant