Skip to content

Commit b155236

Browse files
committed
Do imaginary time-propagation in-place
1 parent 597a93d commit b155236

File tree

3 files changed

+306
-93
lines changed

3 files changed

+306
-93
lines changed

README.md

Lines changed: 124 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
See https://discourse.julialang.org/t/scaling-of-threads-for-trivially-parallel-problem/92949/1
44

5+
**Note**: The parallelization scaling issue was resolved by reducing the number of allocations. The original benchmark results are in the [old version of this README](https://github.com/goerz-testing/2023-01_rotating_tai_benchmark/blob/597a93ddc58b7e4cb92e6c8d3900f4776ddf748b/README.md#benchmark-of-julia-threads-for-trivially-parallel-problem).
6+
57
```
68
julia --project="." -e 'using Pkg; Pkg.instantiate()'
79
julia --project="." -t 1 benchmark.jl
@@ -96,24 +98,24 @@ Note: compared to the [original issue](https://discourse.julialang.org/t/scaling
9698

9799
```
98100
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark.jl
99-
0.621074 seconds (1.13 M allocations: 1.605 GiB, 10.51% gc time)
100-
3.164572 seconds (4.70 M allocations: 1.745 GiB, 1.30% gc time, 0.05% compilation time)
101-
218.653798 seconds (382.22 M allocations: 414.555 GiB, 3.55% gc time, 0.03% compilation time)
101+
0.208167 seconds (33.80 k allocations: 1.917 MiB)
102+
2.751524 seconds (3.60 M allocations: 144.673 MiB, 0.75% gc time, 0.07% compilation time)
103+
120.464164 seconds (100.59 M allocations: 4.076 GiB, 0.33% gc time, 0.03% compilation time)
102104
103105
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark.jl
104-
0.634131 seconds (1.13 M allocations: 1.605 GiB, 9.40% gc time)
105-
3.179692 seconds (4.70 M allocations: 1.745 GiB, 1.41% gc time, 0.05% compilation time)
106-
161.966233 seconds (393.42 M allocations: 414.873 GiB, 4.29% gc time, 0.05% compilation time)
106+
0.207674 seconds (33.80 k allocations: 1.917 MiB)
107+
2.714671 seconds (3.60 M allocations: 144.673 MiB, 0.72% gc time, 0.06% compilation time)
108+
94.199065 seconds (100.59 M allocations: 4.076 GiB, 0.26% gc time, 0.05% compilation time)
107109
108110
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark.jl
109-
0.634875 seconds (1.13 M allocations: 1.605 GiB, 10.65% gc time)
110-
3.158805 seconds (4.70 M allocations: 1.745 GiB, 1.43% gc time, 0.05% compilation time)
111-
150.478485 seconds (419.44 M allocations: 415.518 GiB, 3.84% gc time, 0.06% compilation time)
111+
0.211552 seconds (33.80 k allocations: 1.917 MiB)
112+
2.811510 seconds (3.60 M allocations: 144.673 MiB, 0.70% gc time, 0.05% compilation time)
113+
80.154506 seconds (100.59 M allocations: 4.076 GiB, 0.37% gc time, 0.06% compilation time)
112114
113115
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark.jl
114-
0.637444 seconds (1.13 M allocations: 1.605 GiB, 10.64% gc time)
115-
3.157171 seconds (4.70 M allocations: 1.745 GiB, 1.62% gc time, 0.05% compilation time)
116-
148.714731 seconds (459.50 M allocations: 416.672 GiB, 3.43% gc time, 0.06% compilation time)
116+
0.210447 seconds (33.80 k allocations: 1.917 MiB)
117+
2.786815 seconds (3.60 M allocations: 144.673 MiB, 0.70% gc time, 0.05% compilation time)
118+
66.755464 seconds (100.59 M allocations: 4.076 GiB, 0.68% gc time, 0.07% compilation time)
117119
```
118120

119121

@@ -130,21 +132,21 @@ This means that every call to `propagate_splitting` does the same exact thing, s
130132

131133
```
132134
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark2.jl
133-
0.629423 seconds (1.13 M allocations: 1.605 GiB, 9.97% gc time)
134-
0.593253 seconds (1.13 M allocations: 1.605 GiB, 4.79% gc time, 0.47% compilation time)
135-
153.666410 seconds (290.36 M allocations: 410.962 GiB, 5.05% gc time, 0.04% compilation time)
135+
0.210143 seconds (33.80 k allocations: 1.917 MiB)
136+
0.221251 seconds (33.88 k allocations: 1.922 MiB, 1.29% compilation time)
137+
54.201312 seconds (8.73 M allocations: 494.694 MiB, 0.10% gc time, 0.07% compilation time)
136138
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark2.jl
137-
0.633268 seconds (1.13 M allocations: 1.605 GiB, 9.65% gc time)
138-
0.596045 seconds (1.13 M allocations: 1.605 GiB, 4.71% gc time, 0.49% compilation time)
139-
98.435518 seconds (309.35 M allocations: 411.444 GiB, 6.31% gc time, 0.08% compilation time)
139+
0.212420 seconds (33.80 k allocations: 1.917 MiB)
140+
0.225637 seconds (33.88 k allocations: 1.922 MiB, 1.24% compilation time)
141+
27.806800 seconds (8.73 M allocations: 494.716 MiB, 0.24% gc time, 0.17% compilation time)
140142
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark2.jl
141-
0.644217 seconds (1.13 M allocations: 1.605 GiB, 9.94% gc time)
142-
0.625575 seconds (1.13 M allocations: 1.605 GiB, 6.09% gc time, 0.46% compilation time)
143-
99.298586 seconds (333.82 M allocations: 412.088 GiB, 5.35% gc time, 0.08% compilation time)
143+
0.213204 seconds (33.80 k allocations: 1.917 MiB)
144+
0.215185 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
145+
14.431739 seconds (8.73 M allocations: 494.777 MiB, 0.41% gc time, 0.34% compilation time)
144146
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark2.jl
145-
0.639156 seconds (1.13 M allocations: 1.605 GiB, 10.09% gc time)
146-
0.618517 seconds (1.13 M allocations: 1.605 GiB, 6.23% gc time, 0.47% compilation time)
147-
121.498563 seconds (374.74 M allocations: 413.286 GiB, 3.96% gc time, 0.07% compilation time)
147+
0.207609 seconds (33.80 k allocations: 1.917 MiB)
148+
0.212293 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
149+
8.424910 seconds (8.73 M allocations: 494.711 MiB, 0.48% compilation time)
148150
```
149151

150152
### Refactoring: t_r outside of inner loop
@@ -180,21 +182,21 @@ end
180182

181183
```
182184
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark3.jl
183-
0.637919 seconds (1.13 M allocations: 1.605 GiB, 10.09% gc time)
184-
0.616912 seconds (1.13 M allocations: 1.605 GiB, 5.80% gc time, 0.46% compilation time)
185-
155.916642 seconds (290.35 M allocations: 410.961 GiB, 5.63% gc time, 0.04% compilation time)
185+
0.206196 seconds (33.80 k allocations: 1.917 MiB)
186+
0.209529 seconds (33.88 k allocations: 1.922 MiB, 1.35% compilation time)
187+
53.185182 seconds (8.72 M allocations: 494.140 MiB, 0.10% gc time, 0.07% compilation time)
186188
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark3.jl
187-
0.649860 seconds (1.13 M allocations: 1.605 GiB, 9.97% gc time)
188-
0.619335 seconds (1.13 M allocations: 1.605 GiB, 5.39% gc time, 0.46% compilation time)
189-
101.228960 seconds (310.37 M allocations: 411.469 GiB, 6.39% gc time, 0.08% compilation time)
189+
0.208810 seconds (33.80 k allocations: 1.917 MiB)
190+
0.212513 seconds (33.88 k allocations: 1.922 MiB, 1.31% compilation time)
191+
27.373068 seconds (8.72 M allocations: 494.182 MiB, 0.16% gc time, 0.17% compilation time)
190192
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark3.jl
191-
0.647611 seconds (1.13 M allocations: 1.605 GiB, 10.04% gc time)
192-
0.626761 seconds (1.13 M allocations: 1.605 GiB, 6.26% gc time, 0.46% compilation time)
193-
98.742242 seconds (333.60 M allocations: 412.081 GiB, 5.43% gc time, 0.08% compilation time)
193+
0.211508 seconds (33.80 k allocations: 1.917 MiB)
194+
0.213555 seconds (33.88 k allocations: 1.922 MiB, 1.53% compilation time)
195+
14.384025 seconds (8.72 M allocations: 494.191 MiB, 0.34% gc time, 0.32% compilation time)
194196
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark3.jl
195-
0.630414 seconds (1.13 M allocations: 1.605 GiB, 7.85% gc time)
196-
0.613773 seconds (1.13 M allocations: 1.605 GiB, 4.99% gc time, 0.46% compilation time)
197-
119.975070 seconds (374.90 M allocations: 413.286 GiB, 3.85% gc time, 0.07% compilation time)
197+
0.212579 seconds (33.80 k allocations: 1.917 MiB)
198+
0.216517 seconds (33.88 k allocations: 1.922 MiB, 1.35% compilation time)
199+
8.450139 seconds (8.72 M allocations: 494.156 MiB, 0.45% compilation time)
198200
```
199201

200202
### Refactoring: "false sharing"
@@ -223,21 +225,21 @@ end
223225

224226
```
225227
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark4.jl
226-
0.634388 seconds (1.13 M allocations: 1.605 GiB, 10.14% gc time)
227-
0.619022 seconds (1.13 M allocations: 1.605 GiB, 6.21% gc time, 0.47% compilation time)
228-
156.127696 seconds (290.42 M allocations: 410.965 GiB, 5.66% gc time, 0.06% compilation time)
228+
0.204621 seconds (33.80 k allocations: 1.917 MiB)
229+
0.210162 seconds (33.88 k allocations: 1.922 MiB, 1.38% compilation time)
230+
53.749415 seconds (8.79 M allocations: 498.078 MiB, 0.14% gc time, 0.12% compilation time)
229231
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark4.jl
230-
0.639602 seconds (1.13 M allocations: 1.605 GiB, 10.56% gc time)
231-
0.606554 seconds (1.13 M allocations: 1.605 GiB, 5.80% gc time, 0.48% compilation time)
232-
98.933797 seconds (305.89 M allocations: 411.390 GiB, 6.50% gc time, 0.11% compilation time)
232+
0.211242 seconds (33.80 k allocations: 1.917 MiB)
233+
0.215194 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
234+
27.770841 seconds (8.79 M allocations: 498.106 MiB, 0.29% gc time, 0.26% compilation time)
233235
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark4.jl
234-
0.637588 seconds (1.13 M allocations: 1.605 GiB, 9.29% gc time)
235-
0.612409 seconds (1.13 M allocations: 1.605 GiB, 5.37% gc time, 0.47% compilation time)
236-
98.618285 seconds (333.16 M allocations: 412.074 GiB, 5.18% gc time, 0.11% compilation time)
236+
0.211822 seconds (33.80 k allocations: 1.917 MiB)
237+
0.213592 seconds (33.88 k allocations: 1.922 MiB, 1.35% compilation time)
238+
14.596262 seconds (8.79 M allocations: 498.086 MiB, 0.30% gc time, 0.45% compilation time)
237239
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark4.jl
238-
0.637485 seconds (1.13 M allocations: 1.605 GiB, 8.71% gc time)
239-
0.622894 seconds (1.13 M allocations: 1.605 GiB, 5.53% gc time, 0.49% compilation time)
240-
121.259312 seconds (373.66 M allocations: 413.257 GiB, 3.95% gc time, 0.09% compilation time)
240+
0.209312 seconds (33.80 k allocations: 1.917 MiB)
241+
0.212435 seconds (33.88 k allocations: 1.922 MiB, 1.36% compilation time)
242+
8.479042 seconds (8.79 M allocations: 498.104 MiB, 0.29% gc time, 1.08% compilation time)
241243
```
242244

243245
### Sleeping
@@ -257,21 +259,21 @@ end
257259

258260
```
259261
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark5.jl
260-
0.601872 seconds (7 allocations: 176 bytes)
261-
0.604645 seconds (90 allocations: 5.031 KiB, 0.46% compilation time)
262-
154.111709 seconds (73.22 k allocations: 3.763 MiB, 0.02% compilation time)
262+
0.601209 seconds (7 allocations: 176 bytes)
263+
0.604624 seconds (90 allocations: 5.031 KiB, 0.44% compilation time)
264+
154.104534 seconds (73.22 k allocations: 3.763 MiB, 0.02% compilation time)
263265
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark5.jl
264-
0.601741 seconds (7 allocations: 176 bytes)
265-
0.604557 seconds (90 allocations: 5.031 KiB, 0.45% compilation time)
266-
77.088923 seconds (73.25 k allocations: 3.765 MiB, 0.05% compilation time)
266+
0.602025 seconds (7 allocations: 176 bytes)
267+
0.604489 seconds (90 allocations: 5.031 KiB, 0.44% compilation time)
268+
77.081345 seconds (73.25 k allocations: 3.765 MiB, 0.05% compilation time)
267269
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark5.jl
268-
0.601994 seconds (7 allocations: 176 bytes)
269-
0.604744 seconds (90 allocations: 5.031 KiB, 0.45% compilation time)
270-
38.558808 seconds (73.31 k allocations: 3.768 MiB, 0.09% compilation time)
270+
0.602031 seconds (7 allocations: 176 bytes)
271+
0.604647 seconds (90 allocations: 5.031 KiB, 0.46% compilation time)
272+
38.565316 seconds (73.31 k allocations: 3.768 MiB, 0.09% compilation time)
271273
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark5.jl
272-
0.601660 seconds (7 allocations: 176 bytes)
273-
0.604487 seconds (90 allocations: 5.031 KiB, 0.45% compilation time)
274-
19.300890 seconds (73.41 k allocations: 3.775 MiB, 0.19% compilation time)
274+
0.601987 seconds (7 allocations: 176 bytes)
275+
0.604643 seconds (90 allocations: 5.031 KiB, 0.46% compilation time)
276+
19.299850 seconds (73.42 k allocations: 3.775 MiB, 0.19% compilation time)
275277
```
276278

277279
### Multi-process parallelization
@@ -304,39 +306,81 @@ Note that some overhead here is expected due to the `vcat` reduction.
304306
:> julia -p1 benchmark_distributed_driver.jl
305307
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
306308
Activating project at `~/2023-01_rotating_tai_benchmark`
307-
0.659220 seconds (1.13 M allocations: 1.605 GiB, 10.05% gc time)
308-
0.623259 seconds (1.13 M allocations: 1.605 GiB, 6.04% gc time, 0.45% compilation time)
309-
159.409128 seconds (2.71 M allocations: 179.434 MiB, 0.01% gc time, 0.42% compilation time: 1% of which was recompilation)
309+
0.206928 seconds (33.80 k allocations: 1.917 MiB)
310+
0.209682 seconds (33.88 k allocations: 1.922 MiB, 1.31% compilation time)
311+
59.544753 seconds (2.71 M allocations: 179.497 MiB, 0.07% gc time, 1.15% compilation time: 1% of which was recompilation)
310312
311313
:> julia -p2 benchmark_distributed_driver.jl
312314
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
313315
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
314316
Activating project at `~/2023-01_rotating_tai_benchmark`
315-
0.665675 seconds (1.13 M allocations: 1.605 GiB, 10.35% gc time)
316-
0.631846 seconds (1.13 M allocations: 1.605 GiB, 6.31% gc time, 0.45% compilation time)
317-
91.579614 seconds (2.71 M allocations: 179.529 MiB, 0.02% gc time, 0.75% compilation time: 1% of which was recompilation)
317+
0.232625 seconds (33.80 k allocations: 1.917 MiB, 9.91% gc time)
318+
0.211565 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
319+
34.113498 seconds (2.71 M allocations: 179.529 MiB, 0.05% gc time, 2.00% compilation time: 1% of which was recompilation)
318320
319321
:> julia -p4 benchmark_distributed_driver.jl
320-
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
321-
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
322322
From worker 4: Activating project at `~/2023-01_rotating_tai_benchmark`
323+
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
324+
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
323325
From worker 5: Activating project at `~/2023-01_rotating_tai_benchmark`
324326
Activating project at `~/2023-01_rotating_tai_benchmark`
325-
0.667681 seconds (1.13 M allocations: 1.605 GiB, 10.48% gc time)
326-
0.623414 seconds (1.13 M allocations: 1.605 GiB, 6.34% gc time, 0.45% compilation time)
327-
56.961555 seconds (2.71 M allocations: 179.590 MiB, 0.04% gc time, 1.23% compilation time: 1% of which was recompilation)
327+
0.209870 seconds (33.80 k allocations: 1.917 MiB)
328+
0.213252 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
329+
21.967531 seconds (2.71 M allocations: 179.590 MiB, 0.08% gc time, 3.15% compilation time: 1% of which was recompilation)
328330
329331
:> julia -p8 benchmark_distributed_driver.jl
330-
From worker 6: Activating project at `~/2023-01_rotating_tai_benchmark`
331-
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
332-
From worker 5: Activating project at `~/2023-01_rotating_tai_benchmark`
333-
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
334-
From worker 7: Activating project at `~/2023-01_rotating_tai_benchmark`
332+
From worker 9: Activating project at `~/2023-01_rotating_tai_benchmark`
335333
From worker 4: Activating project at `~/2023-01_rotating_tai_benchmark`
334+
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
336335
From worker 8: Activating project at `~/2023-01_rotating_tai_benchmark`
337-
From worker 9: Activating project at `~/2023-01_rotating_tai_benchmark`
336+
From worker 7: Activating project at `~/2023-01_rotating_tai_benchmark`
337+
From worker 5: Activating project at `~/2023-01_rotating_tai_benchmark`
338+
From worker 6: Activating project at `~/2023-01_rotating_tai_benchmark`
339+
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
338340
Activating project at `~/2023-01_rotating_tai_benchmark`
339-
0.640059 seconds (1.13 M allocations: 1.605 GiB, 9.37% gc time)
340-
0.625552 seconds (1.13 M allocations: 1.605 GiB, 6.69% gc time, 0.44% compilation time)
341-
48.310770 seconds (2.71 M allocations: 179.714 MiB, 0.04% gc time, 1.51% compilation time: 1% of which was recompilation)
341+
0.210575 seconds (33.80 k allocations: 1.917 MiB)
342+
0.212953 seconds (33.88 k allocations: 1.922 MiB, 1.35% compilation time)
343+
18.250584 seconds (2.71 M allocations: 179.729 MiB, 0.20% gc time, 4.08% compilation time: 1% of which was recompilation)
344+
```
345+
346+
### Better runtime distribution
347+
348+
Going back to the [original computation](#original-code), the different calls to `propagate_splitting` have different runtime. Since the runtime is mostly proportional to the separation time, this put all the long runtimes in one thread and all the short runtimes in another. We can remedy this by transposing the construction of the fidelity matrix:
349+
350+
```
351+
function map_fidelity(potential_depth_values, separation_time_values; kwargs...)
352+
N = length(potential_depth_values)
353+
M = length(separation_time_values)
354+
F = zeros(M, N)
355+
Threads.@threads for i = 1:N
356+
@inbounds V0 = potential_depth_values[i]
357+
@inbounds for j = 1:M
358+
t_r = separation_time_values[j]
359+
F[j, i] = propagate_splitting(t_r, V0; kwargs...)
360+
end
361+
end
362+
return transpose(F)
363+
end
364+
```
365+
366+
```
367+
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark6.jl
368+
0.208736 seconds (33.80 k allocations: 1.917 MiB)
369+
2.796440 seconds (3.60 M allocations: 144.673 MiB, 0.74% gc time, 0.06% compilation time)
370+
120.989621 seconds (100.59 M allocations: 4.077 GiB, 0.34% gc time, 0.04% compilation time)
371+
372+
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark6.jl
373+
0.213141 seconds (33.80 k allocations: 1.917 MiB)
374+
2.789801 seconds (3.60 M allocations: 144.673 MiB, 0.70% gc time, 0.05% compilation time)
375+
61.719640 seconds (100.59 M allocations: 4.077 GiB, 0.74% gc time, 0.07% compilation time)
376+
377+
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark6.jl
378+
0.210999 seconds (33.80 k allocations: 1.917 MiB)
379+
2.777318 seconds (3.60 M allocations: 144.673 MiB, 0.76% gc time, 0.05% compilation time)
380+
33.486723 seconds (100.59 M allocations: 4.077 GiB, 1.53% gc time, 0.12% compilation time)
381+
382+
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark6.jl
383+
0.210624 seconds (33.80 k allocations: 1.917 MiB)
384+
2.793085 seconds (3.60 M allocations: 144.673 MiB, 0.95% gc time, 0.06% compilation time)
385+
19.351503 seconds (100.59 M allocations: 4.077 GiB, 3.05% gc time, 0.22% compilation time)
342386
```

0 commit comments

Comments
 (0)