You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See https://discourse.julialang.org/t/scaling-of-threads-for-trivially-parallel-problem/92949/1
4
4
5
+
**Note**: The parallelization scaling issue was resolved by reducing the number of allocations. The original benchmark results are in the [old version of this README](https://github.com/goerz-testing/2023-01_rotating_tai_benchmark/blob/597a93ddc58b7e4cb92e6c8d3900f4776ddf748b/README.md#benchmark-of-julia-threads-for-trivially-parallel-problem).
6
+
5
7
```
6
8
julia --project="." -e 'using Pkg; Pkg.instantiate()'
7
9
julia --project="." -t 1 benchmark.jl
@@ -96,24 +98,24 @@ Note: compared to the [original issue](https://discourse.julialang.org/t/scaling
96
98
97
99
```
98
100
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark.jl
99
-
0.621074 seconds (1.13 M allocations: 1.605 GiB, 10.51% gc time)
100
-
3.164572 seconds (4.70 M allocations: 1.745 GiB, 1.30% gc time, 0.05% compilation time)
101
-
218.653798 seconds (382.22 M allocations: 414.555 GiB, 3.55% gc time, 0.03% compilation time)
101
+
0.208167 seconds (33.80 k allocations: 1.917 MiB)
102
+
2.751524 seconds (3.60 M allocations: 144.673 MiB, 0.75% gc time, 0.07% compilation time)
103
+
120.464164 seconds (100.59 M allocations: 4.076 GiB, 0.33% gc time, 0.03% compilation time)
102
104
103
105
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark.jl
104
-
0.634131 seconds (1.13 M allocations: 1.605 GiB, 9.40% gc time)
105
-
3.179692 seconds (4.70 M allocations: 1.745 GiB, 1.41% gc time, 0.05% compilation time)
106
-
161.966233 seconds (393.42 M allocations: 414.873 GiB, 4.29% gc time, 0.05% compilation time)
106
+
0.207674 seconds (33.80 k allocations: 1.917 MiB)
107
+
2.714671 seconds (3.60 M allocations: 144.673 MiB, 0.72% gc time, 0.06% compilation time)
108
+
94.199065 seconds (100.59 M allocations: 4.076 GiB, 0.26% gc time, 0.05% compilation time)
107
109
108
110
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark.jl
109
-
0.634875 seconds (1.13 M allocations: 1.605 GiB, 10.65% gc time)
110
-
3.158805 seconds (4.70 M allocations: 1.745 GiB, 1.43% gc time, 0.05% compilation time)
111
-
150.478485 seconds (419.44 M allocations: 415.518 GiB, 3.84% gc time, 0.06% compilation time)
111
+
0.211552 seconds (33.80 k allocations: 1.917 MiB)
112
+
2.811510 seconds (3.60 M allocations: 144.673 MiB, 0.70% gc time, 0.05% compilation time)
113
+
80.154506 seconds (100.59 M allocations: 4.076 GiB, 0.37% gc time, 0.06% compilation time)
112
114
113
115
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark.jl
114
-
0.637444 seconds (1.13 M allocations: 1.605 GiB, 10.64% gc time)
115
-
3.157171 seconds (4.70 M allocations: 1.745 GiB, 1.62% gc time, 0.05% compilation time)
116
-
148.714731 seconds (459.50 M allocations: 416.672 GiB, 3.43% gc time, 0.06% compilation time)
116
+
0.210447 seconds (33.80 k allocations: 1.917 MiB)
117
+
2.786815 seconds (3.60 M allocations: 144.673 MiB, 0.70% gc time, 0.05% compilation time)
118
+
66.755464 seconds (100.59 M allocations: 4.076 GiB, 0.68% gc time, 0.07% compilation time)
117
119
```
118
120
119
121
@@ -130,21 +132,21 @@ This means that every call to `propagate_splitting` does the same exact thing, s
130
132
131
133
```
132
134
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark2.jl
133
-
0.629423 seconds (1.13 M allocations: 1.605 GiB, 9.97% gc time)
134
-
0.593253 seconds (1.13 M allocations: 1.605 GiB, 4.79% gc time, 0.47% compilation time)
135
-
153.666410 seconds (290.36 M allocations: 410.962 GiB, 5.05% gc time, 0.04% compilation time)
135
+
0.210143 seconds (33.80 k allocations: 1.917 MiB)
136
+
0.221251 seconds (33.88 k allocations: 1.922 MiB, 1.29% compilation time)
137
+
54.201312 seconds (8.73 M allocations: 494.694 MiB, 0.10% gc time, 0.07% compilation time)
136
138
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark2.jl
137
-
0.633268 seconds (1.13 M allocations: 1.605 GiB, 9.65% gc time)
138
-
0.596045 seconds (1.13 M allocations: 1.605 GiB, 4.71% gc time, 0.49% compilation time)
139
-
98.435518 seconds (309.35 M allocations: 411.444 GiB, 6.31% gc time, 0.08% compilation time)
139
+
0.212420 seconds (33.80 k allocations: 1.917 MiB)
140
+
0.225637 seconds (33.88 k allocations: 1.922 MiB, 1.24% compilation time)
141
+
27.806800 seconds (8.73 M allocations: 494.716 MiB, 0.24% gc time, 0.17% compilation time)
140
142
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark2.jl
141
-
0.644217 seconds (1.13 M allocations: 1.605 GiB, 9.94% gc time)
142
-
0.625575 seconds (1.13 M allocations: 1.605 GiB, 6.09% gc time, 0.46% compilation time)
143
-
99.298586 seconds (333.82 M allocations: 412.088 GiB, 5.35% gc time, 0.08% compilation time)
143
+
0.213204 seconds (33.80 k allocations: 1.917 MiB)
144
+
0.215185 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
145
+
14.431739 seconds (8.73 M allocations: 494.777 MiB, 0.41% gc time, 0.34% compilation time)
144
146
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark2.jl
145
-
0.639156 seconds (1.13 M allocations: 1.605 GiB, 10.09% gc time)
146
-
0.618517 seconds (1.13 M allocations: 1.605 GiB, 6.23% gc time, 0.47% compilation time)
147
-
121.498563 seconds (374.74 M allocations: 413.286 GiB, 3.96% gc time, 0.07% compilation time)
147
+
0.207609 seconds (33.80 k allocations: 1.917 MiB)
148
+
0.212293 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
149
+
8.424910 seconds (8.73 M allocations: 494.711 MiB, 0.48% compilation time)
148
150
```
149
151
150
152
### Refactoring: t_r outside of inner loop
@@ -180,21 +182,21 @@ end
180
182
181
183
```
182
184
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark3.jl
183
-
0.637919 seconds (1.13 M allocations: 1.605 GiB, 10.09% gc time)
184
-
0.616912 seconds (1.13 M allocations: 1.605 GiB, 5.80% gc time, 0.46% compilation time)
185
-
155.916642 seconds (290.35 M allocations: 410.961 GiB, 5.63% gc time, 0.04% compilation time)
185
+
0.206196 seconds (33.80 k allocations: 1.917 MiB)
186
+
0.209529 seconds (33.88 k allocations: 1.922 MiB, 1.35% compilation time)
187
+
53.185182 seconds (8.72 M allocations: 494.140 MiB, 0.10% gc time, 0.07% compilation time)
186
188
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark3.jl
187
-
0.649860 seconds (1.13 M allocations: 1.605 GiB, 9.97% gc time)
188
-
0.619335 seconds (1.13 M allocations: 1.605 GiB, 5.39% gc time, 0.46% compilation time)
189
-
101.228960 seconds (310.37 M allocations: 411.469 GiB, 6.39% gc time, 0.08% compilation time)
189
+
0.208810 seconds (33.80 k allocations: 1.917 MiB)
190
+
0.212513 seconds (33.88 k allocations: 1.922 MiB, 1.31% compilation time)
191
+
27.373068 seconds (8.72 M allocations: 494.182 MiB, 0.16% gc time, 0.17% compilation time)
190
192
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark3.jl
191
-
0.647611 seconds (1.13 M allocations: 1.605 GiB, 10.04% gc time)
192
-
0.626761 seconds (1.13 M allocations: 1.605 GiB, 6.26% gc time, 0.46% compilation time)
193
-
98.742242 seconds (333.60 M allocations: 412.081 GiB, 5.43% gc time, 0.08% compilation time)
193
+
0.211508 seconds (33.80 k allocations: 1.917 MiB)
194
+
0.213555 seconds (33.88 k allocations: 1.922 MiB, 1.53% compilation time)
195
+
14.384025 seconds (8.72 M allocations: 494.191 MiB, 0.34% gc time, 0.32% compilation time)
194
196
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark3.jl
195
-
0.630414 seconds (1.13 M allocations: 1.605 GiB, 7.85% gc time)
196
-
0.613773 seconds (1.13 M allocations: 1.605 GiB, 4.99% gc time, 0.46% compilation time)
197
-
119.975070 seconds (374.90 M allocations: 413.286 GiB, 3.85% gc time, 0.07% compilation time)
197
+
0.212579 seconds (33.80 k allocations: 1.917 MiB)
198
+
0.216517 seconds (33.88 k allocations: 1.922 MiB, 1.35% compilation time)
199
+
8.450139 seconds (8.72 M allocations: 494.156 MiB, 0.45% compilation time)
198
200
```
199
201
200
202
### Refactoring: "false sharing"
@@ -223,21 +225,21 @@ end
223
225
224
226
```
225
227
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark4.jl
226
-
0.634388 seconds (1.13 M allocations: 1.605 GiB, 10.14% gc time)
227
-
0.619022 seconds (1.13 M allocations: 1.605 GiB, 6.21% gc time, 0.47% compilation time)
228
-
156.127696 seconds (290.42 M allocations: 410.965 GiB, 5.66% gc time, 0.06% compilation time)
228
+
0.204621 seconds (33.80 k allocations: 1.917 MiB)
229
+
0.210162 seconds (33.88 k allocations: 1.922 MiB, 1.38% compilation time)
230
+
53.749415 seconds (8.79 M allocations: 498.078 MiB, 0.14% gc time, 0.12% compilation time)
229
231
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark4.jl
230
-
0.639602 seconds (1.13 M allocations: 1.605 GiB, 10.56% gc time)
231
-
0.606554 seconds (1.13 M allocations: 1.605 GiB, 5.80% gc time, 0.48% compilation time)
232
-
98.933797 seconds (305.89 M allocations: 411.390 GiB, 6.50% gc time, 0.11% compilation time)
232
+
0.211242 seconds (33.80 k allocations: 1.917 MiB)
233
+
0.215194 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
234
+
27.770841 seconds (8.79 M allocations: 498.106 MiB, 0.29% gc time, 0.26% compilation time)
233
235
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark4.jl
234
-
0.637588 seconds (1.13 M allocations: 1.605 GiB, 9.29% gc time)
235
-
0.612409 seconds (1.13 M allocations: 1.605 GiB, 5.37% gc time, 0.47% compilation time)
236
-
98.618285 seconds (333.16 M allocations: 412.074 GiB, 5.18% gc time, 0.11% compilation time)
236
+
0.211822 seconds (33.80 k allocations: 1.917 MiB)
237
+
0.213592 seconds (33.88 k allocations: 1.922 MiB, 1.35% compilation time)
238
+
14.596262 seconds (8.79 M allocations: 498.086 MiB, 0.30% gc time, 0.45% compilation time)
237
239
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark4.jl
238
-
0.637485 seconds (1.13 M allocations: 1.605 GiB, 8.71% gc time)
239
-
0.622894 seconds (1.13 M allocations: 1.605 GiB, 5.53% gc time, 0.49% compilation time)
240
-
121.259312 seconds (373.66 M allocations: 413.257 GiB, 3.95% gc time, 0.09% compilation time)
240
+
0.209312 seconds (33.80 k allocations: 1.917 MiB)
241
+
0.212435 seconds (33.88 k allocations: 1.922 MiB, 1.36% compilation time)
242
+
8.479042 seconds (8.79 M allocations: 498.104 MiB, 0.29% gc time, 1.08% compilation time)
241
243
```
242
244
243
245
### Sleeping
@@ -257,21 +259,21 @@ end
257
259
258
260
```
259
261
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark5.jl
19.299850 seconds (73.42 k allocations: 3.775 MiB, 0.19% compilation time)
275
277
```
276
278
277
279
### Multi-process parallelization
@@ -304,39 +306,81 @@ Note that some overhead here is expected due to the `vcat` reduction.
304
306
:> julia -p1 benchmark_distributed_driver.jl
305
307
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
306
308
Activating project at `~/2023-01_rotating_tai_benchmark`
307
-
0.659220 seconds (1.13 M allocations: 1.605 GiB, 10.05% gc time)
308
-
0.623259 seconds (1.13 M allocations: 1.605 GiB, 6.04% gc time, 0.45% compilation time)
309
-
159.409128 seconds (2.71 M allocations: 179.434 MiB, 0.01% gc time, 0.42% compilation time: 1% of which was recompilation)
309
+
0.206928 seconds (33.80 k allocations: 1.917 MiB)
310
+
0.209682 seconds (33.88 k allocations: 1.922 MiB, 1.31% compilation time)
311
+
59.544753 seconds (2.71 M allocations: 179.497 MiB, 0.07% gc time, 1.15% compilation time: 1% of which was recompilation)
310
312
311
313
:> julia -p2 benchmark_distributed_driver.jl
312
314
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
313
315
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
314
316
Activating project at `~/2023-01_rotating_tai_benchmark`
315
-
0.665675 seconds (1.13 M allocations: 1.605 GiB, 10.35% gc time)
316
-
0.631846 seconds (1.13 M allocations: 1.605 GiB, 6.31% gc time, 0.45% compilation time)
317
-
91.579614 seconds (2.71 M allocations: 179.529 MiB, 0.02% gc time, 0.75% compilation time: 1% of which was recompilation)
317
+
0.232625 seconds (33.80 k allocations: 1.917 MiB, 9.91% gc time)
318
+
0.211565 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
319
+
34.113498 seconds (2.71 M allocations: 179.529 MiB, 0.05% gc time, 2.00% compilation time: 1% of which was recompilation)
318
320
319
321
:> julia -p4 benchmark_distributed_driver.jl
320
-
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
321
-
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
322
322
From worker 4: Activating project at `~/2023-01_rotating_tai_benchmark`
323
+
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
324
+
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
323
325
From worker 5: Activating project at `~/2023-01_rotating_tai_benchmark`
324
326
Activating project at `~/2023-01_rotating_tai_benchmark`
325
-
0.667681 seconds (1.13 M allocations: 1.605 GiB, 10.48% gc time)
326
-
0.623414 seconds (1.13 M allocations: 1.605 GiB, 6.34% gc time, 0.45% compilation time)
327
-
56.961555 seconds (2.71 M allocations: 179.590 MiB, 0.04% gc time, 1.23% compilation time: 1% of which was recompilation)
327
+
0.209870 seconds (33.80 k allocations: 1.917 MiB)
328
+
0.213252 seconds (33.88 k allocations: 1.922 MiB, 1.33% compilation time)
329
+
21.967531 seconds (2.71 M allocations: 179.590 MiB, 0.08% gc time, 3.15% compilation time: 1% of which was recompilation)
328
330
329
331
:> julia -p8 benchmark_distributed_driver.jl
330
-
From worker 6: Activating project at `~/2023-01_rotating_tai_benchmark`
331
-
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
332
-
From worker 5: Activating project at `~/2023-01_rotating_tai_benchmark`
333
-
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
334
-
From worker 7: Activating project at `~/2023-01_rotating_tai_benchmark`
332
+
From worker 9: Activating project at `~/2023-01_rotating_tai_benchmark`
335
333
From worker 4: Activating project at `~/2023-01_rotating_tai_benchmark`
334
+
From worker 2: Activating project at `~/2023-01_rotating_tai_benchmark`
336
335
From worker 8: Activating project at `~/2023-01_rotating_tai_benchmark`
337
-
From worker 9: Activating project at `~/2023-01_rotating_tai_benchmark`
336
+
From worker 7: Activating project at `~/2023-01_rotating_tai_benchmark`
337
+
From worker 5: Activating project at `~/2023-01_rotating_tai_benchmark`
338
+
From worker 6: Activating project at `~/2023-01_rotating_tai_benchmark`
339
+
From worker 3: Activating project at `~/2023-01_rotating_tai_benchmark`
338
340
Activating project at `~/2023-01_rotating_tai_benchmark`
339
-
0.640059 seconds (1.13 M allocations: 1.605 GiB, 9.37% gc time)
340
-
0.625552 seconds (1.13 M allocations: 1.605 GiB, 6.69% gc time, 0.44% compilation time)
341
-
48.310770 seconds (2.71 M allocations: 179.714 MiB, 0.04% gc time, 1.51% compilation time: 1% of which was recompilation)
341
+
0.210575 seconds (33.80 k allocations: 1.917 MiB)
342
+
0.212953 seconds (33.88 k allocations: 1.922 MiB, 1.35% compilation time)
343
+
18.250584 seconds (2.71 M allocations: 179.729 MiB, 0.20% gc time, 4.08% compilation time: 1% of which was recompilation)
344
+
```
345
+
346
+
### Better runtime distribution
347
+
348
+
Going back to the [original computation](#original-code), the different calls to `propagate_splitting` have different runtime. Since the runtime is mostly proportional to the separation time, this put all the long runtimes in one thread and all the short runtimes in another. We can remedy this by transposing the construction of the fidelity matrix:
349
+
350
+
```
351
+
function map_fidelity(potential_depth_values, separation_time_values; kwargs...)
352
+
N = length(potential_depth_values)
353
+
M = length(separation_time_values)
354
+
F = zeros(M, N)
355
+
Threads.@threads for i = 1:N
356
+
@inbounds V0 = potential_depth_values[i]
357
+
@inbounds for j = 1:M
358
+
t_r = separation_time_values[j]
359
+
F[j, i] = propagate_splitting(t_r, V0; kwargs...)
360
+
end
361
+
end
362
+
return transpose(F)
363
+
end
364
+
```
365
+
366
+
```
367
+
:> JULIA_EXCLUSIVE=1 julia --project=. -t 1 benchmark6.jl
368
+
0.208736 seconds (33.80 k allocations: 1.917 MiB)
369
+
2.796440 seconds (3.60 M allocations: 144.673 MiB, 0.74% gc time, 0.06% compilation time)
370
+
120.989621 seconds (100.59 M allocations: 4.077 GiB, 0.34% gc time, 0.04% compilation time)
371
+
372
+
:> JULIA_EXCLUSIVE=1 julia --project=. -t 2 benchmark6.jl
373
+
0.213141 seconds (33.80 k allocations: 1.917 MiB)
374
+
2.789801 seconds (3.60 M allocations: 144.673 MiB, 0.70% gc time, 0.05% compilation time)
375
+
61.719640 seconds (100.59 M allocations: 4.077 GiB, 0.74% gc time, 0.07% compilation time)
376
+
377
+
:> JULIA_EXCLUSIVE=1 julia --project=. -t 4 benchmark6.jl
378
+
0.210999 seconds (33.80 k allocations: 1.917 MiB)
379
+
2.777318 seconds (3.60 M allocations: 144.673 MiB, 0.76% gc time, 0.05% compilation time)
380
+
33.486723 seconds (100.59 M allocations: 4.077 GiB, 1.53% gc time, 0.12% compilation time)
381
+
382
+
:> JULIA_EXCLUSIVE=1 julia --project=. -t 8 benchmark6.jl
383
+
0.210624 seconds (33.80 k allocations: 1.917 MiB)
384
+
2.793085 seconds (3.60 M allocations: 144.673 MiB, 0.95% gc time, 0.06% compilation time)
385
+
19.351503 seconds (100.59 M allocations: 4.077 GiB, 3.05% gc time, 0.22% compilation time)
0 commit comments