parallel some more #522

pca006132 · 2023-08-04T18:25:36Z

Use tbb for some optimizations directly:

Parallelize complex face triangulation. The approach with std::async will create a thread for every invocation, which is too much overhead, so tbb thread_arena is needed.
Parallelize AddNewEdgeVerts. This requires concurrent_map as we will be obtaining elements concurrently.

Combining these two optimizations, we can cut down the running time for Samples.Sponge4 from ~4s to ~3s. When MANIFOLD_DEBUG=on, the running time for Samples.Sponge4 is reduced from ~7s to ~3.7s. The main bottleneck for now should be simplification which takes about 1s.

pca006132 · 2023-08-04T19:11:33Z

it seems to me that the CI somehow stalled, but I have no idea why (I don't think the way I use mutex can cause deadlock?)

pca006132 · 2023-08-04T20:08:46Z

Maybe enqueue can cause a deadlock if itself is the sole worker thread... I guess I should use task_group instead.

pca006132 · 2023-08-04T20:28:45Z

It seems that face reordering causes SimplifyTopology failure (I triggered both segfault and infinite loop). May be related to #518. Disabling it for now. The performance is not as good as before but there is still ~0.4s improvement.

pca006132 · 2023-08-04T20:43:45Z

thinking about it, it should be possible to parallelize this without reordering. will try it tmr

codecov · 2023-08-04T20:48:57Z

Codecov Report

Patch coverage: 96.61% and no project coverage change.

Comparison is base (2d51344) 90.36% compared to head (351f1a6) 90.37%.
Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #522   +/-   ##
=======================================
  Coverage   90.36%   90.37%           
=======================================
  Files          35       35           
  Lines        4433     4456   +23     
=======================================
+ Hits         4006     4027   +21     
- Misses        427      429    +2

Files Changed	Coverage Δ
src/manifold/src/impl.h	`72.72% <ø> (ø)`
src/utilities/include/par.h	`94.28% <ø> (ø)`
src/manifold/src/boolean_result.cpp	`96.90% <89.47%> (-0.59%)`	⬇️
src/manifold/src/edge_op.cpp	`96.23% <100.00%> (+0.05%)`	⬆️
src/manifold/src/face_op.cpp	`98.48% <100.00%> (+0.33%)`	⬆️
src/manifold/src/manifold.cpp	`95.66% <100.00%> (+0.02%)`	⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

this avoids potential stack overflow and reduces allocation calls

pca006132 · 2023-08-05T20:18:18Z

Admittedly I am getting a bit crazy now, I was seeing so many optimization opportunities and can hardly sleep without implementing them.

Samples.Sponge4 now runs in 2600ms with tbb (it took 4000ms previously) and MANIFOLD_DEBUG=off, the entire test suite completes in 4.5s, and the perfTest with tbb:

nTri = 512, time = 0.00160684 sec
nTri = 2048, time = 0.00310806 sec
nTri = 8192, time = 0.00775844 sec
nTri = 32768, time = 0.0200173 sec
nTri = 131072, time = 0.029914 sec
nTri = 524288, time = 0.0974839 sec
nTri = 2097152, time = 0.434753 sec
nTri = 8388608, time = 1.94247 sec

pca006132 · 2023-08-05T20:22:36Z

the segfault is weird, it doesn't seem to me that the collider update is doing anything that will cause segfault.

pca006132 · 2023-08-06T08:21:06Z

@elalish is it required that the collider output order agrees with the query input order? e.g. say the query is [a, b] and a collides with [1, 2], b collides with [3, 4], can we output [3, 4, 1, 2]?

pca006132 · 2023-08-06T13:49:54Z

I found the issue: CsgOpNode children is somehow empty in some runs, but I cannot find why it can be empty (I tried adding checks everywhere and can only see it to be empty when we call GetChildren). Perhaps there is some issue with the way I use thread local. Anyway, I am removing that collider optimization which seems to be causing the issue (but I have no idea why...).

pca006132 · 2023-08-06T14:04:18Z

@elalish this should be ready now

elalish

This looks great! Remind me, which of our platforms is TBB not available on? If it's only WASM and that's in progress, then we should put in a TODO to remove the old code when that's ready.

src/manifold/src/boolean_result.cpp

src/manifold/src/edge_op.cpp

src/manifold/src/face_op.cpp

pca006132 · 2023-08-06T17:04:15Z

Only not available on WASM. Yes, we can mark the old code as about to be removed.
I actually tried to compile tbb from source for WASM, it seems that it can work, but the performance is not great. I can look into it later.

elalish

Thanks! If the WASM compiles with TBB and isn't worse than without it, I'd say we can go ahead and remove the alternate code paths. Take a look at the effect on binary size too, just in case.

pca006132 · 2023-08-07T17:27:39Z

I will check that. My main concern is that the PR is still under review, and tbb with wasm is not much tested, so I am afraid that it may not be stable.

pca006132 · 2023-08-07T17:56:22Z

weird, it seems that this somehow causes windows build to fail (when tbb is enabled)

* parallel some more * use std::mutex * fix compilation error * check if max concurrency > 1 * use task_group * disable face reordering * preserve face order * comments * fix cuda build * fix meshid not found situation * use explicit stack and scratch buffer this avoids potential stack overflow and reduces allocation calls * faster collider * please cuda * missing commit * remove collider optimization * dedup face_op.cpp * dedup boolean_result.cpp * remove ambiguous comment * include array

parallel some more

fdd6a7a

pca006132 requested a review from elalish August 4, 2023 18:25

pca006132 added 2 commits August 5, 2023 02:42

use std::mutex

b450501

fix compilation error

b6ab33f

check if max concurrency > 1

b4ab45c

pca006132 added 2 commits August 5, 2023 04:26

use task_group

b3f4a64

disable face reordering

3e7173b

pca006132 added 8 commits August 5, 2023 14:05

preserve face order

f1a3f55

comments

56f6736

fix cuda build

1a8909c

fix meshid not found situation

6a7fa7e

use explicit stack and scratch buffer

ac55e14

this avoids potential stack overflow and reduces allocation calls

faster collider

931a860

please cuda

2cb9270

missing commit

79979dc

remove collider optimization

3eb0948

elalish requested changes Aug 6, 2023

View reviewed changes

pca006132 added 2 commits August 7, 2023 12:45

dedup face_op.cpp

92d48f6

dedup boolean_result.cpp

41b2042

pca006132 added 2 commits August 7, 2023 13:57

remove ambiguous comment

9f239b8

include array

351f1a6

elalish approved these changes Aug 7, 2023

View reviewed changes

pca006132 merged commit 8497e44 into elalish:master Aug 7, 2023
22 checks passed

pca006132 deleted the tbb-opt branch August 15, 2023 12:54

elalish mentioned this pull request Nov 3, 2023

V2.2.1 #589

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel some more #522

parallel some more #522

pca006132 commented Aug 4, 2023

pca006132 commented Aug 4, 2023

pca006132 commented Aug 4, 2023

pca006132 commented Aug 4, 2023 •

edited

Loading

pca006132 commented Aug 4, 2023

codecov bot commented Aug 4, 2023 •

edited

Loading

pca006132 commented Aug 5, 2023

pca006132 commented Aug 5, 2023

pca006132 commented Aug 6, 2023

pca006132 commented Aug 6, 2023

pca006132 commented Aug 6, 2023

elalish left a comment

pca006132 commented Aug 6, 2023

elalish left a comment

pca006132 commented Aug 7, 2023

pca006132 commented Aug 7, 2023

parallel some more #522

parallel some more #522

Conversation

pca006132 commented Aug 4, 2023

pca006132 commented Aug 4, 2023

pca006132 commented Aug 4, 2023

pca006132 commented Aug 4, 2023 • edited Loading

pca006132 commented Aug 4, 2023

codecov bot commented Aug 4, 2023 • edited Loading

Codecov Report

pca006132 commented Aug 5, 2023

pca006132 commented Aug 5, 2023

pca006132 commented Aug 6, 2023

pca006132 commented Aug 6, 2023

pca006132 commented Aug 6, 2023

elalish left a comment

Choose a reason for hiding this comment

pca006132 commented Aug 6, 2023

elalish left a comment

Choose a reason for hiding this comment

pca006132 commented Aug 7, 2023

pca006132 commented Aug 7, 2023

pca006132 commented Aug 4, 2023 •

edited

Loading

codecov bot commented Aug 4, 2023 •

edited

Loading