Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel some more #522

Merged
merged 19 commits into from
Aug 7, 2023
Merged

parallel some more #522

merged 19 commits into from
Aug 7, 2023

Conversation

pca006132
Copy link
Collaborator

Use tbb for some optimizations directly:

  1. Parallelize complex face triangulation. The approach with std::async will create a thread for every invocation, which is too much overhead, so tbb thread_arena is needed.
  2. Parallelize AddNewEdgeVerts. This requires concurrent_map as we will be obtaining elements concurrently.

Combining these two optimizations, we can cut down the running time for Samples.Sponge4 from ~4s to ~3s. When MANIFOLD_DEBUG=on, the running time for Samples.Sponge4 is reduced from ~7s to ~3.7s. The main bottleneck for now should be simplification which takes about 1s.

@pca006132 pca006132 requested a review from elalish August 4, 2023 18:25
@pca006132
Copy link
Collaborator Author

it seems to me that the CI somehow stalled, but I have no idea why (I don't think the way I use mutex can cause deadlock?)

@pca006132
Copy link
Collaborator Author

Maybe enqueue can cause a deadlock if itself is the sole worker thread... I guess I should use task_group instead.

@pca006132
Copy link
Collaborator Author

pca006132 commented Aug 4, 2023

It seems that face reordering causes SimplifyTopology failure (I triggered both segfault and infinite loop). May be related to #518. Disabling it for now. The performance is not as good as before but there is still ~0.4s improvement.

@pca006132
Copy link
Collaborator Author

thinking about it, it should be possible to parallelize this without reordering. will try it tmr

@codecov
Copy link

codecov bot commented Aug 4, 2023

Codecov Report

Patch coverage: 96.61% and no project coverage change.

Comparison is base (2d51344) 90.36% compared to head (351f1a6) 90.37%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #522   +/-   ##
=======================================
  Coverage   90.36%   90.37%           
=======================================
  Files          35       35           
  Lines        4433     4456   +23     
=======================================
+ Hits         4006     4027   +21     
- Misses        427      429    +2     
Files Changed Coverage Δ
src/manifold/src/impl.h 72.72% <ø> (ø)
src/utilities/include/par.h 94.28% <ø> (ø)
src/manifold/src/boolean_result.cpp 96.90% <89.47%> (-0.59%) ⬇️
src/manifold/src/edge_op.cpp 96.23% <100.00%> (+0.05%) ⬆️
src/manifold/src/face_op.cpp 98.48% <100.00%> (+0.33%) ⬆️
src/manifold/src/manifold.cpp 95.66% <100.00%> (+0.02%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pca006132
Copy link
Collaborator Author

Admittedly I am getting a bit crazy now, I was seeing so many optimization opportunities and can hardly sleep without implementing them.

Samples.Sponge4 now runs in 2600ms with tbb (it took 4000ms previously) and MANIFOLD_DEBUG=off, the entire test suite completes in 4.5s, and the perfTest with tbb:

nTri = 512, time = 0.00160684 sec
nTri = 2048, time = 0.00310806 sec
nTri = 8192, time = 0.00775844 sec
nTri = 32768, time = 0.0200173 sec
nTri = 131072, time = 0.029914 sec
nTri = 524288, time = 0.0974839 sec
nTri = 2097152, time = 0.434753 sec
nTri = 8388608, time = 1.94247 sec

@pca006132
Copy link
Collaborator Author

the segfault is weird, it doesn't seem to me that the collider update is doing anything that will cause segfault.

@pca006132
Copy link
Collaborator Author

@elalish is it required that the collider output order agrees with the query input order? e.g. say the query is [a, b] and a collides with [1, 2], b collides with [3, 4], can we output [3, 4, 1, 2]?

@pca006132
Copy link
Collaborator Author

I found the issue: CsgOpNode children is somehow empty in some runs, but I cannot find why it can be empty (I tried adding checks everywhere and can only see it to be empty when we call GetChildren). Perhaps there is some issue with the way I use thread local. Anyway, I am removing that collider optimization which seems to be causing the issue (but I have no idea why...).

@pca006132
Copy link
Collaborator Author

@elalish this should be ready now

Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Remind me, which of our platforms is TBB not available on? If it's only WASM and that's in progress, then we should put in a TODO to remove the old code when that's ready.

src/manifold/src/boolean_result.cpp Outdated Show resolved Hide resolved
src/manifold/src/boolean_result.cpp Outdated Show resolved Hide resolved
src/manifold/src/edge_op.cpp Outdated Show resolved Hide resolved
src/manifold/src/face_op.cpp Outdated Show resolved Hide resolved
src/manifold/src/face_op.cpp Outdated Show resolved Hide resolved
@pca006132
Copy link
Collaborator Author

Only not available on WASM. Yes, we can mark the old code as about to be removed.
I actually tried to compile tbb from source for WASM, it seems that it can work, but the performance is not great. I can look into it later.

Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! If the WASM compiles with TBB and isn't worse than without it, I'd say we can go ahead and remove the alternate code paths. Take a look at the effect on binary size too, just in case.

@pca006132 pca006132 merged commit 8497e44 into elalish:master Aug 7, 2023
22 checks passed
@pca006132
Copy link
Collaborator Author

I will check that. My main concern is that the PR is still under review, and tbb with wasm is not much tested, so I am afraid that it may not be stable.

@pca006132
Copy link
Collaborator Author

weird, it seems that this somehow causes windows build to fail (when tbb is enabled)

@pca006132 pca006132 deleted the tbb-opt branch August 15, 2023 12:54
@elalish elalish mentioned this pull request Nov 3, 2023
cartesian-theatrics pushed a commit to SovereignShop/manifold that referenced this pull request Mar 11, 2024
* parallel some more

* use std::mutex

* fix compilation error

* check if max concurrency > 1

* use task_group

* disable face reordering

* preserve face order

* comments

* fix cuda build

* fix meshid not found situation

* use explicit stack and scratch buffer

this avoids potential stack overflow and reduces allocation calls

* faster collider

* please cuda

* missing commit

* remove collider optimization

* dedup face_op.cpp

* dedup boolean_result.cpp

* remove ambiguous comment

* include array
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants