Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some more optimizations #503

Merged
merged 13 commits into from
Jul 24, 2023
Merged

some more optimizations #503

merged 13 commits into from
Jul 24, 2023

Conversation

pca006132
Copy link
Collaborator

This is mainly optimization for tbb, with as much as -30% running time for some extreme example.

  1. Use parallel stl algorithms when thrust doesn't use the parallel version when thrust::tbb::par is specified (Thrust with tbb backend uses sequential versions for several algorithms NVIDIA/cccl#635)
  2. Use parallel sort from stl, which seems to be slightly faster than the implementation in thrust.
  3. Optimize triangulator using monotonic memory pool (list nodes are not freed until the triangulator is destroyed), and change std::stack which uses deque backend to vector.

Performance differences with tbb

Surprisingly, the performance improvement is quite significant for such simple changes.

perfTest (max: -8%):

nTri = 512, time = 0.00169123 sec
nTri = 2048, time = 0.00497043 sec
nTri = 8192, time = 0.01013 sec
nTri = 32768, time = 0.0302614 sec
nTri = 131072, time = 0.0534231 sec
nTri = 524288, time = 0.15851 sec
nTri = 2097152, time = 0.691089 sec
nTri = 8388608, time = 2.94239 sec

-------------------------------------

nTri = 512, time = 0.00170492 sec
nTri = 2048, time = 0.00422943 sec
nTri = 8192, time = 0.00834682 sec
nTri = 32768, time = 0.0227696 sec
nTri = 131072, time = 0.0404606 sec
nTri = 524288, time = 0.139214 sec
nTri = 2097152, time = 0.619758 sec
nTri = 8388608, time = 2.69757 sec

Selected long-running tests (max: -30%)

Boolean.Close (852 ms -> 614ms)
Boolean.Sweep (245 ms -> 209ms)
Samples.Bracelet (330 ms -> 231ms)
Samples.GyroidModule (309 ms -> 248ms)
Samples.Sponge4 (7234 ms -> 6854ms)

And I did not observe notable performance differences for wasm build (-60ms in total), so it should not cause regression.

@pca006132
Copy link
Collaborator Author

oh the rebase somehow changed the committer, weird

@codecov
Copy link

codecov bot commented Jul 22, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01 ⚠️

Comparison is base (047159e) 90.34% compared to head (ff589c0) 90.34%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     NVIDIA/thrust#503      +/-   ##
==========================================
- Coverage   90.34%   90.34%   -0.01%     
==========================================
  Files          35       35              
  Lines        4414     4422       +8     
==========================================
+ Hits         3988     3995       +7     
- Misses        426      427       +1     
Impacted Files Coverage Δ
src/collider/src/collider.cpp 94.91% <100.00%> (ø)
src/polygon/src/polygon.cpp 89.81% <100.00%> (-0.06%) ⬇️
src/utilities/include/par.h 94.28% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@elalish
Copy link
Owner

elalish commented Jul 23, 2023

oh the rebase somehow changed the committer, weird

In general, I would recommend avoiding the use of rebase entirely. Merge is safer and since we do a squash and merge, there's no pollution of our commit history anyway. It also avoids the need for force-push and makes it easier to read the sequential diffs on Github.

Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if we can pull master to simplify this diff, since it seems to have a bunch of previous PRs in it. Then I'll review, but this sounds great!

it seems that Apple Clang 14 does not support pstl
@pca006132
Copy link
Collaborator Author

Let's see if we can pull master to simplify this diff, since it seems to have a bunch of previous PRs in it. Then I'll review, but this sounds great!

Should be simplified now.

@pca006132
Copy link
Collaborator Author

I give up with macos fix, maybe @elalish you can see what preprocessor macro does macos provide

Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about pmr, but it looks like it's not ready on Apple's compiler. Can we fall back without making the code too crazy?

src/polygon/src/polygon.cpp Outdated Show resolved Hide resolved
src/polygon/src/polygon.cpp Outdated Show resolved Hide resolved
Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pca006132 pca006132 merged commit 18edefc into elalish:master Jul 24, 2023
@pca006132 pca006132 deleted the optimizations branch August 15, 2023 12:54
@elalish elalish mentioned this pull request Nov 3, 2023
cartesian-theatrics pushed a commit to SovereignShop/manifold that referenced this pull request Mar 11, 2024
* simple speedup from stl algorithms

* fix build

* std::list allocator optimization

* added documentation

* fix weird formatting issues

* try fixing MacOS build

it seems that Apple Clang 14 does not support pstl

* fix again

* fix macos build

* fix apple

* clean up using macro

* fix apple?

* small fix...

* fix apple (again...)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants