Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization (std::map, etc) #483

Closed
stephomi opened this issue Jul 7, 2023 · 8 comments · Fixed by #491
Closed

Optimization (std::map, etc) #483

stephomi opened this issue Jul 7, 2023 · 8 comments · Fixed by #491
Milestone

Comments

@stephomi
Copy link
Contributor

stephomi commented Jul 7, 2023

std::map should only be used if ordering needs to be preserved.

Case study: subtract a 150k verts mesh on a 400k verts one.
The entire getMeshGL() took 770ms, by replacing 2 minor thing I got it down to <600ms.

By testing this section (175ms):

std::map<std::pair<int, int>, int> vertPropPair;

diff --git a/src/manifold/src/manifold.cpp b/src/manifold/src/manifold.cpp
index 48efe44..51356ee 100644
--- a/src/manifold/src/manifold.cpp
+++ b/src/manifold/src/manifold.cpp
@@ -21,6 +21,10 @@
 #include "impl.h"
 #include "par.h"
 
+#include "../parallel-hashmap/parallel_hashmap/phmap.h"
+#include <glm/gtx/hash.hpp>
+#include <unordered_map>
+
 namespace {
 using namespace manifold;
 using namespace thrust::placeholders;
@@ -284,7 +288,8 @@ MeshGL Manifold::GetMeshGL(glm::ivec3 normalIdx) const {
 
   // Duplicate verts with different props
   std::vector<int> vert2idx(impl.NumVert(), -1);
-  std::map<std::pair<int, int>, int> vertPropPair;
+  // std::unordered_map<glm::ivec2, int> vertPropPair;
+  phmap::flat_hash_map<glm::ivec2, int> vertPropPair;
   for (int run = 0; run < out.runOriginalID.size(); ++run) {
     for (int tri = out.runIndex[run] / 3; tri < out.runIndex[run + 1] / 3;
          ++tri) {
@@ -294,14 +299,10 @@ MeshGL Manifold::GetMeshGL(glm::ivec3 normalIdx) const {
         const int prop = triProp[i];
         const int vert = out.triVerts[3 * tri + i];
 
-        const auto it = vertPropPair.find({vert, prop});
-        if (it != vertPropPair.end()) {
-          out.triVerts[3 * tri + i] = it->second;
-          continue;
-        }
         const int idx = out.vertProperties.size() / out.numProp;
-        vertPropPair.insert({{vert, prop}, idx});
-        out.triVerts[3 * tri + i] = idx;
+        const auto it = vertPropPair.try_emplace({vert, prop},idx);
+        out.triVerts[3 * tri + i] = it.first->second;
+        if (!it.second) continue;
 
         for (int p : {0, 1, 2}) {
           out.vertProperties.push_back(impl.vertPos_[vert][p]);

  1. try_emplace instead of find/insert to perform only 1 map query
  2. Just by using std::unordered_map I get 175ms -> 65ms
  3. By replacing it with the widely used parallel_hashmap, I get 45ms
@stephomi
Copy link
Contributor Author

stephomi commented Jul 8, 2023

Another 150ms saved here:

diff --git a/src/manifold/src/boolean_result.cpp b/src/manifold/src/boolean_result.cpp
index 8d54ed8..cd717a8 100644
--- a/src/manifold/src/boolean_result.cpp
+++ b/src/manifold/src/boolean_result.cpp
@@ -481,7 +481,7 @@ void CreateProperties(Manifold::Impl &outR, const VecDH<TriRef> &refPQ,
                           inP.halfedge_.cptrD(), inQ.halfedge_.cptrD(),
                           outR.halfedge_.cptrD(), outR.precision_}));
 
-  std::map<std::tuple<int, int, int, int>, int> propIdx;
+  phmap::flat_hash_map<glm::ivec4, int> propIdx;
   int idx = 0;
 
   for (int tri = 0; tri < numTri; ++tri) {
@@ -501,14 +501,14 @@ void CreateProperties(Manifold::Impl &outR, const VecDH<TriRef> &refPQ,
       const int vert = outR.halfedge_[3 * tri + i].startVert;
       const glm::vec3 uvw = bary[3 * tri + i];
 
-      auto key = std::make_tuple(PQ, -1, -1, -1);
+      glm::ivec4 key(PQ, -1, -1, -1);
       if (oldNumProp > 0) {
-        std::get<1>(key) = vert;
+        key[1] = vert;
         int edge = -1;
         for (const int j : {0, 1, 2}) {
           if (uvw[j] == 1) {
             // On a retained vert, the propVert must also match
-            std::get<2>(key) = triProp[j];
+            key[2] = triProp[j];
             edge = -1;
             break;
           }
@@ -518,18 +518,15 @@ void CreateProperties(Manifold::Impl &outR, const VecDH<TriRef> &refPQ,
           // On an edge, both propVerts must match
           const int p0 = triProp[Next3(edge)];
           const int p1 = triProp[Prev3(edge)];
-          std::get<2>(key) = glm::min(p0, p1);
-          std::get<3>(key) = glm::max(p0, p1);
+          key[2] = glm::min(p0, p1);
+          key[3] = glm::max(p0, p1);
         }
       }
 
-      const auto it = propIdx.find(key);
-      if (it != propIdx.end()) {
-        outR.meshRelation_.triProperties[tri][i] = it->second;
-        continue;
-      }
-      outR.meshRelation_.triProperties[tri][i] = idx;
-      propIdx.insert({key, idx++});
+      const auto it = propIdx.try_emplace(key, idx);
+      outR.meshRelation_.triProperties[tri][i] = it.first->second;
+      if (!it.second) continue;
+      idx++;
 
       for (int p = 0; p < numProp; ++p) {
         if (p < oldNumProp) {

@stephomi stephomi changed the title Better std::map Optimization (std::map, etc) Jul 8, 2023
@elalish
Copy link
Owner

elalish commented Jul 8, 2023

Excellent - again, can you put this in the form of a PR? Would love to review and merge.

@elalish
Copy link
Owner

elalish commented Jul 8, 2023

Also, we have our own simple little parallel hash table - curious how it stacks up against the others? Would be nice to be consistent.

@elalish elalish added this to the v2.2 milestone Jul 12, 2023
@pca006132
Copy link
Collaborator

It seems that there are also other uses of std::map, wondering if replacing those can make a different as well

@stephomi
Copy link
Contributor Author

stephomi commented Jul 14, 2023

Other uses of std::map weren't a bottleneck when I checked.

I checked the SimplifyTopology function and could halve its cost, mostly by rewriting the copy_if and using boost indirect_sort on a single packed array.
On the total manifold algorithm , it saves ~13%.
Not going to open a PR as I am using OpenMP instead of thrust (and also dependency to boost sort).

My test branch: stephomi@0b5161d

Edit: I'm testing against the TBB backend (OMP through thrust backend has broken performance on my machine)

@elalish
Copy link
Owner

elalish commented Jul 14, 2023

I'm open to this if we can avoid pulling in a boost dependency.

pca006132 added a commit to pca006132/manifold that referenced this issue Jul 15, 2023
pca006132 added a commit that referenced this issue Jul 15, 2023
* remove extra prefetch

this is in the hot path and causes performance issues

* implemented optimization proposed by @stephomi in #483

* formatting

* fix cuda build

* combine the loop
@pentacular
Copy link

Just be careful that deterministic operation is preserved when you disorder maps, etc.

@pca006132
Copy link
Collaborator

It should be fine as we don't perform anything that is order dependent here. For deterministic behavior however, I think our output is not deterministic when multithreading is enabled, I forgot if we have deterministic output when we execute everything in a single thread.

@elalish elalish mentioned this issue Nov 3, 2023
cartesian-theatrics pushed a commit to SovereignShop/manifold that referenced this issue Mar 11, 2024
* remove extra prefetch

this is in the hot path and causes performance issues

* implemented optimization proposed by @stephomi in elalish#483

* formatting

* fix cuda build

* combine the loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants