Optimization (std::map, etc) #483

stephomi · 2023-07-07T22:52:30Z

std::map should only be used if ordering needs to be preserved.

Case study: subtract a 150k verts mesh on a 400k verts one.
The entire getMeshGL() took 770ms, by replacing 2 minor thing I got it down to <600ms.

By testing this section (175ms):

manifold/src/manifold/src/manifold.cpp

Line 287 in 96c57c0

std::map<std::pair<int, int>, int> vertPropPair;

diff --git a/src/manifold/src/manifold.cpp b/src/manifold/src/manifold.cpp
index 48efe44..51356ee 100644
--- a/src/manifold/src/manifold.cpp
+++ b/src/manifold/src/manifold.cpp
@@ -21,6 +21,10 @@
 #include "impl.h"
 #include "par.h"
 
+#include "../parallel-hashmap/parallel_hashmap/phmap.h"
+#include <glm/gtx/hash.hpp>
+#include <unordered_map>
+
 namespace {
 using namespace manifold;
 using namespace thrust::placeholders;
@@ -284,7 +288,8 @@ MeshGL Manifold::GetMeshGL(glm::ivec3 normalIdx) const {
 
   // Duplicate verts with different props
   std::vector<int> vert2idx(impl.NumVert(), -1);
-  std::map<std::pair<int, int>, int> vertPropPair;
+  // std::unordered_map<glm::ivec2, int> vertPropPair;
+  phmap::flat_hash_map<glm::ivec2, int> vertPropPair;
   for (int run = 0; run < out.runOriginalID.size(); ++run) {
     for (int tri = out.runIndex[run] / 3; tri < out.runIndex[run + 1] / 3;
          ++tri) {
@@ -294,14 +299,10 @@ MeshGL Manifold::GetMeshGL(glm::ivec3 normalIdx) const {
         const int prop = triProp[i];
         const int vert = out.triVerts[3 * tri + i];
 
-        const auto it = vertPropPair.find({vert, prop});
-        if (it != vertPropPair.end()) {
-          out.triVerts[3 * tri + i] = it->second;
-          continue;
-        }
         const int idx = out.vertProperties.size() / out.numProp;
-        vertPropPair.insert({{vert, prop}, idx});
-        out.triVerts[3 * tri + i] = idx;
+        const auto it = vertPropPair.try_emplace({vert, prop},idx);
+        out.triVerts[3 * tri + i] = it.first->second;
+        if (!it.second) continue;
 
         for (int p : {0, 1, 2}) {
           out.vertProperties.push_back(impl.vertPos_[vert][p]);

try_emplace instead of find/insert to perform only 1 map query
Just by using std::unordered_map I get 175ms -> 65ms
By replacing it with the widely used parallel_hashmap, I get 45ms

The text was updated successfully, but these errors were encountered:

stephomi · 2023-07-08T00:56:14Z

Another 150ms saved here:

diff --git a/src/manifold/src/boolean_result.cpp b/src/manifold/src/boolean_result.cpp
index 8d54ed8..cd717a8 100644
--- a/src/manifold/src/boolean_result.cpp
+++ b/src/manifold/src/boolean_result.cpp
@@ -481,7 +481,7 @@ void CreateProperties(Manifold::Impl &outR, const VecDH<TriRef> &refPQ,
                           inP.halfedge_.cptrD(), inQ.halfedge_.cptrD(),
                           outR.halfedge_.cptrD(), outR.precision_}));
 
-  std::map<std::tuple<int, int, int, int>, int> propIdx;
+  phmap::flat_hash_map<glm::ivec4, int> propIdx;
   int idx = 0;
 
   for (int tri = 0; tri < numTri; ++tri) {
@@ -501,14 +501,14 @@ void CreateProperties(Manifold::Impl &outR, const VecDH<TriRef> &refPQ,
       const int vert = outR.halfedge_[3 * tri + i].startVert;
       const glm::vec3 uvw = bary[3 * tri + i];
 
-      auto key = std::make_tuple(PQ, -1, -1, -1);
+      glm::ivec4 key(PQ, -1, -1, -1);
       if (oldNumProp > 0) {
-        std::get<1>(key) = vert;
+        key[1] = vert;
         int edge = -1;
         for (const int j : {0, 1, 2}) {
           if (uvw[j] == 1) {
             // On a retained vert, the propVert must also match
-            std::get<2>(key) = triProp[j];
+            key[2] = triProp[j];
             edge = -1;
             break;
           }
@@ -518,18 +518,15 @@ void CreateProperties(Manifold::Impl &outR, const VecDH<TriRef> &refPQ,
           // On an edge, both propVerts must match
           const int p0 = triProp[Next3(edge)];
           const int p1 = triProp[Prev3(edge)];
-          std::get<2>(key) = glm::min(p0, p1);
-          std::get<3>(key) = glm::max(p0, p1);
+          key[2] = glm::min(p0, p1);
+          key[3] = glm::max(p0, p1);
         }
       }
 
-      const auto it = propIdx.find(key);
-      if (it != propIdx.end()) {
-        outR.meshRelation_.triProperties[tri][i] = it->second;
-        continue;
-      }
-      outR.meshRelation_.triProperties[tri][i] = idx;
-      propIdx.insert({key, idx++});
+      const auto it = propIdx.try_emplace(key, idx);
+      outR.meshRelation_.triProperties[tri][i] = it.first->second;
+      if (!it.second) continue;
+      idx++;
 
       for (int p = 0; p < numProp; ++p) {
         if (p < oldNumProp) {

elalish · 2023-07-08T04:09:46Z

Excellent - again, can you put this in the form of a PR? Would love to review and merge.

elalish · 2023-07-08T04:12:41Z

Also, we have our own simple little parallel hash table - curious how it stacks up against the others? Would be nice to be consistent.

pca006132 · 2023-07-12T11:50:47Z

It seems that there are also other uses of std::map, wondering if replacing those can make a different as well

stephomi · 2023-07-14T16:49:01Z

Other uses of std::map weren't a bottleneck when I checked.

I checked the SimplifyTopology function and could halve its cost, mostly by rewriting the copy_if and using boost indirect_sort on a single packed array.
On the total manifold algorithm , it saves ~13%.
Not going to open a PR as I am using OpenMP instead of thrust (and also dependency to boost sort).

My test branch: stephomi@0b5161d

Edit: I'm testing against the TBB backend (OMP through thrust backend has broken performance on my machine)

elalish · 2023-07-14T16:52:00Z

I'm open to this if we can avoid pulling in a boost dependency.

@stephomi

* remove extra prefetch this is in the hot path and causes performance issues * implemented optimization proposed by @stephomi in #483 * formatting * fix cuda build * combine the loop

pentacular · 2023-07-15T11:07:18Z

Just be careful that deterministic operation is preserved when you disorder maps, etc.

pca006132 · 2023-07-15T11:10:12Z

It should be fine as we don't perform anything that is order dependent here. For deterministic behavior however, I think our output is not deterministic when multithreading is enabled, I forgot if we have deterministic output when we execute everything in a single thread.

@stephomi

* remove extra prefetch this is in the hot path and causes performance issues * implemented optimization proposed by @stephomi in elalish#483 * formatting * fix cuda build * combine the loop

stephomi changed the title ~~Better std::map~~ Optimization (std::map, etc) Jul 8, 2023

stephomi mentioned this issue Jul 8, 2023

Replace map with unordered_map #484

Merged

elalish added this to the v2.2 milestone Jul 12, 2023

pca006132 added a commit to pca006132/manifold that referenced this issue Jul 15, 2023

implemented optimization proposed by @stephomi in elalish#483

a71a530

pca006132 mentioned this issue Jul 15, 2023

optimization in #483 #491

Merged

pca006132 closed this as completed in #491 Jul 15, 2023

pca006132 added a commit that referenced this issue Jul 15, 2023

optimization in #483 (#491)

c167a48

* remove extra prefetch this is in the hot path and causes performance issues * implemented optimization proposed by @stephomi in #483 * formatting * fix cuda build * combine the loop

elalish mentioned this issue Nov 3, 2023

V2.2.1 #589

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization (std::map, etc) #483

Optimization (std::map, etc) #483

stephomi commented Jul 7, 2023 •

edited

Loading

stephomi commented Jul 8, 2023

elalish commented Jul 8, 2023

elalish commented Jul 8, 2023

pca006132 commented Jul 12, 2023

stephomi commented Jul 14, 2023 •

edited

Loading

elalish commented Jul 14, 2023

pentacular commented Jul 15, 2023

pca006132 commented Jul 15, 2023

Optimization (std::map, etc) #483

Optimization (std::map, etc) #483

Comments

stephomi commented Jul 7, 2023 • edited Loading

stephomi commented Jul 8, 2023

elalish commented Jul 8, 2023

elalish commented Jul 8, 2023

pca006132 commented Jul 12, 2023

stephomi commented Jul 14, 2023 • edited Loading

elalish commented Jul 14, 2023

pentacular commented Jul 15, 2023

pca006132 commented Jul 15, 2023

stephomi commented Jul 7, 2023 •

edited

Loading

stephomi commented Jul 14, 2023 •

edited

Loading