Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#528 causes invalid UpdateVert calls #529

Closed
pca006132 opened this issue Aug 14, 2023 · 31 comments · Fixed by #534
Closed

#528 causes invalid UpdateVert calls #529

pca006132 opened this issue Aug 14, 2023 · 31 comments · Fixed by #534

Comments

@pca006132
Copy link
Collaborator

pca006132 commented Aug 14, 2023

Not sure why, but it seems that when openscad is compiled with the current master after #528, https://gist.github.com/ochafik/70a6b15e982b7ccd5a79ff9afd99dbcf#file-condensed-matter-scad will break and access invalid memory in UpdateVert call:

#1  0x0000000000f9d655 in manifold::Manifold::Impl::UpdateVert(int, int, int) ()
#2  0x0000000000f9d9de in manifold::Manifold::Impl::FormLoop(int, int) ()
#3  0x0000000000f9fa40 in manifold::Manifold::Impl::CollapseEdge(int, std::vector<int, std::allocator<int> >&) ()
#4  0x0000000000fa2676 in manifold::Manifold::Impl::SimplifyTopology() ()
#5  0x0000000000f65008 in manifold::Boolean3::Result(manifold::OpType) const ()
#6  0x0000000000f8cadb in manifold::CsgOpNode::BatchBoolean(manifold::OpType, std::vector<std::shared_ptr<manifold::Manifold::Impl const>, std::allocator<std::shared_ptr<manifold::Manifold::Impl const> > >&) ()
#7  0x0000000000f8dccd in manifold::CsgOpNode::BatchUnion() const ()
#8  0x0000000000f8f812 in manifold::CsgOpNode::ToLeafNode() const ()
#9  0x0000000000edf87c in manifold::Manifold::GetCsgLeafNode() const ()
#10 0x0000000000edfa9b in manifold::Manifold::IsEmpty() const ()
#11 0x00000000008fed03 in GeometryEvaluator::applyToChildren3D(AbstractNode const&, OpenSCADOperator) ()
#12 0x0000000000900422 in GeometryEvaluator::applyToChildren(AbstractNode const&, OpenSCADOperator) ()
#13 0x00000000009004e5 in GeometryEvaluator::visit(State&, AbstractNode const&) ()
#14 0x0000000000739fd6 in NodeVisitor::traverse(AbstractNode const&, State const&) ()
#15 0x0000000000739f3b in NodeVisitor::traverse(AbstractNode const&, State const&) ()
#16 0x0000000000739f3b in NodeVisitor::traverse(AbstractNode const&, State const&) ()
#17 0x0000000000739f3b in NodeVisitor::traverse(AbstractNode const&, State const&) ()
#18 0x00000000008fd56a in GeometryEvaluator::evaluateGeometry(AbstractNode const&, bool) ()
#19 0x00000000005c364c in do_export(CommandLine const&, RenderVariables const&, FileFormat, SourceFile*) ()
#20 0x00000000005c5463 in cmdline(CommandLine const&) ()
#21 0x00000000004ddf29 in main ()

I tried compiling with #525 and it works fine. Maybe we should try to port the example to c++ and debug. We can use address sanitizer to try to debug this kind of issues.

There is still invalid access when parallelization is turned off.

@pca006132
Copy link
Collaborator Author

I ported it to C++ and reduced it to triangulator error:

Error message for triangulation:
Error in file: /home/pca006132/code/manifold/src/polygon/src/polygon.cpp (158): 'std::all_of(triangles.begin(), triangles.end(), [&vertPos, precision](const glm::ivec3 &tri) { return CCW(vertPos[tri[0]], vertPos[tri[1]], vertPos[tri[2]], precision) >= 0; })' is false: triangulation is not entirely CCW!
------------
Final polygon:
polys.push_back({
    {2.09305096, 5.86389685},  //
    {2.06142592, 5.85550117},  //
    {1.93763256, 5.4309001},  //
});
polys.push_back({
    {1.84193516, 5.90364313},  //
    {1.93763256, 5.4309001},  //
    {1.84193516, 5.90364313},  //
    {1.93763256, 5.4309001},  //
    {1.9408412, 5.90364313},  //
});
show(array([
  [2.09305096, 5.86389685],
  [2.06142592, 5.85550117],
  [1.93763256, 5.4309001],
]))
show(array([
  [1.84193516, 5.90364313],
  [1.93763256, 5.4309001],
  [1.84193516, 5.90364313],
  [1.93763256, 5.4309001],
  [1.9408412, 5.90364313],
]))
------------
Erroneous triangulation:
show(array([[1.8419352, 5.9036431], [1.9376326, 5.4309001], [1.9408412, 5.9036431]]))
show(array([[1.9408412, 5.9036431], [1.8419352, 5.9036431], [1.9376326, 5.4309001]]))
show(array([[1.8419352, 5.9036431], [1.9408412, 5.9036431], [1.9376326, 5.4309001]]))
^ not CCW
show(array([[2.0930510, 5.8638968], [2.0614259, 5.8555012], [1.9376326, 5.4309001]]))

It looks something like this:
image

Code for the sample:

#include "samples.h"

namespace {
using namespace manifold;
using namespace glm;

constexpr float AtomicRadiusN2 = 0.65;
constexpr float BondPairN2 = 1.197;
constexpr float AtomicRadiusSi = 1.1;
constexpr float LatticeCellSizeSi = 5.4309;
constexpr float fccOffset = 0.25;
constexpr float AtomicRadiusC = 0.7;
constexpr float LatticeCellSizeC = 3.65;
constexpr float cellLenA = 2.464;
constexpr float cellLenB = cellLenA;
constexpr float cellLenC = 6.711;
constexpr float cellAngleA = 90;
constexpr float cellAngleB = cellAngleA;
constexpr float cellAngleC = 120;
constexpr float LayerSeperationC = 3.364;

Manifold bond(int fn = 16, vec3 p1 = {0, 0, 0}, vec3 p2 = {1, 1, 1},
              float ar1 = 1.0, float ar2 = 2.0) {
  float cyR = std::min(ar1, ar2) / 5.0;
  float dist = length(p1 - p2);
  vec3 cyC = (p1 + p2) / 2.0f;
  float beta = degrees(acos((p1.z - p2.z) / dist));
  float gamma = degrees(atan2(p1.y - p2.y, p1.x - p2.x));
  vec3 rot = {0.0, beta, gamma};
  return Manifold::Cylinder(dist, cyR, -1, fn, true)
      .Rotate(rot.x, rot.y, rot.z)
      .Translate(cyC);
}

Manifold bondPair(int fn = 16, float d = 0.0, float ar = 1.0) {
  float axD = pow(d, 1.0 / 3.0);
  vec3 p1 = {+axD, -axD, -axD};
  vec3 p2 = {-axD, +axD, +axD};
  Manifold sphere = Manifold::Sphere(ar, fn);
  return sphere.Translate(p1) + sphere.Translate(p2) + bond(fn, p1, p2, ar, ar);
}

Manifold hexagonalClosePacked(int fn = 16, vec3 dst = {1.0, 1.0, 1.0},
                              float ar = 1.0) {
  std::vector<Manifold> parts;
  vec3 p1 = {0, 0, 0};
  parts.push_back(Manifold::Sphere(ar, fn));
  float baseAg = 30;
  vec3 ag = {baseAg, baseAg + 120, baseAg + 240};
  vec3 points[] = {{cosd(ag.x) * dst.x, sind(ag.x) * dst.x, 0},
                   {cosd(ag.y) * dst.y, sind(ag.y) * dst.y, 0},
                   {cosd(ag.z) * dst.z, sind(ag.z) * dst.z, 0}};
  for (vec3 p2 : points) {
    parts.push_back(Manifold::Sphere(ar, fn).Translate(p2));
    parts.push_back(bond(fn, p1, p2, ar, ar));
  }
  return Manifold::BatchBoolean(parts, OpType::Add);
}

Manifold fccDiamond(int fn = 16, float ar = 1.0, float unitCell = 2.0,
                    float fccOffset = 0.25) {
  std::vector<Manifold> parts;
  float huc = unitCell / 2.0;
  float od = fccOffset * unitCell;
  vec3 interstitial[] = {
      {+od, +od, +od}, {+od, -od, -od}, {-od, +od, -od}, {-od, -od, +od}};
  vec3 corners[] = {{+huc, +huc, +huc},
                    {+huc, -huc, -huc},
                    {-huc, +huc, -huc},
                    {-huc, -huc, +huc}};
  vec3 fcc[] = {{+huc, 0, 0}, {-huc, 0, 0}, {0, +huc, 0},
                {0, -huc, 0}, {0, 0, +huc}, {0, 0, -huc}};
  for (auto p : corners) parts.push_back(Manifold::Sphere(ar, fn).Translate(p));

  for (auto p : fcc) parts.push_back(Manifold::Sphere(ar, fn).Translate(p));
  for (auto p : interstitial)
    parts.push_back(Manifold::Sphere(ar, fn).Translate(p));

  vec3 bonds[][2] = {{interstitial[0], corners[0]}, {interstitial[0], fcc[0]},
                     {interstitial[0], fcc[2]},     {interstitial[0], fcc[4]},
                     {interstitial[1], corners[1]}, {interstitial[1], fcc[0]},
                     {interstitial[1], fcc[3]},     {interstitial[1], fcc[5]},
                     {interstitial[2], corners[2]}, {interstitial[2], fcc[1]},
                     {interstitial[2], fcc[2]},     {interstitial[2], fcc[5]},
                     {interstitial[3], corners[3]}, {interstitial[3], fcc[1]},
                     {interstitial[3], fcc[3]},     {interstitial[3], fcc[4]}};
  for (auto b : bonds) parts.push_back(bond(fn, b[0], b[1], ar, ar));

  return Manifold::BatchBoolean(parts, OpType::Add);
}

Manifold SiCell(int fn = 16, float x = 1.0, float y = 1.0, float z = 1.0) {
  return fccDiamond(fn, AtomicRadiusSi, LatticeCellSizeSi, fccOffset)
      .Translate({LatticeCellSizeSi * x, LatticeCellSizeSi * y,
                  LatticeCellSizeSi * z});
}

Manifold SiN2Cell(int fn = 16, float x = 1.0, float y = 1.0, float z = 1.0) {
  float n2Offset = LatticeCellSizeSi / 8;
  return bondPair(fn, BondPairN2, AtomicRadiusN2)
             .Translate({LatticeCellSizeSi * x - n2Offset,
                         LatticeCellSizeSi * y + n2Offset,
                         LatticeCellSizeSi * z + n2Offset}) +
         SiCell(fn, x, y, z);
}

Manifold GraphiteCell(int fn, vec3 xyz = {1.0, 1.0, 1.0}) {
  vec3 loc = {(cellLenA * xyz.x * cosd(30) * 2),
              ((cellLenB * sind(30)) + cellLenC) * xyz.y, xyz.z};
  return hexagonalClosePacked(fn, {cellLenA, cellLenB, cellLenC}, AtomicRadiusC)
      .Translate(loc);
}

}  // namespace

namespace manifold {
Manifold CondensedMatter(int fn) {
  std::vector<Manifold> parts;
  float siOffset = 3.0 * LatticeCellSizeSi / 8.0;
  for (int x = -3; x <= 3; x++)
    for (int y = -1; y <= 2; y++)
      parts.push_back(
          GraphiteCell(fn, {x + (y % 2 == 0 ? 0.0 : 0.5), y,
                            LayerSeperationC * 0.5 + LatticeCellSizeSi * 1.5})
              .Translate({0, -siOffset, 0})
              .Rotate(0, 0, 45));

  float xyPlane[] = {-2, -1, 0, +1, +2};
  for (float x : xyPlane)
    for (float y : xyPlane) parts.push_back(SiN2Cell(fn, x, y, 1));
  return Manifold::BatchBoolean(parts, OpType::Add);
}
}  // namespace manifold

@elalish
Copy link
Owner

elalish commented Aug 14, 2023

Okay, so two different bugs? Looks like the more serious is that SimplifyTopology has a segfault. The second is that the triangulator has yet another bug, but even if it doesn't get a good geometric result, that's no excuse for anything downstream to segfault, since everything is still manifold.

@pca006132
Copy link
Collaborator Author

Yeah the segfault behavior is a very serious bug. I was not sure if the invalid triangulation can still preserve manifoldness so I did not make it a separate issue.

@pca006132
Copy link
Collaborator Author

I think I should add bound check for VecDH by default, it will slow down things a tiny bit but it is worth paying for the price.

@elalish
Copy link
Owner

elalish commented Aug 15, 2023

Yeah, probably an assert so it only compiles for debug mode, right? Because we're not going to be able to recover anyway.

@elalish
Copy link
Owner

elalish commented Aug 15, 2023

I can repro your triangulator error with CondensedMatter(16), but no seg fault. If I enable the address sanitizer, I get an out of bounds error in SparseIndices::Unique() inside Boolean3, before the triangulator even runs. Have you seen that?

@pca006132
Copy link
Collaborator Author

I actually forgot how I got the segfault, I think it is related to compiler version (the undefined behavior may not cause segfault in some cases, but it is bad enough....). I do get address sanitizer errors.

For the issue with Unique, it is because thrust Unique is buggy and can access invalid memory. That is the main reason I started switching the algorithms to pstl, but sadly OSX does not support pstl for now so we still default to thrust implementation in that case.

@pca006132
Copy link
Collaborator Author

pca006132 commented Aug 15, 2023

Ah, I think I only got the access error when using openscad, the index was -1.

Backtrace:

-1
terminate called after throwing an instance of 'std::out_of_range'
  what():  Vec out of range

Thread 1 "openscad" received signal SIGABRT, Aborted.
0x00007ffff488abc7 in __pthread_kill_implementation () from /nix/store/4nlgxhb09sdr51nc9hdm8az5b08vzkgx-glibc-2.35-163/lib/libc.so.6
(gdb) bt
#0  0x00007ffff488abc7 in __pthread_kill_implementation () from /nix/store/4nlgxhb09sdr51nc9hdm8az5b08vzkgx-glibc-2.35-163/lib/libc.so.6
#1  0x00007ffff483db46 in raise () from /nix/store/4nlgxhb09sdr51nc9hdm8az5b08vzkgx-glibc-2.35-163/lib/libc.so.6
#2  0x00007ffff48284b5 in abort () from /nix/store/4nlgxhb09sdr51nc9hdm8az5b08vzkgx-glibc-2.35-163/lib/libc.so.6
#3  0x00007ffff4ca996a in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /nix/store/mdck89nsfisflwjv6xv8ydj7dj0sj2pn-gcc-11.3.0-lib/lib/libstdc++.so.6
#4  0x00007ffff4cb4f9a in __cxxabiv1::__terminate(void (*)()) () from /nix/store/mdck89nsfisflwjv6xv8ydj7dj0sj2pn-gcc-11.3.0-lib/lib/libstdc++.so.6
#5  0x00007ffff4cb5005 in std::terminate() () from /nix/store/mdck89nsfisflwjv6xv8ydj7dj0sj2pn-gcc-11.3.0-lib/lib/libstdc++.so.6
#6  0x00007ffff4cb5258 in __cxa_throw () from /nix/store/mdck89nsfisflwjv6xv8ydj7dj0sj2pn-gcc-11.3.0-lib/lib/libstdc++.so.6
#7  0x00000000004dfc2e in manifold::VecView<manifold::Halfedge>::operator[] (i=<optimized out>, this=<optimized out>) at /home/pca006132/code/openscad/submodules/manifold/src/utilities/include/vec.h:69
#8  manifold::Manifold::Impl::UpdateVert (this=this@entry=0x7ffffffe8140, vert=vert@entry=92318, startEdge=startEdge@entry=1, endEdge=endEdge@entry=899) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/edge_op.cpp:315
#9  0x0000000000fe6640 in manifold::Manifold::Impl::FormLoop (this=0x7ffffffe8140, current=5266, end=5249) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/edge_op.cpp:335
#10 0x0000000000fe7ae4 in manifold::Manifold::Impl::CollapseEdge (this=this@entry=0x7ffffffe8140, edge=edge@entry=2, edges=...) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/edge_op.cpp:479
#11 0x0000000000fea377 in manifold::Manifold::Impl::SimplifyTopology (this=this@entry=0x7ffffffe8140) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/edge_op.cpp:185
#12 0x0000000000fae3c5 in manifold::Boolean3::Result (this=this@entry=0x7ffffffe84a0, op=<optimized out>) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/boolean_result.cpp:766
#13 0x0000000000fd561e in manifold::CsgOpNode::BatchBoolean (operation=<optimized out>, operation@entry=manifold::OpType::Add, results=...) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/csg_tree.cpp:461
#14 0x0000000000fd6838 in manifold::CsgOpNode::BatchUnion (this=this@entry=0x2e803235970) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/csg_tree.cpp:583
#15 0x0000000000fd800f in manifold::CsgOpNode::ToLeafNode (this=0x2e803235970) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/csg_tree.cpp:393
#16 0x0000000000f29b5f in manifold::Manifold::GetCsgLeafNode (this=0x2e804d200d0) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/manifold.cpp:110
#17 0x0000000000f29d9f in manifold::Manifold::IsEmpty (this=<optimized out>) at /home/pca006132/code/openscad/submodules/manifold/src/manifold/src/manifold.cpp:352
#18 0x000000000091dd1b in GeometryEvaluator::applyToChildren3D (this=this@entry=0x7ffffffe9a20, node=..., op=op@entry=OpenSCADOperator::UNION) at /home/pca006132/code/openscad/src/geometry/GeometryEvaluator.cc:169
#19 0x000000000091f4a2 in GeometryEvaluator::applyToChildren (this=this@entry=0x7ffffffe9a20, node=..., op=op@entry=OpenSCADOperator::UNION) at /home/pca006132/code/openscad/src/geometry/GeometryEvaluator.cc:116
#20 0x000000000091f56e in GeometryEvaluator::visit (this=0x7ffffffe9a20, state=..., node=...) at /home/pca006132/code/openscad/src/geometry/GeometryEvaluator.cc:473
#21 0x00000000007512ea in NodeVisitor::traverse (this=0x7ffffffe9a20, node=..., state=...) at /home/pca006132/code/openscad/src/core/NodeVisitor.cc:30
#22 0x0000000000751240 in NodeVisitor::traverse (this=0x7ffffffe9a20, node=..., state=...) at /nix/store/1gf2flfqnpqbr1b4p4qz2f72y42bs56r-gcc-11.3.0/include/c++/11.3.0/bits/shared_ptr_base.h:1295
#23 0x0000000000751240 in NodeVisitor::traverse (this=0x7ffffffe9a20, node=..., state=...) at /nix/store/1gf2flfqnpqbr1b4p4qz2f72y42bs56r-gcc-11.3.0/include/c++/11.3.0/bits/shared_ptr_base.h:1295
#24 0x0000000000751240 in NodeVisitor::traverse (this=this@entry=0x7ffffffe9a20, node=..., state=...) at /nix/store/1gf2flfqnpqbr1b4p4qz2f72y42bs56r-gcc-11.3.0/include/c++/11.3.0/bits/shared_ptr_base.h:1295
#25 0x000000000091c4c2 in GeometryEvaluator::evaluateGeometry (this=this@entry=0x7ffffffe9a20, node=..., allownef=allownef@entry=true) at /home/pca006132/code/openscad/src/geometry/GeometryEvaluator.cc:64
#26 0x00000000005d4d9a in do_export (cmd=..., render_variables=..., curFormat=curFormat@entry=FileFormat::ASCIISTL, root_file=0x2e803190540) at /nix/store/1gf2flfqnpqbr1b4p4qz2f72y42bs56r-gcc-11.3.0/include/c++/11.3.0/bits/shared_ptr_base.h:1295
#27 0x00000000005d6c89 in cmdline (cmd=...) at /home/pca006132/code/openscad/src/openscad.cc:448
#28 0x00000000004eb84a in main (argc=<optimized out>, argv=<optimized out>) at /home/pca006132/code/openscad/src/openscad.cc:1225

@elalish
Copy link
Owner

elalish commented Aug 15, 2023

Any chance you can write a test that'll fail this way on our CI?

@elalish
Copy link
Owner

elalish commented Aug 15, 2023

I think I mentioned this before, but this is another reason I'd like to switch SparseIndices from relying on sort and unique to just using a hash table. It should be faster, simpler, and probably allow us to remove these weird thrust calls entirely.

@pca006132
Copy link
Collaborator Author

Any chance you can write a test that'll fail this way on our CI?

Yeah I can try to do that tmr. I think this is related to how their spheres and cylinders are constructed, as they are using constructors in openscad instead of ours so there will be subtle differences.

@elalish
Copy link
Owner

elalish commented Aug 15, 2023

Ah yes, that makes sense. Now that we have a way to import 3D model files into our tests, hopefully you can just export the models from OpenSCAD and use them directly.

@pca006132
Copy link
Collaborator Author

while modifying the examples I found a bug: 582c85a

originally this can evaluate to 0 when fn is not specified, and causes the triangulator to crash.

@pca006132
Copy link
Collaborator Author

No, I am unable to reproduce the problem in our tests...

@elalish
Copy link
Owner

elalish commented Aug 16, 2023

Hmm, that's troubling... Well, I'm working on the triangulator refactor now, though that'll likely take some time. I can't make much progress on the decimator until I have a repro - maybe we'll luck out and it'll be randomly fixed by another merge.

@pca006132
Copy link
Collaborator Author

OK I got a simpler reproduce, but it is non-deterministic...
Boolean.Sweep under the correct condition will trigger the out of bound bug in FormLoop, basically you need to run the test 5 times or more, and add the following patch so the debugger can be triggered (as tbb will mess up with the stack trace otherwise...):

--- a/src/utilities/include/vec.h
+++ b/src/utilities/include/vec.h
@@ -59,12 +59,20 @@ class VecView {
   operator VecView<const T>() const { return {ptr_, size_}; }
 
   inline const T &operator[](int i) const {
-    if (i < 0 || i >= size_) throw std::out_of_range("Vec out of range");
+    if (i < 0 || i >= size_) {
+      printf("i: %d, size: %d\n", i, size_);
+      __builtin_trap();
+      throw std::out_of_range("Vec out of range");
+    }
     return ptr_[i];
   }
 
   inline T &operator[](int i) {
-    if (i < 0 || i >= size_) throw std::out_of_range("Vec out of range");
+    if (i < 0 || i >= size_) {
+      printf("i: %d, size: %d\n", i, size_);
+      __builtin_trap();
+      throw std::out_of_range("Vec out of range");
+    }
     return ptr_[i];
   }

The out of range are always -1 or -3, so I think we should probably add more check to make sure that the halfedge is not already removed...

Also, I implemented multithreaded CollapseEdge, but the performance is not great: the performance is slightly better on my machine with 20 cores (50ms faster on average? over something that took 2400ms), and slower when the number of cores decreases. I double checked to make sure that my partition implementation should be pretty fast, so I guess the problem is just too hard to parallelize (at least for now).

@elalish
Copy link
Owner

elalish commented Aug 22, 2023

I ran ./manifold_test --gtest_filter=Boolean.Sweep at least 20 times but never saw a failure. This was in TBB mode, with MANIFOLD_DEBUG enabled? Still, I agree that some more bounds checks are a good idea, though I'd really love to get a consistent repro and figure out where my logic is flawed.

I am seeing the test fail hard with address sanitizer enabled, but at unique. That seems like a pretty serious issue, if for no other reason than it makes address sanitizer unusable for debug. As far as I can tell, unique is only called in one line of code; I think it shouldn't be too hard to remove, though I still think the right way is to switch SparseIndicies to a hash table.

@pca006132
Copy link
Collaborator Author

Interesting, I guess the problem with concurrency issue is that it can be hard to reproduce...
For unique, you can replace the call with a sequential one. The thrust one is known to be buggy, but normally it will not trigger a segfault. Maybe we should switch to the std sequential implementation for platforms that does not have pstl as well.

I actually tried to switch SparseIndices to a hash table, at least in Kernel12. The performance for Kernel12 is now much faster, but we first have to build the hash table and the build is pretty slow (using phmap, which is faster than std::unordered_map alreadh). There are several reasons that forbid using hash table everywhere:

  1. Parallel insertion is not supported even for those parallel hashmaps. The best you can do is to use submap mutex, but those are costly. We can use our own parallel hashmap implementation, but will need some time to refactor that to support complex keys.
  2. Hashmaps does not support random iterators, and stl/phmap hashmaps cannot provide a range that can split. Again, using our own hashmap implementation + tbb can work around this.

@elalish
Copy link
Owner

elalish commented Aug 22, 2023

Agreed we should use our existing parallel hashmap. Does it need a refactor? Despite all the code, what SparseIndices is doing is actually quite simple: it's just a list of (p, q) pairs where data exists (the sparse entries of a matrix), and those various data entries are found because all related value vectors are implicitly ordered the same way as the sparse list, which happens to be sorted to make binary searching possible. I believe a hashmap with key = (p << 32) + q and value being the index (into all the value vectors) should be basically a drop-in replacement. What do you think?

@elalish
Copy link
Owner

elalish commented Aug 22, 2023

And for unique, it might be simpler to just iterate through the hashmap and add the p part of the key to a new hashmap that will act as a Set.

@pca006132
Copy link
Collaborator Author

Indeed, that should work. I was thinking about using pair as the key type but it is not really needed.

@pca006132
Copy link
Collaborator Author

I ran ./manifold_test --gtest_filter=Boolean.Sweep at least 20 times but never saw a failure. This was in TBB mode, with MANIFOLD_DEBUG enabled? Still, I agree that some more bounds checks are a good idea, though I'd really love to get a consistent repro and figure out where my logic is flawed.

I am seeing the test fail hard with address sanitizer enabled, but at unique. That seems like a pretty serious issue, if for no other reason than it makes address sanitizer unusable for debug. As far as I can tell, unique is only called in one line of code; I think it shouldn't be too hard to remove, though I still think the right way is to switch SparseIndicies to a hash table.

In case this is helpful:

Note: Google Test filter = Boolean.Sweep
[==========] Running 1 test from 1 test suite.
[----------] 1 test from Boolean
[ RUN      ] Boolean.Sweep
UpdateVert -1
i = -1, n = 58938

Thread 1 "manifold_test" received signal SIGILL, Illegal instruction.
0x00005555556b5baa in manifold::Manifold::Impl::UpdateVert (this=this@entry=0x7fffffff7b28, vert=<optimized out>, startEdge=startEdge@entry=7, endEdge=<optimized out>) at /home/pca006132/code/manifold/src/utilities/include/vec.h:72
72	     printf("i = %d, n = %d\n", i, size_);
(gdb) bt
#0  0x00005555556b5baa in manifold::Manifold::Impl::UpdateVert (this=this@entry=0x7fffffff7b28, vert=<optimized out>, startEdge=startEdge@entry=7, endEdge=<optimized out>) at /home/pca006132/code/manifold/src/utilities/include/vec.h:72
#1  0x00005555556b5df2 in manifold::Manifold::Impl::FormLoop (this=0x7fffffff7b28, current=1, end=28805) at /home/pca006132/code/manifold/src/manifold/src/edge_op.cpp:340
#2  0x00005555556b4e45 in manifold::Manifold::Impl::CollapseEdge (this=this@entry=0x7fffffff7b28, edge=edge@entry=1, edges=...) at /home/pca006132/code/manifold/src/manifold/src/edge_op.cpp:484
#3  0x00005555556b2db6 in manifold::Manifold::Impl::SimplifyTopology (this=0x7fffffff7b28) at /home/pca006132/code/manifold/src/manifold/src/edge_op.cpp:185
#4  0x0000555555753dfa in manifold::Boolean3::Result (this=0x7fffffff7c58, op=manifold::OpType::Add) at /home/pca006132/code/manifold/src/manifold/src/boolean_result.cpp:772
#5  0x0000555555694cd7 in manifold::CsgOpNode::BatchBoolean(manifold::OpType, std::vector<std::shared_ptr<manifold::Manifold::Impl const>, std::allocator<std::shared_ptr<manifold::Manifold::Impl const> > >&)::$_2::operator()() const::{lambda()#1}::operator()() const (this=0x5555561161c0) at /home/pca006132/code/manifold/src/manifold/src/csg_tree.cpp:481
#6  tbb::detail::d2::(anonymous namespace)::task_ptr_or_nullptr<manifold::CsgOpNode::BatchBoolean(manifold::OpType, std::vector<std::shared_ptr<manifold::Manifold::Impl const>, std::allocator<std::shared_ptr<manifold::Manifold::Impl const> > >&)::$_2::operator()() const::{lambda()#1} const&>(manifold::CsgOpNode::BatchBoolean(manifold::OpType, std::vector<std::shared_ptr<manifold::Manifold::Impl const>, std::allocator<std::shared_ptr<manifold::Manifold::Impl const> > >&)::$_2::operator()() const::{lambda()#1} const&) (f=...) at _deps/tbb-src/src/tbb/../../include/oneapi/tbb/detail/../task_group.h:132
#7  tbb::detail::d1::function_task<manifold::CsgOpNode::BatchBoolean(manifold::OpType, std::vector<std::shared_ptr<manifold::Manifold::Impl const>, std::allocator<std::shared_ptr<manifold::Manifold::Impl const> > >&)::$_2::operator()() const::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) (this=0x555556116180, ed=...) at _deps/tbb-src/src/tbb/../../include/oneapi/tbb/detail/../task_group.h:452
#8  0x00005555557b3fc0 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (this=this@entry=0x5555559a1300, t=0x555556116180, t@entry=0x7fffffff80c0, waiter=...)
    at _deps/tbb-src/src/tbb/../../include/oneapi/tbb/task_group.h:383
#9  0x00005555557b2696 in tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (this=0x5555559a1300, t=0x7fffffff80c0, waiter=...) at _deps/tbb-src/src/tbb/task_dispatcher.h:458
#10 tbb::detail::r1::task_dispatcher::execute_and_wait (t=0x7fffffff80c0, wait_ctx=..., w_ctx=...) at /home/pca006132/code/manifold/build/_deps/tbb-src/src/tbb/task_dispatcher.cpp:168
#11 0x00005555556af62c in tbb::detail::d1::execute_and_wait (t=..., t_ctx=..., wait_ctx=..., w_ctx=...) at _deps/tbb-src/src/tbb/../../include/oneapi/tbb/detail/_task.h:191
#12 tbb::detail::d1::task_group_base::internal_run_and_wait<std::function<void ()> >(std::function<void ()> const&)::{lambda()#1}::operator()() const (this=<optimized out>) at _deps/tbb-src/src/tbb/../../include/oneapi/tbb/detail/../task_group.h:504
#13 tbb::detail::d0::try_call_proxy<tbb::detail::d1::task_group_base::internal_run_and_wait<std::function<void ()> >(std::function<void ()> const&)::{lambda()#1}>::on_completion<tbb::detail::d1::task_group_base::internal_run_and_wait<std::function<void ()> >(std::function<void ()> const&)::{lambda()#2}>(tbb::detail::d1::task_group_base::internal_run_and_wait<std::function<void ()> >(std::function<void ()> const&)::{lambda()#2}) (this=this@entry=0x7fffffff7ec0, on_completion_body=...)
    at _deps/tbb-src/src/tbb/../../include/oneapi/tbb/detail/_template_helpers.h:230
#14 0x000055555568b802 in tbb::detail::d1::task_group_base::internal_run_and_wait<std::function<void ()> >(std::function<void ()> const&) (this=0x7fffffff8020, f=...) at _deps/tbb-src/src/tbb/../../include/oneapi/tbb/detail/../task_group.h:505
#15 tbb::detail::d1::task_group::run_and_wait<std::function<void ()> >(std::function<void ()> const&) (this=0x7fffffff8020, f=...) at _deps/tbb-src/src/tbb/../../include/oneapi/tbb/detail/../task_group.h:623
#16 manifold::CsgOpNode::BatchBoolean (operation=operation@entry=manifold::OpType::Add, results=...) at /home/pca006132/code/manifold/src/manifold/src/csg_tree.cpp:486
#17 0x000055555568a8b8 in manifold::CsgOpNode::BatchUnion (this=this@entry=0x555555ab58f0) at /home/pca006132/code/manifold/src/manifold/src/csg_tree.cpp:582
#18 0x0000555555688004 in manifold::CsgOpNode::ToLeafNode (this=0x555555ab58f0) at /home/pca006132/code/manifold/src/manifold/src/csg_tree.cpp:391
#19 0x00005555556ebc95 in manifold::Manifold::GetCsgLeafNode (this=0x7fffffff8820) at /home/pca006132/code/manifold/src/manifold/src/manifold.cpp:110
#20 0x00005555556efabb in manifold::Manifold::GetProperties (this=0x7fffffff7010) at /home/pca006132/code/manifold/src/manifold/src/manifold.cpp:419
#21 0x00005555556598ec in Boolean_Sweep_Test::TestBody (this=<optimized out>) at /home/pca006132/code/manifold/test/boolean_test.cpp:901
#22 0x00007ffff7fa7cad in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) () from /nix/store/gxkiqnfy25f4pplmy5hjimzi5zifsiyx-gtest-1.12.1/lib/libgtest.so.1.12.1
#23 0x00007ffff7f9647e in testing::Test::Run() () from /nix/store/gxkiqnfy25f4pplmy5hjimzi5zifsiyx-gtest-1.12.1/lib/libgtest.so.1.12.1
#24 0x00007ffff7f96635 in testing::TestInfo::Run() () from /nix/store/gxkiqnfy25f4pplmy5hjimzi5zifsiyx-gtest-1.12.1/lib/libgtest.so.1.12.1
#25 0x00007ffff7f96755 in testing::TestSuite::Run() () from /nix/store/gxkiqnfy25f4pplmy5hjimzi5zifsiyx-gtest-1.12.1/lib/libgtest.so.1.12.1
#26 0x00007ffff7f9d58f in testing::internal::UnitTestImpl::RunAllTests() () from /nix/store/gxkiqnfy25f4pplmy5hjimzi5zifsiyx-gtest-1.12.1/lib/libgtest.so.1.12.1
#27 0x00007ffff7f9691b in testing::UnitTest::Run() () from /nix/store/gxkiqnfy25f4pplmy5hjimzi5zifsiyx-gtest-1.12.1/lib/libgtest.so.1.12.1
#28 0x00005555556632d1 in RUN_ALL_TESTS () at /nix/store/fzpckgzg6ra074f6mxg770hykm8gc7da-gtest-1.12.1-dev/include/gtest/gtest.h:2293
#29 main (argc=<optimized out>, argv=0x7fffffff8d98) at /home/pca006132/code/manifold/test/test_main.cpp:83
(gdb) f 1
#1  0x00005555556b5df2 in manifold::Manifold::Impl::FormLoop (this=0x7fffffff7b28, current=1, end=28805) at /home/pca006132/code/manifold/src/manifold/src/edge_op.cpp:340
340	 UpdateVert(startVert, oldMatch, newMatch);
(gdb) list
335	 int newMatch = halfedge_[end].pairedHalfedge;
336	
337	 if (oldMatch == -1) printf("oldMatch == -1\n");
338	 if (newMatch == -1) printf("newMatch == -1\n");
339	
340	 UpdateVert(startVert, oldMatch, newMatch);
341	 UpdateVert(endVert, end, current);
342	
343	 halfedge_[current].pairedHalfedge = newMatch;
344	 halfedge_[newMatch].pairedHalfedge = current;
(gdb) p end
$1 = 28805
(gdb) p endVert
$2 = <optimized out>
(gdb) p current
$3 = 1

@pca006132
Copy link
Collaborator Author

pca006132 commented Aug 23, 2023

Oh it seems that this is triggered by triangulation failure, fixing the triangulator should hopefully fix this...

(reverting 8011fb6 fixes the issue)

@pca006132 pca006132 mentioned this issue Aug 23, 2023
@pca006132
Copy link
Collaborator Author

I sometimes got this as well:

Error in file: /home/pca006132/code/manifold/src/polygon/src/polygon.cpp (103): 'std::distance(forward.begin(), end) == n_edges' is false: Half of halfedges should be forward.
polys.push_back({
    {20.1267338, 2.00670671},  //
    {20.12673, 2.00670576},  //
    {19.7702141, 1.63188827},  //
    {19.7702141, 1.63188827},  //
    {19.7702141, 1.63188827},  //
    {19.5752983, 1.81728792},  //
    {19.5623493, 1.81283963},  //
    {19.7630997, 1.54846776},  //
    {19.7630997, 1.54846776},  //
});
polys.push_back({
    {19.7702141, 1.63188827},  //
    {19.898922, 1.92845321},  //
    {19.5752983, 1.81728792},  //
});
polys.push_back({
    {19.7630997, 1.54846776},  //
    {19.7630997, 1.54846776},  //
    {19.7630997, 1.54846776},  //
});
array([
  [20.1267338, 2.00670671],
  [20.12673, 2.00670576],
  [19.7702141, 1.63188827],
  [19.7702141, 1.63188827],
  [19.7702141, 1.63188827],
  [19.5752983, 1.81728792],
  [19.5623493, 1.81283963],
  [19.7630997, 1.54846776],
  [19.7630997, 1.54846776],
])
array([
  [19.7702141, 1.63188827],
  [19.898922, 1.92845321],
  [19.5752983, 1.81728792],
])
array([
  [19.7630997, 1.54846776],
  [19.7630997, 1.54846776],
  [19.7630997, 1.54846776],
])

@elalish
Copy link
Owner

elalish commented Aug 23, 2023

Thanks, that looks actionable! I'll take a look.

@pca006132
Copy link
Collaborator Author

I am now pretty sure that:

  1. The -1 index in UpdateVert is caused by the above topology error. I never encounter that issue when MANIFOLD_DEBUG is on.
  2. The topology error 'halfedge_.size() % 6 == 0' is false: Not an even number of faces after sorting faces! seems to be another thing caused by the decimator problem, but this is very rare. It seems that the bug is fixed by using stable sort but I have no idea why. I just verified that this works by adding the following trap (immediate stops execution and call gdb) and run gdb --args ./test/manifold_test --gtest_filter=Boolean.Sweep --gtest_repeat=10000...
diff --git a/src/manifold/src/edge_op.cpp b/src/manifold/src/edge_op.cpp
index 4e3db36..9f69dff 100644
--- a/src/manifold/src/edge_op.cpp
+++ b/src/manifold/src/edge_op.cpp
@@ -158,7 +158,7 @@ void Manifold::Impl::SimplifyTopology() {
     entries[i].index = i;
   });
 
-  sort(policy, entries.begin(), entries.end());
+  stable_sort(policy, entries.begin(), entries.end());
   for (int i = 0; i < nbEdges - 1; ++i) {
     if (entries[i].start == entries[i + 1].start &&
         entries[i].end == entries[i + 1].end) {

diff --git a/src/manifold/src/sort.cpp b/src/manifold/src/sort.cpp
index 13deab4..007b305 100644
--- a/src/manifold/src/sort.cpp
+++ b/src/manifold/src/sort.cpp
@@ -261,6 +261,7 @@ void Manifold::Impl::Finish() {
   if (halfedge_.size() == 0) return;
   CompactProps();
 
+  if (halfedge_.size() % 6 != 0) __builtin_trap();
   ASSERT(halfedge_.size() % 6 == 0, topologyErr,
          "Not an even number of faces after sorting faces!");
  1. CollapseTri and RemoveIfFolded can also encounter -1 index, and the following patch seems to fix them (not sure if the output is still correct)
diff --git a/src/manifold/src/edge_op.cpp b/src/manifold/src/edge_op.cpp
index 4e3db36..9161fe2 100644
--- a/src/manifold/src/edge_op.cpp
+++ b/src/manifold/src/edge_op.cpp
@@ -343,6 +351,8 @@ void Manifold::Impl::FormLoop(int current, int end) {
 }
 
 void Manifold::Impl::CollapseTri(const glm::ivec3& triEdge) {
+  if (halfedge_[triEdge[1]].pairedHalfedge == -1)
+    return;
   int pair1 = halfedge_[triEdge[1]].pairedHalfedge;
   int pair2 = halfedge_[triEdge[2]].pairedHalfedge;
   halfedge_[pair1].pairedHalfedge = pair2;
@@ -355,6 +365,8 @@ void Manifold::Impl::CollapseTri(const glm::ivec3& triEdge) {
 void Manifold::Impl::RemoveIfFolded(int edge) {
   const glm::ivec3 tri0edge = TriOf(edge);
   const glm::ivec3 tri1edge = TriOf(halfedge_[edge].pairedHalfedge);
+  if (halfedge_[tri0edge[1]].pairedHalfedge == -1)
+    return;
   if (halfedge_[tri0edge[1]].endVert == halfedge_[tri1edge[1]].endVert) {
     if (halfedge_[tri0edge[1]].pairedHalfedge == tri1edge[2]) {
       if (halfedge_[tri0edge[2]].pairedHalfedge == tri1edge[1]) {

@pca006132
Copy link
Collaborator Author

pca006132 commented Aug 23, 2023

Alternatively we can use sort, but only dedupe the smaller indices:

diff --git a/src/manifold/src/edge_op.cpp b/src/manifold/src/edge_op.cpp
index 4e3db36..cae7f74 100644
--- a/src/manifold/src/edge_op.cpp
+++ b/src/manifold/src/edge_op.cpp
@@ -162,7 +162,8 @@ void Manifold::Impl::SimplifyTopology() {
   for (int i = 0; i < nbEdges - 1; ++i) {
     if (entries[i].start == entries[i + 1].start &&
         entries[i].end == entries[i + 1].end) {
-      DedupeEdge(entries[i].index);
+      DedupeEdge(std::min(entries[i].index, entries[i + 1].index));
+      entries[i + 1].index = std::max(entries[i].index, entries[i + 1].index);
       numFlagged++;
     }
   }

And I can confirm that 1, 2 and 3 are all independent from each other. Fixing one will not fix another.

I feel like I am starting to use my brain now.

@elalish
Copy link
Owner

elalish commented Aug 23, 2023

I like your solution - let's get that into a PR, close this confusing thread and start some new issues for the remaining problems independently.

@elalish
Copy link
Owner

elalish commented Aug 23, 2023

And extra points if you can manage to add a test first that allows our CI to catch this problem(s) and verify a fix.

@pca006132
Copy link
Collaborator Author

And extra points if you can manage to add a test first that allows our CI to catch this problem(s) and verify a fix.

For 2, one way would be to sort with descending ID in order to verify that ID ordering is indeed important. But I don't think it would be very helpful to put this to the CI.

For 3, I don't think it is possible to reproduce in the CI (reliably and easily). The only way I can think of is to record a trace, but this can only let us reproduce a specific instance, and cannot prove that our fix indeed fixes the problem.

@elalish
Copy link
Owner

elalish commented Aug 23, 2023

Yeah, I was afraid of that, but figured I'd ask 😄

This was referenced Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants