OpenMP issue with complex NURBS #656
SolveSpace version: 3.0~188b2e2
Operating system: Fedora 31
CPU AMD 2400G - 4 core 8 thread.
Software should work when compiled with OpenMP
The attached model seems to work fine until the depth of the hole is dragged. When dragging, the top surface (which is 2 patches) can get distorted. Not just naked edges, but the geometry of the surface changes as seen in the picture:
This model was created to demonstrate NURBS boolean issues. Failures normally occur when the hole intersects the seam between the two top surface patches - that's an expected failure. The issue here is that the 2 surface patches change shape when the hole depth is changed. As soon as the hole position is moved even a little the surfaces revert to normal. It is also strange that both surfaces can change shape even if only one of them contains the hole. This distortion only happens when using OpenMP.
Reverting to commit 9802b5d which is prior to the mimalloc changes did not help - The problem exists there as well (possibly worse but that's terribly subjective).
Compiling single-threaded "fixes" the issue.
The text was updated successfully, but these errors were encountered:
Confirmed that this is due to the use of OMP in Shell::CopySurfacesTrimAgainst() Commented out 2 #pragma in my issue656 branch.
I'll create a PR after I let this sit. I don't want to look any further, but things like this sometimes pull me in. This is some of the most complex code in SolveSpace and it's not easy to see where or why this is a problem. Fixing it to allow OMP might be simple or complicated.
This is one reason I want to write a new NURBS kernel in Rust.
Some updates and thoughts for anyone following along, and to write it down for myself.
The buggy model consists of 4 groups.
I'm assuming that group 3 creates a copy of the shell from group 2. Then the boolean is done with that copy of the shell and the cylindrical shell from group 4. The top surfaces from the group 3 shell are corrupted during the boolean operation and the damage continues as subsequent booleans are done when group 4 is changed (the length of the extrusion). Then if I move the hole by dragging the circle in group 3, a fresh copy of the group 2 shell is created for group 3 which restores the correct shape prior to more damage accumulating.
So what is corrupting the data defining the surfaces of that shell?
The boolean process looks like this - in function MakeFromBoolean():
Each surface has a handle. When a copy is made, the original surface contains a member "newH" which is the handle the copy will recieve. This handle is created when the new surface is added to the new shell. The call to add the surface to the new shell is inside an OMP critical section because updating the list of surfaces is not thread safe. As far as I can tell, that and creating the BSP trees are the only operations that change a surface object from the original shells.
Trim curves each have 2 handles because they are shared between 2 surfaces. The new (possibly split) curves in the new shell are copies of the originals so they have handles for surfaces in the old shells. Step 7 simply looks up these handles to point at the new surfaces created in step 6 (which is why the newH had to be updated there).
SolveSpace is written in so that existing data structures are rarely modified. The handle updates and BSP creation really are the only places I see this happening. I checked over this prior to introducing OMP and now have gone over it again with nothing found.
Other thoughts: 2 and 3 can probably do curves in parallel. Step 5 seems redundant - the new shell should contain all the curves that are needed and none that are not.
Update: I've narrowed this down to the following call:
Placing a critical around the first instance of that inside MakeCopyTrimAgainst() seems to fix it. There are two calls though and if one isn't thread-safe then the other isn't. Not sure what isn't safe inside that function yet. Leaving this mostly as a reminder for myself as I have to stop for now.
SShell::ClassifyEdge uses random() which is possibly not thread safe - I hear it can cause cache related slowdowns when used from multiple threads. More later.
The problem is in raycast.cpp function SSurface::SplitInHalf() which modifies the weights of the control points of a surface by multiplying by their weights. Then it puts them back when it's done by dividing by the weights. This isn't really consistent with the way SolveSpace is written so I'll fix it rather than #pragma around it.