Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP issue with complex NURBS #656

Closed
phkahler opened this issue Jul 21, 2020 · 4 comments · Fixed by #665
Closed

OpenMP issue with complex NURBS #656

phkahler opened this issue Jul 21, 2020 · 4 comments · Fixed by #665

Comments

@phkahler
Copy link
Member

phkahler commented Jul 21, 2020

System information

SolveSpace version: 3.0~188b2e2

Operating system: Fedora 31

CPU AMD 2400G - 4 core 8 thread.

Expected behavior

Software should work when compiled with OpenMP

Actual behavior

The attached model seems to work fine until the depth of the hole is dragged. When dragging, the top surface (which is 2 patches) can get distorted. Not just naked edges, but the geometry of the surface changes as seen in the picture:

parallel_bug

Revolve120.zip

Additional information

This model was created to demonstrate NURBS boolean issues. Failures normally occur when the hole intersects the seam between the two top surface patches - that's an expected failure. The issue here is that the 2 surface patches change shape when the hole depth is changed. As soon as the hole position is moved even a little the surfaces revert to normal. It is also strange that both surfaces can change shape even if only one of them contains the hole. This distortion only happens when using OpenMP.

Reverting to commit 9802b5d which is prior to the mimalloc changes did not help - The problem exists there as well (possibly worse but that's terribly subjective).

Compiling single-threaded "fixes" the issue.

phkahler added a commit to phkahler/solvespace that referenced this issue Jul 21, 2020
@phkahler
Copy link
Member Author

phkahler commented Jul 21, 2020

Confirmed that this is due to the use of OMP in Shell::CopySurfacesTrimAgainst() Commented out 2 #pragma in my issue656 branch.

I'll create a PR after I let this sit. I don't want to look any further, but things like this sometimes pull me in. This is some of the most complex code in SolveSpace and it's not easy to see where or why this is a problem. Fixing it to allow OMP might be simple or complicated.

This is one reason I want to write a new NURBS kernel in Rust.

@phkahler
Copy link
Member Author

phkahler commented Jul 22, 2020

Some updates and thoughts for anyone following along, and to write it down for myself.

The buggy model consists of 4 groups.

  1. sketch
  2. revolve
  3. sketch
  4. extrude - as difference

I'm assuming that group 3 creates a copy of the shell from group 2. Then the boolean is done with that copy of the shell and the cylindrical shell from group 4. The top surfaces from the group 3 shell are corrupted during the boolean operation and the damage continues as subsequent booleans are done when group 4 is changed (the length of the extrusion). Then if I move the hole by dragging the circle in group 3, a fresh copy of the group 2 shell is created for group 3 which restores the correct shape prior to more damage accumulating.

So what is corrupting the data defining the surfaces of that shell?

The boolean process looks like this - in function MakeFromBoolean():

  1. make a BSP tree for every surface in both shells (this part is done in parallel - not the problem)
  2. Make a copy of all the curves in shell A split against shell B
  3. Make a copy of all curves in B split against shell A
  4. Find all curves formed by the intersection of surfaces from A and B
  5. Make new BSPs for surfaces in A and B (also done in parallel - not the problem)
  6. Copy surfaces from A and B into the new shell - CopySurfacesTrimAgainst() (run in parallel causes a problem)
  7. rewrite handles for curves in the new shell
    Steps 2,3 and 4 are still single threaded (may be room for more OMP)

Each surface has a handle. When a copy is made, the original surface contains a member "newH" which is the handle the copy will recieve. This handle is created when the new surface is added to the new shell. The call to add the surface to the new shell is inside an OMP critical section because updating the list of surfaces is not thread safe. As far as I can tell, that and creating the BSP trees are the only operations that change a surface object from the original shells.

Trim curves each have 2 handles because they are shared between 2 surfaces. The new (possibly split) curves in the new shell are copies of the originals so they have handles for surfaces in the old shells. Step 7 simply looks up these handles to point at the new surfaces created in step 6 (which is why the newH had to be updated there).

SolveSpace is written in so that existing data structures are rarely modified. The handle updates and BSP creation really are the only places I see this happening. I checked over this prior to introducing OMP and now have gone over it again with nothing found.

Other thoughts: 2 and 3 can probably do curves in parallel. Step 5 seems redundant - the new shell should contain all the curves that are needed and none that are not.

@phkahler
Copy link
Member Author

phkahler commented Jul 31, 2020

Update: I've narrowed this down to the following call:

        agnst->ClassifyEdge(&indir_shell, &outdir_shell,
                            ret.PointAt(auv), ret.PointAt(buv), pt,
                            enin, enout, surfn);

Placing a critical around the first instance of that inside MakeCopyTrimAgainst() seems to fix it. There are two calls though and if one isn't thread-safe then the other isn't. Not sure what isn't safe inside that function yet. Leaving this mostly as a reminder for myself as I have to stop for now.

SShell::ClassifyEdge uses random() which is possibly not thread safe - I hear it can cause cache related slowdowns when used from multiple threads. More later.

@phkahler
Copy link
Member Author

phkahler commented Jul 31, 2020

The problem is in raycast.cpp function SSurface::SplitInHalf() which modifies the weights of the control points of a surface by multiplying by their weights. Then it puts them back when it's done by dividing by the weights. This isn't really consistent with the way SolveSpace is written so I'll fix it rather than #pragma around it.

phkahler added a commit to phkahler/solvespace that referenced this issue Jul 31, 2020
phkahler added a commit to phkahler/solvespace that referenced this issue Aug 1, 2020
phkahler added a commit to phkahler/solvespace that referenced this issue Aug 1, 2020
SSurface::SplitInHalf was modifying the surface and then restoring it at the end. Make temporary copy instead.
@phkahler phkahler linked a pull request Aug 1, 2020 that will close this issue
phkahler added a commit that referenced this issue Aug 1, 2020
SSurface::SplitInHalf was modifying the surface and then restoring it at the end. Make temporary copy instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant