Poly2TriTriangulator refinement fixes#3572
Conversation
|
The compiler warnings are easy to fix, but looks like we have at least three regoldings coming in MOOSE. |
|
I'm pulling this branch to work on some of my test cases, will report the results once I have them |
|
@roystgnr Devel mode reactor compiled with this branch I was able to mesh my gHPMR_2d_core_v33.i without any assertions or errors. |
|
Unfortunately my MHTGR mesh seems to error out, but it does that with the current version of moose/libmesh, so I need to troubleshoot that. |
|
Ok I was able to run my MHTGR mesh with this branch also. Error was being caused by a boundary I defined being renamed during stitching and me trying to use the old name. Works fine now. |
|
@roystgnr Decided to run gHPMR_2d_core_v33.i in parallel and it just hangs or is just taking a lot longer, in serial it runs in 15 seconds, with four processors I've been waiting for several minutes just getting |
|
In serial even in devel mode it runs in 9 seconds for me, on 4 procs I get segfaults or assertion failures (boy I love parallel race conditions!) after more like 40 seconds. Looking into it now. |
|
2 procs is enough to trigger failure too; hooray for small blessings. |
|
Whoa. The failure is coming from |
|
One failure was coming from |
|
Okay; the underlying failure is in Poly2TriTriangulator. I've found one obvious bug where it can get out of sync in parallel; still hunting for others. @lindsayad wrote an ironic three hours ago that "There may be some algorithms that require iterating in sync across processes for which you do have to have a deterministic ordering and I think we have those in the places we need them", but what he forgot is that I'm always writing new bugs to replace the old ones! |
|
Hahaha how timely. We have so many development intersections and cross-references these days. This from @olinwc included:
|
|
@olinwc - give it a try now? |
|
Since this is my personal Github I don't get notified on my work email... Trying this now! |
|
So good news, 4 cores: Serial: Will try MHTGR next |
|
MHTGR Serial: 4 Cores: No more hanging! |
|
Poly2TriTriangulator isn't parallelized; so with more cores we don't replace "do X" with "do a fourth of X", we replace it with "do X and then also do a bunch of communication to try to check that everybody did X the same way". That's pretty awful anti-scaling, though (unless it was a With The "a little longer the more cores that have to chat" behavior from 2 to 16 cores is about what I'd expect from sync on a replicated mesh, but that initial jump from 1 rank to 2 is crazy. And But I'm not seeing any correctness issues, and performance fixes can wait. I'll disable those tests for now and we can regold them with the libMesh submodule update this week. |
These aren't compatible with the fixed + improved exodiff behavior in libMesh/libmesh#3572 - we'll need to regold them this week when we do the next libMesh submodule update. Refs idaholab#20192
I say, right before I dive into the code to look for fixes. With a swath of low-hanging microoptimizations, I managed to make things faster by ... a fraction of a second? So that's not the way to go. But during the process I couldn't help but notice that gHPMR_2d_core_v33.i is triggering FOUR HUNDRED AND NINETEEN repartitionings, which for a mesh with no extrusions and no refinements is probably approximately four hundred eighteen too many. We've long had vague plans to make the |
|
That is truly brutal |
|
@roystgnr That parallel behavior was all running with I'm also on a 2013 Mac Pro running locally, so people with more up-to-date local machines, like the new M1 Macs, are a lot faster per core. |
It's useful to be able to test this without comparing pre-and-post orient()
I'm seeing one somehow, and I'd love to find it *before* I try marching a ray through it.
This isn't a *sufficient* fix for some of the triangulation refinement issues I've seen, but it's a *necessary* fix for sufficiently fine refinement or sufficiently complex geometries.
Anything with a while loop probably ought to be profiled...
This allows us to give in_circumcircle a (signed!) tolerance
Here it's more robust to retriangulate a wider cavity
We check later, too, but catching them earlier is easier to debug.
Our refinement algorithms assume that we only have to futz with a cavity at a time because the rest of the triangulation was already Delaunay. This was not true for everything Poly2Tri was returning to us, so let's fix up their triangulations manually.
I wasn't actually able to trigger any of these, but they're an obvious risk in cases with weird boundaries/holes.
This lets us do parallel verification of e.g. Point
These are useful when mesh generation gets out of sync.
I'm playing around with different container options for fixing parallel consistency.
|
Rebased on |
|
Job Coverage on 001fa48 wanted to post the following: Coverage
Warnings
This comment will be updated on new commits. |
||||||||||||||||||||||||||
|
That timeout on the |
This fixes some failure cases where triangulation refinement runs away creating sliver elements ... and it fixes more failure cases where it turns out we can't trust poly2tri to give us a true Delaunay triangulation.
The more careful behavior at boundaries will probably force us to regold tests in MOOSE. Here's hoping it doesn't force too much of that further downstream.
This fixes idaholab/moose#24474 for me.