Skip to content

Poly2TriTriangulator refinement fixes#3572

Merged
roystgnr merged 31 commits intolibMesh:develfrom
roystgnr:poly2tri_work
Jun 14, 2023
Merged

Poly2TriTriangulator refinement fixes#3572
roystgnr merged 31 commits intolibMesh:develfrom
roystgnr:poly2tri_work

Conversation

@roystgnr
Copy link
Copy Markdown
Member

This fixes some failure cases where triangulation refinement runs away creating sliver elements ... and it fixes more failure cases where it turns out we can't trust poly2tri to give us a true Delaunay triangulation.

The more careful behavior at boundaries will probably force us to regold tests in MOOSE. Here's hoping it doesn't force too much of that further downstream.

This fixes idaholab/moose#24474 for me.

@roystgnr roystgnr changed the title Poly2tri work Poly2TriTriangulator refinement fixes May 31, 2023
@roystgnr
Copy link
Copy Markdown
Member Author

The compiler warnings are easy to fix, but looks like we have at least three regoldings coming in MOOSE.

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 7, 2023

I'm pulling this branch to work on some of my test cases, will report the results once I have them

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 7, 2023

@roystgnr Devel mode reactor compiled with this branch I was able to mesh my gHPMR_2d_core_v33.i without any assertions or errors.

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 7, 2023

Unfortunately my MHTGR mesh seems to error out, but it does that with the current version of moose/libmesh, so I need to troubleshoot that.

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 7, 2023

Ok I was able to run my MHTGR mesh with this branch also. Error was being caused by a boundary I defined being renamed during stitching and me trying to use the old name. Works fine now.

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 8, 2023

@roystgnr Decided to run gHPMR_2d_core_v33.i in parallel and it just hangs or is just taking a lot longer, in serial it runs in 15 seconds, with four processors I've been waiting for several minutes just getting

Setting Up....................

@roystgnr
Copy link
Copy Markdown
Member Author

roystgnr commented Jun 8, 2023

In serial even in devel mode it runs in 9 seconds for me, on 4 procs I get segfaults or assertion failures (boy I love parallel race conditions!) after more like 40 seconds.

Looking into it now.

@roystgnr
Copy link
Copy Markdown
Member Author

roystgnr commented Jun 8, 2023

2 procs is enough to trigger failure too; hooray for small blessings.

@roystgnr
Copy link
Copy Markdown
Member Author

roystgnr commented Jun 8, 2023

Whoa. The failure is coming from LaplaceMeshSmoother??

@roystgnr
Copy link
Copy Markdown
Member Author

roystgnr commented Jun 8, 2023

One failure was coming from LaplaceMeshSmoother. If I turn off smoothing in XYDelaunay then I see assertion failures from ... Partitioner::partition_unpartitioned_elements? Weirder and weirder. There's got to be some underlying failure elsewhere.

@roystgnr
Copy link
Copy Markdown
Member Author

roystgnr commented Jun 8, 2023

Okay; the underlying failure is in Poly2TriTriangulator. I've found one obvious bug where it can get out of sync in parallel; still hunting for others.

@lindsayad wrote an ironic three hours ago that "There may be some algorithms that require iterating in sync across processes for which you do have to have a deterministic ordering and I think we have those in the places we need them", but what he forgot is that I'm always writing new bugs to replace the old ones!

@lindsayad
Copy link
Copy Markdown
Member

Hahaha how timely. We have so many development intersections and cross-references these days. This from @olinwc included:

Error was being caused by a boundary I defined being renamed during stitching and me trying to use the old name.

@roystgnr
Copy link
Copy Markdown
Member Author

roystgnr commented Jun 8, 2023

@olinwc - give it a try now?

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 9, 2023

Since this is my personal Github I don't get notified on my work email... Trying this now!

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 10, 2023

So good news, gHPMR_2d_core_v33.i did run on 4 cores, it was slower, but it's also a small problem so that might just be expected

4 cores:

Finished Setting Up                                                                      [ 75.41 s] [  521 MB]

Serial:

Finished Setting Up                                                                      [ 14.61 s] [  210 MB]

Will try MHTGR next

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 10, 2023

MHTGR

Serial:

Finished Setting Up                                    [143.10 s] [ 1814 MB]

4 Cores:

Finished Setting Up                                    [790.19 s] [ 3770 MB]

No more hanging!

@roystgnr
Copy link
Copy Markdown
Member Author

Poly2TriTriangulator isn't parallelized; so with more cores we don't replace "do X" with "do a fourth of X", we replace it with "do X and then also do a bunch of communication to try to check that everybody did X the same way". That's pretty awful anti-scaling, though (unless it was a METHOD=dbg run?). Let me see what I get on v33.

With METHOD=opt, I see 7s on 1 rank, 37s on 2, 38 on 4, 40 on 8, 42 on 16 ... Damn.

The "a little longer the more cores that have to chat" behavior from 2 to 16 cores is about what I'd expect from sync on a replicated mesh, but that initial jump from 1 rank to 2 is crazy. And libMesh::PerfLog says the cost is roughly 95% in the prepare_for_use() after stitching, 40% via find_global_indices() and 40%+ in other MetisPartitioner-related activity? I'll bet that's something we can cut way down.

But I'm not seeing any correctness issues, and performance fixes can wait. I'll disable those tests for now and we can regold them with the libMesh submodule update this week.

roystgnr added a commit to roystgnr/moose that referenced this pull request Jun 12, 2023
These aren't compatible with the fixed + improved exodiff behavior in
libMesh/libmesh#3572 - we'll need to regold them
this week when we do the next libMesh submodule update.

Refs idaholab#20192
@roystgnr
Copy link
Copy Markdown
Member Author

and performance fixes can wait.

I say, right before I dive into the code to look for fixes.

With a swath of low-hanging microoptimizations, I managed to make things faster by ... a fraction of a second? So that's not the way to go.

But during the process I couldn't help but notice that gHPMR_2d_core_v33.i is triggering FOUR HUNDRED AND NINETEEN repartitionings, which for a mesh with no extrusions and no refinements is probably approximately four hundred eighteen too many. We've long had vague plans to make the is_prepared()/prepare_for_use() system more fine-grained, but this seems to be the perfect motivational and test case; it's triggering an order of magnitude more overkill than I was imagining.

@lindsayad
Copy link
Copy Markdown
Member

That is truly brutal

@olinwc
Copy link
Copy Markdown

olinwc commented Jun 13, 2023

@roystgnr That parallel behavior was all running with METHOD=devel

I'm also on a 2013 Mac Pro running locally, so people with more up-to-date local machines, like the new M1 Macs, are a lot faster per core.

roystgnr added 6 commits June 14, 2023 09:03
It's useful to be able to test this without comparing pre-and-post
orient()
I'm seeing one somehow, and I'd love to find it *before* I try marching
a ray through it.
This isn't a *sufficient* fix for some of the triangulation refinement
issues I've seen, but it's a *necessary* fix for sufficiently fine
refinement or sufficiently complex geometries.
roystgnr added 23 commits June 14, 2023 09:03
Anything with a while loop probably ought to be profiled...
This allows us to give in_circumcircle a (signed!) tolerance
Here it's more robust to retriangulate a wider cavity
We check later, too, but catching them earlier is easier to debug.
Our refinement algorithms assume that we only have to futz with a cavity
at a time because the rest of the triangulation was already Delaunay.
This was not true for everything Poly2Tri was returning to us, so let's
fix up their triangulations manually.
I wasn't actually able to trigger any of these, but they're an obvious
risk in cases with weird boundaries/holes.
This lets us do parallel verification of e.g. Point
These are useful when mesh generation gets out of sync.
I'm playing around with different container options for fixing parallel
consistency.
@roystgnr
Copy link
Copy Markdown
Member Author

Rebased on master, to kick off new CI now that the tests to regold are temporarily disabled.

@moosebuild
Copy link
Copy Markdown

Job Coverage on 001fa48 wanted to post the following:

Coverage

49a13c #3572 001fa4
Total Total +/- New
Rate 60.25% 60.27% +0.02% 86.71%
Hits 49426 49461 +35 137
Misses 32606 32605 -1 21

Diff coverage report

Full coverage report

Warnings

  • New new line coverage rate 86.71% is less than the suggested 90.0%

This comment will be updated on new commits.

@roystgnr
Copy link
Copy Markdown
Member Author

That timeout on the Test MOOSE debug modules worries me. Looked like the timing-out test was unrelated, so I just kicked it, but I'm also going to see if I get any weirdness in a local dbg modules make test.

@roystgnr roystgnr merged commit 4905c2e into libMesh:devel Jun 14, 2023
@roystgnr roystgnr deleted the poly2tri_work branch June 14, 2023 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mesh input runs in opt mode, but hits assertion in devel mode

5 participants