Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain about calling renumber when skipping renumbering #1666

Closed
wants to merge 2 commits into from

Conversation

roystgnr
Copy link
Member

In the long run we should perhaps rename that method to
"delete_disconnected_nodes_and_update_counts_and_maybe_renumber" or we
should actually break those into separate methods.

In theory this should just add some comments and remove a little redundancy.

In practice, something in #1621 is breaking Rattlesnake and this is the most likely of the very few remaining suspects.

@roystgnr
Copy link
Member Author

And now I get it. ReplicatedMesh::renumber_nodes_and_elements() doesn't really respect skip_renumber_nodes_and_elements - if there's been adaptive coarsening or other element deletion, it still renumbers as necessary to make the numbering contiguous again.

I'll try fixing that in a separate PR, and with luck #1658 was all we needed to support discontiguous numbering properly.

In the long run we should perhaps rename that method to
"delete_disconnected_nodes_and_update_counts_and_maybe_renumber" or we
should actually break those into separate methods.
@roystgnr
Copy link
Member Author

roystgnr commented Jul 6, 2018

So, this is baffling. Pinging @jwpeterson and anyone he knows who has exodiff experience.

Two XFEM tests now break here, so I was excited to have something more convenient than Rattlesnake to use for debugging. I replicated the failures, and for one I used ncdump to get at the differences between the gold file and the tested file...

And the only significant difference is the node_num_map array. When turning renumbering off actually turns renumbering off, then we end up with a gap in the node numbers in the tested file, just as we'd expect when nodes are being added and deleted. And if I renumber node_num_map, change nothing else, and ncgen a replacement exodus file, then exodiff is happy with it.

So how does the node matching in exodiff work? The error

exodiff: ERROR: Unable to match node 37 in first file with node in second file.

makes no sense when the second file has a node with the exact same location, exact same connectivity, but simply a different number! I could swear I've seen exodiff handle much more scrambled node numbering in the past!

@jwpeterson
Copy link
Member

We usually run exodiff with the -m option, which ignores the node numbering and just tries to compare the spatial locations of the nodes. The exodiffing test does not seem to be using -m, however:

/opt/civet/build_0/moose/framework/contrib/exodiff/exodiff  -F 1e-10 -t 5.5e-06  /opt/civet/build_0/moose/modules/xfem/test/tests/moving_interface/gold/moving_bimaterial_out.e /opt/civet/build_0/moose/modules/xfem/test/tests/moving_interface/moving_bimaterial_out.e

That seems to be because map = false is set in the test spec for the test in question.

@jwpeterson
Copy link
Member

So it's possible that test is still working fine. The nodes may have been getting inadvertently renumbered up to now; and so now that we are better enforcing allow_renumbering() == false the numbering has changed for the better?

@roystgnr
Copy link
Member Author

roystgnr commented Jul 6, 2018

Aha! I should ping @bwspenc and @jiangwen84, then, so we can ask whether that map setting is overzealous?

@roystgnr
Copy link
Member Author

roystgnr commented Jul 6, 2018

The nodes may have been getting inadvertently renumbered up to now; and so now that we are better enforcing allow_renumbering() == false the numbering has changed for the better?

Well, if by better you mean "actually always doing what our API says we'll do" then this is definitely a change for the better. There may be other users in the same depending-on-old-broken-behavior boat, though. Anyone using ReplicatedMesh and deleting (or coarsening away) nodes or elements might be affected.

@jwpeterson
Copy link
Member

jwpeterson commented Jul 6, 2018

Aha! I should ping @bwspenc and @jiangwen84, then, so we can ask whether that map setting is overzealous?

Looks like that test was added fairly recently in idaholab/moose@c520971; no comment about why map=false would have been specified, and it seems to me that it shouldn't be necessary to make sure the test is working...

@roystgnr
Copy link
Member Author

roystgnr commented Jul 6, 2018

Hmmm... if I remove map = false from xfem/test/tests/moving_interface/tests then I get an error sooner:

xfem/test:moving_interface.moving_interface: exodiff: ERROR: Two elements in file 2 have the same midpoint (within tolerance). 
xfem/test:moving_interface.moving_interface:    Local element  19 at (0.7, 0.1, 0) and 
xfem/test:moving_interface.moving_interface:    Local element 20 at (0.7, 0.1, 0) 
xfem/test:moving_interface.moving_interface:    No unique element mapping possible. 

So that would explain why map=false was set.

@jwpeterson
Copy link
Member

So that would explain why map=false was set.

Ah, yeah that would make sense for XFEM as I believe they end up with overlapping elements in the Mesh during the simulation and in the output. We could wait on this PR until the next libmesh update and regold the failing tests (assuming the only difference is the node numbering) at that time?

@jwpeterson
Copy link
Member

The Marmot and Rattlesnake failures on the other hand don't look like the same problem to me (the Marmot one has to do with PeriodicBoundaries?)

@roystgnr
Copy link
Member Author

roystgnr commented Jul 6, 2018

Hell, there's a Marmot failure too now?

Yeah, it looks like this will continue to be delayed.

@jwpeterson
Copy link
Member

Just FYI both tests seem to involve a SolutionUserObject in some way, one reads a solution from an XDA file and one reads from an Exodus file. It's possible a different mesh numbering is causing problems with that...

@bwspenc
Copy link
Contributor

bwspenc commented Jul 9, 2018

@roystgnr Did you get your question about the XFEM test resolved? I was out last week, and am just digging through the things that piled up. It looks like you figured this out already, but we use that map=false option in all of the XFEM tests because they all generate output files with overlapping elements.

@roystgnr
Copy link
Member Author

roystgnr commented Jul 9, 2018

Did you get your question about the XFEM test resolved?

I think so. The question of "why do we turn off exodiff remapping" is answered, at least.

I'm not entirely sure where to go from here, though.

I think the short-term question of "how do we avoid false positive test failures when remapping is off" will be answered by "keep renumbering disabled, force ReplicatedMesh to keep it identical, and regenerate a new gold standard after this PR gets merged", but that's hardly a sufficient long-term answer. Anything creating and destroying elements on a distributed mesh will end up with a partitioning-dependent element numbering, which will require remapping to compare against a gold standard with exodiff.

@roystgnr
Copy link
Member Author

roystgnr commented Jul 4, 2020

This was causing breakage and we've moved past it now.

@roystgnr roystgnr closed this Jul 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants