Checkpoint splitter #1103

friedmud · 2016-09-19T21:45:49Z

Forget Nemesis reading. It's too damn complicated.

We can do this ourselves. DistributedMesh is actually really awesome at reinitializing itself... it barely needs anything. Just the local elements and the nodes they're connected to... along with BC info. Done.

That said... there is still a small issue with this PR that must be resolved before it can be merged. It currently throws an assert() in devel/dbg mode if you leave mesh partitioning on when you prepare the newly read-in mesh for use.

You can see the way I'm needing to do the reading over here:

friedmud/moose@6e08eef

@jwpeterson @roystgnr could I get just the tiniest bit of help tracking that one down?

To easily create meshes that are split using this new capability use this branch of my simple_libmesh_app:

https://github.com/friedmud/simple_libmesh_app/tree/splitter

refs idaholab/moose#7752 , idaholab/moose#7744 , idaholab/moose#7745 , #1087 , #1086

… a splitting format

roystgnr · 2016-09-19T22:04:11Z

include/mesh/checkpoint_io.h

+   */
+  const processor_id_type & current_n_processors() const { return _my_n_processors; }
+  processor_id_type & current_n_processors() { return _my_n_processors; }
+


These aren't redundant with the ParallelObject APIs because during reading they come from the file rather than the CheckpointIO object?

If I'm understanding that correctly, could you make it clearer in the comments?

Actually - those are for m->n splitting. Like run with m MPI processes... but create output files for n MPI processes.

To do that a code can loop over and set current_n_processors() and current_processor_id() and call write()... and the file that comes out will be for that situation.

I'll update the comment

See here: https://github.com/friedmud/simple_libmesh_app/blob/9062a3b0fb1d9ae7e75958e6270c15c0edb6b3f8/src/main.C#L77

moosebuild · 2016-09-20T13:26:43Z

Job Test debug:linux-gnu on 771fe81 : invalidated by @friedmud

Rerun bcs/periodic.testlevel1

permcody · 2016-09-20T14:43:30Z

Hmm - that same test timed out yesterday as well. I don't recall it ever timing out before...

friedmud · 2016-09-20T15:23:03Z

@permcody It looks like it passed this time. It definitely doesn't have anything to do with this PR... so I don't know what's up.

jwpeterson · 2016-09-20T15:36:22Z

src/mesh/parallel_mesh.C

@@ -1349,14 +1349,14 @@ void DistributedMesh::delete_remote_elements()
 {
 #ifdef DEBUG
  // Make sure our neighbor links are all fine
-  MeshTools::libmesh_assert_valid_neighbors(*this);
+  //MeshTools::libmesh_assert_valid_neighbors(*this);


@roystgnr can we just delete these asserts altogether if they make the code too slow in DEBUG mode? Or maybe add some kind of "extra_pararnoid_slow" debugging flag that people can enable. I don't think they should just be commented out...

I definitely don't want to delete them entirely. "This assert would pass here" is very useful information to leave in the code even if we don't compile it.

Personally, I thought DEBUG was the extra_paranoid_slow flag. We've already got -O0 and GLIBCXX_DEBUG* active in dbg mode too; it's basically as unoptimized as it gets.

Sorry guys: I didn't mean to leave these changes in here. Those just slipped through. I'll revert those and force-push this.

BTW: As you can see with my newest commit... I'm still working on this a bit. One of my problems still won't run using this capability. Still hunting down why.

Only write out boundary info for truly local objects. Need to write out all point neighbors of local elements (and the nodes that connect to them)

friedmud · 2016-09-27T16:24:42Z

Closing this in favor of #1106

friedmud added 3 commits September 19, 2016 17:39

Rework CheckpointIO a bit to make it more conducive toward working as…

619063b

… a splitting format

Add the ability to set the processor_id and n_procs for CheckpointIO

b19ed3a

Fixup CheckpointIO again to make it work properly for serial

771fe81

roystgnr reviewed Sep 19, 2016

View reviewed changes

jwpeterson reviewed Sep 20, 2016

View reviewed changes

friedmud and others added 2 commits September 20, 2016 15:19

More fixes for CheckpointIO parallel writing

ca44a0b

Only write out boundary info for truly local objects. Need to write out all point neighbors of local elements (and the nodes that connect to them)

Write out elements connected to local elements

aa11216

friedmud mentioned this pull request Sep 22, 2016

Use CheckpointIO For Mesh Splitting idaholab/moose#7752

Closed

Some optimization of Parallel CheckpointIO writing

1db9888

friedmud closed this Sep 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint splitter #1103

Checkpoint splitter #1103

friedmud commented Sep 19, 2016

roystgnr Sep 19, 2016

friedmud Sep 19, 2016

friedmud Sep 19, 2016

moosebuild commented Sep 20, 2016

permcody commented Sep 20, 2016

friedmud commented Sep 20, 2016

jwpeterson Sep 20, 2016

roystgnr Sep 20, 2016

friedmud Sep 21, 2016

friedmud commented Sep 27, 2016

Checkpoint splitter #1103

Checkpoint splitter #1103

Conversation

friedmud commented Sep 19, 2016

roystgnr Sep 19, 2016

Choose a reason for hiding this comment

friedmud Sep 19, 2016

Choose a reason for hiding this comment

friedmud Sep 19, 2016

Choose a reason for hiding this comment

moosebuild commented Sep 20, 2016

permcody commented Sep 20, 2016

friedmud commented Sep 20, 2016

jwpeterson Sep 20, 2016

Choose a reason for hiding this comment

roystgnr Sep 20, 2016

Choose a reason for hiding this comment

friedmud Sep 21, 2016

Choose a reason for hiding this comment

friedmud commented Sep 27, 2016