Use CheckpointIO For Mesh Splitting #7752

friedmud · 2016-09-19T21:37:37Z

Description of the enhancement or error report

Forget #7744 and #7745 . Screw libMesh/libmesh#1087 .

I'm over it.

We don't really need to use Nemesis for reading split meshes. What incentive do we have? All of the tools that used to create split Nemesis meshes have bit rot at this point.

Instead: let's just use our own format. We already have CheckpointIO... it just needs a few tweaks and then it should work.

Rationale for the enhancement or information for reproducing the error

We need a reliable method for creating split meshes and using them in simulations with DistributedMesh.

Identified impact

The ability to partition and run truly huge problems.

The text was updated successfully, but these errors were encountered:

…aholab#7752

friedmud · 2016-09-22T05:48:42Z

@idaholab/moose-developers take note!

I got this all working... and it's awesome. It depends on: libMesh/libmesh#1103

Here is the summary about how awesome:

This is for a problem with ~10M Hex8 elements and ~700M DoFs (but I'm not solving a linear system). These numbers are for running using 240 MPI proceses. What I did is just run up to the point where FEProblem::solve() is called for the first time. So, this is all the time before the first solve...

	Time	RAM/process
Exodus	402s	~6500 MB
Parallel Checkpoint	12s	~400 MB

Yep... those numbers are real. ~40x faster startup! ~17x less RAM!

Holy crap. This is a game changer for me...

(BTW: I also have numbers for 720 procs for Parallel Checkpoint: ~20s and ~250 MB RAM/process)

Here are the raw timing numbers...

Exodus:

 -------------------------------------------------------------------------------------------------------------------------
| Mocodile Performance: Alive time=402.223, Active time=393.72                                                            |
 -------------------------------------------------------------------------------------------------------------------------
| Event                                      nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                                       w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                         |
|                                                                                                                         |
| Application                                                                                                             |
|   Full Runtime                             1          0.8219      0.821885    393.7204    393.720447  0.21     100.00   |
|                                                                                                                         |
| Execution                                                                                                               |
|   computeAuxiliaryKernels()                2          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   computeControls()                        2          0.0000      0.000002    0.0000      0.000002    0.00     0.00     |
|   computeUserObjects()                     4          0.0001      0.000021    0.0001      0.000021    0.00     0.00     |
|                                                                                                                         |
| Output                                                                                                                  |
|   CSV::output()                            4          0.0115      0.002867    0.0115      0.002867    0.00     0.00     |
|                                                                                                                         |
| Setup                                                                                                                   |
|   Application Setup                        1          49.0642     49.064164   360.9090    360.909035  12.46    91.67    |
|   FEProblem::init::meshChanged()           1          31.2364     31.236431   31.2364     31.236431   7.93     7.93     |
|   Initial updateActiveSemiLocalNodeRange() 1          0.0284      0.028443    0.0284      0.028443    0.01     0.01     |
|   Initial updateGeomSearch()               2          0.0000      0.000002    0.0000      0.000002    0.00     0.00     |
|   NonlinearSystem::update()                1          0.0539      0.053939    0.0539      0.053939    0.01     0.01     |
|   Read Mesh                                1          174.6405    174.640493  174.6405    174.640493  44.36    44.36    |
|   eq.init()                                1          105.9140    105.914006  105.9140    105.914006  26.90    26.90    |
|   execMultiApps()                          1          0.0000      0.000005    0.0000      0.000005    0.00     0.00     |
|   execTransfers()                          1          0.0000      0.000002    0.0000      0.000002    0.00     0.00     |
|   initial adaptivity                       1          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   initialSetup()                           1          31.3767     31.376665   31.9780     31.977971   7.97     8.12     |
|   reinit() after updateGeomSearch()        1          0.0037      0.003739    0.0037      0.003739    0.00     0.00     |
|                                                                                                                         |
| Utility                                                                                                                 |
|   projectSolution()                        1          0.5691      0.569112    0.5691      0.569112    0.14     0.14     |
 -------------------------------------------------------------------------------------------------------------------------
| Totals:                                    27         393.7204                                        100.00            |
 -------------------------------------------------------------------------------------------------------------------------

Parallel Checkpoint:

 -------------------------------------------------------------------------------------------------------------------------
| Mocodile Performance: Alive time=11.7636, Active time=9.40401                                                           |
 -------------------------------------------------------------------------------------------------------------------------
| Event                                      nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                                       w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                         |
|                                                                                                                         |
| Application                                                                                                             |
|   Full Runtime                             1          0.0923      0.092292    9.4040      9.404009    0.98     100.00   |
|                                                                                                                         |
| Execution                                                                                                               |
|   computeAuxiliaryKernels()                2          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   computeControls()                        2          0.0000      0.000003    0.0000      0.000003    0.00     0.00     |
|   computeUserObjects()                     4          0.0001      0.000028    0.0001      0.000028    0.00     0.00     |
|                                                                                                                         |
| Output                                                                                                                  |
|   CSV::output()                            4          0.0345      0.008635    0.0345      0.008635    0.37     0.37     |
|                                                                                                                         |
| Setup                                                                                                                   |
|   Application Setup                        1          0.6168      0.616845    8.2994      8.299362    6.56     88.25    |
|   FEProblem::init::meshChanged()           1          0.1178      0.117847    0.1178      0.117847    1.25     1.25     |
|   Initial updateActiveSemiLocalNodeRange() 1          0.0284      0.028362    0.0284      0.028362    0.30     0.30     |
|   Initial updateGeomSearch()               2          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   NonlinearSystem::update()                1          0.0144      0.014432    0.0144      0.014432    0.15     0.15     |
|   Read Mesh                                1          2.9575      2.957462    2.9575      2.957462    31.45    31.45    |
|   eq.init()                                1          4.5928      4.592775    4.5928      4.592775    48.84    48.84    |
|   execMultiApps()                          1          0.0000      0.000005    0.0000      0.000005    0.00     0.00     |
|   execTransfers()                          1          0.0000      0.000002    0.0000      0.000002    0.00     0.00     |
|   initial adaptivity                       1          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   initialSetup()                           1          0.4741      0.474141    0.9777      0.977697    5.04     10.40    |
|   reinit() after updateGeomSearch()        1          0.0022      0.002178    0.0022      0.002178    0.02     0.02     |
|                                                                                                                         |
| Utility                                                                                                                 |
|   projectSolution()                        1          0.4730      0.473006    0.4730      0.473006    5.03     5.03     |
 -------------------------------------------------------------------------------------------------------------------------
| Totals:                                    27         9.4040                                          100.00            |
 -------------------------------------------------------------------------------------------------------------------------

permcody · 2016-09-22T14:51:09Z

On Wed, Sep 21, 2016 at 11:48 PM Derek Gaston notifications@github.com
wrote:

@idaholab/moose-developers
https://github.com/orgs/idaholab/teams/moose-developers take note!

I got this all working... and it's awesome. It depends on:
libMesh/libmesh#1103 libMesh/libmesh#1103

Here is the summary about how awesome:

This is for a problem with ~10M Hex8 elements and ~700M DoFs (but I'm not
solving a linear system). These numbers are for running using 240 MPI
proceses. What I did is just run up to the point where FEProblem::solve()
is called for the first time. So, this is all the time before the first
solve...
Time RAM/process
Exodus 402s ~6500 MB
Parallel Checkpoint 12s ~400 MB

Yep... those numbers are real. ~40x faster startup! ~17x less RAM!

I guess it depends on how you spin it. If you are using 240 processors each
one is working on ~1/240 of the original elements. This means that you have
a parallel efficiency of 16% :) Haha!

Before you start celebrating too soon, that RAM and startup savings isn't
all for free. When we used parallel mesh on those big 3D GrainTracker runs
last year, each time step of the simulation took about 40% longer than the
same step time on the Replicated Mesh. That's terrible! Maybe we had other
problems back then but I'll be curious to see the timings of the simulation
now.

@idaholab/moose-team

Holy crap. This is a game changer for me...

(BTW: I also have numbers for 720 procs for Parallel Checkpoint: ~20s and
~250 MB RAM/process)

Here are the raw timing numbers...

Exodus:

Mocodile Performance: Alive time=402.223, Active time=393.72

Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time

w/o Sub w/o Sub With Sub With Sub w/o S With S

-------------------------------------------------------------------------------------------------------------------------

Application

Full Runtime 1 0.8219 0.821885 393.7204 393.720447 0.21 100.00

Execution

computeAuxiliaryKernels() 2 0.0000 0.000001 0.0000 0.000001 0.00 0.00

computeControls() 2 0.0000 0.000002 0.0000 0.000002 0.00 0.00

computeUserObjects() 4 0.0001 0.000021 0.0001 0.000021 0.00 0.00

Output

CSV::output() 4 0.0115 0.002867 0.0115 0.002867 0.00 0.00

Setup

Application Setup 1 49.0642 49.064164 360.9090 360.909035 12.46 91.67

FEProblem::init::meshChanged() 1 31.2364 31.236431 31.2364 31.236431 7.93 7.93

Initial updateActiveSemiLocalNodeRange() 1 0.0284 0.028443 0.0284 0.028443 0.01 0.01

Initial updateGeomSearch() 2 0.0000 0.000002 0.0000 0.000002 0.00 0.00

NonlinearSystem::update() 1 0.0539 0.053939 0.0539 0.053939 0.01 0.01

Read Mesh 1 174.6405 174.640493 174.6405 174.640493 44.36 44.36

eq.init() 1 105.9140 105.914006 105.9140 105.914006 26.90 26.90

execMultiApps() 1 0.0000 0.000005 0.0000 0.000005 0.00 0.00

execTransfers() 1 0.0000 0.000002 0.0000 0.000002 0.00 0.00

initial adaptivity 1 0.0000 0.000001 0.0000 0.000001 0.00 0.00

initialSetup() 1 31.3767 31.376665 31.9780 31.977971 7.97 8.12

reinit() after updateGeomSearch() 1 0.0037 0.003739 0.0037 0.003739 0.00 0.00

Utility

projectSolution() 1 0.5691 0.569112 0.5691 0.569112 0.14 0.14

Totals: 27 393.7204 100.00

Parallel Checkpoint:

Mocodile Performance: Alive time=11.7636, Active time=9.40401

Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time

w/o Sub w/o Sub With Sub With Sub w/o S With S

-------------------------------------------------------------------------------------------------------------------------

Application

Full Runtime 1 0.0923 0.092292 9.4040 9.404009 0.98 100.00

Execution

computeAuxiliaryKernels() 2 0.0000 0.000001 0.0000 0.000001 0.00 0.00

computeControls() 2 0.0000 0.000003 0.0000 0.000003 0.00 0.00

computeUserObjects() 4 0.0001 0.000028 0.0001 0.000028 0.00 0.00

Output

CSV::output() 4 0.0345 0.008635 0.0345 0.008635 0.37 0.37

Setup

Application Setup 1 0.6168 0.616845 8.2994 8.299362 6.56 88.25

FEProblem::init::meshChanged() 1 0.1178 0.117847 0.1178 0.117847 1.25 1.25

Initial updateActiveSemiLocalNodeRange() 1 0.0284 0.028362 0.0284 0.028362 0.30 0.30

Initial updateGeomSearch() 2 0.0000 0.000001 0.0000 0.000001 0.00 0.00

NonlinearSystem::update() 1 0.0144 0.014432 0.0144 0.014432 0.15 0.15

Read Mesh 1 2.9575 2.957462 2.9575 2.957462 31.45 31.45

eq.init() 1 4.5928 4.592775 4.5928 4.592775 48.84 48.84

execMultiApps() 1 0.0000 0.000005 0.0000 0.000005 0.00 0.00

execTransfers() 1 0.0000 0.000002 0.0000 0.000002 0.00 0.00

initial adaptivity 1 0.0000 0.000001 0.0000 0.000001 0.00 0.00

initialSetup() 1 0.4741 0.474141 0.9777 0.977697 5.04 10.40

reinit() after updateGeomSearch() 1 0.0022 0.002178 0.0022 0.002178 0.02 0.02

Utility

projectSolution() 1 0.4730 0.473006 0.4730 0.473006 5.03 5.03

Totals: 27 9.4040 100.00

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5XIAKtP2C6RbJXxenpJf7Et5LYbXeNks5qsha7gaJpZM4KA_mm
.

YaqiWang · 2016-09-22T15:18:44Z

@friedmud How is this set up? In lots of our codes, we assumed the replicated mesh. I want to see how many test will fail if mesh is turned parallel. I really like this and want it.

fdkong · 2016-09-22T15:22:13Z

@YaqiWang, should be fine if you do not use mesh adaptivity much.

friedmud · 2016-09-22T15:33:52Z

@permcody true on the efficiency :-) but who cares? 700 million DoFs in 400MB. Anything under 1GB per process is awesome. I'm also happy to see it go down even more after spreading it out more.

There will always be some fixed size overhead... but, honestly, I'm amazed it's this low.

As for being slower with parallel Mesh... that's going to be problem dependent. For normal stuff there won't be any impact. For mesh adaptivity there is quite a bit of overhead. If you're wanting to output Exodus there will be a big overhead (the Mesh has to be serialized... and so does the solution vector). Also, contact stuff may be a bit slower.

Other than that it shouldn't impact solve speed. What were you doing with it?

For my application it runs EXACTLY the same speed with and without distributed Mesh. My algorithm is already domain decomposed... it doesn't matter if there are extra non-local elements there or not. (Already confirmed this last night)

BTW: I was doing this with the Mesh files on /scratch. It didn't bat an eye when I tried to simultaneously load ~1000 Mesh files from it.

Oh: this also makes threading even less useful. The only time threading used to be a good idea is when you were RAM limited... this removes some of the times you would be in that situation.

@YaqiWang I'm going to finalize the process for this in the next couple of days so you can try it out.

permcody · 2016-09-22T15:43:47Z

Yeah I was being facetious. I guess we're going to have to quit telling people to knock it off when they start using DistributedMesh now!

Oh: this also makes threading even less useful. The only time threading used to be a good idea is when you were RAM limited... this removes some of the times you would be in that situation.

Now you are starting to sound more like the PETSc guys. You don't need threading, you just sometimes need "shared memory". The new version of MPI allows that. Hierarchical communicators with shared memory support.

@YaqiWang - You can try out DistributedMesh by using the "parallel_type" option in the Mesh block. However that's only half of the battle. Derek is pre-splitting his meshes and reading them in already split. So there's really two steps here. All the pieces you need aren't merged yet so give us some time. There are still a few bugs (assertions) that we're seeing with DistributedMesh so you probably don't want to go nuts with it until we know everything is working properly.

friedmud · 2016-09-22T15:58:57Z

@permcody Maybe so!

I'll definitely lower my threshold for when you should go to Distributed Mesh.... it's now planted at around 1 Million elements...

One of the reasons why I've always warned people away from it is that we didn't have a reliable tool for pre-splitting the meshes. My new splitter is awesome. You can run with any number of MPI and write out partitions for any number of processors (like use 10 MPI processes to write out files for 1000 processes). That makes a huge difference over our old tools.

Also: since it's using our existing partitioner infrastructure we can use any of our partitioners with it easily (or make our own).

Where do you think I should put the splitter? Should it be in our contrib? Should it be one of those libMesh executables that always gets built with libMesh? Should there not be a separate executable... and maybe a command-line option to any MOOSE-based executable should automatically invoke the splitting? (I kind of like that last one)

I think I'll try to implement that last one and we can see what it looks like...

roystgnr · 2016-09-22T16:09:43Z

I'd love it as src/apps/meshsplit.C in libMesh.

permcody · 2016-09-22T16:12:44Z

I agree with Roy, Let's put it in libMesh.

On Thu, Sep 22, 2016 at 10:09 AM roystgnr notifications@github.com wrote:

I'd love it as src/apps/meshsplit.C in libMesh.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5XIA3FKk6Lkgqg-IT6Y-xQY69iVfRKks5qsqhIgaJpZM4KA_mm
.

friedmud · 2016-09-22T17:14:26Z

Ok - I may do both ;-)

friedmud · 2016-09-23T15:30:46Z

@permcody we were WAY off on the parallel efficiency. You can't compare a ReplicatedMesh run at 240 procs to a DistributedMesh run at 240 procs.... you need to compare both to a truly serial run...

I just did that. On one processor this problem takes ~85,000 MB of RAM (written that way for easy computation)...

So... to add to the table I had earlier... here is the memory scaling efficiency:

	Time	RAM/process	Memory Efficiency
Exodus	402s	~6500 MB	5%
Parallel Checkpoint	12s	~400 MB	89%

"Perfect" memory scaling would have been 354 MB. The 400MB number was kind of an "eyeball" average anyway... some of the processes were less than that... and a few were more (just looking at the head node for the job)

At any rate... our memory scaling efficiency is actually REALLY good. I'm going to be generating plots of this for my upcoming paper which will make it easier to see.

permcody · 2016-09-23T15:39:55Z

This is very interesting. I was just messing with you and I was only referring to timing. The memory scaling is actually more important here anyway.

So for the ReplicatedMesh run, I'm having a hard time understanding why one one process would require 85GB of RAM on the mesh but drop to 6.5GB of RAM per process when we are reading it in on 240 procs. If it's replicated shouldn't it be almost equal per process. We aren't considering the EQ objects or anything else right? Is this just the memory required to hold the mesh? What tool are you using to measure memory usage?

friedmud · 2016-09-23T16:16:10Z

No - all of these numbers are the total memory for the process (which is
what I care about).
On Fri, Sep 23, 2016 at 11:39 AM Cody Permann notifications@github.com
wrote:

This is very interesting. I was just messing with you and I was only
referring to timing. The memory scaling is actually more important here
anyway.

So for the ReplicatedMesh run, I'm having a hard time understanding why
one one process would require 85GB of RAM on the mesh but drop to 6.5GB of
RAM per process when we are reading it in on 240 procs. If it's replicated
shouldn't it be almost equal per process. We aren't considering the EQ
objects or anything else right? Is this just the memory required to hold
the mesh? What tool are you using to measure memory usage?

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA1JMfuaBfApZsyIgSTVM4J5G8wOT2oUks5qs_LLgaJpZM4KA_mm
.

permcody · 2016-09-23T17:05:49Z

Oh, Ok. Since all of the pieces of the EQs are distributed, that would explain the huge difference between the parallel and serial case. From a frameworks perspective, we really should care about both. This helps gives us more information about when DistributedMesh should be used versus Replicated.

friedmud · 2016-09-23T17:21:03Z

I think in this case you can basically look at it like there is a 6GB overhead for using ReplicatedMesh when running on 240 processors.

That's a lot of overhead!

YaqiWang · 2016-09-23T17:45:48Z

I guess there are cases ReplicatedMesh is necessary. I do want to make DistributedMesh as the default though.

permcody · 2016-09-23T18:32:34Z

No - not the default, at least not universally. When running small meshes
or small numbers of processors ReplicatedMesh should still be faster and
just more robust. We'll start by possibly making it the default for larger
problems if anything.

On Fri, Sep 23, 2016 at 11:45 AM Yaqi notifications@github.com wrote:

I guess there are cases ReplicatedMesh is necessary. I do want to make
DistributedMesh as the default though.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5XIEceLGzCoeHCemBI0cFBCE2KeBtnks5qtBBNgaJpZM4KA_mm
.

friedmud · 2016-09-24T18:56:47Z

Agreed - definitely don't want Distributed to be the default. In particular... it requires extra steps to split your mesh before running. That's just unnecessary for 90% of the runs out with MOOSE.

permcody · 2016-09-24T20:06:27Z

It's possible to use distributed mesh if you don't split first. You'll lose
the faster startup but the nonlocal elements are deleted saving you memory
after startup.
On Sat, Sep 24, 2016 at 12:56 PM Derek Gaston notifications@github.com
wrote:

Agreed - definitely don't want Distributed to be the default. In
particular... it requires extra steps to split your mesh before running.
That's just unnecessary for 90% of the runs out with MOOSE.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5XIADI1DPT4QzAfs_wLeweuryLs7JOks5qtXJvgaJpZM4KA_mm
.

friedmud · 2016-09-24T21:01:21Z

Depends on your problem. For most of my problems the initial memory spike kills my runs...

I really think the utility of DistributedMesh is only really unlocked if you split first.

permcody · 2016-09-24T21:14:49Z

Really? I'm running 180^3 without any problems in ReplicatedMesh mode. Are you running that much bigger meshes? I think the Mesh structure is around 5-6GB per process at that size.

Maybe down the road, we should think about a live-splitter mechanism. It should be feasible to create a sub-communicator using maybe 10% of the total procs to read and split the mesh. Then we could launch the run all online. Perhaps overkill, I don't know. With your new utility maybe it's not so much of a pain but if you are running on the cluster, having to schedule two jobs to split then run is not always convenient.

friedmud · 2016-09-24T21:19:21Z

At 6GB you already can't utilize every core in a node. That's what I'm talking about. Yes, I can run my problems... but I waste a lot of cores. It's a pretty big issue when you're trying to run ~10k cores (like one of the jobs I currently have queued). If I needed to reserve 15k-20k cores to actually use 10k... that sucks. With pre-splitting and DistributedMesh that's not the case.

friedmud · 2016-09-25T02:53:25Z

@roystgnr I'm running into something here.

I can't seem to be able to Partition the same Mesh object twice... do you think that should work?

It runs into this assert: https://github.com/libMesh/libmesh/blob/master/src/mesh/mesh_communication_global_indices.C#L785

The issue seems to be because I'm partitioning for more processors than I'm running on. The weird thing is that it works the first time... only on the second time does it hit that assert. I don't understand either thing: why it should work the first time... and/or why it doesn't the second time.

Is there something I should do to "clear" the mesh before re-partitioning it.

BTW: My purpose here is to try to avoid reading the same mesh multiple times (as that can take many minutes)... but be able to write out partitionings for multiple numbers of processors. i.e. read once but output partitionings for 24, 48, 96, etc. processors.

friedmud · 2016-09-25T20:38:30Z

@roystgnr I figured out a workaround... I just "reset" the partitioning before each partitioning by partitioning it for one processor. Like this:

for (auto n_procs : all_n_procs)
{
  partitioner.partition(mesh, 1);
  partitioner.partition(mesh, n_procs);
  .....
}

The nice thing about resetting it to 1 processor is that that operation is fast (there is a branch specifically for it in MetisPartitioner.

So... kind of ugly... but still much faster than needing to do a separate run / read of the mesh to create multiple partitionings.

…aholab#7752

permcody · 2017-02-02T18:50:12Z

Tagging @friedmud, @roystgnr. I'm assigning this ticket to the A&M Milestone. When Derek gets his PR up there so that we can use Checkpoint Meshes into MOOSE. We can close this ticket.

…aholab#7752

roystgnr · 2017-08-15T23:03:44Z

So, I'm pretty sure I have this working perfectly now, but I don't know how to properly test that it's working perfectly. Ideally we want to have a test which:

Runs splitter in serial on a serial mesh
Runs a simple MOOSE app in parallel on the resulting split checkpoint mesh

We can't exactly add such a test to libMesh since moose apps aren't available there, though I suppose we could just run a libMesh example code instead. Can we do this easily in the MOOSE test harness? The "RunCommand" tester looks like it's flexible enough to call a python script, which could in turn run the splitter-mooseapp-exodiff sequence we'd like, but the only example I can see of that tester is with command='echo Hello World', which is less impressive.

permcody · 2017-08-16T19:00:26Z

Yes, you can absolutely run an arbitrary command. The Tester interface is basically two parts:

Return a command that will be scheduled to run asynchronously and in possibly in parallel with other Testers.
Define a "post command" to determine success or failure using the results of the command in the first step. You are given the output (stdout + stderr, or all even all streams from all processors, along with any output files you want to open in the working directory).

That should be everything we need to describe the test. However, I suspect that you really won't need the second part as you'll define success or failure in the script itself and return the appropriate exit code (which we also look at).

@friedmud

Based heavily on @friedmud's test in #8472; adds test coverage for issue #7752.

@friedmud

Based heavily on @friedmud's test in #8472; adds test coverage for issue #7752, for both ASCII and binary CheckpointIO pre-split mesh reads.

Summary of changes in this update: Remove 'old' Makefiles from contrib and unit test directories. ASCII option, naming fixes for apps/splitter Small update to RBEIMConstruction Put sideset/nodeset names in CheckpointIO header Checkpoint N->M restart fixes Check for exact file extensions in NameBasedIO::is_parallel_file_format() Update TetGenIO code for reading .ele files. allgather(vector<string>) overload Store integer data type in CheckpointIO files This is necessary to support idaholab#9782 in service of idaholab#9700, and to add more robustness to support of idaholab#7752

@friedmud

Based heavily on @friedmud's test in idaholab#8472; adds test coverage for issue idaholab#7752, for both ASCII and binary CheckpointIO pre-split mesh reads.

Summary of changes in this update: Remove 'old' Makefiles from contrib and unit test directories. ASCII option, naming fixes for apps/splitter Small update to RBEIMConstruction Put sideset/nodeset names in CheckpointIO header Checkpoint N->M restart fixes Check for exact file extensions in NameBasedIO::is_parallel_file_format() Update TetGenIO code for reading .ele files. allgather(vector<string>) overload Store integer data type in CheckpointIO files This is necessary to support idaholab#9782 in service of idaholab#9700, and to add more robustness to support of idaholab#7752

permcody · 2017-10-30T23:13:51Z

Closing this issue, as it has been completed.

friedmud added C: Framework T: task An enhancement to the software. P: normal A defect affecting operation with a low possibility of significantly affects. C: libMesh labels Sep 19, 2016

friedmud self-assigned this Sep 19, 2016

friedmud added a commit to friedmud/moose that referenced this issue Sep 19, 2016

Use Checkpoint files to read split meshes for DistributedMesh refs id…

6e08eef

…aholab#7752

friedmud mentioned this issue Sep 19, 2016

Checkpoint splitter libMesh/libmesh#1103

Closed

friedmud added a commit to friedmud/moose that referenced this issue Sep 22, 2016

Fixup reading parallel CheckpointIO files refs idaholab#7752

1c0fe5d

friedmud added a commit to friedmud/moose that referenced this issue Nov 15, 2016

Use Checkpoint files to read split meshes for DistributedMesh refs id…

109b33d

…aholab#7752

friedmud added a commit to friedmud/moose that referenced this issue Nov 15, 2016

Fixup reading parallel CheckpointIO files refs idaholab#7752

d924905

permcody added this to the Texas A&M Tiger Team milestone Feb 2, 2017

friedmud added a commit to friedmud/moose that referenced this issue Feb 3, 2017

Use Checkpoint files to read split meshes for DistributedMesh refs id…

1f6bce4

…aholab#7752

friedmud added a commit to friedmud/moose that referenced this issue Feb 3, 2017

Fixup reading parallel CheckpointIO files refs idaholab#7752

ccb9601

friedmud added a commit to friedmud/moose that referenced this issue Feb 3, 2017

Add ability to read in pre_split Checkpoint files idaholab#7752

5e109eb

friedmud added a commit to friedmud/moose that referenced this issue Feb 5, 2017

Add ability to read in pre_split Checkpoint files idaholab#7752

6671dbf

friedmud added a commit to friedmud/moose that referenced this issue Feb 5, 2017

Add ability to read in pre_split Checkpoint files idaholab#7752

5a0a0f8

permcody pushed a commit to permcody/moose that referenced this issue Feb 15, 2017

Use Checkpoint files to read split meshes for DistributedMesh refs id…

8c0d3dc

…aholab#7752

permcody pushed a commit to permcody/moose that referenced this issue Feb 15, 2017

Fixup reading parallel CheckpointIO files refs idaholab#7752

08f246d

permcody pushed a commit to permcody/moose that referenced this issue Feb 15, 2017

Add ability to read in pre_split Checkpoint files idaholab#7752

1c19d7c

permcody pushed a commit to permcody/moose that referenced this issue Jun 12, 2017

Use Checkpoint files to read split meshes for DistributedMesh refs id…

e39a20c

…aholab#7752

permcody pushed a commit to permcody/moose that referenced this issue Jun 12, 2017

Fixup reading parallel CheckpointIO files refs idaholab#7752

660794f

permcody pushed a commit to permcody/moose that referenced this issue Jun 12, 2017

Add ability to read in pre_split Checkpoint files idaholab#7752

7efa71f

permcody modified the milestones: Tiger Team Tasks, Texas A&M Tiger Team Aug 14, 2017

roystgnr mentioned this issue Aug 15, 2017

ASCII option, naming fixes for apps/splitter libMesh/libmesh#1404

Merged

roystgnr added a commit that referenced this issue Aug 17, 2017

Add test for checkpoint mesh reads

20cf171

Based heavily on @friedmud's test in #8472; adds test coverage for issue #7752.

roystgnr added a commit that referenced this issue Aug 17, 2017

Add tests for checkpoint mesh reads

776b180

Based heavily on @friedmud's test in #8472; adds test coverage for issue #7752, for both ASCII and binary CheckpointIO pre-split mesh reads.

roystgnr mentioned this issue Aug 17, 2017

Checkpoint input test coverage #9706

Merged

jarons pushed a commit to jarons/moose that referenced this issue Oct 5, 2017

Add tests for checkpoint mesh reads

8d41055

Based heavily on @friedmud's test in idaholab#8472; adds test coverage for issue idaholab#7752, for both ASCII and binary CheckpointIO pre-split mesh reads.

permcody closed this as completed Nov 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CheckpointIO For Mesh Splitting #7752

Use CheckpointIO For Mesh Splitting #7752

friedmud commented Sep 19, 2016 •

edited

friedmud commented Sep 22, 2016

permcody commented Sep 22, 2016 •

edited

YaqiWang commented Sep 22, 2016

fdkong commented Sep 22, 2016

friedmud commented Sep 22, 2016

permcody commented Sep 22, 2016 •

edited

friedmud commented Sep 22, 2016

roystgnr commented Sep 22, 2016

permcody commented Sep 22, 2016

friedmud commented Sep 22, 2016

friedmud commented Sep 23, 2016

permcody commented Sep 23, 2016

friedmud commented Sep 23, 2016

permcody commented Sep 23, 2016

friedmud commented Sep 23, 2016

YaqiWang commented Sep 23, 2016

permcody commented Sep 23, 2016

friedmud commented Sep 24, 2016

permcody commented Sep 24, 2016

friedmud commented Sep 24, 2016

permcody commented Sep 24, 2016

friedmud commented Sep 24, 2016 •

edited

friedmud commented Sep 25, 2016

friedmud commented Sep 25, 2016 •

edited

permcody commented Feb 2, 2017

roystgnr commented Aug 15, 2017

permcody commented Aug 16, 2017

permcody commented Oct 30, 2017

Use CheckpointIO For Mesh Splitting #7752

Use CheckpointIO For Mesh Splitting #7752

Comments

friedmud commented Sep 19, 2016 • edited

Description of the enhancement or error report

Rationale for the enhancement or information for reproducing the error

Identified impact

friedmud commented Sep 22, 2016

permcody commented Sep 22, 2016 • edited

YaqiWang commented Sep 22, 2016

fdkong commented Sep 22, 2016

friedmud commented Sep 22, 2016

permcody commented Sep 22, 2016 • edited

friedmud commented Sep 22, 2016

roystgnr commented Sep 22, 2016

permcody commented Sep 22, 2016

friedmud commented Sep 22, 2016

friedmud commented Sep 23, 2016

permcody commented Sep 23, 2016

friedmud commented Sep 23, 2016

permcody commented Sep 23, 2016

friedmud commented Sep 23, 2016

YaqiWang commented Sep 23, 2016

permcody commented Sep 23, 2016

friedmud commented Sep 24, 2016

permcody commented Sep 24, 2016

friedmud commented Sep 24, 2016

permcody commented Sep 24, 2016

friedmud commented Sep 24, 2016 • edited

friedmud commented Sep 25, 2016

friedmud commented Sep 25, 2016 • edited

permcody commented Feb 2, 2017

roystgnr commented Aug 15, 2017

permcody commented Aug 16, 2017

permcody commented Oct 30, 2017

friedmud commented Sep 19, 2016 •

edited

permcody commented Sep 22, 2016 •

edited

permcody commented Sep 22, 2016 •

edited

friedmud commented Sep 24, 2016 •

edited

friedmud commented Sep 25, 2016 •

edited