Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CheckpointIO For Mesh Splitting #7752

Closed
friedmud opened this issue Sep 19, 2016 · 30 comments
Closed

Use CheckpointIO For Mesh Splitting #7752

friedmud opened this issue Sep 19, 2016 · 30 comments
Assignees
Labels
C: Framework C: libMesh P: normal A defect affecting operation with a low possibility of significantly affects. T: task An enhancement to the software.

Comments

@friedmud
Copy link
Contributor

friedmud commented Sep 19, 2016

Description of the enhancement or error report

Forget #7744 and #7745 . Screw libMesh/libmesh#1087 .

I'm over it.

We don't really need to use Nemesis for reading split meshes. What incentive do we have? All of the tools that used to create split Nemesis meshes have bit rot at this point.

Instead: let's just use our own format. We already have CheckpointIO... it just needs a few tweaks and then it should work.

Rationale for the enhancement or information for reproducing the error

We need a reliable method for creating split meshes and using them in simulations with DistributedMesh.

Identified impact

The ability to partition and run truly huge problems.

@friedmud friedmud added C: Framework T: task An enhancement to the software. P: normal A defect affecting operation with a low possibility of significantly affects. C: libMesh labels Sep 19, 2016
@friedmud friedmud self-assigned this Sep 19, 2016
friedmud added a commit to friedmud/moose that referenced this issue Sep 22, 2016
@friedmud
Copy link
Contributor Author

@idaholab/moose-developers take note!

I got this all working... and it's awesome. It depends on: libMesh/libmesh#1103

Here is the summary about how awesome:

This is for a problem with ~10M Hex8 elements and ~700M DoFs (but I'm not solving a linear system). These numbers are for running using 240 MPI proceses. What I did is just run up to the point where FEProblem::solve() is called for the first time. So, this is all the time before the first solve...

Time RAM/process
Exodus 402s ~6500 MB
Parallel Checkpoint 12s ~400 MB

Yep... those numbers are real. ~40x faster startup! ~17x less RAM!

Holy crap. This is a game changer for me...

(BTW: I also have numbers for 720 procs for Parallel Checkpoint: ~20s and ~250 MB RAM/process)

Here are the raw timing numbers...

Exodus:

 -------------------------------------------------------------------------------------------------------------------------
| Mocodile Performance: Alive time=402.223, Active time=393.72                                                            |
 -------------------------------------------------------------------------------------------------------------------------
| Event                                      nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                                       w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                         |
|                                                                                                                         |
| Application                                                                                                             |
|   Full Runtime                             1          0.8219      0.821885    393.7204    393.720447  0.21     100.00   |
|                                                                                                                         |
| Execution                                                                                                               |
|   computeAuxiliaryKernels()                2          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   computeControls()                        2          0.0000      0.000002    0.0000      0.000002    0.00     0.00     |
|   computeUserObjects()                     4          0.0001      0.000021    0.0001      0.000021    0.00     0.00     |
|                                                                                                                         |
| Output                                                                                                                  |
|   CSV::output()                            4          0.0115      0.002867    0.0115      0.002867    0.00     0.00     |
|                                                                                                                         |
| Setup                                                                                                                   |
|   Application Setup                        1          49.0642     49.064164   360.9090    360.909035  12.46    91.67    |
|   FEProblem::init::meshChanged()           1          31.2364     31.236431   31.2364     31.236431   7.93     7.93     |
|   Initial updateActiveSemiLocalNodeRange() 1          0.0284      0.028443    0.0284      0.028443    0.01     0.01     |
|   Initial updateGeomSearch()               2          0.0000      0.000002    0.0000      0.000002    0.00     0.00     |
|   NonlinearSystem::update()                1          0.0539      0.053939    0.0539      0.053939    0.01     0.01     |
|   Read Mesh                                1          174.6405    174.640493  174.6405    174.640493  44.36    44.36    |
|   eq.init()                                1          105.9140    105.914006  105.9140    105.914006  26.90    26.90    |
|   execMultiApps()                          1          0.0000      0.000005    0.0000      0.000005    0.00     0.00     |
|   execTransfers()                          1          0.0000      0.000002    0.0000      0.000002    0.00     0.00     |
|   initial adaptivity                       1          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   initialSetup()                           1          31.3767     31.376665   31.9780     31.977971   7.97     8.12     |
|   reinit() after updateGeomSearch()        1          0.0037      0.003739    0.0037      0.003739    0.00     0.00     |
|                                                                                                                         |
| Utility                                                                                                                 |
|   projectSolution()                        1          0.5691      0.569112    0.5691      0.569112    0.14     0.14     |
 -------------------------------------------------------------------------------------------------------------------------
| Totals:                                    27         393.7204                                        100.00            |
 -------------------------------------------------------------------------------------------------------------------------

Parallel Checkpoint:

 -------------------------------------------------------------------------------------------------------------------------
| Mocodile Performance: Alive time=11.7636, Active time=9.40401                                                           |
 -------------------------------------------------------------------------------------------------------------------------
| Event                                      nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                                       w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|-------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                         |
|                                                                                                                         |
| Application                                                                                                             |
|   Full Runtime                             1          0.0923      0.092292    9.4040      9.404009    0.98     100.00   |
|                                                                                                                         |
| Execution                                                                                                               |
|   computeAuxiliaryKernels()                2          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   computeControls()                        2          0.0000      0.000003    0.0000      0.000003    0.00     0.00     |
|   computeUserObjects()                     4          0.0001      0.000028    0.0001      0.000028    0.00     0.00     |
|                                                                                                                         |
| Output                                                                                                                  |
|   CSV::output()                            4          0.0345      0.008635    0.0345      0.008635    0.37     0.37     |
|                                                                                                                         |
| Setup                                                                                                                   |
|   Application Setup                        1          0.6168      0.616845    8.2994      8.299362    6.56     88.25    |
|   FEProblem::init::meshChanged()           1          0.1178      0.117847    0.1178      0.117847    1.25     1.25     |
|   Initial updateActiveSemiLocalNodeRange() 1          0.0284      0.028362    0.0284      0.028362    0.30     0.30     |
|   Initial updateGeomSearch()               2          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   NonlinearSystem::update()                1          0.0144      0.014432    0.0144      0.014432    0.15     0.15     |
|   Read Mesh                                1          2.9575      2.957462    2.9575      2.957462    31.45    31.45    |
|   eq.init()                                1          4.5928      4.592775    4.5928      4.592775    48.84    48.84    |
|   execMultiApps()                          1          0.0000      0.000005    0.0000      0.000005    0.00     0.00     |
|   execTransfers()                          1          0.0000      0.000002    0.0000      0.000002    0.00     0.00     |
|   initial adaptivity                       1          0.0000      0.000001    0.0000      0.000001    0.00     0.00     |
|   initialSetup()                           1          0.4741      0.474141    0.9777      0.977697    5.04     10.40    |
|   reinit() after updateGeomSearch()        1          0.0022      0.002178    0.0022      0.002178    0.02     0.02     |
|                                                                                                                         |
| Utility                                                                                                                 |
|   projectSolution()                        1          0.4730      0.473006    0.4730      0.473006    5.03     5.03     |
 -------------------------------------------------------------------------------------------------------------------------
| Totals:                                    27         9.4040                                          100.00            |
 -------------------------------------------------------------------------------------------------------------------------

@permcody
Copy link
Member

permcody commented Sep 22, 2016

On Wed, Sep 21, 2016 at 11:48 PM Derek Gaston notifications@github.com
wrote:

@idaholab/moose-developers
https://github.com/orgs/idaholab/teams/moose-developers take note!

I got this all working... and it's awesome. It depends on:
libMesh/libmesh#1103 libMesh/libmesh#1103

Here is the summary about how awesome:

This is for a problem with ~10M Hex8 elements and ~700M DoFs (but I'm not
solving a linear system). These numbers are for running using 240 MPI
proceses. What I did is just run up to the point where FEProblem::solve()
is called for the first time. So, this is all the time before the first
solve...
Time RAM/process
Exodus 402s ~6500 MB
Parallel Checkpoint 12s ~400 MB

Yep... those numbers are real. ~40x faster startup! ~17x less RAM!

I guess it depends on how you spin it. If you are using 240 processors each
one is working on ~1/240 of the original elements. This means that you have
a parallel efficiency of 16% :) Haha!

Before you start celebrating too soon, that RAM and startup savings isn't
all for free. When we used parallel mesh on those big 3D GrainTracker runs
last year, each time step of the simulation took about 40% longer than the
same step time on the Replicated Mesh. That's terrible! Maybe we had other
problems back then but I'll be curious to see the timings of the simulation
now.

@idaholab/moose-team

Holy crap. This is a game changer for me...

(BTW: I also have numbers for 720 procs for Parallel Checkpoint: ~20s and
~250 MB RAM/process)

Here are the raw timing numbers...

Exodus:


Mocodile Performance: Alive time=402.223, Active time=393.72
Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time
w/o Sub w/o Sub With Sub With Sub w/o S With S
-------------------------------------------------------------------------------------------------------------------------
Application
Full Runtime 1 0.8219 0.821885 393.7204 393.720447 0.21 100.00
Execution
computeAuxiliaryKernels() 2 0.0000 0.000001 0.0000 0.000001 0.00 0.00
computeControls() 2 0.0000 0.000002 0.0000 0.000002 0.00 0.00
computeUserObjects() 4 0.0001 0.000021 0.0001 0.000021 0.00 0.00
Output
CSV::output() 4 0.0115 0.002867 0.0115 0.002867 0.00 0.00
Setup
Application Setup 1 49.0642 49.064164 360.9090 360.909035 12.46 91.67
FEProblem::init::meshChanged() 1 31.2364 31.236431 31.2364 31.236431 7.93 7.93
Initial updateActiveSemiLocalNodeRange() 1 0.0284 0.028443 0.0284 0.028443 0.01 0.01
Initial updateGeomSearch() 2 0.0000 0.000002 0.0000 0.000002 0.00 0.00
NonlinearSystem::update() 1 0.0539 0.053939 0.0539 0.053939 0.01 0.01
Read Mesh 1 174.6405 174.640493 174.6405 174.640493 44.36 44.36
eq.init() 1 105.9140 105.914006 105.9140 105.914006 26.90 26.90
execMultiApps() 1 0.0000 0.000005 0.0000 0.000005 0.00 0.00
execTransfers() 1 0.0000 0.000002 0.0000 0.000002 0.00 0.00
initial adaptivity 1 0.0000 0.000001 0.0000 0.000001 0.00 0.00
initialSetup() 1 31.3767 31.376665 31.9780 31.977971 7.97 8.12
reinit() after updateGeomSearch() 1 0.0037 0.003739 0.0037 0.003739 0.00 0.00
Utility
projectSolution() 1 0.5691 0.569112 0.5691 0.569112 0.14 0.14

Totals: 27 393.7204 100.00

Parallel Checkpoint:


Mocodile Performance: Alive time=11.7636, Active time=9.40401
Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time
w/o Sub w/o Sub With Sub With Sub w/o S With S
-------------------------------------------------------------------------------------------------------------------------
Application
Full Runtime 1 0.0923 0.092292 9.4040 9.404009 0.98 100.00
Execution
computeAuxiliaryKernels() 2 0.0000 0.000001 0.0000 0.000001 0.00 0.00
computeControls() 2 0.0000 0.000003 0.0000 0.000003 0.00 0.00
computeUserObjects() 4 0.0001 0.000028 0.0001 0.000028 0.00 0.00
Output
CSV::output() 4 0.0345 0.008635 0.0345 0.008635 0.37 0.37
Setup
Application Setup 1 0.6168 0.616845 8.2994 8.299362 6.56 88.25
FEProblem::init::meshChanged() 1 0.1178 0.117847 0.1178 0.117847 1.25 1.25
Initial updateActiveSemiLocalNodeRange() 1 0.0284 0.028362 0.0284 0.028362 0.30 0.30
Initial updateGeomSearch() 2 0.0000 0.000001 0.0000 0.000001 0.00 0.00
NonlinearSystem::update() 1 0.0144 0.014432 0.0144 0.014432 0.15 0.15
Read Mesh 1 2.9575 2.957462 2.9575 2.957462 31.45 31.45
eq.init() 1 4.5928 4.592775 4.5928 4.592775 48.84 48.84
execMultiApps() 1 0.0000 0.000005 0.0000 0.000005 0.00 0.00
execTransfers() 1 0.0000 0.000002 0.0000 0.000002 0.00 0.00
initial adaptivity 1 0.0000 0.000001 0.0000 0.000001 0.00 0.00
initialSetup() 1 0.4741 0.474141 0.9777 0.977697 5.04 10.40
reinit() after updateGeomSearch() 1 0.0022 0.002178 0.0022 0.002178 0.02 0.02
Utility
projectSolution() 1 0.4730 0.473006 0.4730 0.473006 5.03 5.03

Totals: 27 9.4040 100.00


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5XIAKtP2C6RbJXxenpJf7Et5LYbXeNks5qsha7gaJpZM4KA_mm
.

@YaqiWang
Copy link
Contributor

@friedmud How is this set up? In lots of our codes, we assumed the replicated mesh. I want to see how many test will fail if mesh is turned parallel. I really like this and want it.

@fdkong
Copy link
Contributor

fdkong commented Sep 22, 2016

@YaqiWang, should be fine if you do not use mesh adaptivity much.

@friedmud
Copy link
Contributor Author

@permcody true on the efficiency :-) but who cares? 700 million DoFs in 400MB. Anything under 1GB per process is awesome. I'm also happy to see it go down even more after spreading it out more.

There will always be some fixed size overhead... but, honestly, I'm amazed it's this low.

As for being slower with parallel Mesh... that's going to be problem dependent. For normal stuff there won't be any impact. For mesh adaptivity there is quite a bit of overhead. If you're wanting to output Exodus there will be a big overhead (the Mesh has to be serialized... and so does the solution vector). Also, contact stuff may be a bit slower.

Other than that it shouldn't impact solve speed. What were you doing with it?

For my application it runs EXACTLY the same speed with and without distributed Mesh. My algorithm is already domain decomposed... it doesn't matter if there are extra non-local elements there or not. (Already confirmed this last night)

BTW: I was doing this with the Mesh files on /scratch. It didn't bat an eye when I tried to simultaneously load ~1000 Mesh files from it.

Oh: this also makes threading even less useful. The only time threading used to be a good idea is when you were RAM limited... this removes some of the times you would be in that situation.

@YaqiWang I'm going to finalize the process for this in the next couple of days so you can try it out.

@permcody
Copy link
Member

permcody commented Sep 22, 2016

Yeah I was being facetious. I guess we're going to have to quit telling people to knock it off when they start using DistributedMesh now!

Oh: this also makes threading even less useful. The only time threading used to be a good idea is when you were RAM limited... this removes some of the times you would be in that situation.

Now you are starting to sound more like the PETSc guys. You don't need threading, you just sometimes need "shared memory". The new version of MPI allows that. Hierarchical communicators with shared memory support.

@YaqiWang - You can try out DistributedMesh by using the "parallel_type" option in the Mesh block. However that's only half of the battle. Derek is pre-splitting his meshes and reading them in already split. So there's really two steps here. All the pieces you need aren't merged yet so give us some time. There are still a few bugs (assertions) that we're seeing with DistributedMesh so you probably don't want to go nuts with it until we know everything is working properly.

@friedmud
Copy link
Contributor Author

@permcody Maybe so!

I'll definitely lower my threshold for when you should go to Distributed Mesh.... it's now planted at around 1 Million elements...

One of the reasons why I've always warned people away from it is that we didn't have a reliable tool for pre-splitting the meshes. My new splitter is awesome. You can run with any number of MPI and write out partitions for any number of processors (like use 10 MPI processes to write out files for 1000 processes). That makes a huge difference over our old tools.

Also: since it's using our existing partitioner infrastructure we can use any of our partitioners with it easily (or make our own).

Where do you think I should put the splitter? Should it be in our contrib? Should it be one of those libMesh executables that always gets built with libMesh? Should there not be a separate executable... and maybe a command-line option to any MOOSE-based executable should automatically invoke the splitting? (I kind of like that last one)

I think I'll try to implement that last one and we can see what it looks like...

@roystgnr
Copy link
Contributor

I'd love it as src/apps/meshsplit.C in libMesh.

@permcody
Copy link
Member

I agree with Roy, Let's put it in libMesh.

On Thu, Sep 22, 2016 at 10:09 AM roystgnr notifications@github.com wrote:

I'd love it as src/apps/meshsplit.C in libMesh.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5XIA3FKk6Lkgqg-IT6Y-xQY69iVfRKks5qsqhIgaJpZM4KA_mm
.

@friedmud
Copy link
Contributor Author

Ok - I may do both ;-)

@friedmud
Copy link
Contributor Author

@permcody we were WAY off on the parallel efficiency. You can't compare a ReplicatedMesh run at 240 procs to a DistributedMesh run at 240 procs.... you need to compare both to a truly serial run...

I just did that. On one processor this problem takes ~85,000 MB of RAM (written that way for easy computation)...

So... to add to the table I had earlier... here is the memory scaling efficiency:

Time RAM/process Memory Efficiency
Exodus 402s ~6500 MB 5%
Parallel Checkpoint 12s ~400 MB 89%

"Perfect" memory scaling would have been 354 MB. The 400MB number was kind of an "eyeball" average anyway... some of the processes were less than that... and a few were more (just looking at the head node for the job)

At any rate... our memory scaling efficiency is actually REALLY good. I'm going to be generating plots of this for my upcoming paper which will make it easier to see.

@permcody
Copy link
Member

This is very interesting. I was just messing with you and I was only referring to timing. The memory scaling is actually more important here anyway.

So for the ReplicatedMesh run, I'm having a hard time understanding why one one process would require 85GB of RAM on the mesh but drop to 6.5GB of RAM per process when we are reading it in on 240 procs. If it's replicated shouldn't it be almost equal per process. We aren't considering the EQ objects or anything else right? Is this just the memory required to hold the mesh? What tool are you using to measure memory usage?

@friedmud
Copy link
Contributor Author

No - all of these numbers are the total memory for the process (which is
what I care about).
On Fri, Sep 23, 2016 at 11:39 AM Cody Permann notifications@github.com
wrote:

This is very interesting. I was just messing with you and I was only
referring to timing. The memory scaling is actually more important here
anyway.

So for the ReplicatedMesh run, I'm having a hard time understanding why
one one process would require 85GB of RAM on the mesh but drop to 6.5GB of
RAM per process when we are reading it in on 240 procs. If it's replicated
shouldn't it be almost equal per process. We aren't considering the EQ
objects or anything else right? Is this just the memory required to hold
the mesh? What tool are you using to measure memory usage?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA1JMfuaBfApZsyIgSTVM4J5G8wOT2oUks5qs_LLgaJpZM4KA_mm
.

@permcody
Copy link
Member

Oh, Ok. Since all of the pieces of the EQs are distributed, that would explain the huge difference between the parallel and serial case. From a frameworks perspective, we really should care about both. This helps gives us more information about when DistributedMesh should be used versus Replicated.

@friedmud
Copy link
Contributor Author

I think in this case you can basically look at it like there is a 6GB overhead for using ReplicatedMesh when running on 240 processors.

That's a lot of overhead!

@YaqiWang
Copy link
Contributor

I guess there are cases ReplicatedMesh is necessary. I do want to make DistributedMesh as the default though.

@permcody
Copy link
Member

No - not the default, at least not universally. When running small meshes
or small numbers of processors ReplicatedMesh should still be faster and
just more robust. We'll start by possibly making it the default for larger
problems if anything.

On Fri, Sep 23, 2016 at 11:45 AM Yaqi notifications@github.com wrote:

I guess there are cases ReplicatedMesh is necessary. I do want to make
DistributedMesh as the default though.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5XIEceLGzCoeHCemBI0cFBCE2KeBtnks5qtBBNgaJpZM4KA_mm
.

@friedmud
Copy link
Contributor Author

Agreed - definitely don't want Distributed to be the default. In particular... it requires extra steps to split your mesh before running. That's just unnecessary for 90% of the runs out with MOOSE.

@permcody
Copy link
Member

It's possible to use distributed mesh if you don't split first. You'll lose
the faster startup but the nonlocal elements are deleted saving you memory
after startup.
On Sat, Sep 24, 2016 at 12:56 PM Derek Gaston notifications@github.com
wrote:

Agreed - definitely don't want Distributed to be the default. In
particular... it requires extra steps to split your mesh before running.
That's just unnecessary for 90% of the runs out with MOOSE.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7752 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5XIADI1DPT4QzAfs_wLeweuryLs7JOks5qtXJvgaJpZM4KA_mm
.

@friedmud
Copy link
Contributor Author

Depends on your problem. For most of my problems the initial memory spike kills my runs...

I really think the utility of DistributedMesh is only really unlocked if you split first.

@permcody
Copy link
Member

Really? I'm running 180^3 without any problems in ReplicatedMesh mode. Are you running that much bigger meshes? I think the Mesh structure is around 5-6GB per process at that size.

Maybe down the road, we should think about a live-splitter mechanism. It should be feasible to create a sub-communicator using maybe 10% of the total procs to read and split the mesh. Then we could launch the run all online. Perhaps overkill, I don't know. With your new utility maybe it's not so much of a pain but if you are running on the cluster, having to schedule two jobs to split then run is not always convenient.

@friedmud
Copy link
Contributor Author

friedmud commented Sep 24, 2016

At 6GB you already can't utilize every core in a node. That's what I'm talking about. Yes, I can run my problems... but I waste a lot of cores. It's a pretty big issue when you're trying to run ~10k cores (like one of the jobs I currently have queued). If I needed to reserve 15k-20k cores to actually use 10k... that sucks. With pre-splitting and DistributedMesh that's not the case.

@friedmud
Copy link
Contributor Author

@roystgnr I'm running into something here.

I can't seem to be able to Partition the same Mesh object twice... do you think that should work?

It runs into this assert: https://github.com/libMesh/libmesh/blob/master/src/mesh/mesh_communication_global_indices.C#L785

The issue seems to be because I'm partitioning for more processors than I'm running on. The weird thing is that it works the first time... only on the second time does it hit that assert. I don't understand either thing: why it should work the first time... and/or why it doesn't the second time.

Is there something I should do to "clear" the mesh before re-partitioning it.

BTW: My purpose here is to try to avoid reading the same mesh multiple times (as that can take many minutes)... but be able to write out partitionings for multiple numbers of processors. i.e. read once but output partitionings for 24, 48, 96, etc. processors.

@friedmud
Copy link
Contributor Author

friedmud commented Sep 25, 2016

@roystgnr I figured out a workaround... I just "reset" the partitioning before each partitioning by partitioning it for one processor. Like this:

for (auto n_procs : all_n_procs)
{
  partitioner.partition(mesh, 1);
  partitioner.partition(mesh, n_procs);
  .....
}

The nice thing about resetting it to 1 processor is that that operation is fast (there is a branch specifically for it in MetisPartitioner.

So... kind of ugly... but still much faster than needing to do a separate run / read of the mesh to create multiple partitionings.

@permcody
Copy link
Member

permcody commented Feb 2, 2017

Tagging @friedmud, @roystgnr. I'm assigning this ticket to the A&M Milestone. When Derek gets his PR up there so that we can use Checkpoint Meshes into MOOSE. We can close this ticket.

friedmud added a commit to friedmud/moose that referenced this issue Feb 3, 2017
friedmud added a commit to friedmud/moose that referenced this issue Feb 3, 2017
friedmud added a commit to friedmud/moose that referenced this issue Feb 5, 2017
friedmud added a commit to friedmud/moose that referenced this issue Feb 5, 2017
permcody pushed a commit to permcody/moose that referenced this issue Feb 15, 2017
permcody pushed a commit to permcody/moose that referenced this issue Feb 15, 2017
permcody pushed a commit to permcody/moose that referenced this issue Jun 12, 2017
permcody pushed a commit to permcody/moose that referenced this issue Jun 12, 2017
@roystgnr
Copy link
Contributor

So, I'm pretty sure I have this working perfectly now, but I don't know how to properly test that it's working perfectly. Ideally we want to have a test which:

  1. Runs splitter in serial on a serial mesh
  2. Runs a simple MOOSE app in parallel on the resulting split checkpoint mesh

We can't exactly add such a test to libMesh since moose apps aren't available there, though I suppose we could just run a libMesh example code instead. Can we do this easily in the MOOSE test harness? The "RunCommand" tester looks like it's flexible enough to call a python script, which could in turn run the splitter-mooseapp-exodiff sequence we'd like, but the only example I can see of that tester is with command='echo Hello World', which is less impressive.

@permcody
Copy link
Member

Yes, you can absolutely run an arbitrary command. The Tester interface is basically two parts:

  • Return a command that will be scheduled to run asynchronously and in possibly in parallel with other Testers.
  • Define a "post command" to determine success or failure using the results of the command in the first step. You are given the output (stdout + stderr, or all even all streams from all processors, along with any output files you want to open in the working directory).

That should be everything we need to describe the test. However, I suspect that you really won't need the second part as you'll define success or failure in the script itself and return the appropriate exit code (which we also look at).

roystgnr added a commit that referenced this issue Aug 17, 2017
Based heavily on @friedmud's test in #8472; adds test coverage for
issue #7752.
roystgnr added a commit that referenced this issue Aug 17, 2017
Based heavily on @friedmud's test in #8472; adds test coverage for
issue #7752, for both ASCII and binary CheckpointIO pre-split mesh
reads.
roystgnr added a commit to roystgnr/moose that referenced this issue Sep 8, 2017
Summary of changes in this update:
    Remove 'old' Makefiles from contrib and unit test directories.
    ASCII option, naming fixes for apps/splitter
    Small update to RBEIMConstruction
    Put sideset/nodeset names in CheckpointIO header
    Checkpoint N->M restart fixes
    Check for exact file extensions in NameBasedIO::is_parallel_file_format()
    Update TetGenIO code for reading .ele files.
    allgather(vector<string>) overload
    Store integer data type in CheckpointIO files

This is necessary to support idaholab#9782 in service of idaholab#9700, and to add
more robustness to support of idaholab#7752
jarons pushed a commit to jarons/moose that referenced this issue Oct 5, 2017
Based heavily on @friedmud's test in idaholab#8472; adds test coverage for
issue idaholab#7752, for both ASCII and binary CheckpointIO pre-split mesh
reads.
jarons pushed a commit to jarons/moose that referenced this issue Oct 5, 2017
Summary of changes in this update:
    Remove 'old' Makefiles from contrib and unit test directories.
    ASCII option, naming fixes for apps/splitter
    Small update to RBEIMConstruction
    Put sideset/nodeset names in CheckpointIO header
    Checkpoint N->M restart fixes
    Check for exact file extensions in NameBasedIO::is_parallel_file_format()
    Update TetGenIO code for reading .ele files.
    allgather(vector<string>) overload
    Store integer data type in CheckpointIO files

This is necessary to support idaholab#9782 in service of idaholab#9700, and to add
more robustness to support of idaholab#7752
@permcody
Copy link
Member

Closing this issue, as it has been completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Framework C: libMesh P: normal A defect affecting operation with a low possibility of significantly affects. T: task An enhancement to the software.
Projects
None yet
Development

No branches or pull requests

5 participants