Allow for alternate communicators in parallel objects #71

Closed
benkirk opened this Issue Apr 2, 2013 · 73 comments

Projects

None yet

6 participants

@benkirk
Member
benkirk commented Apr 2, 2013

OK, the time has come. In the subcell integration stuff I'm working I'd like to be able to do

  SerialMesh my_local_mesh(MPI_COMM_SELF);

This is a necessary step for being able to split an MPI communicator and execute physics A on one portion of the parallel resource and physics B on another.

My plan of attack is to begin with t he Mesh classes, since that is what I need now, and then propagate to the EquationSystems later.

Basically, the Mesh will have an optional constructor argument that takes a reference to a Parallel::Communicator, defaulting to CommWorld.

Once the mesh is done the EquationSystems will inherit its communicator from the Mesh, so that will be easy, just tedious.

@roystgnr, any other pitfalls I'm not thinking about?

In order to keep things like parallel_only() simple I may need to impose a standard naming convention, like have class_parallel_only() expand to

   #define class_parallel_only() do { \
    libmesh_assert(this->communicator().verify(std::string(__FILE__).size())); \
    libmesh_assert(this->communicator().verify(std::string(__FILE__))); \
    libmesh_assert(this->communicator().verify(__LINE__)); } while (0)

So here I assume that any class with a communicator object has a method communicator() that returns that object...

@benkirk benkirk was assigned Apr 2, 2013
@friedmud
Member
friedmud commented Apr 2, 2013

lol - oh god!

I just got all of this working in my own code... the thought of having to change all of that.... sigh. ;-)

Hopefully my hacks for swapping out the libMesh::CommWorld communicator will continue to work while you do this work ;-)

@benkirk
Member
benkirk commented Apr 2, 2013

Thanks for reminding me - I think it can if that is a design goal.

Basically each class will need a pointer to its own communicator rather than a reference if it is to be switched behind the scenes... Hmm...

-Ben

On Apr 2, 2013, at 11:20 AM, "Derek Gaston" <notifications@github.commailto:notifications@github.com> wrote:

lol - oh god!

I just got all of this working in my own code... the thought of having to change all of that.... sigh. ;-)

Hopefully my hacks for swapping out the libMesh::CommWorld communicator will continue to work while you do this work ;-)


Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/71#issuecomment-15785715.

@roystgnr
Member
roystgnr commented Apr 2, 2013

Is it a design goal to allow each class to switch its communicator behind the scenes? Derek needed to swap CommWorld in and out as a workaround to "fake" the effect of having different per-object communicators; once we actually have real per-object communicators I can't see under what conditions we'd want to swap them in and out.

@benkirk
Member
benkirk commented Apr 2, 2013

@roystgnr, what are your thoughts about Parallel::Communicator references vs. Communicator::duplicate() everywhere?

For example, ultimately should the EquationSystems store a reference to it's Mesh's communicator, or should it store its own that is simply setup via Communicator::duplicate()?

I can see benefits to duplicating - the integer space for message tags is then disjoint, but I don't have a feel for the cost any other downsides...

@roystgnr
Member
roystgnr commented Apr 2, 2013

I don't even see benefits to duplicating unless we're writing dangrously asynchronous code - if we have any synchronized points at all in an algorithm we ought to be able to use them to get_unique_tag() rather than tagging with magic numbers in the first place.

On the other hand, I don't see any downsides to duplicating either. AFAIK MPI communicators aren't supposed to have much overhead, and we don't add much more ourselves.

@roystgnr
Member
roystgnr commented Apr 2, 2013

My vote would be to store references rather than to duplicate. I'm happy either way if others disagree.

@friedmud
Member
friedmud commented Apr 2, 2013

Well, Jed Brown posted something about this on Google+ a while back:

"V1R2M0 makes MPI_Comm_dup use O(P) memory on each rank. Apparently the regression was caused by fixing something else in PAMI for POWER7... This is yet another reason for responsible management of MPI communicators."

Personally, I would say just have the EquationSystems object reference the MPI Comm from the Mesh. That way calling communicator() will return exactly the same object for all objects associated with that Mesh/ES.

@benkirk
Member
benkirk commented Apr 2, 2013

I'll go with references as a start then, easy enough to refactor later if needed.

@benkirk
Member
benkirk commented Apr 3, 2013

I've started down the implementation. This is useful for scoping the problem:

grep 'CommWorld\|libMesh::n_processors\|libMesh::processor_id' src/*/*.C include/*/*.h

I'm implementing a ParallelObject base class, which is going to make this very clean:

  class ParallelObject
  {
  public:

    /**
     * Constructor. Requires a reference to the communicator
     * that defines the object's parallel decomposition.
     */
    ParallelObject (const Parallel::Communicator &comm) :
      _communicator(comm)
    {}

    /**
     * Copy Constructor.
     */
    ParallelObject (const ParallelObject &other) :
      _communicator(other._communicator)
    {}

    /**
     * Destructor.  Virtual because we are a base class.
     */
    virtual ~ParallelObject () {};

    /**
     * @returns a reference to the \p Parallel::Communicator object
     * used by this mesh.
     */
    const Parallel::Communicator & communicator () const
    { return _communicator; }

    /**
     * @returns the number of processors used in the
     * current simulation.
     */
    processor_id_type n_processors () const
    { return libmesh_cast_int<processor_id_type>(_communicator.size()); }

    /**
     * @returns the subdomain id for this processor.
     */
    processor_id_type processor_id () const
    { return libmesh_cast_int<processor_id_type>(_communicator.rank()); }


  protected:

    const Parallel::Communicator &_communicator;
  };

The trick then is Mesh, DofMap, EquationSystems, etc... are all ParallelObjects and share the same communicator by construction.

So you can construct a mesh with its own communicator, and that gets propagated all the way down.

Then mostly a bunch of replacing:

  CommWorld                     --> this->communicator()
  libMesh::n_processors() --> this->n_processors()
  libMesh::processor_id()  --> this->processor_id()
@roystgnr
Member
roystgnr commented Apr 3, 2013

This looks good provisionally - we're not going to get any diamond
inheritance patterns out of this plan, right?

@benkirk
Member
benkirk commented Apr 3, 2013

Doesn't look like it.

@benkirk
Member
benkirk commented Apr 3, 2013

I'm working on this at
https://github.com/benkirk/libmesh/tree/communicators
if anyone is interested. Pretty tedious but straightforwad.

I've done the mesh and equation system stuff, next major hurtle will be linear algebra data structures.

@friedmud
Member
friedmud commented Apr 4, 2013

Ok - I'm going to jump in here. I'll see what I can get done tonight...

Derek

@benkirk
Member
benkirk commented Apr 4, 2013

Thanks, but no biggie if you want to hold off, I can probably wrap it up tomorrow...

On Apr 3, 2013, at 7:00 PM, "Derek Gaston" <notifications@github.commailto:notifications@github.com> wrote:

Ok - I'm going to jump in here. I'll see what I can get done tonight...

Derek


Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/71#issuecomment-15872966.

@friedmud
Member
friedmud commented Apr 4, 2013

As long as you're not working on it tonight (so we're not doubling up) - I'll continue the work tonight and then give you a pull request before I go to sleep.

@benkirk
Member
benkirk commented Apr 4, 2013

Awesome - I haven't touched any of the parallel linear algebra stuff, so in particular NumericVector and SparseMatrix being derived from ParallelObject would be great.

On Apr 3, 2013, at 7:05 PM, "Derek Gaston" <notifications@github.commailto:notifications@github.com> wrote:

As long as you're not working on it tonight (so we're not doubling up) - I'll continue the work tonight and then give you a pull request before I go to sleep.


Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/71#issuecomment-15873112.

@benkirk
Member
benkirk commented Apr 4, 2013

So it may look a little weird, but when NumericVector is derived from ParallelObject it's constructor will need either a parent ParallelObject or a Communicator.

That's where you provide a reference to the containing class (e.g. System) to make sure they share the same underlying communicator.

Thanks!

-Ben

On Apr 3, 2013, at 7:05 PM, "Derek Gaston" <notifications@github.commailto:notifications@github.com> wrote:

As long as you're not working on it tonight (so we're not doubling up) - I'll continue the work tonight and then give you a pull request before I go to sleep.


Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/71#issuecomment-15873112.

@friedmud
Member
friedmud commented Apr 4, 2013

Yep - I got it figured out. I put in a Pull Request for you (on your Fork).

I only got NumericVectors worked out... damn it's fiddly. It's tough to ferret out all the places where these objects get built and make sure a communicator is getting passed in!

One thing that's not great with my patches is the way I deal with PetscVector::PetscVector(Vec) and EpetraVector::EpetraVector(Epetra_Vector)... those constructors that take native objects and create libMesh objects out of them.

You can technically get the MPI_Comm from both Petsc Vecs and Epetra_Vectors... but I couldn't see a great clean way to get those out and create a Communicator from them on the fly in the constructor (to pass down to NumericVector and ultimately to ParallelObject). I'm sure it can be done (you could do it with a function call for instance... but I didn't know about how the memory would be handled)... but for now I just made them take a Communicator just like the other constructors. Which means you have to be careful when you're creating a libMesh NumericVector from a native object....

Speaking of that, there are two places where we do just that that I couldn't fix up: callbacks for ShellMatrix and Preconditioners. Those aren't done yet because those objects need to be turned into ParallelObjects so we can get the Communicator from them to pass into the NumericVector (see the above paragraph ;-). I did put commented out stubs in there to remind us that that needs to be done.

Anyway... hopefully I didn't hork it up too much. Sorry it wasn't more... it took a bit longer than I thought ;-)

I can do Matrices tomorrow morning if you want. Just let me know.

@friedmud
Member
friedmud commented Apr 4, 2013

One more thing... what do you think about being able to do:

./configure --disable-default-communicator

??

It would definitely be useful in tracking down all the places in an existing code where we're accidentally passing in CommWorld and we don't mean to.... not too mention all the places in libMesh itself! ;-)

@benkirk
Member
benkirk commented Apr 4, 2013

I was thinking about that... My only worry is all the places we are using it as a default constructor argument - we'll have to change those lines of code conditionally.

Would you then propose something like

#define LIBMESH_DEFAULT_COMM_OBJECT =libMesh::CommWorld

...

explicit
  NumericVector (const ParallelType ptype = AUTOMATIC,
                 const Parallel::Communicator &comm LIBMESH_DEFAULT_COMM_OBJECT);

Should be doable. The other bigge will be fixing parallel_only() to take a communicator.

@roystgnr
Member
roystgnr commented Apr 4, 2013

Let's add a new libmesh_parallel_only() which takes a communicator?
That'll help avoid name clashes in the future while still preserving
backwards compatibility for the original macro name for a while.

For testing purposes, maybe we could optionally initialize CommWorld
to be a Communicator subclass that can be duplicated properly but that

screams and dies if anyone tries to use it for anything else??

Roy

@benkirk
Member
benkirk commented Apr 4, 2013

On Apr 4, 2013, at 8:58 AM, roystgnr notifications@github.com wrote:

Let's add a new libmesh_parallel_only() which takes a communicator?
That'll help avoid name clashes in the future while still preserving
backwards compatibility for the original macro name for a while.

Sounds good. Also, anything that is a ParallelObject should have

parallel_object_only();

which relies on this->communicator() working, but I haven't started the replacement yet.

For testing purposes, maybe we could optionally initialize CommWorld
to be a Communicator subclass that can be duplicated properly but that
screams and dies if anyone tries to use it for anything else??

Hmm… something like RemoteElem? Good idea.

In that case ParallelObject would need virtual functions, which is fine but I'd say since it is so short we should make two implementations depending on this option, lest this->communicator() and this->processor_id() (which are everywhere) become virtual function calls all the time.

-Ben

@friedmud
Member
friedmud commented Apr 4, 2013

I was thinking something simpler for --disable-default-comm

Just make libMesh::defaultComm(). It normally will return libMesh::CommWorld but if you use --disable-default-comm then it just throws an error.

That way all of the constructors will look like:

Object(Stuff stuff, const Parallel::Communicator &comm = libMesh::defaultComm())

That's not too fiddly. Sure, it doesn't give us compile time errors - but in general we won't be able to do that anyway (because we are adding "comm" as the last argument to a lot of functions that already have default arguments... so we can't just omit the default for comm).

At least this way we won't ever silently be getting a CommWorld slipped in when we're trying to work on a sub-communicator.

@benkirk
Member
benkirk commented Apr 4, 2013

Ahh, that makes a lot of sense. I was trying to figure out the best way to comment out libMesh::CommWorld for testing purposes!

BTW, I'm well in to the matrix stuff ATM, and thanks for last night!

(uhh....)

@friedmud
Member
friedmud commented Apr 4, 2013

lol - anytime ;-)

@roystgnr
Member
roystgnr commented Apr 4, 2013

Do we really need ParallelObject to wrap n_processors() etc.? Why not just make the member reference a bit terser? "this->comm->n_processors()" isn't too much worse than "libMesh::n_processors()", and we're going to need to be doing "this->comm->send()" etc. anyway so we might as well give that idiom as much brevity as the stupid C++ dependent-name-lookup rules allow.

@friedmud
Member
friedmud commented Apr 4, 2013

I actually agree Roy. In MOOSE I believe I'm going to name the member _comm... so that our users can do _comm->send(), etc.

this->comm->n_processors() is not bad at all. But for some reason this->communicator()->send() feels like it's too long ;-)

@roystgnr
Member
roystgnr commented Apr 4, 2013

I've come up with about three different alternatives now to Derek's "libMesh::defaultComm()" idea, none of which are as good as his, which is evidence that he's found the right way to proceed. But there's still something that bothers me: if we don't do something to make "CommWorld.whatever()" unsafe to use, then our --disable-default-comm option will only be debugging constructions with missing communicator arguments, and won't also be debugging inappropriate CommWorld uses. Granted, we can hunt for inappropriate CommWorld uses with "grep" too, but it would be nice to have something that's more natural to toss onto our list of regression test configurations and that automatically checks application-level code too.

@friedmud
Member
friedmud commented Apr 4, 2013

If we implement libMesh::default_comm() then when we use --disable-default-comm we can comment out libMesh::CommWorld all together.

We should never be calling libMesh::CommWorld anywhere in the library anymore... everything should use libMesh::default_comm().

@benkirk
Member
benkirk commented Apr 4, 2013

No preference.

But ATM it would have to be comm().rank() and comm.size()

Personally I like the clarity of n_processors() and processor_id(),
but that's just a preference.

-Ben

On Apr 4, 2013, at 10:22 AM, "Derek Gaston" <notifications@github.commailto:notifications@github.com> wrote:

I actually agree Roy. In MOOSE I believe I'm going to name the member _comm... so that our users can do _comm->send(), etc.

this->comm->n_processors() is not bad at all. But for some reason this->communicator()->send() feels like it's too long ;-)


Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/71#issuecomment-15903840.

@friedmud
Member
friedmud commented Apr 4, 2013

BTW - I'm not against leaving n_processors() and processor_id() where they are in ParallelObject...

@pbauman
Member
pbauman commented Apr 4, 2013

On Thu, Apr 4, 2013 at 10:36 AM, Benjamin S. Kirk
notifications@github.comwrote:

Personally I like the clarity of n_processors() and processor_id(),
but that's just a preference.

FWIW, +1.

@roystgnr
Member
roystgnr commented Apr 4, 2013

I think "communicator methods all use the communicator" is clearer
than "some communicator methods use the communicator, others don't".
But that's not a strong preference, and anyway it sounds like I'm
outvoted.

@roystgnr
Member
roystgnr commented Apr 4, 2013

If everything should use libMesh::default_comm(), then we almost might as well rename "default_comm()" to "CommWorld", no? If we make its --disable-default-comm implementation be some class with an implicit conversion-to-Communicator operator (which throws an error) then we don't even need parentheses or virtual functions to use it as a default argument to methods expecting a Communicator&.

@friedmud
Member
friedmud commented Apr 4, 2013

Sounds fine to me!

@friedmud
Member
friedmud commented Apr 4, 2013

Actually - no it doesn't. Because you wouldn't get the error until you try to use it. I want the error to happen right when we try to assign it.

@roystgnr
Member
roystgnr commented Apr 4, 2013

The one catch is that we then need some way for standard only-playing-with-the-world-communicator applications to initialize their stuff; "Mesh(CommWorld);" would break. Is LibMeshInit a ParallelObject yet? If so then we could just use the same API to grab its communicator.

@roystgnr
Member
roystgnr commented Apr 4, 2013

The implicit conversion-to-Communicator operator would be called on assignment; it would throw a run-time error when that happened. You'd get a compile-time error for any other attempt to use CommWorld as a communicator.

@friedmud
Member
friedmud commented Apr 4, 2013

Hmmm.... as long as the stack trace is clean and shows where the problem is... I'm ok with that.

@friedmud
Member
friedmud commented Apr 4, 2013

Why would Mesh(CommWorld) fail? It would only fail in the case that you use --disable-default-comm right? That's what we want, right?

@jwpeterson
Member

Testing to see if "at-tags" will help issue emails make it through to @libmesh-devel

@roystgnr
Member
roystgnr commented Apr 4, 2013

I'll put together a "FakeCommunicator" class implementation sometime today or tonight, then, and we can try it out. The stack trace should work just fine.

We do want Mesh(CommWorld) to fail in the --disable-default-comm case, and we also want Mesh() to fail... but we want some way to create a mesh without failing. My proposal was "Mesh(libmeshinit.communicator());" Expresses what we'd want to do when creating a mesh in main(), but because LibMeshInit objects usually aren't in scope anywhere else it'd be hard for app codes to "abuse" and thereby inadvertently make themselves less extensible to the multi-communicator case.

@roystgnr
Member
roystgnr commented Apr 4, 2013

What's the proper git procedure for this? Create a roystgnr/libmesh/fake_comm branch, regularly "pull --rebase" from benkirk/libmesh/communicators, then set up a pull request back to benkirk/libmesh/communicators when I'm done?

@libmesh-devel

On Apr 4, 2013, at 11:07 AM, roystgnr notifications@github.com wrote:

What's the proper git procedure for this? Create a roystgnr/libmesh/fake_comm branch, regularly "pull --rebase" from benkirk/libmesh/communicators, then set up a pull request back to benkirk/libmesh/communicators when I'm done?

Yes, I think so. Pretty sure Derek forked from my branch and then sent me a pull request when it was done?

@friedmud
Member
friedmud commented Apr 4, 2013

I didn't fork your repo (in the Github sense).

In my local repo that is cloned from my libMesh fork I added Ben's libMesh fork as a remote:

git remote add ben git@github.com:benkirk/libmesh.git

Then fetched to get Ben's branches:

git fetch ben

Then created a local tracking branch of his communicators branch:

git co -tb communicators ben/communicators

Then you can push that branch to your own libMesh fork repo on Github by doing:

git push origin communicators

(Note: That assumes that your clone was made from your own libMesh fork... so "origin" here is referring to your own libMesh fork on Github)

Then issue a pull request to Ben's repo and branch here:

https://github.com/libMesh/libmesh/pull/new/benkirk:communicators...master

The only weirdness there is that you have to go to the main libMesh/libmesh repo page and hit "Pull Request" to do a pull request to Ben's fork of the libmesh repo. There is no "Pull Request" button on Ben's repo because you never technically forked Ben's repo. But, the pull request page for the main repo allows you to select that you want to do a pull request to any of the forked repos (including Ben's).

One last thing.... I never issue pull requests for branches I'm actively working in. So what I actually did, is when I was ready to give Ben back my changes I made another branch from my local tracking branch and actually pushed that to GitHub instead. Like so:

/// Currently in my "communicators" branch that I made like above
git co -b new_stuff_for_ben
git push origin new_stuff_for_ben

Then issue the pull request from new_stuff_for_ben to Ben's communicators branch.

@friedmud
Member
friedmud commented Apr 4, 2013

@roystgnr I was actually thinking that anyone working with --disable-default-comm would actually create their own Communicator object from MPI_COMM_WORLD (or whatever they want to use) and pass that in to Mesh. ie they would literally do that in main()

I don't think we want to do anything for you if you use --disable-default-comm.... that should be a very pedantic mode.

BTW - I'm planning to actually use that mode as the default for MOOSE. MOOSE now relies on the ability to run on sub-communicators.... so we never want to accidentally pick up COMM_WORLD.

@benkirk
Member
benkirk commented Apr 4, 2013

On Apr 4, 2013, at 11:50 AM, Derek Gaston notifications@github.com wrote:

I didn't fork your repo (in the Github sense).

In my local repo that is cloned from my libMesh fork I added Ben's libMesh fork as a remote:

If y'all would prefer I can instead push my branch to libMesh instead of my own space so we can all write to it, but that seems pretty svn'ish of me...

@friedmud
Member
friedmud commented Apr 4, 2013

Nah - this is really the right way to work with it. This is the whole point of "distributed" VC...

@benkirk
Member
benkirk commented Apr 4, 2013

On Apr 4, 2013, at 12:38 PM, Derek Gaston notifications@github.com wrote:

Nah - this is really the right way to work with it. This is the whole point of "distributed" VC...

Back to the whole DefaultCommunicator business, though -

If we do it right, shouldn't we be able to remove the default constructor argument from everywhere except Mesh, and it should all work by design, right? The constant reference inside each ParallelObject is forced to be initialized by the containing object, and that unwinds all the way to the top mesh.

So the derived meshes (Serial, Parallel) are the only place where we refer to any sort of default communicator, and the rest falls through. Ok, and the e.g.

NumericVector<>::build() functions as well, those are the only places we should need to provide a default communicator, and only for backwards compatibility.

Basically I'm just verbalizing that right now there are way more instances of

(const libMesh::Parallel::Communicator& = libMesh::CommWorld )

than are actually required, thereby increasing the likelihood for mistakes.

-Ben

@roystgnr
Member
roystgnr commented Apr 4, 2013

Email reply on this ticket just broke for me... is John still futzing with things?

Pasting to the web form:

Ben, your claim intrigues and confuses me.

To be able to use multiple communicators, we need to be able to pass
non-default communicators in to ParallelObjects, except for
ParallelObjects whose constructors already all require another
ParallelObject subclass (e.g. EquationSystems always gets built with a
Mesh) or whose parallel_only() methods already all require a
ParallelObject subclass or something with a handle to one (e.g.
EstimateError subclasses always get handed a System to work with)

To keep this from breaking old code, this argument has to have a
default value.

What am I missing?

Am I just agreeing with you unintentionally, because it turns out that most of our classes do get passed other ParallelObjects before they do any parallel work?

@jwpeterson
Member

On Thu, Apr 4, 2013 at 12:23 PM, roystgnr notifications@github.com wrote:

Email reply on this ticket just broke for me... is John still futzing with
things?

Nope!

Pasting to the web form:

Ben, your claim intrigues and confuses me.

To be able to use multiple communicators, we need to be able to pass
non-default communicators in to ParallelObjects, except for
ParallelObjects whose constructors already all require another
ParallelObject subclass (e.g. EquationSystems always gets built with a
Mesh) or whose parallel_only() methods already all require a
ParallelObject subclass or something with a handle to one (e.g.
EstimateError subclasses always get handed a System to work with)

To keep this from breaking old code, this argument has to have a
default value.

What am I missing?

Am I just agreeing with you unintentionally, because it turns out that
most of our classes do get passed other ParallelObjects before they do any
parallel work?

John

@roystgnr
Member
roystgnr commented Apr 4, 2013

Derek, I assume it's okay if --disable-default-comm-world also causes runtime libmesh_error()s in opt mode when something tries to use the default CommWorld, not just devel/debug modes?

@roystgnr
Member
roystgnr commented Apr 4, 2013

Ben, try running the reduced basis examples from your current communicators branch? I'm seeing a failure to find RBEvaluation::RBEvaluation().

@roystgnr
Member
roystgnr commented Apr 4, 2013

Anyone mind if I get rid of the non-LibMeshInit init()/close() methods? They've been marked as libmesh_deprecated() for years now.

@benkirk
Member
benkirk commented Apr 4, 2013

On Apr 4, 2013, at 1:23 PM, roystgnr notifications@github.com wrote:

To keep this from breaking old code, this argument has to have a
default value.

What am I missing?

Am I just agreeing with you unintentionally, because it turns out that most of our classes do get passed other ParallelObjects before they do any parallel work?

I think that is the case.

Specifically, we just committed some PetscVector code with a default constructor, but that should not be required because we should always be using NumericVector<>::build(). So in that case the only place CommWorld should appear is as the default argument to build() - not PetscVector.

Basically I want to remove the default constructor spec from all base classes, and all classes not intended to be directly instantiated from user code. So anything with a ::build() method.

@benkirk
Member
benkirk commented Apr 4, 2013

On Apr 4, 2013, at 2:24 PM, roystgnr notifications@github.com wrote:

Anyone mind if I get rid of the non-LibMeshInit init()/close() methods? They've been marked as libmesh_deprecated() for years now.

That'd be great.

@friedmud
Member
friedmud commented Apr 4, 2013
  1. Agree with @benkirk on removing default arguments.
  2. @roystgnr Yes! I would like runtime errors no matter what METHOD.
@roystgnr
Member
roystgnr commented Apr 4, 2013

I'm suspicious of the phrase "not intended to be directly instantiated
from user code". I'd never use anything but NumericVector::build()
myself, but I'd always presumed we also wanted to support application
codes deliberately selecting potentially-non-default algebra types.

@benkirk
Member
benkirk commented Apr 4, 2013

Well we do, through the optional argument to NumericVector::build().
But you're right we don't force the issue by not hiding the constructor.

Anyway, for example, I'm leaving the default out of the MeshfreeInterpolation base class constructor for this reason, it helps catch all the cases of derived classes accidentally calling the base class default constructor - which should never happen.

@benkirk
Member
benkirk commented Apr 4, 2013

Btw, I've got a bunch of incoming changes too but no Internet until tonight, in case anyone is pulling from my branch...

On Apr 4, 2013, at 2:31 PM, "roystgnr" <notifications@github.commailto:notifications@github.com> wrote:

I'm suspicious of the phrase "not intended to be directly instantiated
from user code". I'd never use anything but NumericVector::build()
myself, but I'd always presumed we also wanted to support application
codes deliberately selecting potentially-non-default algebra types.


Reply to this email directly or view it on GitHubhttps://github.com/libMesh/libmesh/issues/71#issuecomment-15918524.

@roystgnr
Member
roystgnr commented Apr 4, 2013

Your changes are almost certainly going to cause me merge conflicts,
assuming they include fixes for some of the same bugs I've fixed. Que
será.

@benkirk
Member
benkirk commented Apr 5, 2013
grep 'CommWorld\|libMesh::n_processors\|libMesh::processor_id\|libMesh::COMM_WORLD' src/*/*.C > listC.txt
grep 'CommWorld\|libMesh::n_processors\|libMesh::processor_id\|libMesh::COMM_WORLD' include/*/*.h  | grep -v "Parallel::Communicator &" > listh.txt 

making serious progress - almost ready for testing. AFAICT the only remaining instances of global communicator related cruft are in places they might be appropriate, and some asserts.

gotta get something other than COMM_WORLD inside DMRegister() (I think), but otherwise looking good...

@roystgnr
Member
roystgnr commented Apr 5, 2013

Did you get a chance to pull in my FakeCommunicator bits yet?

@benkirk
Member
benkirk commented Apr 5, 2013

On Apr 5, 2013, at 10:38 AM, roystgnr notifications@github.com wrote:

FakeCommunicator bits yet?

Crap - didn't see that request, will try to pull in a bit after a 10:45 meeting...

@benkirk
Member
benkirk commented Apr 5, 2013

On Apr 5, 2013, at 10:39 AM, "Benjamin S. Kirk" benjamin.kirk@nasa.gov wrote:

On Apr 5, 2013, at 10:38 AM, roystgnr notifications@github.com wrote:

FakeCommunicator bits yet?

Crap - didn't see that request, will try to pull in a bit after a 10:45 meeting...

It is in there now. I'm in the process now of finding all the places we accidentally called a default constructor and fixing them.

I

@roystgnr
Member
roystgnr commented Apr 5, 2013

--disable-default-comm-world is working for that, then? Great! I
thought I had it in working order, but when I tried testing it there
were still too many compile-time using-globals errors that it caught,
and I didn't have time to dig through those and make sure it was also
catching the run-time using-default-argument errors.

@roystgnr
Member
roystgnr commented Apr 5, 2013

Let's try and get this ready to merge back to master soon, but in the
longer term, should we enable turning those run-time errors into
compile-time errors? It'd be easy to define a macro called something
like LIBMESH_DEFAULT_COMMWORLD_ARG macro that evaluates to "=
CommWorld" by default but to "" whenever someone requests
--disable-default-comm-world. Then any code relying on the default
arguments would fail to compile, not just fail if/when it managed to
hit that constructor.

The only downside I see is documentation; every single constructor
would require some additional comment explaining what that confusing
macro meant.

@benkirk
Member
benkirk commented Apr 5, 2013

On Apr 5, 2013, at 12:16 PM, roystgnr notifications@github.com wrote:

--disable-default-comm-world is working for that, then? Great! I
thought I had it in working order, but when I tried testing it there
were still too many compile-time using-globals errors that it caught,
and I didn't have time to dig through those and make sure it was also
catching the run-time using-default-argument errors.

What I did ATM is to remove all default constructor arguments that referenced COMM_WORLD:

AdjointResidualErrorEstimator(const Parallel::Communicator &comm /* = libMesh::CommWorld */);

This has helped find at compile time a lot of places we were inadvertently calling the default constructor, which I think is a better solution until waiting for a runtime trip, especially for rarely used code.

So I have done that and got the entire library to compile, and will turn on all the default arguments needed to get the examples to run unchanged.

-Ben

@benkirk
Member
benkirk commented Apr 5, 2013

On Apr 5, 2013, at 12:22 PM, roystgnr notifications@github.com wrote:

Let's try and get this ready to merge back to master soon, but in the
longer term, should we enable turning those run-time errors into
compile-time errors? It'd be easy to define a macro called something
like LIBMESH_DEFAULT_COMMWORLD_ARG macro that evaluates to "=
CommWorld" by default but to "" whenever someone requests
--disable-default-comm-world. Then any code relying on the default
arguments would fail to compile, not just fail if/when it managed to
hit that constructor.

The only downside I see is documentation; every single constructor
would require some additional comment explaining what that confusing
macro meant.

Sounds good. So long as the macro is sufficiently documented where it's defined I think we can let the per-class documentation slide.

I'm on a tear and will be ready to commit these constructor fixes, at which point I think we can replace the commented-out default with a suitably defined macro.

-Ben

@benkirk
Member
benkirk commented Apr 5, 2013

The only outstanding problem are constructors like

NumericVector 
(const numeric_index_type N,
 const numeric_index_type n_local,
 const std::vector<numeric_index_type>& ghost,
 const ParallelType ptype = AUTOMATIC,
 const Parallel::Communicator &comm=libMesh::CommWorld);

where we can't remove the default without changing the constructor args...

@roystgnr
Member
roystgnr commented Apr 5, 2013

I think the only things we can do for constructors with pre-existing
default arguments are either break them (put the Communicator before
the other default arguments) or overload them (leave the existing
function signature as-is, except that we'll eventually ifdef it out in
the --disable-default-comm-world case, and add a new constructor with
the communicator before the other default arguments). I'd vote for
the latter option.

@benkirk
Member
benkirk commented Apr 5, 2013
grep 'CommWorld\|libMesh::n_processors\|libMesh::processor_id\|libMesh::COMM_WORLD\|Communicator_World' src/*/*.C 
grep 'CommWorld\|libMesh::n_processors\|libMesh::processor_id\|libMesh::COMM_WORLD\|Communicator_World' include/*/*.h  | grep -v "Parallel::Communicator &"

produces

src/base/libmesh.C:Parallel::FakeCommunicator CommWorld;
src/base/libmesh.C:Parallel::FakeCommunicator& Parallel::Communicator_World = CommWorld;
src/base/libmesh.C:Parallel::Communicator CommWorld;
src/base/libmesh.C:Parallel::Communicator& Parallel::Communicator_World = CommWorld;
src/base/libmesh.C:      libMesh::COMM_WORLD = this->comm.get();
src/base/libmesh.C:      Parallel::Communicator_World = COMM_WORLD_IN;
src/base/libmesh.C:      //MPI_Comm_set_name (libMesh::COMM_WORLD, "libMesh::COMM_WORLD");
src/base/libmesh.C:   MPI_Errhandler_set(libMesh::COMM_WORLD, libmesh_errhandler);
src/base/libmesh.C:  Parallel::Communicator communicator(libMesh::COMM_WORLD);
src/base/libmesh.C:      PETSC_COMM_WORLD = libMesh::COMM_WORLD;
src/base/libmesh.C:             CHKERRABORT(libMesh::COMM_WORLD,ierr);
src/base/libmesh.C:          CHKERRABORT(libMesh::COMM_WORLD,ierr);
src/base/libmesh.C:          CHKERRABORT(libMesh::COMM_WORLD,ierr);
src/base/libmesh.C:        libMesh::processor_id());
src/base/libmesh.C:  if (libMesh::processor_id() != 0)
src/base/libmesh.C:  Parallel::Communicator communicator(libMesh::COMM_WORLD);
src/base/libmesh.C:        MPI_Abort(libMesh::COMM_WORLD,1);
src/base/libmesh.C:      Parallel::Communicator_World.clear();
src/base/libmesh.C:      MPI_Comm_free (&libMesh::COMM_WORLD);
src/geom/node.C:                 processor_id < libMesh::n_processors());
src/solvers/petsc_dm_nonlinear_solver.C:    ierr = DMRegister(DMLIBMESH, PETSC_NULL, "DMCreate_libMesh", DMCreate_libMesh); CHKERRABORT(libMesh::COMM_WORLD,ierr);
src/utils/perf_log.C:      if (libMesh::n_processors() > 1)
src/utils/perf_log.C:     pid_stream     << "| Processor id:   " << libMesh::processor_id();
src/utils/perf_log.C:     nprocs_stream  << "| Num Processors: " << libMesh::n_processors();
include/base/libmesh_base.h:libMesh::processor_id_type libMesh::n_processors()
include/base/libmesh_base.h:libMesh::processor_id_type libMesh::processor_id()
include/base/libmesh_common.h:#define libmesh_here()     do { libMesh::err << "[" << libMesh::processor_id() << "] " << __FILE__ << ", line " << __LINE__ << ", compiled " << __DATE__ << " at " << __TIME__ << std::endl; } while (0)
include/base/libmesh_common.h:#  define libmesh_stop()     do { if (libMesh::n_processors() == 1) { libmesh_here(); libMesh::out << "Stopping process " << getpid() << "..." << std::endl; std::raise(SIGSTOP); libMesh::out << "Continuing process " << getpid() << "..." << std::endl; } } while(0)
include/base/libmesh_common.h:#  define libmesh_stop()     do { if (libMesh::n_processors() == 1) { libmesh_here(); libMesh::out << "WARNING:  libmesh_stop() does not work without the <csignal> header file!" << std::endl; } } while(0)
include/base/libmesh_common.h:  #define libmesh_write_traceout()   do { std::stringstream outname; outname << "traceout_" << libMesh::processor_id() << '_' << getpid() << ".txt"; std::ofstream traceout(outname.str().c_str()); libMesh::print_trace(traceout); } while(0)
include/base/libmesh_common.h:#define libmesh_error()    do { if (libMesh::n_processors() == 1) libMesh::print_trace(); libmesh_here(); LIBMESH_THROW(libMesh::LogicError()); } while(0)
include/base/libmesh_common.h:#define libmesh_error_msg(msg)    do { if (libMesh::n_processors() == 1) libMesh::print_trace(); libmesh_here(); libMesh::err << msg << std::endl; LIBMESH_THROW(libMesh::LogicError()); } while(0)
include/base/libmesh_common.h:#define libmesh_not_implemented()    do { if (libMesh::n_processors() == 1) libMesh::print_trace(); libmesh_here(); LIBMESH_THROW(libMesh::NotImplemented()); } while(0)
include/base/libmesh_common.h:#define libmesh_not_implemented_msg(msg)    do { if (libMesh::n_processors() == 1) libMesh::print_trace(); libmesh_here(); libMesh::err << msg << std::endl; LIBMESH_THROW(libMesh::NotImplemented()); } while(0)
include/base/libmesh_common.h:#define libmesh_file_error(filename)    do { if (libMesh::n_processors() == 1) libMesh::print_trace(); libmesh_here(); LIBMESH_THROW(libMesh::FileError(filename)); } while(0)
include/base/libmesh_common.h:#define libmesh_file_error_msg(filename, msg)    do { if (libMesh::n_processors() == 1) libMesh::print_trace(); libmesh_here(); libMesh:err << msg << std::endl; LIBMESH_THROW(libMesh::FileError(filename)); } while(0)
include/geom/elem.h:                   libMesh::n_processors() ||
include/parallel/parallel.h:    libmesh_assert(CommWorld.verify(std::string(__FILE__).size())); \
include/parallel/parallel.h:    libmesh_assert(CommWorld.verify(std::string(__FILE__))); \
include/parallel/parallel.h:    libmesh_assert(CommWorld.verify(__LINE__)); } while (0)
include/parallel/parallel.h:    libmesh_assert(CommWorld.verify(std::string(__FILE__).size(), comm_arg)); \
include/parallel/parallel.h:    libmesh_assert(CommWorld.verify(std::string(__FILE__), comm_arg)); \
include/parallel/parallel.h:    libmesh_assert(CommWorld.verify(__LINE__), comm_arg); } while (0)
include/parallel/parallel.h:  // FakeCommunicator for debugging inappropriate CommWorld uses
include/parallel/parallel.h:  extern Parallel::FakeCommunicator CommWorld;
include/parallel/parallel.h:  extern Parallel::Communicator CommWorld;
include/parallel/parallel_implementation.h: * deprecated - instead of libMesh::Parallel::Communicator_World use
include/parallel/parallel_implementation.h: * libMesh::CommWorld
include/parallel/parallel_implementation.h:extern FakeCommunicator& Communicator_World;
include/parallel/parallel_implementation.h:extern Communicator& Communicator_World;
include/parallel/parallel_implementation.h:inline void barrier (const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                   const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                   const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                   const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                      const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                      const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                     const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                  const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                  const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                  const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                  const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                               const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                               const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                              const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                              const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                       const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                     const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                       const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                     const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                                  const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                                  const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                                 const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                                 const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                         const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                                      const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                         const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                   const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                   const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                      const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                      const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                                    const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                     const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                      const Communicator &comm = Communicator_World)
include/parallel/parallel_implementation.h:                                    const Communicator &comm = Communicator_World)

I think we are good. the only leftovers are in e.g. libmesh_error(); and friends, which AFAICT need some sort of global access to the processor id.

In node.C for example that one is an assert, and worth leaving I think even if it is less effective when running on a subset.

I'm mentally fried - I cannot look at another constructor until Monday.

I'm going to let this branch sit over the weekend.

@roystgnr if you want to pick up the torch with FakeCommunicator, @friedmud please make sure I didn't break DTKSolutionTransfer...

Notice that all default constructor arguments not required to get the examples to run are still commended out, so applications may break ATM, but I'd like to leave them commented out until we're sure the library implementation is clean.

@benkirk benkirk referenced this issue Apr 5, 2013
Merged

Communicators #76

@roystgnr
Member

Closing this issue; Pull Request #76 pretty much handled everything.

@roystgnr roystgnr closed this Jul 24, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment