Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using dynamic reference to communicator #157

Merged
merged 10 commits into from
Nov 10, 2023

Conversation

Cydhra
Copy link
Contributor

@Cydhra Cydhra commented Apr 4, 2023

This should fix #148.
It is independent of #155, it should work (in its current unfinished state) without change if 155 is merged.

process_at_rank is currently not callable on dyn Communicator for reasons outlined in #148. I have an experimental commit in here that fixes this (building upon #155)

The same is true for split_by_subgroup_collective, and probably a few other that require Sized, for the same reason. I haven't gotten to that yet.

@jedbrown
Copy link
Contributor

Thanks for this, and thanks for your patience (I'm finally into summer mode and have more capacity to do Rust things).

Does this need to be Draft or is it ready from your perspective? Note the existence of tests/trait_object_safety_test.rs (from #113) is supposed to test object safety, but obviously was inadequate because of CommunicatorCollectives.

@Cydhra
Copy link
Contributor Author

Cydhra commented May 31, 2023

Does this need to be Draft or is it ready from your perspective?

I made this a draft because, while it does work, there are things left that do not (as outlined in the initial post).
It obviously compiles and can be used as is, there are just some functions that cannot be called on trait objects.

If #155 is merged, I have another commit that changes the Process API to implement Communicator as well, this way it is no longer generic and functions like process_at_rank can be used with trait objects as well. (I am unsure, however, if the change to Process is desired).

If you are happy to merge this in its incomplete state to at least partially support collective communication on trait objects, then it doesn't need to be a draft.

Note the existence of tests/trait_object_safety_test.rs (from #113) is supposed to test object safety, but obviously was inadequate because of CommunicatorCollectives.

This is green and untouched. Should I add a call to a collective function to it to make it more adequate?

@jedbrown
Copy link
Contributor

Sorry about the delays on my end. Would you like to pick this up now that #155 has merged?

@Cydhra
Copy link
Contributor Author

Cydhra commented Oct 20, 2023

I will gladly have another look at this in the following weeks

…to use trait-objects to make them object-safe
 - changed constructors for Process type
 - changed AnyProcess analogous
 - implemented Communicator for Process, such that AsCommunicator can be implemented without violating borrow checker
 - changed functions in Communicator trait that return processes to be object-safe
@Cydhra
Copy link
Contributor Author

Cydhra commented Oct 29, 2023

Changes so far:

  • I have provided the blanket implementation of CommunicatorCollectives for unsized Communicators
  • I have sealed the CommunicatorHandle enum from Remove type-distinction between system-communicators and user-communicators #155 in a crate-public module, because I needed it to be sealed for the following change, and, also, I don't think downstream crates should access that type anyway?
  • I have added a sealed trait AsHandle that must be implemented for all Communicator implementations, which returns the CommunicatorHandle instance that wraps the raw comm handle. I require this trait for the following change, and I enforce its implementation by extending the Communicator trait from it. Since it is sealed, it cannot be seen from outside the crate, hiding CommunicatorHandle from the public API (this has the side effect that downstream crates cannot implement Communicator. If this is undesired, the following changes must be reverted).
  • I changed Process to be non-generic (previously it had a Communicator parameter), and instead contain a CommunicatorHandle for the communicator from which it was created (instead of a reference to the Communicator itself). This is why I need the AsHandle trait. Since the handle is part of the Communicator instance, the lifetime of the Process is still bound to its Communicator, so nothing is unsound here.
  • Process now implements Communicator. This is probably my least favorite change of the API, because before, there was a strict separation between Communicator and Process. But Process implements AsCommunicator, which previously exposed the Communicator instance within the process. This is no longer possible, so instead AsCommunicator is now a no-op, which just returns the Process. Process emulates the underlying Communicator using its CommunicatorHandle.

Progress:

  • Add blanket implementation impl<C: Communicator + ?Sized> CommunicatorCollectives for C {} to allow collective communicator functions for Communicator trait-objects
  • Change functions of Communicator trait that take generic parameters of the Group trait to take dyn Group objects instead
  • Remove generic parameter from Process struct, and then change functions in Communicator trait that return Process instances. This will remove the Self: Sized requirement
  • Change functions of Communicator trait that take generic parameters of Attribute, so the Self: Sized requirement is lifted, or accept that those functions cannot be called on trait-objects
  • Change functions of Communicator trait that take generic parameters of Datatype, so the Self: Sized requirement is lifted, or accept that those functions cannot be called on trait-objects
  • Change functions of Communicator trait that take generic parameters of Buffer, so the Self: Sized requirement is lifted, or accept that those functions cannot be called on trait-objects
  • Decide if Group should also work with dyn Group objects (though if so, I'd extract that into a separate pull request)

Questions

  • I do not know what Attributes in MPI are, so I have a hard time deciding if get_attr should be made object-safe and if so, how.
  • I haven't looked into Buffer yet, but I feel like changing it, so it doesn't need to be a generic parameter to the Communicator functions that use it, is a huge change
  • I also haven't looked into Datatype yet.
  • All those changes (including the changes to Process) are only required, if the respective functions in the Communicator trait (e.g. this_process(), get_attr(..), etc) must be available for trait-objects. If it is okay for trait-objects to not offer some or all of those generic functions, no changes are required (and the changes to Process may be reverted). So the main question is, should I continue with those changes, and if so, which one of them?

Obviously, all of this are breaking changes.

@jedbrown
Copy link
Contributor

jedbrown commented Nov 3, 2023

This looks good so far.

I agree with sealing, though might have just moved the sealed stuff to a different file instead of adding nesting. For #32, we need to be able to be able to instantiate a communicator from a raw MPI_Comm (e.g., passed from FFI in a third-party library), but that doesn't require exposing the handle.

As a design question, should Process have a type-level distinction of its intra- vs inter-communicator status? (I'm honestly not sure that Process is a desirable concept. It doesn't exist in the C MPI interface and I don't know if it's providing any real safety. Like we use Communicator::process_at_rank, which just asserts that the rank is in range and (gasp as I look at it, Process::by_rank_unchecked isn't even unsafe). Meanwhile, there is no type system enforcement that send_receive_with_tags has matching source and destination communicators (just a run-time assertion). Getting rid of Process (or substantially changing what can be type checked) would have a lot of potential consequences so I would leave that for a separate PR where we can weigh the costs/benefits.

Attributes allow callers to attach data to an MPI_Comm. Suppose there is a function f(comm, g, data) implemented in libf with g(comm, data) implemented in libg. libf does not depend on libg or vice-versa, so comm must have type MPI_Comm (not a lib-specific newtype), but g might need to access persistent context (like a tag dispenser). Attributes offer that, which is a big deal for library interoperability. I see no reason why it shouldn't be object-safe, but it also need not be a priority.

You've perhaps come across DynBuffer and DynBufferMut, which are similar to &dyn Buffer. My inclination is to postpone object safety for Buffer and Datatype to another PR if indeed we have use cases for it.

@Cydhra
Copy link
Contributor Author

Cydhra commented Nov 4, 2023

I agree with sealing, though might have just moved the sealed stuff to a different file instead of adding nesting.

Done.


For #32, we need to be able to be able to instantiate a communicator from a raw MPI_Comm (e.g., passed from FFI in a third-party library), but that doesn't require exposing the handle.

This was already solved by #155, because the traits AsRaw and FromRaw are implemented for Communicators (the latter is not implemented for Process though).

Currently, the implementation asserts that the passed raw handle isn't a system handle (like ffi::RSMPI_COMM_WORLD), and simply creates a CommunicatorHandle::User handle from it. This means, the raw handle is freed when dropped.


As a design question, should Process have a type-level distinction of its intra- vs inter-communicator status?

I agree that this design is confusing. This confusion is amplified by this Pull Request, because Process now implements Communicator by delegation, and therefore behaves exactly like its original counterpart (but cannot be created from a raw handle). There is even a trait AsCommunicator which converts a Process back into a Communicator. Previously, this side-stepped the entire split in the type system, because MPI isn't designed with this distinction in mind. Now this sidestep is unnecessary, because Process is a Communicator, but the trait still exists for compatibility.

@Cydhra
Copy link
Contributor Author

Cydhra commented Nov 4, 2023

I am not sure how to make get_attr object-safe though:

fn get_attr<A: Attribute>(&self, key: A) -> Option<&<A as Attribute>::Target>

key's type could be changed to &dyn Attribute, if a new method is added to the Attibute trait:

pub trait Attribute: AsRaw<Raw = c_int> {
    type Target;

    /// Allocate memory for an attribute value.
    unsafe fn allocate() -> *mut Self::Target;
}

This way, the allocation inside get_attr can be done dynamically.

However, get_attr cannot define a return type anymore.

@jedbrown
Copy link
Contributor

jedbrown commented Nov 7, 2023

Hmm, I hadn't thought about it carefully. Could we make a method in impl dyn Communicator similar to Any::downcast_mut?

@Cydhra
Copy link
Contributor Author

Cydhra commented Nov 7, 2023

The return type still isn't statically computable anymore, as far as I understand. The function get_attr cannot be generic if it is object-safe. So what would we even cast into, if we cannot name the type of key?

@jedbrown
Copy link
Contributor

jedbrown commented Nov 7, 2023

If we want to use a typed Attribute with dyn Communicator, then impl dyn Communicator as with Any will work (look at how downcast_ref and downcast_mut are defined; it's a subtle point).

If in addition, we want to work with a dyn Attribute, then I think Target = dyn Any and the user needs to handle the downcast.

Feel free to leave this for a separate PR -- I think I can do it quickly and add a test.

@Cydhra Cydhra marked this pull request as ready for review November 7, 2023 15:21
@Cydhra
Copy link
Contributor Author

Cydhra commented Nov 7, 2023

In that case I'd like to defer this to a separate PR. Same with Groups, if you want them to work as trait-objects as well. Marked as ready for review

Copy link
Contributor

@jedbrown jedbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. Are you okay with squash-merging or do you want to make the history atomic (at least squash the sealing with moving it to a separate file, and the legacy extern syntax into that example)?

examples/dyn_communicator.rs Outdated Show resolved Hide resolved
@Cydhra
Copy link
Contributor Author

Cydhra commented Nov 10, 2023

Are you okay with squash-merging or do you want to make the history atomic

I don't really mind. Do it as you want it in the history.

@jedbrown jedbrown merged commit d03d7f5 into rsmpi:main Nov 10, 2023
7 checks passed
@jedbrown
Copy link
Contributor

Thanks for your contribution!

Comment on lines +952 to +959
impl<'a> AsCommunicator for Process<'a> {
type Out = Self;

fn as_communicator(&self) -> &Self::Out {
self
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turns out to be a bug, in that process.as_communicator().rank() now returns the rank of process, not the rank of the current process on the communicator. One can call Communicator::rank(&process), but I think process.as_communicator() should behave like a communicator, not silently use the wrong rank() method.

Do you have thoughts about how to fix it? We could put an actual communicator back into Process? It could be comm: &'a dyn Communicator to avoid making it generic over Communicator or any other trivial wrapper around a CommunicatorHandle. AnyProcess<'a> could be abused for that purpose, but it seems decidedly unclean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty convinced that the PR didn't change any behavior regarding to this.
Previously Process looked like this:

pub struct Process<'a, C>
where
    C: 'a + Communicator,
{
    comm: &'a C,
    rank: Rank,
}

and the AsCommunicator trait was implemented to return comm. Now it returns self, because comm is no longer a Communicator but a CommunicatorHandle. Instead, Process implements Communicator itself, which in turn delegates all calls to the internal CommunicatorHandle. So if this bug exists, I am pretty sure it existed before this PR (I haven't tested this though, so maybe I am missing something).

Not sure what even causes this, because I am not sure how MPI differentiates between a Process handle and a Communicator handle. Is the differentiation only done by choosing different rank functions?

If so, maybe we could change the Communicator trait impl of Process by explicitly changing its rank implementation:

impl<'a> Communicator for Process<'a> {
    // ...

    fn rank(&self) -> Rank {
        Process::rank(&self)
    }
}

Copy link
Contributor

@jedbrown jedbrown Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply and sorry I'm slow following up. The issue is that impl Process has a method called rank() referring to the rank of the designated process, while the rank() method on the Communicator trait returns the rank of self.

In the old code, process.as_communicator() was just a Communicator (no longer a Process) so there was only one rank() method. After your change here, it returned a Process (which implements Communicator, but prefers the method on Process instead of the trait method).

I resolved the issue in #177, though I'm not entirely happy with the solution. Part of me wants to remove/rename the Process::rank() method. There are also experimental systems in which a rank is not a process, so I'm not wild about the Process name. But these are design questions for another time/place.

Copy link
Contributor Author

@Cydhra Cydhra Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see the problem now. Would it be confusing for users if the rank methods on both traits simply had different names (e.g. communicator_rank() and process_rank())? That way, it'd be clear which rank is called at any given point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, though comm.comm_rank() and process.process_rank() are somewhat tedious and I'm not sure it's better than the current use of AnyProcess, at least until we get around to making Rank stronger (e.g., a newtype instead of an alias to c_int). It's necessary to have a type that can be put in an array and sorted, though it would be nice for it to retain a branding associating it with the communicator on which it's valid.

I think the current Process is pure ergonomics and I have mixed feelings about its existence -- many/most real-world patterns will be calling comm.process_at_rank() and using it immediately in exactly one point-to-point operation, but then we need run-time assertions like you'll see in send_receive_with_tags to check that source and destination communicators match. In a sense, it is more error-prone (weaker compile-time guarantees) to use source_process, dest_process than comm, source_rank, dest_rank.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using dynamic reference to communicator
2 participants