Allow using dynamic reference to communicator #157

Cydhra · 2023-04-04T22:52:29Z

This should fix #148.
It is independent of #155, it should work (in its current unfinished state) without change if 155 is merged.

process_at_rank is currently not callable on dyn Communicator for reasons outlined in #148. I have an experimental commit in here that fixes this (building upon #155)

The same is true for split_by_subgroup_collective, and probably a few other that require Sized, for the same reason. I haven't gotten to that yet.

jedbrown · 2023-05-29T02:29:48Z

Thanks for this, and thanks for your patience (I'm finally into summer mode and have more capacity to do Rust things).

Does this need to be Draft or is it ready from your perspective? Note the existence of tests/trait_object_safety_test.rs (from #113) is supposed to test object safety, but obviously was inadequate because of CommunicatorCollectives.

Cydhra · 2023-05-31T10:11:06Z

Does this need to be Draft or is it ready from your perspective?

I made this a draft because, while it does work, there are things left that do not (as outlined in the initial post).
It obviously compiles and can be used as is, there are just some functions that cannot be called on trait objects.

If #155 is merged, I have another commit that changes the Process API to implement Communicator as well, this way it is no longer generic and functions like process_at_rank can be used with trait objects as well. (I am unsure, however, if the change to Process is desired).

If you are happy to merge this in its incomplete state to at least partially support collective communication on trait objects, then it doesn't need to be a draft.

Note the existence of tests/trait_object_safety_test.rs (from #113) is supposed to test object safety, but obviously was inadequate because of CommunicatorCollectives.

This is green and untouched. Should I add a call to a collective function to it to make it more adequate?

jedbrown · 2023-10-20T19:18:34Z

Sorry about the delays on my end. Would you like to pick this up now that #155 has merged?

Cydhra · 2023-10-20T20:30:16Z

I will gladly have another look at this in the following weeks

…to use trait-objects to make them object-safe

…jects

- changed constructors for Process type - changed AnyProcess analogous - implemented Communicator for Process, such that AsCommunicator can be implemented without violating borrow checker - changed functions in Communicator trait that return processes to be object-safe

Cydhra · 2023-10-29T14:23:37Z

Changes so far:

I have provided the blanket implementation of CommunicatorCollectives for unsized Communicators
I have sealed the CommunicatorHandle enum from Remove type-distinction between system-communicators and user-communicators #155 in a crate-public module, because I needed it to be sealed for the following change, and, also, I don't think downstream crates should access that type anyway?
I have added a sealed trait AsHandle that must be implemented for all Communicator implementations, which returns the CommunicatorHandle instance that wraps the raw comm handle. I require this trait for the following change, and I enforce its implementation by extending the Communicator trait from it. Since it is sealed, it cannot be seen from outside the crate, hiding CommunicatorHandle from the public API (this has the side effect that downstream crates cannot implement Communicator. If this is undesired, the following changes must be reverted).
I changed Process to be non-generic (previously it had a Communicator parameter), and instead contain a CommunicatorHandle for the communicator from which it was created (instead of a reference to the Communicator itself). This is why I need the AsHandle trait. Since the handle is part of the Communicator instance, the lifetime of the Process is still bound to its Communicator, so nothing is unsound here.
Process now implements Communicator. This is probably my least favorite change of the API, because before, there was a strict separation between Communicator and Process. But Process implements AsCommunicator, which previously exposed the Communicator instance within the process. This is no longer possible, so instead AsCommunicator is now a no-op, which just returns the Process. Process emulates the underlying Communicator using its CommunicatorHandle.

Progress:

Add blanket implementation impl<C: Communicator + ?Sized> CommunicatorCollectives for C {} to allow collective communicator functions for Communicator trait-objects
Change functions of Communicator trait that take generic parameters of the Group trait to take dyn Group objects instead
Remove generic parameter from Process struct, and then change functions in Communicator trait that return Process instances. This will remove the Self: Sized requirement
Change functions of Communicator trait that take generic parameters of Attribute, so the Self: Sized requirement is lifted, or accept that those functions cannot be called on trait-objects
Change functions of Communicator trait that take generic parameters of Datatype, so the Self: Sized requirement is lifted, or accept that those functions cannot be called on trait-objects
Change functions of Communicator trait that take generic parameters of Buffer, so the Self: Sized requirement is lifted, or accept that those functions cannot be called on trait-objects
Decide if Group should also work with dyn Group objects (though if so, I'd extract that into a separate pull request)

Questions

I do not know what Attributes in MPI are, so I have a hard time deciding if get_attr should be made object-safe and if so, how.
I haven't looked into Buffer yet, but I feel like changing it, so it doesn't need to be a generic parameter to the Communicator functions that use it, is a huge change
I also haven't looked into Datatype yet.
All those changes (including the changes to Process) are only required, if the respective functions in the Communicator trait (e.g. this_process(), get_attr(..), etc) must be available for trait-objects. If it is okay for trait-objects to not offer some or all of those generic functions, no changes are required (and the changes to Process may be reverted). So the main question is, should I continue with those changes, and if so, which one of them?

Obviously, all of this are breaking changes.

jedbrown · 2023-11-03T03:53:02Z

This looks good so far.

I agree with sealing, though might have just moved the sealed stuff to a different file instead of adding nesting. For #32, we need to be able to be able to instantiate a communicator from a raw MPI_Comm (e.g., passed from FFI in a third-party library), but that doesn't require exposing the handle.

As a design question, should Process have a type-level distinction of its intra- vs inter-communicator status? (I'm honestly not sure that Process is a desirable concept. It doesn't exist in the C MPI interface and I don't know if it's providing any real safety. Like we use Communicator::process_at_rank, which just asserts that the rank is in range and (gasp as I look at it, Process::by_rank_unchecked isn't even unsafe). Meanwhile, there is no type system enforcement that send_receive_with_tags has matching source and destination communicators (just a run-time assertion). Getting rid of Process (or substantially changing what can be type checked) would have a lot of potential consequences so I would leave that for a separate PR where we can weigh the costs/benefits.

Attributes allow callers to attach data to an MPI_Comm. Suppose there is a function f(comm, g, data) implemented in libf with g(comm, data) implemented in libg. libf does not depend on libg or vice-versa, so comm must have type MPI_Comm (not a lib-specific newtype), but g might need to access persistent context (like a tag dispenser). Attributes offer that, which is a big deal for library interoperability. I see no reason why it shouldn't be object-safe, but it also need not be a priority.

You've perhaps come across DynBuffer and DynBufferMut, which are similar to &dyn Buffer. My inclination is to postpone object safety for Buffer and Datatype to another PR if indeed we have use cases for it.

Cydhra · 2023-11-04T11:46:13Z

I agree with sealing, though might have just moved the sealed stuff to a different file instead of adding nesting.

Done.

For #32, we need to be able to be able to instantiate a communicator from a raw MPI_Comm (e.g., passed from FFI in a third-party library), but that doesn't require exposing the handle.

This was already solved by #155, because the traits AsRaw and FromRaw are implemented for Communicators (the latter is not implemented for Process though).

Currently, the implementation asserts that the passed raw handle isn't a system handle (like ffi::RSMPI_COMM_WORLD), and simply creates a CommunicatorHandle::User handle from it. This means, the raw handle is freed when dropped.

As a design question, should Process have a type-level distinction of its intra- vs inter-communicator status?

I agree that this design is confusing. This confusion is amplified by this Pull Request, because Process now implements Communicator by delegation, and therefore behaves exactly like its original counterpart (but cannot be created from a raw handle). There is even a trait AsCommunicator which converts a Process back into a Communicator. Previously, this side-stepped the entire split in the type system, because MPI isn't designed with this distinction in mind. Now this sidestep is unnecessary, because Process is a Communicator, but the trait still exists for compatibility.

Cydhra · 2023-11-04T13:07:45Z

I am not sure how to make get_attr object-safe though:

fn get_attr<A: Attribute>(&self, key: A) -> Option<&<A as Attribute>::Target>

key's type could be changed to &dyn Attribute, if a new method is added to the Attibute trait:

pub trait Attribute: AsRaw<Raw = c_int> {
    type Target;

    /// Allocate memory for an attribute value.
    unsafe fn allocate() -> *mut Self::Target;
}

This way, the allocation inside get_attr can be done dynamically.

However, get_attr cannot define a return type anymore.

jedbrown · 2023-11-07T03:45:12Z

Hmm, I hadn't thought about it carefully. Could we make a method in impl dyn Communicator similar to Any::downcast_mut?

Cydhra · 2023-11-07T14:05:35Z

The return type still isn't statically computable anymore, as far as I understand. The function get_attr cannot be generic if it is object-safe. So what would we even cast into, if we cannot name the type of key?

jedbrown · 2023-11-07T14:34:24Z

If we want to use a typed Attribute with dyn Communicator, then impl dyn Communicator as with Any will work (look at how downcast_ref and downcast_mut are defined; it's a subtle point).

If in addition, we want to work with a dyn Attribute, then I think Target = dyn Any and the user needs to handle the downcast.

Feel free to leave this for a separate PR -- I think I can do it quickly and add a test.

Cydhra · 2023-11-07T15:22:35Z

In that case I'd like to defer this to a separate PR. Same with Groups, if you want them to work as trait-objects as well. Marked as ready for review

jedbrown

Looks good, thanks. Are you okay with squash-merging or do you want to make the history atomic (at least squash the sealing with moving it to a separate file, and the legacy extern syntax into that example)?

examples/dyn_communicator.rs

Cydhra · 2023-11-10T14:31:30Z

Are you okay with squash-merging or do you want to make the history atomic

I don't really mind. Do it as you want it in the history.

jedbrown · 2023-11-10T17:57:37Z

Thanks for your contribution!

jedbrown · 2024-01-23T03:14:13Z

src/topology/mod.rs

+impl<'a> AsCommunicator for Process<'a> {
+    type Out = Self;
+
+    fn as_communicator(&self) -> &Self::Out {
+        self
+    }
+}
+


This turns out to be a bug, in that process.as_communicator().rank() now returns the rank of process, not the rank of the current process on the communicator. One can call Communicator::rank(&process), but I think process.as_communicator() should behave like a communicator, not silently use the wrong rank() method.

Do you have thoughts about how to fix it? We could put an actual communicator back into Process? It could be comm: &'a dyn Communicator to avoid making it generic over Communicator or any other trivial wrapper around a CommunicatorHandle. AnyProcess<'a> could be abused for that purpose, but it seems decidedly unclean.

I am pretty convinced that the PR didn't change any behavior regarding to this.
Previously Process looked like this:

pub struct Process<'a, C> where C: 'a + Communicator, { comm: &'a C, rank: Rank, }

and the AsCommunicator trait was implemented to return comm. Now it returns self, because comm is no longer a Communicator but a CommunicatorHandle. Instead, Process implements Communicator itself, which in turn delegates all calls to the internal CommunicatorHandle. So if this bug exists, I am pretty sure it existed before this PR (I haven't tested this though, so maybe I am missing something).

Not sure what even causes this, because I am not sure how MPI differentiates between a Process handle and a Communicator handle. Is the differentiation only done by choosing different rank functions?

If so, maybe we could change the Communicator trait impl of Process by explicitly changing its rank implementation:

impl<'a> Communicator for Process<'a> { // ... fn rank(&self) -> Rank { Process::rank(&self) } }

Thanks for your reply and sorry I'm slow following up. The issue is that impl Process has a method called rank() referring to the rank of the designated process, while the rank() method on the Communicator trait returns the rank of self.

In the old code, process.as_communicator() was just a Communicator (no longer a Process) so there was only one rank() method. After your change here, it returned a Process (which implements Communicator, but prefers the method on Process instead of the trait method).

I resolved the issue in #177, though I'm not entirely happy with the solution. Part of me wants to remove/rename the Process::rank() method. There are also experimental systems in which a rank is not a process, so I'm not wild about the Process name. But these are design questions for another time/place.

Oh, I see the problem now. Would it be confusing for users if the rank methods on both traits simply had different names (e.g. communicator_rank() and process_rank())? That way, it'd be clear which rank is called at any given point.

Yeah, though comm.comm_rank() and process.process_rank() are somewhat tedious and I'm not sure it's better than the current use of AnyProcess, at least until we get around to making Rank stronger (e.g., a newtype instead of an alias to c_int). It's necessary to have a type that can be put in an array and sorted, though it would be nice for it to retain a branding associating it with the communicator on which it's valid.

I think the current Process is pure ergonomics and I have mixed feelings about its existence -- many/most real-world patterns will be calling comm.process_at_rank() and using it immediately in exactly one point-to-point operation, but then we need run-time assertions like you'll see in send_receive_with_tags to check that source and destination communicators match. In a sense, it is more error-prone (weaker compile-time guarantees) to use source_process, dest_process than comm, source_rank, dest_rank.

Cydhra added 2 commits April 5, 2023 00:47

fix rsmpi#148

9aa7b48

add example for dynamic communicator

110b5c4

Cydhra marked this pull request as draft April 4, 2023 23:02

Cydhra mentioned this pull request Apr 4, 2023

Remove type-distinction between system-communicators and user-communicators #155

Merged

Cydhra force-pushed the allow_dyn_comm branch from 898b35f to 110b5c4 Compare October 29, 2023 11:43

Cydhra added 6 commits October 29, 2023 12:52

Merge branch 'rsmpi:main' into allow_dyn_comm

970c767

changed generic functions for group parameters in CommunicatorHandle …

6965ac8

…to use trait-objects to make them object-safe

changed compare function in communicator traits to work with trait-ob…

4dadcc4

…jects

sealed CommunicatorHandle enum to hide it from public API

466f413

added sealed trait AsHandle to obtain comm handles from communicators

5cda5d0

extracted sealed module into separate file

df61b91

Cydhra marked this pull request as ready for review November 7, 2023 15:21

jedbrown approved these changes Nov 7, 2023

View reviewed changes

examples/dyn_communicator.rs Outdated Show resolved Hide resolved

remove obsolete syntax

046d132

jedbrown merged commit d03d7f5 into rsmpi:main Nov 10, 2023
7 checks passed

jedbrown reviewed Jan 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow using dynamic reference to communicator #157

Allow using dynamic reference to communicator #157

Cydhra commented Apr 4, 2023 •

edited

jedbrown commented May 29, 2023

Cydhra commented May 31, 2023

jedbrown commented Oct 20, 2023

Cydhra commented Oct 20, 2023

Cydhra commented Oct 29, 2023 •

edited

jedbrown commented Nov 3, 2023

Cydhra commented Nov 4, 2023

Cydhra commented Nov 4, 2023 •

edited

jedbrown commented Nov 7, 2023

Cydhra commented Nov 7, 2023

jedbrown commented Nov 7, 2023

Cydhra commented Nov 7, 2023

jedbrown left a comment

Cydhra commented Nov 10, 2023 •

edited

jedbrown commented Nov 10, 2023

jedbrown Jan 23, 2024

Cydhra Feb 5, 2024

jedbrown Feb 17, 2024 •

edited

Cydhra Feb 17, 2024 •

edited

jedbrown Feb 17, 2024

Allow using dynamic reference to communicator #157

Allow using dynamic reference to communicator #157

Conversation

Cydhra commented Apr 4, 2023 • edited

jedbrown commented May 29, 2023

Cydhra commented May 31, 2023

jedbrown commented Oct 20, 2023

Cydhra commented Oct 20, 2023

Cydhra commented Oct 29, 2023 • edited

Changes so far:

Progress:

Questions

jedbrown commented Nov 3, 2023

Cydhra commented Nov 4, 2023

Cydhra commented Nov 4, 2023 • edited

jedbrown commented Nov 7, 2023

Cydhra commented Nov 7, 2023

jedbrown commented Nov 7, 2023

Cydhra commented Nov 7, 2023

jedbrown left a comment

Choose a reason for hiding this comment

Cydhra commented Nov 10, 2023 • edited

jedbrown commented Nov 10, 2023

jedbrown Jan 23, 2024

Choose a reason for hiding this comment

Cydhra Feb 5, 2024

Choose a reason for hiding this comment

jedbrown Feb 17, 2024 • edited

Choose a reason for hiding this comment

Cydhra Feb 17, 2024 • edited

Choose a reason for hiding this comment

jedbrown Feb 17, 2024

Choose a reason for hiding this comment

Cydhra commented Apr 4, 2023 •

edited

Cydhra commented Oct 29, 2023 •

edited

Cydhra commented Nov 4, 2023 •

edited

Cydhra commented Nov 10, 2023 •

edited

jedbrown Feb 17, 2024 •

edited

Cydhra Feb 17, 2024 •

edited