Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4/rma: use winattr flag to replace ptr dereference based checks at RMA fast path #4778

Merged
merged 12 commits into from
Sep 16, 2020

Conversation

minsii
Copy link
Contributor

@minsii minsii commented Sep 9, 2020

Pull Request Description

Expected Impact

Author Checklist

  • Reference appropriate issues (with "Fixes" or "See" as appropriate)
  • Remove xfail from the test suite when fixing a test
  • Commits are self-contained and do not do two things at once
  • Commit message is of the form: module: short description and follows good practice
  • Passes whitespace checkers
  • Passes warning tests
  • Passes all tests
  • Add comments such that someone without knowledge of the code could understand
  • You or your company has a signed contributor's agreement on file with Argonne
  • For non-Argonne authors, request an explicit comment from your companies PR approval manager

@minsii
Copy link
Contributor Author

minsii commented Sep 9, 2020

test:mpich/ch4/most

{
MPIDI_av_entry_t *av = NULL;

if (HANDLE_CHECK_MPI_RESERVE_BIT(win->handle)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a handle bit and not a field in the win struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was to save pointer dereferences into the window object. In my prototype code for OSHMPI analysis, I used directly the input handle at MPI_Put. But then we had to break the MPID API in order to pass down the winattr parameter.

This PR used win->handle at MPID_put in order to avoid breaking MPID API, which does cost a pointer deference. I guess it is the same as using a field of win now.

Do you have any better idea (i.e., how to pass down handle encoded info without breaking MPID API)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was to save pointer dereferences into the window object. In my prototype code for OSHMPI analysis, I used directly the input handle at MPI_Put. But then we had to break the MPID API in order to pass down the winattr parameter.

This PR used win->handle at MPID_put in order to avoid breaking MPID API, which does cost a pointer deference. I guess it is the same as using a field of win now.

Do you have any better idea (i.e., how to pass down handle encoded info without breaking MPID API)?

Ah, yes. For datatypes, we pass the handle all the way down and do not dereference it in the common case. Unfortunately I don't see a way to do this with window handles without a change in the ADI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to change ADI? e.g., after the upcoming big release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to change ADI? e.g., after the upcoming big release?

I think it would be worth considering. IMO, it's not too difficult to adapt to using handle vs. ptr in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At MPI_Put, we already get the win ptr for error checking. Getting the ptr costs 9 instructions. The best would be pass down both ptr and handle, if it is not too messy.

Error checking can be configured off, right? I don't think we should make an optimization decision based on an optional part. I mean, without error checking, does the argument -- duplicated cost of getting the ptr -- still hold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzhou raised a good point. But error checking is enabled by default. I think we still want to avoid getting the ptr twice in such a case. It just looks bad from engineer's view.

If we move the error checking down to MPID_ layer, we don't need get ptr any more at MPI_ layer. But I guess that will force duplicated code in both ch3 and ch4, which is not good too.

I am afraid that the only way to have handle-based optimization is to add a parameter (e.g., MPI_Win win_handle, MPIR_Win *win) in the MPID API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From "manager" point of view -- 😄 -- if the application can afford turning on error checking, then it should afford the extra 9 instructions. A scenario could be for the application to run with error checking on to ensure correctness, then run with error checking off for the bulk task. The extra parameter in the ADI for specific optimization is obscure, very difficult to comprehend for new developers, so I am leaning toward not to complicate the interface unless the gain is justifiable, such as real measurable benefit for real applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss further in a separate PR. This PR will just add a winattr field in the win object.

@minsii minsii force-pushed the pr/winbit branch 2 times, most recently from 7f7d092 to 13879a2 Compare September 11, 2020 19:11
@minsii
Copy link
Contributor Author

minsii commented Sep 11, 2020

Retest after removing win handle patch

test:mpich/ch4/most

@minsii
Copy link
Contributor Author

minsii commented Sep 12, 2020

Retest after fixing bugs

test:mpich/ch4/most

Copy link
Contributor

@raffenet raffenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor nitpick, but I am still approving.

Comment on lines 189 to 190
if (mpi_errno)
MPIR_ERR_POP(mpi_errno);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: can use MPIR_ERR_CHECK(mpi_errno) macro here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push.

@minsii
Copy link
Contributor Author

minsii commented Sep 16, 2020

Retest after minor fix and rebasing

test:mpich/ch4/most

@minsii minsii changed the title ch4/rma: use a bit of win handle to optimize window attribute access with direct_intra comm ch4/rma: use winattr flag to replace ptr dereference based checks at RMA fast path Sep 16, 2020
winattr is a bit-vector of multiple attributes set for optimization at
RMA fast-path.

This patch introduces MPIDI_WINATTR_DIRECT_INTRA_COMM which is set at
win init if the comm equals to comm_world or a dup of comm_world.
In such a case, we can save instructions and pointer deferences at RMA
fast-path by directly accessing global variables of comm_world (e.g.,
rank, av) rather than accessing to the dynamically allocated objects
win->comm_ptr.

Note that the intention is to load winattr only once and pass down to
shmmod and netmod in order to avoid dereference to win.
By checking the winattr flag, we can skip expensive pointer dereferences
into dynamic object (win->comm_ptr) when checking whether it is a self
message.
By checking the winattr flag, we can skip expensive pointer dereferences
into dynamic object (win->comm_ptr) when checking whether it is a self
message.
By checking the winattr flag, we can skip expensive pointer dereferences
into dynamic object (win->comm_ptr).
By using winattr instead of the original shm_allocated field, it saves
pointer dereferences at shm fast path.
By using winattr instead of the original info field, it saves
pointer dereferences at shm fast path.
By using winattr instead of the original info hint field, it saves
pointer dereferences at shm fast path.
@minsii minsii merged commit 7a2aac3 into pmodels:main Sep 16, 2020
@minsii minsii deleted the pr/winbit branch September 16, 2020 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants