-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch4/rma: use winattr flag to replace ptr dereference based checks at RMA fast path #4778
Conversation
test:mpich/ch4/most |
src/mpid/ch4/src/ch4_impl.h
Outdated
{ | ||
MPIDI_av_entry_t *av = NULL; | ||
|
||
if (HANDLE_CHECK_MPI_RESERVE_BIT(win->handle)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a handle bit and not a field in the win struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to save pointer dereferences into the window object. In my prototype code for OSHMPI analysis, I used directly the input handle at MPI_Put. But then we had to break the MPID API in order to pass down the winattr
parameter.
This PR used win->handle
at MPID_put in order to avoid breaking MPID API, which does cost a pointer deference. I guess it is the same as using a field of win now.
Do you have any better idea (i.e., how to pass down handle encoded info without breaking MPID API)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to save pointer dereferences into the window object. In my prototype code for OSHMPI analysis, I used directly the input handle at MPI_Put. But then we had to break the MPID API in order to pass down the
winattr
parameter.This PR used
win->handle
at MPID_put in order to avoid breaking MPID API, which does cost a pointer deference. I guess it is the same as using a field of win now.Do you have any better idea (i.e., how to pass down handle encoded info without breaking MPID API)?
Ah, yes. For datatypes, we pass the handle all the way down and do not dereference it in the common case. Unfortunately I don't see a way to do this with window handles without a change in the ADI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to change ADI? e.g., after the upcoming big release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to change ADI? e.g., after the upcoming big release?
I think it would be worth considering. IMO, it's not too difficult to adapt to using handle vs. ptr in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At MPI_Put, we already get the win ptr for error checking. Getting the ptr costs 9 instructions. The best would be pass down both ptr and handle, if it is not too messy.
Error checking can be configured off, right? I don't think we should make an optimization decision based on an optional part. I mean, without error checking, does the argument -- duplicated cost of getting the ptr -- still hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hzhou raised a good point. But error checking is enabled by default. I think we still want to avoid getting the ptr twice in such a case. It just looks bad from engineer's view.
If we move the error checking down to MPID_
layer, we don't need get ptr any more at MPI_
layer. But I guess that will force duplicated code in both ch3 and ch4, which is not good too.
I am afraid that the only way to have handle-based optimization is to add a parameter (e.g., MPI_Win win_handle, MPIR_Win *win
) in the MPID
API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From "manager" point of view -- 😄 -- if the application can afford turning on error checking, then it should afford the extra 9 instructions. A scenario could be for the application to run with error checking on to ensure correctness, then run with error checking off for the bulk task. The extra parameter in the ADI for specific optimization is obscure, very difficult to comprehend for new developers, so I am leaning toward not to complicate the interface unless the gain is justifiable, such as real measurable benefit for real applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss further in a separate PR. This PR will just add a winattr
field in the win object.
7f7d092
to
13879a2
Compare
Retest after removing win handle patch test:mpich/ch4/most |
Retest after fixing bugs test:mpich/ch4/most |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One minor nitpick, but I am still approving.
src/mpi/comm/comm_compare.c
Outdated
if (mpi_errno) | ||
MPIR_ERR_POP(mpi_errno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: can use MPIR_ERR_CHECK(mpi_errno)
macro here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the latest push.
Retest after minor fix and rebasing test:mpich/ch4/most |
winattr is a bit-vector of multiple attributes set for optimization at RMA fast-path. This patch introduces MPIDI_WINATTR_DIRECT_INTRA_COMM which is set at win init if the comm equals to comm_world or a dup of comm_world. In such a case, we can save instructions and pointer deferences at RMA fast-path by directly accessing global variables of comm_world (e.g., rank, av) rather than accessing to the dynamically allocated objects win->comm_ptr. Note that the intention is to load winattr only once and pass down to shmmod and netmod in order to avoid dereference to win.
By checking the winattr flag, we can skip expensive pointer dereferences into dynamic object (win->comm_ptr) when checking whether it is a self message.
By checking the winattr flag, we can skip expensive pointer dereferences into dynamic object (win->comm_ptr) when checking whether it is a self message.
By checking the winattr flag, we can skip expensive pointer dereferences into dynamic object (win->comm_ptr).
By using winattr instead of the original shm_allocated field, it saves pointer dereferences at shm fast path.
By using winattr instead of the original info field, it saves pointer dereferences at shm fast path.
By using winattr instead of the original info hint field, it saves pointer dereferences at shm fast path.
Pull Request Description
Expected Impact
Author Checklist
module: short description
and follows good practice