New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch4: Message Queue Dumping Support for CH4 active message #3713
Conversation
2e515e1
to
26e0870
Compare
test:jenkins/ch3/tcp |
@raffenet Since I have touched the commits as well. Can you review this PR? |
@@ -2774,6 +2774,10 @@ on all link lines (consider adding it to LDFLAGS)]) | |||
## most Unix versions, libtvmpich.dylib for Mac OSX). | |||
##ENVVAR END: | |||
|
|||
if test "$device_name" = "ch4" ; then | |||
AC_DEFINE(HAVE_CH4_DEBUGGER_SUPPORT,1,[Define if debugger support is included for CH4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a comment here explaining why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another hack solution we used in the hcoll integration code was to check if MPIDCH4_H_INCLUDED
is defined. Maybe that could work and we could drop this? It's still hacky, but I don't have a solution for that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raffenet Currently, MPIDCH4_H_INCLUDED
is not working under /home/caochong/git/mpich-ofi/src/mpi/debugger
. It is only useful under mpid
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That's true of dll_mpich.c
since it only includes mpi.h
for some reason. The files that include mpiimpl.h
should be able to use MPIDCH4_H
, but as I mentioned, it is super hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some comments.
@@ -88,6 +88,10 @@ typedef struct communicator_t { | |||
* matchine */ | |||
int present; | |||
mqs_communicator comm_info; /* Info needed at the higher level */ | |||
#ifdef HAVE_CH4_DEBUGGER_SUPPORT | |||
mqs_taddr_t posted_base; /* CH4 has the option for per-communicator queue */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do debuggers support per-comm queues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is set is ch4 level:
#ifdef HAVE_DEBUGGER_SUPPORT
#ifndef MPIDI_CH4U_USE_PER_COMM_QUEUE
MPIDIG_COMM(comm, posted_head_ptr) = &(MPIDI_global.posted_list);
MPIDIG_COMM(comm, unexp_head_ptr) = &(MPIDI_global.unexp_list);
#else
MPIDIG_COMM(comm, posted_head_ptr) = &(MPIDIG_COMM(comm, posted_list));
MPIDIG_COMM(comm, unexp_head_ptr) = &(MPIDIG_COMM(comm, unexp_list));
#endif
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ch4_comm.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not my question. Does a debugger support viewing per-comm queues? Totalview, DDT, etc. Does exposing this in the comm struct result in something that the user can view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with Totalview in 2017, it did see per-comm queue. Our license expired for now, so I cannot test with it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that sounds good. I think someone had reached out to me after your PR was submitted asking if that functionality existed. I'll take your word for it :).
@yfguo and I discussed that the MQD code could use a rethinking as ADI-level functions, but it's low priority I think. This is functional so I think it's OK to take as-is. |
Message queue dumping (MQD) support is added for CH4 active message. Since device option (CH3 or CH4) is not exported to debugger layer at this moment, we hard-coded the selection for CH4 using Macro USING_CH4_AM. CH4 active message related offsets are implemented for MQD interfaces to look for information in MPI communicator and MPI request.
Posted and unexp message queue headers are defined for message queue dumping interface. Also information for pending send is recorded.
26e0870
to
9a2493e
Compare
mpi/debugger
.mpi/debugger
at this moment, so we have to use MACRO to hard code selection for CH4. (USING_CH4_AM
is used here).