Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4: Message Queue Dumping Support for CH4 active message #3713

Merged
merged 3 commits into from Jun 27, 2019

Conversation

shawnccx
Copy link
Contributor

  • Added CH4 active message related offset definition into mpi/debugger.
    • As issue Supporting Message Queue Dumping Interfaces for CH4 #2721 mentioned, device selection info (ch3 or ch4) is not exported to mpi/debugger at this moment, so we have to use MACRO to hard code selection for CH4. (USING_CH4_AM is used here).
    • We can only support message queue for active message in CH4 for now, since netmod hardware queue is not exposed now.
  • Enabled debugger support for CH4 active message.

@shawnccx
Copy link
Contributor Author

@raffenet
FYI, this PR supports message queue dumping for CH4 active message, but it hard-codes device selection. Exporting device selection info to mpi/debugger is reported int issue #2721 .

Once that issue is resolved, it will be a straightforward change for this PR to get rid of hard-coding.

@hajimefu hajimefu added this to To do in Intel Work via automation Mar 29, 2019
@wesbland wesbland changed the title Debugger: Message Queue Dumping Support for CH4 active message ch4: Message Queue Dumping Support for CH4 active message Apr 4, 2019
@yfguo yfguo force-pushed the ch4_mqd_am_march_19 branch 2 times, most recently from 2e515e1 to 26e0870 Compare June 24, 2019 18:02
@yfguo
Copy link
Contributor

yfguo commented Jun 24, 2019

test:jenkins/ch3/tcp
test:jenkins/ch4/ofi

@yfguo yfguo requested a review from raffenet June 25, 2019 21:16
@yfguo
Copy link
Contributor

yfguo commented Jun 25, 2019

@raffenet Since I have touched the commits as well. Can you review this PR?

@@ -2774,6 +2774,10 @@ on all link lines (consider adding it to LDFLAGS)])
## most Unix versions, libtvmpich.dylib for Mac OSX).
##ENVVAR END:

if test "$device_name" = "ch4" ; then
AC_DEFINE(HAVE_CH4_DEBUGGER_SUPPORT,1,[Define if debugger support is included for CH4])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a comment here explaining why this is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another hack solution we used in the hcoll integration code was to check if MPIDCH4_H_INCLUDED is defined. Maybe that could work and we could drop this? It's still hacky, but I don't have a solution for that right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raffenet Currently, MPIDCH4_H_INCLUDED is not working under /home/caochong/git/mpich-ofi/src/mpi/debugger. It is only useful under mpid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That's true of dll_mpich.c since it only includes mpi.h for some reason. The files that include mpiimpl.h should be able to use MPIDCH4_H, but as I mentioned, it is super hacky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments.

@@ -88,6 +88,10 @@ typedef struct communicator_t {
* matchine */
int present;
mqs_communicator comm_info; /* Info needed at the higher level */
#ifdef HAVE_CH4_DEBUGGER_SUPPORT
mqs_taddr_t posted_base; /* CH4 has the option for per-communicator queue */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do debuggers support per-comm queues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is set is ch4 level:

#ifdef HAVE_DEBUGGER_SUPPORT
#ifndef MPIDI_CH4U_USE_PER_COMM_QUEUE
    MPIDIG_COMM(comm, posted_head_ptr) = &(MPIDI_global.posted_list);
    MPIDIG_COMM(comm, unexp_head_ptr) = &(MPIDI_global.unexp_list);
#else
    MPIDIG_COMM(comm, posted_head_ptr) = &(MPIDIG_COMM(comm, posted_list));
    MPIDIG_COMM(comm, unexp_head_ptr) = &(MPIDIG_COMM(comm, unexp_list));
#endif
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ch4_comm.c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not my question. Does a debugger support viewing per-comm queues? Totalview, DDT, etc. Does exposing this in the comm struct result in something that the user can view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with Totalview in 2017, it did see per-comm queue. Our license expired for now, so I cannot test with it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that sounds good. I think someone had reached out to me after your PR was submitted asking if that functionality existed. I'll take your word for it :).

@raffenet
Copy link
Contributor

@yfguo and I discussed that the MQD code could use a rethinking as ADI-level functions, but it's low priority I think. This is functional so I think it's OK to take as-is.

yfguo and others added 3 commits June 27, 2019 10:09
Message queue dumping (MQD) support is added for CH4 active message.
Since device option (CH3 or CH4) is not exported to debugger layer at
this moment, we hard-coded the selection for CH4 using Macro USING_CH4_AM.

CH4 active message related offsets are implemented for MQD interfaces
to look for information in MPI communicator and MPI request.
Posted and unexp message queue headers are defined for message queue
dumping interface. Also information for pending send is recorded.
@yfguo yfguo merged commit 3f8a4f9 into pmodels:master Jun 27, 2019
Intel Work automation moved this from To do to Done Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Intel Work
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants