Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nemesis ext_procs optimization #29

Closed
mpichbot opened this issue Oct 14, 2016 · 7 comments
Closed

nemesis ext_procs optimization #29

mpichbot opened this issue Oct 14, 2016 · 7 comments
Assignees
Milestone

Comments

@mpichbot
Copy link

mpichbot commented Oct 14, 2016

Originally by goodell on 2008-08-01 08:42:34 -0500


In [de6e5ee] I committed a rough cut of dynamic processes for nemesis
newtcp. In mpid_nem_inline.h I commented out an optimization that
uses MPID_nem_mem_region.ext_procs because it prevents the proper
operation of dynamic processes. Unfortunately, removing it adds
~100ns to our zero-byte message latencies. So there is a FIXME in
the code that reads like this:

 /* FIXME the ext_procs bit is an optimization for the all-local-procs case.
    This has been commented out for now because it breaks dynamic processes.
    Some other solution should be implemented eventually, possibly using a
    flag that is set whenever a port is opened. [goodell@ 2008-06-18] */

In general, this won't affect real uses who run any inter-node jobs,
since they were already polling every time anyway. However, it does
hurt those wonderful microbenchmarks. A hack fix is to leave this in
but also check to see if a port has been opened. A possibly better
fix is to only poll the network every X iterations of "poll
everything", where X is some tunable parameter.

This req is a reminder for this FIXME.

-Dave

@mpichbot mpichbot self-assigned this Oct 14, 2016
@mpichbot
Copy link
Author

Originally by Dave Goodell on 2008-08-01 08:42:34 -0500


This message has 0 attachment(s)

@mpichbot mpichbot added this to the mpich2-1.1rc1 milestone Oct 14, 2016
@mpichbot
Copy link
Author

Originally by thakur on 2008-09-15 14:34:01 -0500


Darius will check if this is already fixed.

@mpichbot
Copy link
Author

Originally by buntinas on 2008-09-16 12:37:32 -0500


This is still an issue in 1.0 and 1.1, but since it's performance issue, I think we shouldn't hold up 1.0.8 for this.

When we add support for multiple netmods, we'll have a list of "active" netmods, and only call poll on those netmods. Doing that will resolve this issue, so we should just leave this until then.

-d

@mpichbot mpichbot modified the milestones: mpich2-1.1b1, mpich2-1.1rc1 Oct 14, 2016
@mpichbot
Copy link
Author

Originally by balaji on 2009-03-03 23:21:41 -0600


OSU reported nearly a 0.5us increase in latency here. The additional latency could be because of other reasons too, but this note is to make sure we do a compare against 1.0.8 for performance.

@mpichbot
Copy link
Author

Originally by buntinas on 2009-05-07 14:39:55 -0500


Fixed in [888cb39].

The original plan was to poll the network only when there is an external process (not on this node), or while a port is open (by MPI_Open_port). The problem is that through some communicator creation magic, a process may end up belonging to a communicator with spawned or connected processes, even though it has never called spawn or connect. So keeping track of whether there are external processes ends up being pretty hairy.

Instead, we took a different approach, and reduced the polling frequency for the tcp module. If a process hasn't had any network activity (nothing from the listener socket and no connect requests), then the poll period is very large (1<<22 for now). As soon as some activity is detected, we reduce the polling period to something smaller (currently 128). Note that in this method, because we don't know whether another process might try to connect to us, we still need to poll once in a while, even if we haven't initiated a network connection ourselves.

-d

@mpichbot
Copy link
Author

Originally by jayesh on 2009-05-07 14:44:44 -0500


Since the changes are in the tcp network module, I need to port the changes to wintcp.

Regards,
Jayesh

@mpichbot mpichbot reopened this Oct 14, 2016
@mpichbot
Copy link
Author

Originally by jayesh on 2009-05-11 14:38:19 -0500


Ported the changes to windows netmod in [56c991e]

-jayesh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant