Skip to content

Conversation

@jsquyres
Copy link
Member

@jsquyres jsquyres commented Oct 1, 2016

The (effective) "+42" computation was, in fact, the incorrect answer in this case (gasp!).

We should just take the max_msg_size from the command (which came from the libfabric endpoint max_msg_size attribute in the client) and subtract off the max header size: 68 (which is explained in the comment). This will result in a "large" message size which is likely slightly smaller than the MTU, but still right up near the MTU, and therefore good enough.

Note: the old computation (i.e., -(68-42)) worked fine when we asked for Libfabric API v1.1 because the usnic provider would return a max_msg_size that was already less than the MTU due to FI_PREFIX behavior shenanigans. Once we started asking for Libfabric API v1.4, the usnic Libfabric provider started returning (MTU + prefix_size), and the -(68-42) computation started giving a value that was over the MTU. This caused sendto() on the connectivity checker UDP socket to fail.

This commit also removes an old/misleading comment.

Signed-off-by: Jeff Squyres jsquyres@cisco.com

@bturrubiates please review

The (effective) "+42" computation was, in fact, the incorrect answer
in this case (gasp!).

We should just take the max_msg_size from the command (which came from
the libfabric endpoint max_msg_size attribute in the client) and
subtract off the max header size: 68 (which is explained in the
comment).  This will result in a "large" message size which is likely
slightly smaller than the MTU, but still right up near the MTU, and
therefore good enough.

Note: the old computation (i.e., -(68-42)) worked fine when we asked
for Libfabric API v1.1 because the usnic provider would return a
max_msg_size that was already less than the MTU due to FI_PREFIX
behavior shenanigans.  Once we started asking for Libfabric API v1.4,
the usnic Libfabric provider started returning (MTU + prefix_size),
and the -(68-42) computation started giving a value that was over the
MTU.  This caused sendto() on the connectivity checker UDP socket
to fail.

This commit also removes an old/misleading comment.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@jsquyres jsquyres added the bug label Oct 1, 2016
@jsquyres jsquyres merged commit 2975c7f into open-mpi:master Oct 1, 2016
@jsquyres jsquyres deleted the pr/usnic-fix-cagent-max-msg-size branch October 1, 2016 00:57
clementFoyer pushed a commit to bosilca/ompi that referenced this pull request Nov 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants