Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upcrash when sending very large messages #23
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
aravindksg
Jan 25, 2018
Collaborator
We have seen this problem appear with message sizes that are not DW multiple before, but the issue was fixed. (as of PSM2 version :PSM2_10.2-235)
Also- the line numbers you posted above do not match:
ips_proto.c:1646 where your execution is failing and current location of assert (in latest PSM2 master) is ips_proto.c: 1957. Could you clarify if you are actually using the latest PSM2 version from GitHub or a different PSM2 version (either from distro or from IFS)? If it is indeed an older version, could you please update to latest GitHub master and retry?
|
We have seen this problem appear with message sizes that are not DW multiple before, but the issue was fixed. (as of PSM2 version :PSM2_10.2-235) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mattijsjanssens
commented
Jan 26, 2018
|
Thanks for the answer. I will check. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Can this issue be closed or is this still a problem? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mattijsjanssens
May 3, 2018
mattijsjanssens
commented
May 3, 2018
|
From what I'm told it can probably be closed. Many thanks for the feedback.
Mattijs
…On 26 April 2018 at 17:25, Russell McGuire ***@***.***> wrote:
Can this issue be closed or is this still a problem?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#23 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AL-eP6QymF_FyJLLpO6gOuuHgwRZwwWAks5tsfT2gaJpZM4RshQt>
.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Thank you for confirming. |
mattijsjanssens commentedJan 25, 2018
We're occasionally seeing assert message of the form
ips_proto.c:1646: (scb->payload_size & 0x3) == 0
which seem to originate from somewhere in the network stack (e.g. https://github.com/intel/opa-psm2/blob/master/ptl_ips/ips_proto.c#L1957) when the size is not a multiple of 4.
Is this a known problem? We don't pad our mpi messages to be multiple of 4 bytes. Should we? If so why does it not show up on ordinary usage (i.e. smaller messages).