Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bunch of little fixes #1

Closed
wants to merge 8 commits into from
Closed

A bunch of little fixes #1

wants to merge 8 commits into from

Conversation

jsquyres
Copy link
Member

Sean --

Here's a bunch of little fixes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
@jsquyres
Copy link
Member Author

Closing this pull request so that I can submit a new one from a branch in my repo (vs. from my master).

@jsquyres jsquyres closed this Aug 21, 2014
shefty added a commit that referenced this pull request Nov 14, 2014
commit 42a1f96809d0dfb72e1abaad3923761eba4c6fe2
Merge: dc1317b fca6e10
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Aug 8 11:53:16 2014 -0700

    Merge branch 'dev'

commit fca6e10a83eb592135fd47bc73600c7a955ca2b5
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 7 15:43:00 2014 -0700

    Release 1.0.19-1 hotfix

commit dc1317b5668200bf0947dcac21a4d95959d333b3
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Aug 4 10:01:31 2014 -0700

    indexer: Include errno.h directly

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 064c9cb1bddbab9e6d54ba301facfae7e1992455
Author: Ilya Nelkenbaum <ilyan@mellanox.com>
Date:   Mon Jul 28 15:48:09 2014 +0300

    rsocket: Segmentation fault fix in case of multiple connections

    In case of more than 16 rsocket connections
    are established, "svc->rss" buffer is reallocated
    with more memory. Index 0 is reserved for the service's
    communication socket, and this is not taken in count
    when data is copied from old buffer location to
    new one.

    Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a7287adaea52d21cd2d50f1621f8eda37c4c3c90
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 22 23:24:53 2014 -0700

    udpong: Fix client_recv error check

    We only want to report an error if it's not EGAIN.  The if
    statement is reversed.  Correct it.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 806de778b1fe665dee2f62c7bf7211ab9bd2d53f
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jul 16 15:49:16 2014 -0700

    Release 1.0.19

commit 8f53f2a5d3cb5d6c30fe5695b48268ea1bbe2ff0
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jul 16 13:44:56 2014 -0700

    riostream: Only verify last data transfer

    Data verification will fail when running the bandwidth
    tests or the transfer count is > 1.  The issue is that
    subsequent writes by the initiator side will overwrite
    the data in the target buffer before the receiver can
    verify that it is correct.

    To fix this, only verify that the data in the buffer
    is correct after the last transfer has completed.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit c4f8e22a6d078fa914cd4102d65fa854587e1248
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jul 7 08:40:44 2014 -0700

    Revert "Revert "rsocket: Change keepalive to 0-byte RDMA write""

    This reverts commit a34703c53259845dd20450a87eb6747030e23e8b.

    0-byte RDMA writes appears to be working correctly with
    HCAs from 2 different vendors.  The original problem that
    was reported turned out to be a user error.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit fa85dc408e28afd67b81c3a590fd874ef6fdc63a
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Jul 3 13:45:52 2014 -0700

    rsocket: Update correct rsocket keepalive time

    When the keepalive time of an rsocket is updated, the
    updated information is forwarded to the keepalive service
    thread.  However, the thread updates the time for the
    wrong service as shown:

    tcp_svc_timeouts[svc->cnt] = rs_get_time() + msg.rs->keepalive_time;

    The index into tcp_svc_timeouts should correspond to the
    rsocket being updated, not the last one in the list.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1695abfa9f6bf429a5aa07117310c4ad87d4b3ae
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Jul 3 13:55:39 2014 -0700

    rsocket: Fix removing rsocket from service thread

    When removing an rsocket from a service thread, we replace
    the removed service with the one at the end of the service list.
    This keeps the array tightly packed.  However, rs_svc_rm_rs
    decrements the rsocket count before doing the swap.  The result
    is that the entry at the end of the list gets dropped off.
    Defer decrementing the count until the swap has been made.

    In this case, the cnt value is a valid index into the array,
    because we start at index 1.  Index 0 is used internally by
    the service thread.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 9085562c22189850e1f16b9a9955f11e79caac06
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jul 2 15:37:10 2014 -0700

    rsocket: Fix crash resulting from keepalive timeout

    The following crash was reported by Hal Rosenstock,
    <hal@mellanox.com>, with keepalive enabled.  The crash
    occurs in the keepalive thread attempting to send a
    keepalive message.

    report:
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0x7fffecf08700 (LWP 6013)]
    rs_post_write (rs=<value optimized out>, sgl=0x0, nsge=0, wr_data=3758096385,
        flags=0, addr=0, rkey=0) at src/rsocket.c:1660
    1660            return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
    Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
    (gdb)
    (gdb) p/x rs
    $1 = value has been optimized out

    So I added in the following to debug:
    1660    if (rs == NULL)
    1661    abort();
    1662    if (rs->cm_id == NULL)
    1663    abort();
    1664    if (rs->cm_id->qp == NULL)
    1665    abort();
    1666            return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
    1667    }

    And saw in gdb:

    Program received signal SIGABRT, Aborted.
    [Switching to Thread 0x7fffecf08700 (LWP 8096)]
    0x00000030d50328a5 in raise () from /lib64/libc.so.6
    Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
    (gdb)
    (gdb) bt
    #0  0x00000030d50328a5 in raise () from /lib64/libc.so.6
    #1  0x00000030d5034085 in abort () from /lib64/libc.so.6
    #2  0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
        nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
    #3  0x00007ffff058193d in tcp_svc_send_keepalive (arg=0x7ffff0789f20)
        at src/rsocket.c:4245
    #4  tcp_svc_run (arg=0x7ffff0789f20) at src/rsocket.c:4279
    #5  0x00000030d5807851 in start_thread () from /lib64/libpthread.so.0
    #6  0x00000030d50e890d in clone () from /lib64/libc.so.6
    (gdb) fr 2
    #2  0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
        nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
    1665    abort();

    So qp is NULL somehow...
    :end report

    There is an issue if an rsocket is closed without going through
    the rshutdown.

    int rshutdown(int socket, int how)
    {
    	...
    	if (rs->opts & RS_OPT_SVC_ACTIVE)
    		rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE);

    We remove the rsocket from the keepalive thread in rshutdown.

    int rclose(int socket)
    {
    	...
    		if (rs->state & rs_connected)
    			rshutdown(socket, SHUT_RDWR);
    	...
    	rs_free(rs);

    rclose will call shutdown only if we're connected.  However, if the
    keepalive failed, the socket will be in an error state.  So,
    no call to rshutdown, which will leave the freed rsocket on
    the keepalive thread's list.

    The fix is to to have rclose remove an rsocket from being processed
    by a service thread if it is still active.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 310f630ac87f1deee1534ab405d5b771b801c25d
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 22:52:40 2014 -0700

    example/rdma_xclient/server: Update XRC support in sample programs

    Update rdma_xclient and rdma_xserver sample programs to test
    XRC data transfers.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 5662340a12429f8882be36d8787924be91a1cb74
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 22:56:43 2014 -0700

    rdmacm: Update addrinfo with XRC support

    Remove internal defines, and use libibverbs exported values
    instead.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 995eb0c90c1a0967179fe3f523861e15300d3dfa
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 17:47:22 2014 -0700

    rdmacm: Add support for XRC QPs

    Export a new extended create QP call.  Add support for XRC
    QPs.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 05eabc5335b95ab9d0d6a6132092fac6e1af1cc5
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 17:14:13 2014 -0700

    rdmacm: Add support for allocating XRC SRQs

    Add extended SRQ creation call, to support allocating
    XRC SRQs.  Use the rdma_cm_id qp type field to
    determine which type of SRQ should be allocated.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 89a782a52a48db38d917084233006fb91cbd0694
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 16:46:34 2014 -0700

    rdmacm: Add functionality to allocate an XRCD

    XRC QPs and SRQs are associated by an XRC domain.  Provide a
    call to allocate an XRCD, similar to how the rdmacm allocates
    a PD for the user.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit f916b9b6bfbcd86b5326d84c0dfa106ddc9c907c
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 16:17:30 2014 -0700

    build: Add build support for XRC

    Modify autotools to check for and require a libibverbs
    version that includes XRC and extension support.

    Remove any code used to support older versions of
    libibverbs.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 0cd1e9b0e7a2d438a0f1004e6c6ff1b6785c4038
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 13:30:42 2014 -0700

    librdmacm: Use SRQ in rdma_create_qp

    If an application has allocated an SRQ on an rdma_cm_id, use
    it when creating a QP.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 3e1fc1cfad65c83a05c8550d8e359c8b9223d859
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jun 25 12:56:18 2014 -0700

    librdmacm: Remove NULL checks after calling alloca

    alloca doesn't return a NULL pointer on failure.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a34703c53259845dd20450a87eb6747030e23e8b
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Jun 20 17:44:26 2014 -0700

    Revert "rsocket: Change keepalive to 0-byte RDMA write"

    This reverts commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e.

    Testing has shown that this does not always result in the
    keep-alive message working correctly, such that a broken
    connection is reported as having failed.  The reason for this
    behavior is unknown, but revert the patch until the issue has
    been resolved.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 7b1eb6407f1f7a953673ab23a2d75f8a3cd8dbb9
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Thu Jun 19 13:08:02 2014 -0400

    librdmacm: In ucma_convert_path, fix selector values

    Intent is for the selectors to be equal to (exactly) rather than less than.
    Selector for exactly is value of 2 rather than 1.

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 7f0fbf984a5140efb76f93fef1f35202c617249d
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Thu Jun 19 11:54:11 2014 -0400

    rsocket: Add support for RDMA_ROUTE option in rgetsockopt

    Create as many ibv_path_data structs from the RDMA route
    ibv_sa_path_rec struct for the rsocket based on how
    many fit into the supplied buffer.

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 106899eccc5fa61dd5e69c90bc0651ccd57e725f
Merge: 6c7d6d3 0f2c76e
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jun 18 11:56:42 2014 -0700

    Merge branch 'dev'

commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e
Author: Susan K. Coulter <markus@cj-fe1.lanl.gov>
Date:   Mon Jun 16 10:28:08 2014 -0700

    rsocket: Change keepalive to 0-byte RDMA write

    Signed-off-by: Susan K. Coulter <markus@cj-fe1.lanl.gov>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6c7d6d3038524c275ecfb7468b4455fe2cc39a19
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:45:23 2014 -0700

    rdma_server: handle IBV_SEND_INLINE correctly

    Not all RDMA devices support IBV_SEND_INLINE.  At least some of those
    that don't will ignore the flag passed to rdma_post_send and attempt to
    send the command by using an sge entry instead.  Because we don't
    register the send memory, this fails.  The proper way to deal with the
    fact that IBV_SEND_INLINE is not guaranteed is to check the returned
    value in our cap struct to see if we have support for inline data, and
    if not, fall back to non-inline sends and to register the send memory
    region.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 9fe390a793203a13b0507472848e1e7da8c75bed
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:44:49 2014 -0700

    rdma_client: handle IBV_SEND_INLINE correctly

    Not all RDMA devices support IBV_SEND_INLINE.  At least some of those
    that don't will ignore the flag passed to rdma_post_send and attempt to
    send the command by using an sge entry instead.  Because we don't
    register the send memory, this fails.  The proper way to deal with the
    fact that IBV_SEND_INLINE is not guaranteed is to check the returned
    value in our cap struct to see if we have support for inline data, and
    if not, fall back to non-inline sends and to register the send memory
    region.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 2c2e44e144f17c2cef4af052ec91a680c9a81fb9
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:44:28 2014 -0700

    rdma_server: use perror, unwind allocs on failure

    Our main test function prints out errno directly, which is hard to read
    as it's not decoded at all.  Instead, use perror() to make failures more
    readable.  Also redo the failure flow so that we can do a simple unwind
    at the end of the function and just jump to the right unwind spot on
    error.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1bc834aeca99a4dd0c5bea733e2735f148b4418c
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:44:13 2014 -0700

    rdma_client: use perror, unwind allocs on failure

    Our main test function prints out errno directly, which is hard to read
    as it's not decoded at all.  Instead, use perror() to make failures more
    readable.  Also redo the failure flow so that we can do a simple unwind
    at the end of the function and just jump to the right unwind spot on
    error.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 05fc15b44805a23a4e8562d1953074243950dfbe
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:43:04 2014 -0700

    cmtime: rework program to be multithread

    When using very large numbers of connections (10,000 was in use here),
    we ran into a problem where when we resolved a performance problem in
    the kernel cma.c code, we suddenly developed a new problem.  That new
    problem turned out to be the fact that with the underlying kernel issue
    resolved, 10,000 connect requests would flood the server side of the
    test and the cmtime application would respond as quickly as possible.
    However, the client side would not bother to check any of the returns
    until after having sent all 10,000 connect requests.  When the kernel
    had a serializing performance problem, this was OK.  When it was fixed,
    this caused a general slowdown in connect operations due to overruns in
    the event processing.  This patch causes the client side to fire off
    threads that will handle responses to connect requests as they come in
    instead of allowing them to backlog uncontrollably.  Times for a 10,000
    connect run changed from this:

    [root@rdma-dev-01 ~]# more
    3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+.output
    ib1:
    step              total ms     max ms     min us  us / conn
    create id    :       46.64       0.10       1.00       4.66
    bind addr    :       89.61       0.04       7.00       8.96
    resolve addr :       50.63      26.18   23976.00       5.06
    resolve route:      565.44     538.77   26736.00      56.54
    create qp    :     4028.31       5.70     326.00     402.83
    connect      :    50077.42   49990.49   90734.00    5007.74
    disconnect   :     5277.25    4850.35  380017.00     527.72
    destroy      :       42.15       0.04       2.00       4.21

    ib0:
    step              total ms     max ms     min us  us / conn
    create id    :       34.82       0.04       1.00       3.48
    bind addr    :       25.94       0.02       1.00       2.59
    resolve addr :       48.18      25.01   22779.00       4.82
    resolve route:      501.28     476.26   25071.00      50.13
    create qp    :     3274.12       6.05     257.00     327.41
    connect      :    55549.64   55490.32   62150.00    5554.96
    disconnect   :     5263.64    4851.18  375628.00     526.36
    destroy      :       47.20       0.07       2.00       4.72

    to this:

    [root@rdma-dev-01 ~]# more
    3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+-fixed-cmtime.output
    ib1:
    step              total ms     max ms     min us  us / conn
    create id    :       34.45       0.08       1.00       3.44
    bind addr    :       88.41       0.04       7.00       8.84
    resolve addr :       33.59       4.65     612.00       3.36
    resolve route:      618.68       0.61      97.00      61.87
    create qp    :     4024.03       6.30     341.00     402.40
    connect      :     6983.35    6886.33    8509.00     698.33
    disconnect   :     5066.47     230.34     831.00     506.65
    destroy      :       37.02       0.03       2.00       3.70

    ib0:
    step              total ms     max ms     min us  us / conn
    create id    :       42.61       0.14       1.00       4.26
    bind addr    :       27.05       0.03       2.00       2.70
    resolve addr :       40.65      10.73     869.00       4.06
    resolve route:      626.75       0.60     103.00      62.68
    create qp    :     3334.50       6.48     273.00     333.45
    connect      :     6310.29    6251.59   13298.00     631.03
    disconnect   :     5111.12     365.87     867.00     511.11
    destroy      :       36.57       0.02       2.00       3.66

    with this patch.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6551bab0b75b1f2499d97c2384cd3ac723da625f
Author: Hal Rosenstock <hal@mellanox.com>
Date:   Wed Jun 18 09:55:06 2014 -0700

    rsocket: Use malloc instead of calloc

    No need to clear allocated memory as immediately followed by
    memcpy which covers the allocated memory.

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 2a0944dc5e0e64290b8dfca332e6d5645c25b12e
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue May 27 11:43:05 2014 -0700

    librdmacm: Update rdma_accept man page

    Document NULL conn_param parameter for rdma_accept.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 386b97e807917a8ca7f6d12d66e34dc9441f7502
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu May 22 16:13:08 2014 -0700

    indexer: Free index_map resources when cleared

    Free memory allocated for index map entries when they are no
    longer in use.  To handle this, count the number of entries
    stored by the index map item arrays and release the arrays when
    no items are being tracked.

    This reduces valgrind noise.

    Problem reported by: Hannes Weisbach <hannes_weisbach@gmx.net>

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 397b1a79f077c2fd1ae35be15bc3a7d8918800f1
Author: Patrick MacArthur <pmacarth@iol.unh.edu>
Date:   Tue Apr 29 21:30:08 2014 -0700

    rstream: fix "-T resolve" detection

    Signed-off-by: Patrick MacArthur <pmacarth@iol.unh.edu>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit b3758e215f0abbea0d48996ef9b95f01530a4210
Author: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Date:   Tue Apr 29 19:57:36 2014 -0700

    librdmacm: Fix verbs leak due to reentrancy issue

    Any call to ucma_init_device must be done under lock.

    Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1291d9c7b52e829057458dad0e0ddd5aa9821a2a
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Apr 16 22:01:51 2014 -0700

    rsocket: Relax requirement for minimal inline data

    Inline data support is optional.  Allow rsockets to work
    with devices that do not support inline data, provided
    that they do support RDMA writes with immediate data.
    This allows rsockets to work over Intel TrueScale HCA.

    Patch derived from work by: Amir Hanania

    Signed-off-by: Amir Hanania <amir.hanania@intel.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit dfb5886db5975d209be6b31656c95b0d9c608195
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Apr 16 22:33:38 2014 -0700

    rsocket: Modify when control messages are available

    Rsockets currently tracks how many control messages (i.e.
    entries in the send queue) that are available using a
    single ctrl_avail counter.  Seems simple enough.

    However, control messages currently require the use of
    inline data.  In order to support control messages that
    do not use inline data, we need to associate each
    control message with a specific data buffer.  This will
    become easier to manage if we modify how we track when
    control messages are available.

    We replace the single ctrl_avail counter with two new
    counters.  The new counters conceptually treat control
    messages as if each message had its own sequence number.
    The sequence number will then be able to correspond to
    a specific data buffer in a follow up patch.

    ctrl_seqno will be used to indicate the current control
    message being sent.  ctrl_max_seqno will track the
    highest control message that may be sent.

    A side effect of this change is that we will be able to
    see how many control messages have been sent.  This also
    separates the updating of the control count on the
    sending  side, versus the receiving side.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 5ac6f3eab852606575f9affa515ec77b978a001c
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Apr 17 08:37:47 2014 -0700

    rsocket: Dedicate a fixed number of SQEs for control messages

    The number of SQEs allocated for control messages is set
    to 1 of 2 constant values (either 4 or 2).  A default
    value is used unless the size of the SQ is below a certain
    threshold (16 entries).  This results in additional code
    complexity, and it is highly unlikely that the SQ would
    ever be allocated smaller than 16 entries.

    Simplify the code to use a single constant value for the
    number of SQEs allocated for control messages.  This will
    also help in subsequent patches that will need to deal
    with HCAs that do not support inline data.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit d62a52590741da993c5ac3c39c82601c273175d9
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Apr 16 21:42:06 2014 -0700

    rsocket: Check max inline data after creating QP

    The ipath provider will ignore the max_inline_size
    specified as input into ibv_create_qp and instead
    return the size that it supports (which is 0) on
    output.

    Update the actual inline size returned from create QP,
    and check that it meets the minimum requirement for
    rsockets.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 8ce5823e02b6a38fd5ed7e11a1bb586847dbcb03
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Apr 29 20:11:35 2014 -0700

    librdmacm: Make ucma_init_all static

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 23ffef06cf462c4c5ac4ec5880b96c8719b64774
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Apr 9 12:19:25 2014 -0700

    librdmacm: Support lazy initialization

    librdmacm currently opens a device context per configured HCA. This is
    usually done in rdma_create_event_channel() or first time whenever
    ucma_init() is called. If a process is only going to use one of the
    configured HCAs/RDMA IPs then the remaining device contexts are not
    used/required. Opening a device context on each device apriori limits the
    maximum number of processes that can be supported on a node to the maximum
    number of open context supported per HCA regardless of number of HCAs present
    in the system.

    Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 984b1e3c189db9d156ea429c1726bd8739893247
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Mar 6 13:42:31 2014 -0800

    rsocket: Fix sbuf_bytes_avail counter 'overrun' with iwarp

    Reported-by: Jonas Pfefferle1 <JPF@zurich.ibm.com>

    "The problem is that on the client side sbuf_bytes_avail overflows
    in rs_poll_cq.  And from what I debugged so far there are 2
    completions for every send and this is because I use iWarp hardware
    which does not support write with immediate so there is one completion
    for the write and one for the send (both go into the default case
    and add the length to sbuf_bytes_avail)."

    To avoid the issue, we flag send message operations that are used
    in place of immediate data.  Other send message operations are
    not affected.  The completion code can then check whether the
    completion is for a send message which was paired with an RDMA
    write transaction and adjust the behavior accordingly.  Additionally,
    such send messages only carry the opcode in their WR_ID, with the
    data portion zeroed.  This avoids adding the length value twice.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a2340d891eaa3f8a766a627bb4402ea85bcec6cb
Author: Hal Rosenstock <hal@mellanox.com>
Date:   Wed Mar 5 12:51:54 2014 -0800

    riostream: Add AF_IB support

    Allow the user to specify GID addresses (AF_IB) with riostream

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 8e760a4486f776df4f6728326dc7e8aed4a18971
Author: Hal Rosenstock <hal@mellanox.com>
Date:   Tue Mar 4 17:06:47 2014 -0800

    rsocket: Return EBADF on bad rsocket fd

    Eliminates potential seg faults when passed an invalid rsocket.

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 3c19c968a240a2c50809373f9aa90bdf3454f6b1
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Mar 4 16:59:20 2014 -0800

    man/rsocket: Enhance riomap documentation

    Document that the user must set IOMAPSIZE in order to
    use the riomap call.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 176e6e961d17c51ae1f2dad5a2f50546e3a2ecf4
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jan 27 12:10:55 2014 -0800

    librdmacm 1.0.18

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit b4603c864860e5e35379458cd1c0a42bb983af59
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jan 27 11:30:34 2014 -0800

    udaddy: Remove support for port space IB

    UD support for the IB port space requires that the application
    use rdma_create_ep, rather than rdma_create_id.  However, using
    rdma_create_ep results in address and route resolution being
    performed synchronously as part of the rdma_create_ep call.
    Since udaddy is an example, we want to show how it can be used
    with asynchronous events.  So, rather than update udaddy to
    use rdma_create_ep in order to support the IB port space, it
    would be better to remove that support.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit df7ecde0da9df4af5d8bc3e1ca472e2e5ec9095b
Author: Susan K. Coulter <markus@cj-fe2.lanl.gov>
Date:   Fri Jan 17 14:31:42 2014 -0800

    rsocket: Add keepalive logic

    Actually send and receive keepalive messages if keepalive is
    enabled on an rsocket.

    Signed-off-by: Susan K. Coulter <markus@cj-fe2.lanl.gov>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit da2b7c1cde16df7936273b1ebd38e7c25856c843
Author: Or Gerlitz <ogerlitz@mellanox.com>
Date:   Tue Dec 3 16:51:07 2013 -0800

    librdmacm: Add directives on binding to IPv6 any address to man pages

    Explain how to bind to IPv6 any address in the man pages for the examples

    Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit ea5851998c11b8211170179a6d924d4935fec0a1
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Nov 26 13:16:19 2013 -0800

    librdmacm: Check 'init' under mutex

    ucma_ib_init() does a quick check that access to ibacm has
    been initialized.  This check is done outside of the
    acm_lock mutex.  We need to check init again inside of
    holding the mutex to ensure that we don't run the
    initialization code twice.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit b70a390d8bd8a679571f06ab82e42d68a99bc7d2
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Nov 18 13:12:04 2013 -0800

    rping: Fix server reporting error on exit

    Commit e57196c71ddd850e14f3e66355f02786e4914f72
    rping: added checks to the return values functions
    resulted in the rping server always reporting that
    it failed.  Fix this by only failing in the case of
    an unexpected termination, and not the result of
    the client completing.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit c38d43aa2d5dc39dd98f813749dfa496875ad2e1
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Nov 11 10:24:54 2013 -0800

    Retrieve SGID after calling rdma_bind_addr

    A change was made to rdma_bind_addr when AF_IB is enabled
    to only retrieve the resulting bound address.  Previously,
    rdma_bind_addr would retrieve the corresponding SGID as
    well.  This breaks some apps which were checking the
    SGID after binding to an IP address.  Revert to the
    previous behavior of also retrieving the SGID after
    calling rdma_bind_addr.

    Tested-by: Christoph Lameter <cl@linux.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit faafeac08920a37994da19d72fd7ba1e64281f83
Author: Guy Shapiro <guysh@mellanox.com>
Date:   Tue Nov 5 19:52:20 2013 +0200

    librdmacm: Some fixes to man pages

    Fix the man pages of rdma_destroy_ep & rdma_destroy_qp to the correct return value (void).

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 41900ddd3b09ed0625a721b014692b8c5c6f7246
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Mon Nov 4 07:56:08 2013 -0500

    [librdmacm] Makefile.am: Add missing riostream man page to man_MANS

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 86520b86ffb45d3caf6e5bd94271f99deef0a5f9
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Aug 16 15:15:12 2013 -0700

    rsockets: Handle race between rshutdown and rpoll

    Multi-threaded applications which call rpoll and rshutdown
    simultaneously can hang.  Ceph developers reported an issue
    with the rsocket implementation.  Ceph calls rpoll in
    one thread, and while that thread is blocked in rpoll,
    a second thread may cann rshutdown on the socket.  In
    normal sockets, this results in the poll call unblocking
    (since a call to read on the socket will no longer block).
    however, rsockets does not free the thread blocked on the
    rpoll call.

    To fix this, we add some additional state checking to
    protect against threads calling rpoll and rshutdown
    simultaneously.  We also have the rshutdown call
    transition the QP into an error state.  This causes all
    posted receives to complete as flushed, which results
    in unblocking the thread in rpoll (to process the flushed
    receives).

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6152fb2ea9f4e331c63c00810ee4b920e6f1af2d
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Wed Sep 11 15:37:11 2013 -0400

    [librdmacm] man/rstream.1: Update man page to be consistent with rstream -h

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 77cab40df7f29bdc718a4a6da74c6145bf81468a
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Wed Sep 11 14:44:32 2013 -0400

    [librdmacm] rstream.c: Indicate when specified address family is unknown

    Signed-off-by: Hal Rosenstock >hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 05ea9d16da8808e464750fa976ba3d6151df0a54
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Wed Sep 11 14:44:28 2013 -0400

    [librdmacm] man/rdma_create_id.3: Add RDMA_PS_IB port space description

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a53376c3c7887c52cf5b311b0b96cfa405a49d31
Author: Yan Droneaud <ydroneaud@opteya.com>
Date:   Tue Aug 27 11:37:54 2013 -0700

    examples: Add cmtime to .gitignore

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 78dd0371cdad6bf27e98903ba66cebc01f52f6d5
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 22 15:29:15 2013 -0700

    rsocket: Update rsocket man page

    Update fork support and RDMA_ROUTE socket option.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 5a5ec3458c67b1b431a18a0acbc950ef4e31f87f
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 22 12:00:54 2013 -0700

    cmtime: Add retry support for address and route resolution

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit b031fead061eb0d2874be8f259c84e21433e4505
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 22 11:54:56 2013 -0700

    cmtime: Allow user to specify timeout values

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit afd49dcc2bb13052075e07a7593f6593b43606ce
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 22 11:30:33 2013 -0700

    cmtime: Add ability to time rdma_bind_addr calls

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 2949a92960546b75c647bcf14fec1f4369fd17fa
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Aug 5 10:57:43 2013 -0700

    cmtime: Add example program that times rdma cm calls

    cmtime is a new sample program that measures how long it
    takes for each step in the connection process to complete.
    It can be used to analyze the performance of the various
    CM steps.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 8fd079abb8b2835908017f74ac70781d84e1e163
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Jul 26 09:52:55 2013 -0700

    rstream: Use rsocket option to set route directly

    If we're using GID addressing, rdma_getaddrinfo can return
    routing data directly.  Add an option for the user to
    indicate that rdma_getaddrinfo should be called in place of
    getaddrinfo.  And if routing data is available, call
    rsetsockopt to set the route.

    This helps test rsockets when ibacm and AF_IB support are
    available.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 21c703e5a594283cf119ce1286831df5d1483b34
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Aug 2 14:18:06 2013 -0700

    rsocket: Return 0 on success for SOL_RDMA options

    The processing of SOL_RDMA does not set the return value in
    the case of successfully handled options.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e33755decd339712fc57fbe25bed704d24e8621a
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jun 10 12:33:20 2013 -0700

    rsockets: Add ability to set the IB route directly

    Add an RDMA specific rsocket option that allows the user
    to program the RDMA route directly.  This is useful
    for apps that have path record data available, e.g. from
    ibacm.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit f77079d79becf4476cb75ea5c816aae70724116e
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Sat Jul 20 19:22:55 2013 -0700

    examples: Add support for native IB addressing to samples

    Allow the user to specify GID addresses (AF_IB) into
    udaddy and rstream.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit ca353a3f985135504c429f82bf5a342ec26d11d4
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Jul 18 13:26:15 2013 -0700

    rsockets: Support native IB addressing on connected rsockets

    Update rsockets to support AF_IB addresses on connected rsockets.
    Support for datagram rsockets is more difficult as a result of
    using real UDP sockets for QP resolution, so that support is
    deferred.  For connected sockets, we need to update internal
    checks to handle AF_IB.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a8becf33bbbb363cb2e0f2b45456bc82b345c453
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun Jul 28 11:20:54 2013 +0200

    [4/4] Declare 'server_port' as an unsigned variable

    Change the data type of the 'server_port' variable from signed to
    unsigned such that the cast in the fscanf() call can be removed.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

commit eee05e6604a60b007249f97613d3bb513c07c20d
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun Jul 28 11:19:48 2013 +0200

    [3/4] rsocket: Remove the unused variable 'ret'

    The variable 'ret' is assigned a value but that value is never used.
    This triggers the following compiler warning:

    src/rsocket.c:3720:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]

    Hence remove this variable.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

commit 9e758e0655242bb02aea5ec28fe4eeac2ec655f7
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun Jul 28 11:19:15 2013 +0200

    [2/4] cma: Remove the unused variable 'id_priv'

    The variable 'id_priv' is assigned a value but is never used.
    This triggers the following compiler warning:

    src/cma.c:1178:25: warning: variable 'id_priv' set but not used [-Wunused-but-set-variable]

    Hence remove this variable.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

commit 2a31c855fc95d04370db56de5b35d8271e577f6f
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun Jul 28 11:18:36 2013 +0200

    [1/4] acm: Remove the unused variable 'pri_path'

    The variable 'pri_path' is assigned a value but is never used.
    This triggers the following compiler warning:

    src/acm.c:301:26: warning: variable 'pri_path' set but not used [-Wunused-but-set-variable]

    Hence remove this variable.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

commit c8be3cfde6902e490fadd6a51206c1bcba3e3aa2
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jun 10 10:57:56 2013 -0700

    init: Remove USE_IB_ACM configuration option

    When the librdmacm is configured, it sets the USE_IB_ACM option
    if infininband/acm.h is found.  We can remove this option with
    very little overhead, which would allow a user to install
    ACM after installing the librdmacm, and the librdmacm would be
    able to make use of ACM.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6efb57780ca142ea4e3b0feebef554849047f79f
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jun 10 11:07:12 2013 -0700

    acm: Define needed ACM protocol messages

    The librdmacm needs message definitions used to communicate
    with the ibacm.  It currently pulls these from infiniband/acm.h,
    which is installed by ibacm.  This creates an install order
    dependency on ibacm.  However, work on the scalable SA has
    the ibacm using the librdmacm (via rsockets) for communication
    between the different SSA components.

    To resolve this issue, have the librdmacm define the message
    structures that it needs to communicate with ibacm.  The
    librdmacm already defines some ACM messages through configuration
    checks.  We just expand that capability, which isolates the librdmacm
    package from the ibacm package.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit c8173d50d1a8c2bbfb0c4459e05d3941175676b2
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Aug 29 15:02:54 2012 -0700

    cmatose: Allow user to specify address format

    Provide an option for the user to indicate the type of
    addresses used as input.  Support hostname, IPv4, IPv6,
    and GIDs.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 704f54358a1f74229cd9e982b530ca8327c7658e
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 16:03:42 2013 -0700

    Remove executable mode bit on text files

    Source code and man page should not be executable.

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 3eb1704b2e11413077933d6d3a963d81d508bdf8
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 23:59:52 2013 +0200

    Open files with "close on exec" flag

    File opened by librdmacm are not supposed to be inherited across
    exec*(), most of the files are of no use for another program, and
    others cannot be used without the associated memory mapping.

    This patch changes fopen() open() and socket() to always set
    close on exec flag.

    This patch also add checks to configure to guess if fopen() supports
    "e" flag. If O_CLOEXEC and SOCK_CLOEXEC are supported, fopen() should
    support "e". If not supported, its discarded according to POSIX. Many
    operating systems have support for fopen("e").

    You might find more information about close on exec in the following articles:

    - "Excuse me son, but your code is leaking !!!" by Dan Walsh
      http://danwalsh.livejournal.com/53603.html

    - "Secure File Descriptor Handling" by Ulrich Drepper
      http://udrepper.livejournal.com/20407.html

    Note: this patch won't set close on exec flag on file descriptors
    created by the kernel for completion channel and such.
    This is addressed by another kernel patch.

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit d53cd79c3bde6186bda6822a04708b9d2666f8ae
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 23:59:50 2013 +0200

    Add .gitignore rules

    Add the list of files/patterns to be exclueded from git status output.
    Additionally it will prevent such files/patterns to be added and committed.

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e9ef6c2e2d8141dd5c32472918b8c087f745524b
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 23:59:49 2013 +0200

    configure: Use automake's option "subdir-objects"

    Following advice in "Autotool Mythbuster" [1], option subdir-objects
    can be used to have Makefiles create object files in the same
    directory than theirs source files.

    It reduces clobbering in the build directory.

    [1] "Autotool Mythbuster", by Diego Elio "Flameeyes" Petten`o
    http://www.flameeyes.eu/autotools-mythbuster/automake/nonrecursive.html

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 3edfff79d98f72b754278c854f871c4a22a7ce3c
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 23:59:48 2013 +0200

    configure: Apply updates proposed by autoupdate

    'autoupdate' is a tool to help developer to update configure.ac.

    This patch applies a few fixes as suggested by autoupdate.

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit f49ac33aaab147e5b126a75565f57e596600f372
Author: Jeff Squyres <jsquyres@cisco.com>
Date:   Tue Jul 16 23:59:47 2013 +0200

    autogen.sh: Use autoreconf in autogen.sh

    The old sequence of Autotools commands listed in autogen.sh is no
    longer correct.  Instead, just use the single "autoreconf" command,
    which will invoke all the Right Autotools commands in the correct
    order.

    Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 9d2f1b068e6fcd62853fe013c7cc4316dcb3fc4b
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Tue Jul 16 23:59:46 2013 +0200

    Makefile.am: Fix an automake warning

    Fix the following automake warning message:

        Makefile.am:1: `INCLUDES' is the old name for `AM_CPPFLAGS' (or `*_CPPFLAGS')

    A quote from the automake manual:

        INCLUDES
            This does the same job as AM_CPPFLAGS (or any per-target _CPPFLAGS variable
            if it is used). It is an older name for the same functionality. This
            variable is deprecated; we suggest using AM_CPPFLAGS and per-target
            _CPPFLAGS instead.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 715965b7231cd97d302e24c9e8ac89b2a57a57ab
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Tue Jul 16 23:59:45 2013 +0200

    Add "foreign" option to AM_INIT_AUTOMAKE

    Switch to the modern form of the AM_INIT_AUTOMAKE macro and tell
    automake that the librdmacm package does not follow the GNU
    standards. This change makes it possible to use 'autoreconf' for the
    librdmacm package.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit ef095323918acac8fdc5386ebb7877fb5d34e5e3
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu May 2 13:47:51 2013 -0700

    lib: Rename configure.in to configure.ac

    Update to latest autotools naming.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit faae8c5db396985a40dc56ad6f82f89a16b8e9f1
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Apr 11 10:05:29 2013 -0700

    rsocket: Add support for iWarp

    iWarp does not support RDMA writes with immediate data.
    Instead of sending messages using immediate data, allow
    the rsocket protocol to exchange messages using sends.

    The rsocket protocol remains the same.  RDMA writes are
    used for data transfers, with send messages used to transfer
    rsocket protocol messages.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 0d6ca1300d88377ae7f9162457e64c541a4630eb
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Apr 12 14:41:52 2013 -0700

    rsocket: Merge usage of wr_id between stream and datagram svcs

    The rsocket data streaming and datagram services use different
    formats for the wr_id.  Although some differences are needed,
    we can make them more similar.  This will be useful when the
    wr_id is used for iwarp support, plus eliminates use of wr_id
    bits that aren't actually needed.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e57928b701ded6c5417b5ac0c153a239bf947612
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Mar 5 17:18:11 2013 -0800

    librdmacm: Release 1.0.17

commit 24590bc96d8871d80124d68d182c915d7efcc9e6
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Feb 19 20:03:58 2013 -0800

    librdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown

    Shutdown switches an rsocket from nonblocking to blocking to
    ensure that all data has been sent.  After completing all
    transfers, it should switch back to nonblocking; this handles
    partial shutdown situations, where only half the connection
    is shut down.  However, the code uses the value of '1' to
    set the nonblocking flag, rather than O_NONBLOCK.  Fix this.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit be2a2a44663282cda1a60e05c3b85275c732acc6
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Feb 4 16:52:18 2013 -0800

    librdmacm/rstream: Reduce default transfer count

    1 million ping-pong transfers takes over 3 seconds to
    complete, and I'm impatient.  Reduce the default number of
    transfers for small messsages to speed up running
    performance tests, especially when running over slower
    connections, like TCP sockets or over a WAN.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 69fadb50636d98de57c9069b83adf6d2c5c77fc6
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Feb 1 17:17:34 2013 -0800

    librdmacm: Work-around kernel bug returning uid = 0

    Older kernels have a bug where it can report an event with the
    uid set to 0.  The librdmacm crashes when casting the uid to
    an rdma_cm_id and dereferencing the NULL pointer.

    There are a limited number of events where this can occur and
    in most cases it's safe to simply discard the event.  (This is
    what the kernel does anyway.)  However, it's possible for us
    to process an RDMA_CM_EVENT_ESTABLISHED event with the uid
    set to 0.  (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.)

    Although it's rare for this to occur, it does in fact happen
    in practice.  To work-around the kernel bug, when the uid of an
    established event is set to 0, we first try to locate the correct
    user space id based on related data before discarding the event.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 75e5b5b17d8a478b4fad5d9ee700edb943b050ba
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jan 28 14:56:25 2013 -0800

    librdmacm: Define ucma_ib_init when IB_ACM is disabled

    ucma_ib_init is only defined if IB_ACM is enabled, which is
    determined by looking for the infiniband/acm.h header file.
    Define ucma_ib_init when IB_ACM is disabled.

    Problem reportedy by Suresh Shelvapille <suri@baymicrosystems.com>

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1f6088f85af3c60ba4d57de1d8f1098e06761237
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jan 21 15:28:39 2013 -0800

    rsockets: Update rsocket man page

    Update man page to include recently added rsocket options
    and undocumented configuration file.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 56e1a7cd4904fbfde59adbdfedd5374e5bde2e87
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jan 9 14:54:47 2013 -0800

    rsockets: Add support for existing UDP apps

    Support for existing UDP applications is done via the rspreload
    library.  However, when the preload library is loaded, socket
    calls used by rsockets get intercepted and converted into
    rsocket calls.

    The preload library was able to handle this for TCP rsockets
    by using a per thread variable and checking for recursive calls
    coming from rsockets back into the preload library.  The preload
    library would direct such calls to the real socket calls.

    The problem is more complex for UDP rsockets, which can invoke
    socket calls from an internal rsocket thread.  The result is that
    the preload library intercepts socket calls that originate from
    the rsocket library which are not recursive.

    Although, this is really a problem with the preload library,
    the simplest solution is for rsockets to fully initialize the
    library when allocating the first rsocket, versus deferring
    initialization until required.  The preload library can then
    detect the recursive calls.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6047e1991e95b96b1992f39a466457e584c01226
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Dec 5 15:58:03 2012 -0800

    examples/udpong: Add test program for rsocket datagrams

    Add a sample test program to test datagram rsockets.  Move
    common routines used by udpong and other test programs into
    a common source file.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e6e93ed4231976eeab707b31e283be0a7acff6db
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Nov 9 10:26:38 2012 -0800

    rsocket: Add datagram support

    Add datagram support through the rsocket API.

    Datagram support is handled through an entirely different protocol and
    internal implementation than streaming sockets.  Unlike connected rsockets,
    datagram rsockets are not necessarily bound to a network (IP) address.
    A datagram socket may use any number of network (IP) addresses, including
    those which map to different RDMA devices.  As a result, a single datagram
    rsocket must support using multiple RDMA devices and ports, and a datagram
    rsocket references a single UDP socket, plus zero or more UD QPs.

    Rsockets uses headers inserted before user data sent over UDP sockets to
    resolve remote UD QP numbers.  When a user first attempts to send a datagram
    to a remote address (IP and UDP port), rsockets will take the following steps:

    1. Store the destination address into a lookup table.
    2. Resolve which local network address should be used when sending
       to the specified destination.
    3. Allocate a UD QP on the RDMA device associated with the local address.
    4. Send the user's datagram to the remote UDP socket.

    A header is inserted before the user's datagram.  The header specifies the
    UD QP number associated with the local network address (IP and UDP port) of
    the send.

    A service thread is used to process messages received on the UDP socket.  This
    thread updates the rsocket lookup tables with the remote QPN and path record
    data.  The service thread forwards data received on the UDP socket to an
    rsocket QP.  After the remote QPN and path records have been resolved, datagram
    communication between two nodes are done over the UD QP.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit c6bfc1c5b15e6207188a97e8a5df0405cfd2587f
Author: Or Gerlitz <ogerlitz@mellanox.com>
Date:   Sun Dec 2 12:04:23 2012 +0000

    [librdmacm] Fixed build problem due to missing macro

    rsocket.c wasn't passing compilation as of missing definition for the
    container_of macro, fix it. Reported-by: Eyal Salamon <esalomon@mellanox.com>

    Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit ab0d488c1e3ba7658f61a4d8da022b5afc17737f
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Nov 5 11:53:03 2012 -0800

    rsocket: Remove fscanf build warnings

    Cast fscanf return values to (void) to indicate that we don't
    care if the call fails.  In the case of a failure, we simply
    fall back to using default values.

    Problem reported by Or Gerlitz <ogerlitz@mellanox.com>

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 7d92d0106f50e0371256e74863963a0e2e99a5c8
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Oct 24 10:23:52 2012 -0700

    riostream: Add example program for using iomap routines.

    riostream is based on rstream, but uses the new riomap, riounmap,
    and riowrite calls instead.  It runs a series of latency and
    bandwidth tests using remote iomapped memory.

    riostream is limited to using zero copy transfers at the
    receiving side only at this time.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit bb9fcba81acdfe34ea5df3bb23a45e0a486207da
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Sun Oct 21 14:16:03 2012 -0700

    rsocket: Add APIs for direct data placement

    We introduce rsocket extensions for supporting direct
    data placement (also known as zero copy).  Direct data
    placement avoids data copies into network buffers when
    sending or receiving data.  This patch implements zero
    copies on the receive side, but adds some basic framework for
    supporting it on the sending side.

    Integrating zero copy support into the existing socket APIs
    is difficult to achieve when the sockets are set as
    nonblocking.  Any such implementation is likely to be unusable
    in practice.  The problem stems from the fact that socket
    operations are synchronous in nature.  Support for asynchronous
    operations is limited to connection establishment.

    Therefore we introduce new calls to handle direct data placement.
    The use of the new calls is optional and does not affect the
    use of the existing calls.  An attempt is made to have the new
    routines integrate naturally with the existing APIs.  The new
    functions are: riomap, riounmap, and riowrite.  The basic operation
    can be described as follows:

    1. App A calls riomap to register a data buffer with the local
       RDMA device.  Riomap returns an off_t offset value that
       corresponds to the registered data buffer.  The app may
       select the offset value.
    2. Rsockets will transmit an internal message to the remote
       peer with information about the registration.  This exchange
       is hidden from the applications.
    3. App A sends a notification message to app B indicating that
       the remote iomapped buffer is now available to receive data.
    4. App B calls riowrite to transmit data directly into the
       riomapped data buffer.
    5. App B sends a notification message to app A indicating that
       data is available in the mapped buffer.
    6. After all transfers are complete, app A calls riounmap to
       deregister its data buffer.

    Riomap and riounmap are functionally equivalent to RDMA
    memory registration and deregistration routines.  They are loosely
    based on the mmap and munmap APIs.

    off_t riomap(int socket, void *buf, size_t len,
    	     int prot, int flags, off_t offset)

    Riomap registers an application buffer with the RDMA hardware
    associated with an rsocket.  The buffer is registered either for
    local only access (PROT_NONE) or for remote write access (PROT_WRITE).
    When registered for remote access, the buffer is mapped to a given
    offset.  The offset is either provided by the user, or if the user
    selects -1 for the offset, rsockets selects one.  The remote peer may
    access an iomapped buffer directly by specifying the correct offset.
    The mapping is not guaranteed to be available until after the remote
    peer receives a data transfer initiated after riomap has completed.

    int riounmap(int socket, void *buf, size_t len)

    Riounmap removes the mapping between a buffer and an rsocket.

    size_t riowrite(int socket, const void *buf, size_t count,
    		off_t offset, int flags)

    Riowrite allows an application to transfer data over an rsocket
    directly into a remotely iomapped buffer.  The remote buffer is specified
    through an offset parameter, which corresponds to a remote iomapped buffer.
    From the sender's perspective, riowrite behaves similar to rwrite.  From
    a receiver's view, riowrite transfers are silently redirected into a pre-
    determined data buffer.  Data is received automatically, and the receiver
    is not informed of the transfer.  However, iowrite data is still considered
    part of the data stream, such that iowrite data will be written before a
    subsequent transfer is received.  A message sent immediately after
    initiating an iowrite may be used to notify the receiver of the iowrite.

    It should be noted that the current implementation primarily focused
    on being functional for evaluation purposes.  Some checks have been
    deferred for subsequent patches, and performance is currently limited
    by linear lookups.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit d2e96e99bf1fc3d14e33c741502cb689c810a27b
Author: Roland Dreier <roland@purestorage.com>
Date:   Tue Oct 16 19:44:39 2012 +0000

    rdma_xserver/client: Fix man page formatting

    Putting 'r' at the beginning of a line in the nroff source for man pages
    is confusing to nroff because lines that start with a single quote
    character ' or a dot character . are treated as control lines, which is
    not what's intended here.  Some of the man page text ends up left out of
    the formatted output.

    Fix this by just wrapping the text slightly differently in the source
    (which doesn't matter since nroff reflows the text anyway).  Also add a
    missing ".TP" so that the -p and -c options are not run together in the
    formatted output.

    Signed-off-by: Roland Dreier <roland@purestorage.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 507cc241e8b212c3cf3ed0ffb04e37095bbf8bb3
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Oct 8 10:33:21 2012 -0700

    librdmacm: Disable ACM support if ibacm.port is not found

    The librdmacm will try to connect port 6125 if ibacm.port is
    not found.  The problem is that some other service or application
    could be using that port and respond with garbage.  Rather
    than falling back to a hard coded port number, if ibacm.port
    is not found, simply disable ACM support.

    This has the effect of removing support for older versions
    of ibacm, unless the port file is created manually.

    Patch created based on feedback from Doug Ledford and Florian
    Weimer from RedHat.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e57196c71ddd850e14f3e66355f02786e4914f72
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:52 2012 +0000

    [5/5,librdmacm] rping: added checks to the return values functions

    This will make rping to exit with return value other than zero in case of an
    error.

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6c56dc404c999daa16a039f59b0160ab983acc98
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:51 2012 +0000

    [4/5,librdmacm] rstream: added missing return is accept() failed

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 41d6547bede80581b384b49bb35eac4fe089d08c
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:50 2012 +0000

    [3/5,librdmacm] rstream: initialize return value in server_connect()

    If use_async == 0 and rs_accept() passes (i.e. non negative value), then
    the return value from the function was uninitialized.

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1f1a03dae14cbb25a43b1b56aa5ae689776edc11
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:49 2012 +0000

    [2/5,librdmacm] rsocket: added missing break

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean  Hefty <sean.hefty@intel.com>

commit eddbe8f0abc3d0f69755f0e510df2a7f21412c0b
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:48 2012 +0000

    [1/5,librdmacm] rsocket: add missing va_end() after calling va_end()

    Not doing so, may lead to resource leak.

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 8a92d0c3c8ce5f513dff974912143f6b0283f8e3
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Oct 4 12:01:50 2012 -0700

    ucmatose: Remove connect parameter passed into rdma_accept

    Pass in NULL for conn_param into rdma_accept to indicate
    that the passive side will use the values specified by the
    active side.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 714af39b2bc2cc54dd2391a0df2c7e54856bc9c7
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Oct 4 11:49:59 2012 -0700

    ucmatose: Fix number of connections to disconnect

    When ucmatose aborts because of issues trying to connect
    to the server, it moves to disconnecting all connections.
    However, not all connections may have been established.
    The result is that ucmatose will hang in disconnect_events.
    Fix this by setting the number of times that we need to
    disconnect to the number of times that we successfully
    connect.

    This problem is based on a report by Doug Ledford
    <dledford@redhat.com>

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 860b1a8784f1846be759eec46770cc723991479c
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Oct 3 15:05:20 2012 -0700

    rping: Reduce retry_count to fit in 3-bits

    retry_count is a 3 bit value on IB, reduce it from
    10 to 7.

    A value of 10 prevents rping from working over the Intel
    IB HCA.  Problem reported by Doug Ledford <dledford@redhat.com>

    The retry_count is also not set when calling rdma_accept.
    Rather than passing different values into rdma_accept than
    what was specified by the remote side, use the values given
    in the connection request.

    Signed-off-by: …
aingerson referenced this pull request in aingerson/libfabric Jan 25, 2017
Add the port of libibverbs/examples/rc_pingpong.c to libfabric.
aingerson referenced this pull request in aingerson/libfabric Jan 25, 2017
Fix crash in uber test:

#0  0x00000000004036fb in fts_cur_info (series=<value optimized out>,
    info=0x60c920) at complex/ft_config.c:613
#1  0x0000000000402568 in ft_fw_client (argc=<value optimized out>,
    argv=<value optimized out>) at complex/ft_main.c:412
#2  main (argc=<value optimized out>, argv=<value optimized out>)
    at complex/ft_main.c:511

The opts strings may be overrun.

Set the default destination port, so that the test can run.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
sydidelot referenced this pull request in sydidelot/libfabric Aug 3, 2017
=================================================================
==849267== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4caa7230 at pc 0x7ffdf8608687 bp 0x7fff4caa71b0 sp 0x7fff4caa71a0
READ of size 8 at 0x7fff4caa7230 thread T0
    #0 0x7ffdf8608686 in fi_tostr_ libfabric-current/src/fi_tostr.c:618
    #1 0x402f3a in run_test_set ofi/libfabric-current/fabtest/unit/size_left_test.c:262
    ofiwg#2 0x403457 in main libfabric-current/fabtest/unit/size_left_test.c:317
    ofiwg#3 0x7ffdf4819b14 in __libc_start_main (/usr/lib64/libc.so.6+0x21b14)
    ofiwg#4 0x401988 in _start (libfabric-1.4.0/ofi_inst/bin/fi_size_left_test+0x401988)
Address 0x7fff4caa7230 is located at offset 32 in frame <run_test_set> of T0's stack:
  This frame has 2 object(s):
    [32, 36) 'ep_type'
    [96, 104) 'info'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow libfabric-current/src/fi_tostr.c:618 fi_tostr_
Shadow bytes around the buggy address:
  0x10006994cdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10006994ce40: 00 00 f1 f1 f1 f1[04]f4 f4 f4 f2 f2 f2 f2 00 f4
  0x10006994ce50: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x10006994ce60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==849267== ABORTING

Change-Id: I90e59ca4127a792718cac5180da33ff2caf66f2b
sydidelot referenced this pull request in sydidelot/libfabric Aug 3, 2017
=================================================================
==849267== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4caa7230 at pc 0x7ffdf8608687 bp 0x7fff4caa71b0 sp 0x7fff4caa71a0
READ of size 8 at 0x7fff4caa7230 thread T0
    #0 0x7ffdf8608686 in fi_tostr_ libfabric-current/src/fi_tostr.c:618
    #1 0x402f3a in run_test_set ofi/libfabric-current/fabtest/unit/size_left_test.c:262
    ofiwg#2 0x403457 in main libfabric-current/fabtest/unit/size_left_test.c:317
    ofiwg#3 0x7ffdf4819b14 in __libc_start_main (/usr/lib64/libc.so.6+0x21b14)
    ofiwg#4 0x401988 in _start (libfabric-1.4.0/ofi_inst/bin/fi_size_left_test+0x401988)
Address 0x7fff4caa7230 is located at offset 32 in frame <run_test_set> of T0's stack:
  This frame has 2 object(s):
    [32, 36) 'ep_type'
    [96, 104) 'info'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow libfabric-current/src/fi_tostr.c:618 fi_tostr_
Shadow bytes around the buggy address:
  0x10006994cdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10006994ce40: 00 00 f1 f1 f1 f1[04]f4 f4 f4 f2 f2 f2 f2 00 f4
  0x10006994ce50: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x10006994ce60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==849267== ABORTING

Change-Id: I90e59ca4127a792718cac5180da33ff2caf66f2b
sydidelot referenced this pull request in sydidelot/libfabric Aug 3, 2017
=================================================================
==849267== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4caa7230 at pc 0x7ffdf8608687 bp 0x7fff4caa71b0 sp 0x7fff4caa71a0
READ of size 8 at 0x7fff4caa7230 thread T0
    #0 0x7ffdf8608686 in fi_tostr_ libfabric-current/src/fi_tostr.c:618
    #1 0x402f3a in run_test_set ofi/libfabric-current/fabtest/unit/size_left_test.c:262
    ofiwg#2 0x403457 in main libfabric-current/fabtest/unit/size_left_test.c:317
    ofiwg#3 0x7ffdf4819b14 in __libc_start_main (/usr/lib64/libc.so.6+0x21b14)
    ofiwg#4 0x401988 in _start (libfabric-1.4.0/ofi_inst/bin/fi_size_left_test+0x401988)
Address 0x7fff4caa7230 is located at offset 32 in frame <run_test_set> of T0's stack:
  This frame has 2 object(s):
    [32, 36) 'ep_type'
    [96, 104) 'info'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow libfabric-current/src/fi_tostr.c:618 fi_tostr_
Shadow bytes around the buggy address:
  0x10006994cdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10006994ce40: 00 00 f1 f1 f1 f1[04]f4 f4 f4 f2 f2 f2 f2 00 f4
  0x10006994ce50: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x10006994ce60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==849267== ABORTING

Change-Id: I90e59ca4127a792718cac5180da33ff2caf66f2b
sydidelot referenced this pull request in sydidelot/libfabric Aug 3, 2017
=================================================================
==849267== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4caa7230 at pc 0x7ffdf8608687 bp 0x7fff4caa71b0 sp 0x7fff4caa71a0
READ of size 8 at 0x7fff4caa7230 thread T0
    #0 0x7ffdf8608686 in fi_tostr_ libfabric-current/src/fi_tostr.c:618
    #1 0x402f3a in run_test_set ofi/libfabric-current/fabtest/unit/size_left_test.c:262
    ofiwg#2 0x403457 in main libfabric-current/fabtest/unit/size_left_test.c:317
    ofiwg#3 0x7ffdf4819b14 in __libc_start_main (/usr/lib64/libc.so.6+0x21b14)
    ofiwg#4 0x401988 in _start (libfabric-1.4.0/ofi_inst/bin/fi_size_left_test+0x401988)
Address 0x7fff4caa7230 is located at offset 32 in frame <run_test_set> of T0's stack:
  This frame has 2 object(s):
    [32, 36) 'ep_type'
    [96, 104) 'info'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow libfabric-current/src/fi_tostr.c:618 fi_tostr_
Shadow bytes around the buggy address:
  0x10006994cdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10006994ce40: 00 00 f1 f1 f1 f1[04]f4 f4 f4 f2 f2 f2 f2 00 f4
  0x10006994ce50: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x10006994ce60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10006994ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:     fa
  Heap righ redzone:     fb
  Freed Heap region:     fd
  Stack left redzone:    f1
  Stack mid redzone:     f2
  Stack right redzone:   f3
  Stack partial redzone: f4
  Stack after return:    f5
  Stack use after scope: f8
  Global redzone:        f9
  Global init order:     f6
  Poisoned by user:      f7
  ASan internal:         fe
==849267== ABORTING

Change-Id: I90e59ca4127a792718cac5180da33ff2caf66f2b
shefty added a commit that referenced this pull request Jul 28, 2020
Problem reported by Address Sanitizer:

=================================================================
    ==25220==ERROR: AddressSanitizer: heap-use-after-free on address 0x6270000072e0 at pc 0x00010b926a3c bp 0x700001bd1c30 sp 0x700001bd1c28
    READ of size 4 at 0x6270000072e0 thread T4
        #0 0x10b926a3b in sock_conn_listener_thread (libfabric.1.dylib:x86_64+0xdca3b)
        #1 0x7fff7e2d5660 in _pthread_body (libsystem_pthread.dylib:x86_64+0x3660)
        #2 0x7fff7e2d550c in _pthread_start (libsystem_pthread.dylib:x86_64+0x350c)
        #3 0x7fff7e2d4bf8 in thread_start (libsystem_pthread.dylib:x86_64+0x2bf8)

    0x6270000072e0 is located 480 bytes inside of 12944-byte region [0x627000007100,0x62700000a390)
    freed by thread T0 here:
        #0 0x10baf1a9d in wrap_free (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56a9d)
        #1 0x10b9016bf in sock_ep_close (libfabric.1.dylib:x86_64+0xb76bf)
        #2 0x10b7f4a8f in fi_close fabric.h:593
        #3 0x10b7f4209 in main shared_ctx.c:649
        #4 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014)

    previously allocated by thread T0 here:
        #0 0x10baf1e27 in wrap_calloc (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56e27)
        #1 0x10b906df4 in sock_alloc_endpoint (libfabric.1.dylib:x86_64+0xbcdf4)
        #2 0x10b8f7fdb in sock_msg_ep (libfabric.1.dylib:x86_64+0xadfdb)
        #3 0x10b7f7c93 in fi_endpoint fi_endpoint.h:164
        #4 0x10b7f5e40 in server_connect shared_ctx.c:471
        #5 0x10b7f49ba in run shared_ctx.c:573
        #6 0x10b7f411b in main shared_ctx.c:647
        #7 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014)

    Thread T4 created by T0 here:
        #0 0x10bae999d in wrap_pthread_create (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x4e99d)
        #1 0x10b925f9b in sock_conn_start_listener_thread (libfabric.1.dylib:x86_64+0xdbf9b)
        #2 0x10b8e7eb2 in sock_domain (libfabric.1.dylib:x86_64+0x9deb2)
        #3 0x10b7f87d3 in fi_domain fi_domain.h:306
        #4 0x10b7f5c9f in server_connect shared_ctx.c:460
        #5 0x10b7f49ba in run shared_ctx.c:573
        #6 0x10b7f411b in main shared_ctx.c:647
        #7 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014)

The issue shows up more frequently on OS X, which emulates epoll.  However, I believe the
problem could occur on any platform.

In sock_ep_close, we remove the socket from the epoll fd, then free the endpoint.
However, if the listener thread has received an event on the socket, but has not
yet started processing it, then a race can occur.  The listener thread could have
returned from ofi_epoll_wait, but suspended trying to acquire the signal_lock.
The signal_lock is acquired from sock_ep_close, where ofi_epoll_del is called, then
released.  The endpoint is then freed.  The listener thread can now acquire the
signal_lock, where it will attempt to access the freed endpoint data.

To avoid the race, we add a change boolean to the listener.  That boolean is
only changed while holding the signal_lock.  When a socket is removed from the
epollfd, we mark the listener state as 'changed'.  The listener thread checks the
changed state prior to processing any events.  If set, it clears the state, and
calls ofi_epoll_wait again to get a new set of events to process.

Note that this works for epoll set to level-triggered (poll semantics).
Sockets that reported events will report those same events when wait is called
a second time.  Sockets which were removed from the epoll set would have their
events removed, as they are no longer being monitored.

This fix is applied both to the listener thread and cm thread.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Honggang-LI added a commit to Honggang-LI/libfabric that referenced this pull request Dec 17, 2020
ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4c61e7e0 at pc 0x14f2cb7ae0b9 bp 0x7fff4c61e650 sp 0x7fff4c61ddd8
WRITE of size 17 at 0x7fff4c61e7e0 thread T0
    #0 0x14f2cb7ae0b8  (/lib64/libasan.so.5+0xb40b8)
    ofiwg#1 0x14f2cb7aedd2 in vsscanf (/lib64/libasan.so.5+0xb4dd2)
    ofiwg#2 0x14f2cb7aeede in __interceptor_sscanf (/lib64/libasan.so.5+0xb4ede)
    ofiwg#3 0x14f2cb230766 in ofi_addr_format src/common.c:401
    ofiwg#4 0x14f2cb233238 in ofi_str_toaddr src/common.c:780
    ofiwg#5 0x14f2cb314332 in vrb_handle_ib_ud_addr prov/verbs/src/verbs_info.c:1670
    ofiwg#6 0x14f2cb314332 in vrb_get_match_infos prov/verbs/src/verbs_info.c:1787
    ofiwg#7 0x14f2cb314332 in vrb_getinfo prov/verbs/src/verbs_info.c:1841
    ofiwg#8 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010
    ofiwg#9 0x14f2cb25fcc0 in ofi_get_core_info prov/util/src/util_attr.c:298
    ofiwg#10 0x14f2cb269b20 in ofix_getinfo prov/util/src/util_attr.c:321
    ofiwg#11 0x14f2cb3e29fd in rxd_getinfo prov/rxd/src/rxd_init.c:122
    ofiwg#12 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010
    ofiwg#13 0x407150 in ft_getinfo common/shared.c:794
    ofiwg#14 0x414917 in ft_init_fabric common/shared.c:1042
    ofiwg#15 0x402f40 in run functional/bw.c:155
    ofiwg#16 0x402f40 in main functional/bw.c:252
    ofiwg#17 0x14f2ca1b28e2 in __libc_start_main (/lib64/libc.so.6+0x238e2)
    ofiwg#18 0x401d1d in _start (/root/libfabric/fabtests/functional/fi_bw+0x401d1d)

Address 0x7fff4c61e7e0 is located in stack of thread T0 at offset 48 in frame
    #0 0x14f2cb2306f3 in ofi_addr_format src/common.c:397

  This frame has 1 object(s):
    [32, 48) 'fmt' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.5+0xb40b8)
Shadow bytes around the buggy address:
  0x1000698bbca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000698bbcf0: 00 00 00 00 00 00 f1 f1 f1 f1 00 00[f2]f2 f3 f3
  0x1000698bbd00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  0x1000698bbd10: f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2
  0x1000698bbd20: f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2
  0x1000698bbd30: f2 f2 00 00 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00
  0x1000698bbd40: 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

Fixes: 5d31276 ("common: Redo address string conversions")
Signed-off-by: Honggang Li <honli@redhat.com>
shefty pushed a commit that referenced this pull request Dec 19, 2020
ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4c61e7e0 at pc 0x14f2cb7ae0b9 bp 0x7fff4c61e650 sp 0x7fff4c61ddd8
WRITE of size 17 at 0x7fff4c61e7e0 thread T0
    #0 0x14f2cb7ae0b8  (/lib64/libasan.so.5+0xb40b8)
    #1 0x14f2cb7aedd2 in vsscanf (/lib64/libasan.so.5+0xb4dd2)
    #2 0x14f2cb7aeede in __interceptor_sscanf (/lib64/libasan.so.5+0xb4ede)
    #3 0x14f2cb230766 in ofi_addr_format src/common.c:401
    #4 0x14f2cb233238 in ofi_str_toaddr src/common.c:780
    #5 0x14f2cb314332 in vrb_handle_ib_ud_addr prov/verbs/src/verbs_info.c:1670
    #6 0x14f2cb314332 in vrb_get_match_infos prov/verbs/src/verbs_info.c:1787
    #7 0x14f2cb314332 in vrb_getinfo prov/verbs/src/verbs_info.c:1841
    #8 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010
    #9 0x14f2cb25fcc0 in ofi_get_core_info prov/util/src/util_attr.c:298
    #10 0x14f2cb269b20 in ofix_getinfo prov/util/src/util_attr.c:321
    #11 0x14f2cb3e29fd in rxd_getinfo prov/rxd/src/rxd_init.c:122
    #12 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010
    #13 0x407150 in ft_getinfo common/shared.c:794
    #14 0x414917 in ft_init_fabric common/shared.c:1042
    #15 0x402f40 in run functional/bw.c:155
    #16 0x402f40 in main functional/bw.c:252
    #17 0x14f2ca1b28e2 in __libc_start_main (/lib64/libc.so.6+0x238e2)
    #18 0x401d1d in _start (/root/libfabric/fabtests/functional/fi_bw+0x401d1d)

Address 0x7fff4c61e7e0 is located in stack of thread T0 at offset 48 in frame
    #0 0x14f2cb2306f3 in ofi_addr_format src/common.c:397

  This frame has 1 object(s):
    [32, 48) 'fmt' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.5+0xb40b8)
Shadow bytes around the buggy address:
  0x1000698bbca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbcd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000698bbce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000698bbcf0: 00 00 00 00 00 00 f1 f1 f1 f1 00 00[f2]f2 f3 f3
  0x1000698bbd00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  0x1000698bbd10: f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2
  0x1000698bbd20: f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2
  0x1000698bbd30: f2 f2 00 00 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00
  0x1000698bbd40: 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

Fixes: 5d31276 ("common: Redo address string conversions")
Signed-off-by: Honggang Li <honli@redhat.com>
JuliaRS pushed a commit to JuliaRS/libfabric that referenced this pull request Feb 11, 2021
I'm not entirely sure if it is fixes the issue our QA is seeing
(as they get err_entry.err=-104 - a wrong negative value), but
with error injection I could easily trigger a use-after-free
with the root from this function (with err_entry.err=104, though,
so I still don't know where the wrong error sign came from).

In my error injection reproducer ofi_send_socket() fails sometimes,
which then triggers free of cm_ctx without removing the fd and
cm_ctx from polling. Next poll round will then access cm_ctx and
trigger a use-after-free.

client_send_connreq
    tx_cm_data
        ofi_send_socket -> fails
    goto err
    ...
err:
    free(cm_ctx)

ASAN reports

READ of size 4 at 0x6120000106c8 thread T4 (rpc_poll-0)
    #0 0x7f77005e0f21 in process_cm_ctx prov/tcp/src/tcpx_conn_mgr.c:482
    ofiwg#1 0x7f77005e15ef in tcpx_conn_mgr_run prov/tcp/src/tcpx_conn_mgr.c:535
    ofiwg#2 0x7f77005fc429 in tcpx_eq_read prov/tcp/src/tcpx_eq.c:48
    ofiwg#3 0x4926dd in fi_eq_read /home/bschubert/local/rhel7/libfabric/include/rdma/fi_eq.h:352

0x6120000106c8 is located 8 bytes inside of 280-byte region [0x6120000106c0,0x6120000107d8)
freed by thread T4 (rpc_poll-0) here:
    #0 0x7f77015915e7 in __interceptor_free
    ofiwg#1 0x7f77005e083b in client_send_connreq prov/tcp/src/tcpx_conn_mgr.c:422
    ofiwg#2 0x7f77005e0f7e in process_cm_ctx prov/tcp/src/tcpx_conn_mgr.c:487
    ofiwg#3 0x7f77005e15ef in tcpx_conn_mgr_run prov/tcp/src/tcpx_conn_mgr.c:535
    ofiwg#4 0x7f77005fc429 in tcpx_eq_read prov/tcp/src/tcpx_eq.c:48

previously allocated by thread T5 (rpc_conn_mgr) here:
    #0 0x7f7701591b7e in __interceptor_calloc
    ofiwg#1 0x7f77005edb5c in tcpx_ep_connect prov/tcp/src/tcpx_ep.c:103
    ofiwg#2 0x478b2f in fi_connect /home/bschubert/local/rhel7/libfabric/include/rdma/fi_cm.h:98

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
JuliaRS pushed a commit to JuliaRS/libfabric that referenced this pull request Feb 11, 2021
Problem reported by Address Sanitizer:

=================================================================
    ==25220==ERROR: AddressSanitizer: heap-use-after-free on address 0x6270000072e0 at pc 0x00010b926a3c bp 0x700001bd1c30 sp 0x700001bd1c28
    READ of size 4 at 0x6270000072e0 thread T4
        #0 0x10b926a3b in sock_conn_listener_thread (libfabric.1.dylib:x86_64+0xdca3b)
        ofiwg#1 0x7fff7e2d5660 in _pthread_body (libsystem_pthread.dylib:x86_64+0x3660)
        ofiwg#2 0x7fff7e2d550c in _pthread_start (libsystem_pthread.dylib:x86_64+0x350c)
        ofiwg#3 0x7fff7e2d4bf8 in thread_start (libsystem_pthread.dylib:x86_64+0x2bf8)

    0x6270000072e0 is located 480 bytes inside of 12944-byte region [0x627000007100,0x62700000a390)
    freed by thread T0 here:
        #0 0x10baf1a9d in wrap_free (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56a9d)
        ofiwg#1 0x10b9016bf in sock_ep_close (libfabric.1.dylib:x86_64+0xb76bf)
        ofiwg#2 0x10b7f4a8f in fi_close fabric.h:593
        ofiwg#3 0x10b7f4209 in main shared_ctx.c:649
        ofiwg#4 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014)

    previously allocated by thread T0 here:
        #0 0x10baf1e27 in wrap_calloc (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56e27)
        ofiwg#1 0x10b906df4 in sock_alloc_endpoint (libfabric.1.dylib:x86_64+0xbcdf4)
        ofiwg#2 0x10b8f7fdb in sock_msg_ep (libfabric.1.dylib:x86_64+0xadfdb)
        ofiwg#3 0x10b7f7c93 in fi_endpoint fi_endpoint.h:164
        ofiwg#4 0x10b7f5e40 in server_connect shared_ctx.c:471
        ofiwg#5 0x10b7f49ba in run shared_ctx.c:573
        ofiwg#6 0x10b7f411b in main shared_ctx.c:647
        ofiwg#7 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014)

    Thread T4 created by T0 here:
        #0 0x10bae999d in wrap_pthread_create (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x4e99d)
        ofiwg#1 0x10b925f9b in sock_conn_start_listener_thread (libfabric.1.dylib:x86_64+0xdbf9b)
        ofiwg#2 0x10b8e7eb2 in sock_domain (libfabric.1.dylib:x86_64+0x9deb2)
        ofiwg#3 0x10b7f87d3 in fi_domain fi_domain.h:306
        ofiwg#4 0x10b7f5c9f in server_connect shared_ctx.c:460
        ofiwg#5 0x10b7f49ba in run shared_ctx.c:573
        ofiwg#6 0x10b7f411b in main shared_ctx.c:647
        ofiwg#7 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014)

The issue shows up more frequently on OS X, which emulates epoll.  However, I believe the
problem could occur on any platform.

In sock_ep_close, we remove the socket from the epoll fd, then free the endpoint.
However, if the listener thread has received an event on the socket, but has not
yet started processing it, then a race can occur.  The listener thread could have
returned from ofi_epoll_wait, but suspended trying to acquire the signal_lock.
The signal_lock is acquired from sock_ep_close, where ofi_epoll_del is called, then
released.  The endpoint is then freed.  The listener thread can now acquire the
signal_lock, where it will attempt to access the freed endpoint data.

To avoid the race, we add a change boolean to the listener.  That boolean is
only changed while holding the signal_lock.  When a socket is removed from the
epollfd, we mark the listener state as 'changed'.  The listener thread checks the
changed state prior to processing any events.  If set, it clears the state, and
calls ofi_epoll_wait again to get a new set of events to process.

Note that this works for epoll set to level-triggered (poll semantics).
Sockets that reported events will report those same events when wait is called
a second time.  Sockets which were removed from the epoll set would have their
events removed, as they are no longer being monitored.

This fix is applied both to the listener thread and cm thread.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
shefty pushed a commit that referenced this pull request Feb 1, 2022
Utility providers have to call fi_getinfo again to get core providers
resulting in deceptive and confusing log lines where a core provider
might return FI_ENODATA for a utility provider but FI_SUCCESS for the
app. Extra log levels were added that say Begin/End ofi_get_core_info
to make this clearer but these debug-only (not info) logs can get lost
among the hundreds of lines of output.

To make it easier to distinguish between log lines with and without a
core provider, specifically during fi_getinfo, add a log_prefix to the
log output which clarifies that the log line was outputed as part of
the layered fi_getinfo call

For example, the following log line sees changes as such:
libfabric:53685:1643663041:verbs:fabric:vrb_get_matching_info():1514<info> checking domain: #1 mlx5_0
libfabric:53685:1643663041:ofi_rxm:verbs:fabric:vrb_get_matching_info():1514<info> checking domain: #1 mlx5_0

Signed-off-by: aingerson <alexia.ingerson@intel.com>
BrendanCunningham referenced this pull request in cornelisnetworks/libfabric Mar 11, 2022
1. opx_shm_tx_next() did not detect when the FIFO was full. This would
   cause hangs as unread packets would be overwritten by new ones. This
   was caused by a simple one-line bug in opx_shm_tx_next().

2. Reduced the size of the FIFOs which were enlarged by an earlier
   work-around to the hangs caused by #1. The new size, 1024 entries, creates
   a memory-mapped file 8 megabytes long rather than 64 megabytes long.
   Since there's one of these FIFOs for every OPX process on a node,
   64 megabytes each can be expensive.

   To verify that the fix to opx_shm_tx_next() was correct I
   experiemented with reducing the FIFO size to as small as 64 entries
   and verified that while making the FIFO so small would cause a large loss
   of performance MPI applications still ran to completion.

3. Renamed fi_opx_shm_poll_once to fi_opx_shm_poll_many because, well,
   calling it will process every packet in the FIFO, not just just one.

Signed-off-by: Tim Thompson <tim.thompson@cornelisnetworks.com>
BrendanCunningham referenced this pull request in cornelisnetworks/libfabric Mar 14, 2022
1. opx_shm_tx_next() did not detect when the FIFO was full. This would
   cause hangs as unread packets would be overwritten by new ones. This
   was caused by a simple one-line bug in opx_shm_tx_next().

2. Reduced the size of the FIFOs which were enlarged by an earlier
   work-around to the hangs caused by #1. The new size, 1024 entries, creates
   a memory-mapped file 8 megabytes long rather than 64 megabytes long.
   Since there's one of these FIFOs for every OPX process on a node,
   64 megabytes each can be expensive.

   To verify that the fix to opx_shm_tx_next() was correct I
   experiemented with reducing the FIFO size to as small as 64 entries
   and verified that while making the FIFO so small would cause a large loss
   of performance MPI applications still ran to completion.

3. Renamed fi_opx_shm_poll_once to fi_opx_shm_poll_many because, well,
   calling it will process every packet in the FIFO, not just just one.

Signed-off-by: Tim Thompson <tim.thompson@cornelisnetworks.com>
mwheinz referenced this pull request in cornelisnetworks/libfabric Mar 15, 2022
1. opx_shm_tx_next() did not detect when the FIFO was full. This would
   cause hangs as unread packets would be overwritten by new ones. This
   was caused by a simple one-line bug in opx_shm_tx_next().

2. Reduced the size of the FIFOs which were enlarged by an earlier
   work-around to the hangs caused by #1. The new size, 1024 entries, creates
   a memory-mapped file 8 megabytes long rather than 64 megabytes long.
   Since there's one of these FIFOs for every OPX process on a node,
   64 megabytes each can be expensive.

   To verify that the fix to opx_shm_tx_next() was correct I
   experiemented with reducing the FIFO size to as small as 64 entries
   and verified that while making the FIFO so small would cause a large loss
   of performance MPI applications still ran to completion.

3. Renamed fi_opx_shm_poll_once to fi_opx_shm_poll_many because, well,
   calling it will process every packet in the FIFO, not just just one.

Signed-off-by: Tim Thompson <tim.thompson@cornelisnetworks.com>
Author: Michael Heinz <mheinz@cornelisnetworks.com>
mwheinz referenced this pull request in cornelisnetworks/libfabric Mar 15, 2022
1. opx_shm_tx_next() did not detect when the FIFO was full. This would
   cause hangs as unread packets would be overwritten by new ones. This
   was caused by a simple one-line bug in opx_shm_tx_next().

2. Reduced the size of the FIFOs which were enlarged by an earlier
   work-around to the hangs caused by #1. The new size, 1024 entries, creates
   a memory-mapped file 8 megabytes long rather than 64 megabytes long.
   Since there's one of these FIFOs for every OPX process on a node,
   64 megabytes each can be expensive.

   To verify that the fix to opx_shm_tx_next() was correct I
   experiemented with reducing the FIFO size to as small as 64 entries
   and verified that while making the FIFO so small would cause a large loss
   of performance MPI applications still ran to completion.

3. Renamed fi_opx_shm_poll_once to fi_opx_shm_poll_many because, well,
   calling it will process every packet in the FIFO, not just just one.

Signed-off-by: Tim Thompson <tim.thompson@cornelisnetworks.com>
Author: Michael Heinz <mheinz@cornelisnetworks.com>
jtamzn pushed a commit to jtamzn/libfabric that referenced this pull request Oct 19, 2022
Utility providers have to call fi_getinfo again to get core providers
resulting in deceptive and confusing log lines where a core provider
might return FI_ENODATA for a utility provider but FI_SUCCESS for the
app. Extra log levels were added that say Begin/End ofi_get_core_info
to make this clearer but these debug-only (not info) logs can get lost
among the hundreds of lines of output.

To make it easier to distinguish between log lines with and without a
core provider, specifically during fi_getinfo, add a log_prefix to the
log output which clarifies that the log line was outputed as part of
the layered fi_getinfo call

For example, the following log line sees changes as such:
libfabric:53685:1643663041:verbs:fabric:vrb_get_matching_info():1514<info> checking domain: ofiwg#1 mlx5_0
libfabric:53685:1643663041:ofi_rxm:verbs:fabric:vrb_get_matching_info():1514<info> checking domain: ofiwg#1 mlx5_0

Signed-off-by: aingerson <alexia.ingerson@intel.com>
shefty added a commit that referenced this pull request Feb 10, 2023
If a posted receive matches with a saved receive, we may need to
increment the rx counter.  Set the rx counter increment callback
to match that of the posted receive.  This fixes an assert in
xnet_cntr_inc() accessing a NULL cntr_inc function pointer.

Program received signal SIGABRT, Aborted.
0x0000155552d4d37f in raise () from /lib64/libc.so.6
#0  0x0000155552d4d37f in raise () from /lib64/libc.so.6
#1  0x0000155552d37db5 in abort () from /lib64/libc.so.6
#2  0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6
#4  0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347
#5  0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354
#6  0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153
#7  0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188
#8  0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445
#9  0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558
#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91
#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants