redo patch for data (transfer) operations #4

jeffhammond · 2014-08-25T22:32:32Z

"data operations" was not defined or used anywhere else, whereas "data transfer operations" is/was.

redo patch for data (transfer) operations

commit 42a1f96809d0dfb72e1abaad3923761eba4c6fe2 Merge: dc1317b fca6e10 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Aug 8 11:53:16 2014 -0700 Merge branch 'dev' commit fca6e10a83eb592135fd47bc73600c7a955ca2b5 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 7 15:43:00 2014 -0700 Release 1.0.19-1 hotfix commit dc1317b5668200bf0947dcac21a4d95959d333b3 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Aug 4 10:01:31 2014 -0700 indexer: Include errno.h directly Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 064c9cb1bddbab9e6d54ba301facfae7e1992455 Author: Ilya Nelkenbaum <ilyan@mellanox.com> Date: Mon Jul 28 15:48:09 2014 +0300 rsocket: Segmentation fault fix in case of multiple connections In case of more than 16 rsocket connections are established, "svc->rss" buffer is reallocated with more memory. Index 0 is reserved for the service's communication socket, and this is not taken in count when data is copied from old buffer location to new one. Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a7287adaea52d21cd2d50f1621f8eda37c4c3c90 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 22 23:24:53 2014 -0700 udpong: Fix client_recv error check We only want to report an error if it's not EGAIN. The if statement is reversed. Correct it. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 806de778b1fe665dee2f62c7bf7211ab9bd2d53f Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jul 16 15:49:16 2014 -0700 Release 1.0.19 commit 8f53f2a5d3cb5d6c30fe5695b48268ea1bbe2ff0 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jul 16 13:44:56 2014 -0700 riostream: Only verify last data transfer Data verification will fail when running the bandwidth tests or the transfer count is > 1. The issue is that subsequent writes by the initiator side will overwrite the data in the target buffer before the receiver can verify that it is correct. To fix this, only verify that the data in the buffer is correct after the last transfer has completed. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit c4f8e22a6d078fa914cd4102d65fa854587e1248 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jul 7 08:40:44 2014 -0700 Revert "Revert "rsocket: Change keepalive to 0-byte RDMA write"" This reverts commit a34703c53259845dd20450a87eb6747030e23e8b. 0-byte RDMA writes appears to be working correctly with HCAs from 2 different vendors. The original problem that was reported turned out to be a user error. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit fa85dc408e28afd67b81c3a590fd874ef6fdc63a Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Jul 3 13:45:52 2014 -0700 rsocket: Update correct rsocket keepalive time When the keepalive time of an rsocket is updated, the updated information is forwarded to the keepalive service thread. However, the thread updates the time for the wrong service as shown: tcp_svc_timeouts[svc->cnt] = rs_get_time() + msg.rs->keepalive_time; The index into tcp_svc_timeouts should correspond to the rsocket being updated, not the last one in the list. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1695abfa9f6bf429a5aa07117310c4ad87d4b3ae Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Jul 3 13:55:39 2014 -0700 rsocket: Fix removing rsocket from service thread When removing an rsocket from a service thread, we replace the removed service with the one at the end of the service list. This keeps the array tightly packed. However, rs_svc_rm_rs decrements the rsocket count before doing the swap. The result is that the entry at the end of the list gets dropped off. Defer decrementing the count until the swap has been made. In this case, the cnt value is a valid index into the array, because we start at index 1. Index 0 is used internally by the service thread. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 9085562c22189850e1f16b9a9955f11e79caac06 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jul 2 15:37:10 2014 -0700 rsocket: Fix crash resulting from keepalive timeout The following crash was reported by Hal Rosenstock, <hal@mellanox.com>, with keepalive enabled. The crash occurs in the keepalive thread attempting to send a keepalive message. report: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffecf08700 (LWP 6013)] rs_post_write (rs=<value optimized out>, sgl=0x0, nsge=0, wr_data=3758096385, flags=0, addr=0, rkey=0) at src/rsocket.c:1660 1660 return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad)); Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 (gdb) (gdb) p/x rs $1 = value has been optimized out So I added in the following to debug: 1660 if (rs == NULL) 1661 abort(); 1662 if (rs->cm_id == NULL) 1663 abort(); 1664 if (rs->cm_id->qp == NULL) 1665 abort(); 1666 return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad)); 1667 } And saw in gdb: Program received signal SIGABRT, Aborted. [Switching to Thread 0x7fffecf08700 (LWP 8096)] 0x00000030d50328a5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 (gdb) (gdb) bt #0 0x00000030d50328a5 in raise () from /lib64/libc.so.6 #1 0x00000030d5034085 in abort () from /lib64/libc.so.6 #2 0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0, nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665 #3 0x00007ffff058193d in tcp_svc_send_keepalive (arg=0x7ffff0789f20) at src/rsocket.c:4245 #4 tcp_svc_run (arg=0x7ffff0789f20) at src/rsocket.c:4279 #5 0x00000030d5807851 in start_thread () from /lib64/libpthread.so.0 #6 0x00000030d50e890d in clone () from /lib64/libc.so.6 (gdb) fr 2 #2 0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0, nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665 1665 abort(); So qp is NULL somehow... :end report There is an issue if an rsocket is closed without going through the rshutdown. int rshutdown(int socket, int how) { ... if (rs->opts & RS_OPT_SVC_ACTIVE) rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE); We remove the rsocket from the keepalive thread in rshutdown. int rclose(int socket) { ... if (rs->state & rs_connected) rshutdown(socket, SHUT_RDWR); ... rs_free(rs); rclose will call shutdown only if we're connected. However, if the keepalive failed, the socket will be in an error state. So, no call to rshutdown, which will leave the freed rsocket on the keepalive thread's list. The fix is to to have rclose remove an rsocket from being processed by a service thread if it is still active. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 310f630ac87f1deee1534ab405d5b771b801c25d Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 22:52:40 2014 -0700 example/rdma_xclient/server: Update XRC support in sample programs Update rdma_xclient and rdma_xserver sample programs to test XRC data transfers. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 5662340a12429f8882be36d8787924be91a1cb74 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 22:56:43 2014 -0700 rdmacm: Update addrinfo with XRC support Remove internal defines, and use libibverbs exported values instead. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 995eb0c90c1a0967179fe3f523861e15300d3dfa Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 17:47:22 2014 -0700 rdmacm: Add support for XRC QPs Export a new extended create QP call. Add support for XRC QPs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 05eabc5335b95ab9d0d6a6132092fac6e1af1cc5 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 17:14:13 2014 -0700 rdmacm: Add support for allocating XRC SRQs Add extended SRQ creation call, to support allocating XRC SRQs. Use the rdma_cm_id qp type field to determine which type of SRQ should be allocated. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 89a782a52a48db38d917084233006fb91cbd0694 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 16:46:34 2014 -0700 rdmacm: Add functionality to allocate an XRCD XRC QPs and SRQs are associated by an XRC domain. Provide a call to allocate an XRCD, similar to how the rdmacm allocates a PD for the user. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit f916b9b6bfbcd86b5326d84c0dfa106ddc9c907c Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 16:17:30 2014 -0700 build: Add build support for XRC Modify autotools to check for and require a libibverbs version that includes XRC and extension support. Remove any code used to support older versions of libibverbs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 0cd1e9b0e7a2d438a0f1004e6c6ff1b6785c4038 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 13:30:42 2014 -0700 librdmacm: Use SRQ in rdma_create_qp If an application has allocated an SRQ on an rdma_cm_id, use it when creating a QP. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 3e1fc1cfad65c83a05c8550d8e359c8b9223d859 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jun 25 12:56:18 2014 -0700 librdmacm: Remove NULL checks after calling alloca alloca doesn't return a NULL pointer on failure. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a34703c53259845dd20450a87eb6747030e23e8b Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Jun 20 17:44:26 2014 -0700 Revert "rsocket: Change keepalive to 0-byte RDMA write" This reverts commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e. Testing has shown that this does not always result in the keep-alive message working correctly, such that a broken connection is reported as having failed. The reason for this behavior is unknown, but revert the patch until the issue has been resolved. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 7b1eb6407f1f7a953673ab23a2d75f8a3cd8dbb9 Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Thu Jun 19 13:08:02 2014 -0400 librdmacm: In ucma_convert_path, fix selector values Intent is for the selectors to be equal to (exactly) rather than less than. Selector for exactly is value of 2 rather than 1. Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 7f0fbf984a5140efb76f93fef1f35202c617249d Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Thu Jun 19 11:54:11 2014 -0400 rsocket: Add support for RDMA_ROUTE option in rgetsockopt Create as many ibv_path_data structs from the RDMA route ibv_sa_path_rec struct for the rsocket based on how many fit into the supplied buffer. Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 106899eccc5fa61dd5e69c90bc0651ccd57e725f Merge: 6c7d6d3 0f2c76e Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jun 18 11:56:42 2014 -0700 Merge branch 'dev' commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e Author: Susan K. Coulter <markus@cj-fe1.lanl.gov> Date: Mon Jun 16 10:28:08 2014 -0700 rsocket: Change keepalive to 0-byte RDMA write Signed-off-by: Susan K. Coulter <markus@cj-fe1.lanl.gov> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6c7d6d3038524c275ecfb7468b4455fe2cc39a19 Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:45:23 2014 -0700 rdma_server: handle IBV_SEND_INLINE correctly Not all RDMA devices support IBV_SEND_INLINE. At least some of those that don't will ignore the flag passed to rdma_post_send and attempt to send the command by using an sge entry instead. Because we don't register the send memory, this fails. The proper way to deal with the fact that IBV_SEND_INLINE is not guaranteed is to check the returned value in our cap struct to see if we have support for inline data, and if not, fall back to non-inline sends and to register the send memory region. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 9fe390a793203a13b0507472848e1e7da8c75bed Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:44:49 2014 -0700 rdma_client: handle IBV_SEND_INLINE correctly Not all RDMA devices support IBV_SEND_INLINE. At least some of those that don't will ignore the flag passed to rdma_post_send and attempt to send the command by using an sge entry instead. Because we don't register the send memory, this fails. The proper way to deal with the fact that IBV_SEND_INLINE is not guaranteed is to check the returned value in our cap struct to see if we have support for inline data, and if not, fall back to non-inline sends and to register the send memory region. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 2c2e44e144f17c2cef4af052ec91a680c9a81fb9 Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:44:28 2014 -0700 rdma_server: use perror, unwind allocs on failure Our main test function prints out errno directly, which is hard to read as it's not decoded at all. Instead, use perror() to make failures more readable. Also redo the failure flow so that we can do a simple unwind at the end of the function and just jump to the right unwind spot on error. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1bc834aeca99a4dd0c5bea733e2735f148b4418c Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:44:13 2014 -0700 rdma_client: use perror, unwind allocs on failure Our main test function prints out errno directly, which is hard to read as it's not decoded at all. Instead, use perror() to make failures more readable. Also redo the failure flow so that we can do a simple unwind at the end of the function and just jump to the right unwind spot on error. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 05fc15b44805a23a4e8562d1953074243950dfbe Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:43:04 2014 -0700 cmtime: rework program to be multithread When using very large numbers of connections (10,000 was in use here), we ran into a problem where when we resolved a performance problem in the kernel cma.c code, we suddenly developed a new problem. That new problem turned out to be the fact that with the underlying kernel issue resolved, 10,000 connect requests would flood the server side of the test and the cmtime application would respond as quickly as possible. However, the client side would not bother to check any of the returns until after having sent all 10,000 connect requests. When the kernel had a serializing performance problem, this was OK. When it was fixed, this caused a general slowdown in connect operations due to overruns in the event processing. This patch causes the client side to fire off threads that will handle responses to connect requests as they come in instead of allowing them to backlog uncontrollably. Times for a 10,000 connect run changed from this: [root@rdma-dev-01 ~]# more 3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+.output ib1: step total ms max ms min us us / conn create id : 46.64 0.10 1.00 4.66 bind addr : 89.61 0.04 7.00 8.96 resolve addr : 50.63 26.18 23976.00 5.06 resolve route: 565.44 538.77 26736.00 56.54 create qp : 4028.31 5.70 326.00 402.83 connect : 50077.42 49990.49 90734.00 5007.74 disconnect : 5277.25 4850.35 380017.00 527.72 destroy : 42.15 0.04 2.00 4.21 ib0: step total ms max ms min us us / conn create id : 34.82 0.04 1.00 3.48 bind addr : 25.94 0.02 1.00 2.59 resolve addr : 48.18 25.01 22779.00 4.82 resolve route: 501.28 476.26 25071.00 50.13 create qp : 3274.12 6.05 257.00 327.41 connect : 55549.64 55490.32 62150.00 5554.96 disconnect : 5263.64 4851.18 375628.00 526.36 destroy : 47.20 0.07 2.00 4.72 to this: [root@rdma-dev-01 ~]# more 3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+-fixed-cmtime.output ib1: step total ms max ms min us us / conn create id : 34.45 0.08 1.00 3.44 bind addr : 88.41 0.04 7.00 8.84 resolve addr : 33.59 4.65 612.00 3.36 resolve route: 618.68 0.61 97.00 61.87 create qp : 4024.03 6.30 341.00 402.40 connect : 6983.35 6886.33 8509.00 698.33 disconnect : 5066.47 230.34 831.00 506.65 destroy : 37.02 0.03 2.00 3.70 ib0: step total ms max ms min us us / conn create id : 42.61 0.14 1.00 4.26 bind addr : 27.05 0.03 2.00 2.70 resolve addr : 40.65 10.73 869.00 4.06 resolve route: 626.75 0.60 103.00 62.68 create qp : 3334.50 6.48 273.00 333.45 connect : 6310.29 6251.59 13298.00 631.03 disconnect : 5111.12 365.87 867.00 511.11 destroy : 36.57 0.02 2.00 3.66 with this patch. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6551bab0b75b1f2499d97c2384cd3ac723da625f Author: Hal Rosenstock <hal@mellanox.com> Date: Wed Jun 18 09:55:06 2014 -0700 rsocket: Use malloc instead of calloc No need to clear allocated memory as immediately followed by memcpy which covers the allocated memory. Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 2a0944dc5e0e64290b8dfca332e6d5645c25b12e Author: Sean Hefty <sean.hefty@intel.com> Date: Tue May 27 11:43:05 2014 -0700 librdmacm: Update rdma_accept man page Document NULL conn_param parameter for rdma_accept. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 386b97e807917a8ca7f6d12d66e34dc9441f7502 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu May 22 16:13:08 2014 -0700 indexer: Free index_map resources when cleared Free memory allocated for index map entries when they are no longer in use. To handle this, count the number of entries stored by the index map item arrays and release the arrays when no items are being tracked. This reduces valgrind noise. Problem reported by: Hannes Weisbach <hannes_weisbach@gmx.net> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 397b1a79f077c2fd1ae35be15bc3a7d8918800f1 Author: Patrick MacArthur <pmacarth@iol.unh.edu> Date: Tue Apr 29 21:30:08 2014 -0700 rstream: fix "-T resolve" detection Signed-off-by: Patrick MacArthur <pmacarth@iol.unh.edu> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit b3758e215f0abbea0d48996ef9b95f01530a4210 Author: shamir rabinovitch <shamir.rabinovitch@oracle.com> Date: Tue Apr 29 19:57:36 2014 -0700 librdmacm: Fix verbs leak due to reentrancy issue Any call to ucma_init_device must be done under lock. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1291d9c7b52e829057458dad0e0ddd5aa9821a2a Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Apr 16 22:01:51 2014 -0700 rsocket: Relax requirement for minimal inline data Inline data support is optional. Allow rsockets to work with devices that do not support inline data, provided that they do support RDMA writes with immediate data. This allows rsockets to work over Intel TrueScale HCA. Patch derived from work by: Amir Hanania Signed-off-by: Amir Hanania <amir.hanania@intel.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit dfb5886db5975d209be6b31656c95b0d9c608195 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Apr 16 22:33:38 2014 -0700 rsocket: Modify when control messages are available Rsockets currently tracks how many control messages (i.e. entries in the send queue) that are available using a single ctrl_avail counter. Seems simple enough. However, control messages currently require the use of inline data. In order to support control messages that do not use inline data, we need to associate each control message with a specific data buffer. This will become easier to manage if we modify how we track when control messages are available. We replace the single ctrl_avail counter with two new counters. The new counters conceptually treat control messages as if each message had its own sequence number. The sequence number will then be able to correspond to a specific data buffer in a follow up patch. ctrl_seqno will be used to indicate the current control message being sent. ctrl_max_seqno will track the highest control message that may be sent. A side effect of this change is that we will be able to see how many control messages have been sent. This also separates the updating of the control count on the sending side, versus the receiving side. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 5ac6f3eab852606575f9affa515ec77b978a001c Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Apr 17 08:37:47 2014 -0700 rsocket: Dedicate a fixed number of SQEs for control messages The number of SQEs allocated for control messages is set to 1 of 2 constant values (either 4 or 2). A default value is used unless the size of the SQ is below a certain threshold (16 entries). This results in additional code complexity, and it is highly unlikely that the SQ would ever be allocated smaller than 16 entries. Simplify the code to use a single constant value for the number of SQEs allocated for control messages. This will also help in subsequent patches that will need to deal with HCAs that do not support inline data. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit d62a52590741da993c5ac3c39c82601c273175d9 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Apr 16 21:42:06 2014 -0700 rsocket: Check max inline data after creating QP The ipath provider will ignore the max_inline_size specified as input into ibv_create_qp and instead return the size that it supports (which is 0) on output. Update the actual inline size returned from create QP, and check that it meets the minimum requirement for rsockets. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 8ce5823e02b6a38fd5ed7e11a1bb586847dbcb03 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Apr 29 20:11:35 2014 -0700 librdmacm: Make ucma_init_all static Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 23ffef06cf462c4c5ac4ec5880b96c8719b64774 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Apr 9 12:19:25 2014 -0700 librdmacm: Support lazy initialization librdmacm currently opens a device context per configured HCA. This is usually done in rdma_create_event_channel() or first time whenever ucma_init() is called. If a process is only going to use one of the configured HCAs/RDMA IPs then the remaining device contexts are not used/required. Opening a device context on each device apriori limits the maximum number of processes that can be supported on a node to the maximum number of open context supported per HCA regardless of number of HCAs present in the system. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 984b1e3c189db9d156ea429c1726bd8739893247 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Mar 6 13:42:31 2014 -0800 rsocket: Fix sbuf_bytes_avail counter 'overrun' with iwarp Reported-by: Jonas Pfefferle1 <JPF@zurich.ibm.com> "The problem is that on the client side sbuf_bytes_avail overflows in rs_poll_cq. And from what I debugged so far there are 2 completions for every send and this is because I use iWarp hardware which does not support write with immediate so there is one completion for the write and one for the send (both go into the default case and add the length to sbuf_bytes_avail)." To avoid the issue, we flag send message operations that are used in place of immediate data. Other send message operations are not affected. The completion code can then check whether the completion is for a send message which was paired with an RDMA write transaction and adjust the behavior accordingly. Additionally, such send messages only carry the opcode in their WR_ID, with the data portion zeroed. This avoids adding the length value twice. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a2340d891eaa3f8a766a627bb4402ea85bcec6cb Author: Hal Rosenstock <hal@mellanox.com> Date: Wed Mar 5 12:51:54 2014 -0800 riostream: Add AF_IB support Allow the user to specify GID addresses (AF_IB) with riostream Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 8e760a4486f776df4f6728326dc7e8aed4a18971 Author: Hal Rosenstock <hal@mellanox.com> Date: Tue Mar 4 17:06:47 2014 -0800 rsocket: Return EBADF on bad rsocket fd Eliminates potential seg faults when passed an invalid rsocket. Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 3c19c968a240a2c50809373f9aa90bdf3454f6b1 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Mar 4 16:59:20 2014 -0800 man/rsocket: Enhance riomap documentation Document that the user must set IOMAPSIZE in order to use the riomap call. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 176e6e961d17c51ae1f2dad5a2f50546e3a2ecf4 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jan 27 12:10:55 2014 -0800 librdmacm 1.0.18 Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit b4603c864860e5e35379458cd1c0a42bb983af59 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jan 27 11:30:34 2014 -0800 udaddy: Remove support for port space IB UD support for the IB port space requires that the application use rdma_create_ep, rather than rdma_create_id. However, using rdma_create_ep results in address and route resolution being performed synchronously as part of the rdma_create_ep call. Since udaddy is an example, we want to show how it can be used with asynchronous events. So, rather than update udaddy to use rdma_create_ep in order to support the IB port space, it would be better to remove that support. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit df7ecde0da9df4af5d8bc3e1ca472e2e5ec9095b Author: Susan K. Coulter <markus@cj-fe2.lanl.gov> Date: Fri Jan 17 14:31:42 2014 -0800 rsocket: Add keepalive logic Actually send and receive keepalive messages if keepalive is enabled on an rsocket. Signed-off-by: Susan K. Coulter <markus@cj-fe2.lanl.gov> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit da2b7c1cde16df7936273b1ebd38e7c25856c843 Author: Or Gerlitz <ogerlitz@mellanox.com> Date: Tue Dec 3 16:51:07 2013 -0800 librdmacm: Add directives on binding to IPv6 any address to man pages Explain how to bind to IPv6 any address in the man pages for the examples Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit ea5851998c11b8211170179a6d924d4935fec0a1 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Nov 26 13:16:19 2013 -0800 librdmacm: Check 'init' under mutex ucma_ib_init() does a quick check that access to ibacm has been initialized. This check is done outside of the acm_lock mutex. We need to check init again inside of holding the mutex to ensure that we don't run the initialization code twice. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit b70a390d8bd8a679571f06ab82e42d68a99bc7d2 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Nov 18 13:12:04 2013 -0800 rping: Fix server reporting error on exit Commit e57196c71ddd850e14f3e66355f02786e4914f72 rping: added checks to the return values functions resulted in the rping server always reporting that it failed. Fix this by only failing in the case of an unexpected termination, and not the result of the client completing. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit c38d43aa2d5dc39dd98f813749dfa496875ad2e1 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Nov 11 10:24:54 2013 -0800 Retrieve SGID after calling rdma_bind_addr A change was made to rdma_bind_addr when AF_IB is enabled to only retrieve the resulting bound address. Previously, rdma_bind_addr would retrieve the corresponding SGID as well. This breaks some apps which were checking the SGID after binding to an IP address. Revert to the previous behavior of also retrieving the SGID after calling rdma_bind_addr. Tested-by: Christoph Lameter <cl@linux.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit faafeac08920a37994da19d72fd7ba1e64281f83 Author: Guy Shapiro <guysh@mellanox.com> Date: Tue Nov 5 19:52:20 2013 +0200 librdmacm: Some fixes to man pages Fix the man pages of rdma_destroy_ep & rdma_destroy_qp to the correct return value (void). Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 41900ddd3b09ed0625a721b014692b8c5c6f7246 Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Mon Nov 4 07:56:08 2013 -0500 [librdmacm] Makefile.am: Add missing riostream man page to man_MANS Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 86520b86ffb45d3caf6e5bd94271f99deef0a5f9 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Aug 16 15:15:12 2013 -0700 rsockets: Handle race between rshutdown and rpoll Multi-threaded applications which call rpoll and rshutdown simultaneously can hang. Ceph developers reported an issue with the rsocket implementation. Ceph calls rpoll in one thread, and while that thread is blocked in rpoll, a second thread may cann rshutdown on the socket. In normal sockets, this results in the poll call unblocking (since a call to read on the socket will no longer block). however, rsockets does not free the thread blocked on the rpoll call. To fix this, we add some additional state checking to protect against threads calling rpoll and rshutdown simultaneously. We also have the rshutdown call transition the QP into an error state. This causes all posted receives to complete as flushed, which results in unblocking the thread in rpoll (to process the flushed receives). Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6152fb2ea9f4e331c63c00810ee4b920e6f1af2d Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Wed Sep 11 15:37:11 2013 -0400 [librdmacm] man/rstream.1: Update man page to be consistent with rstream -h Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 77cab40df7f29bdc718a4a6da74c6145bf81468a Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Wed Sep 11 14:44:32 2013 -0400 [librdmacm] rstream.c: Indicate when specified address family is unknown Signed-off-by: Hal Rosenstock >hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 05ea9d16da8808e464750fa976ba3d6151df0a54 Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Wed Sep 11 14:44:28 2013 -0400 [librdmacm] man/rdma_create_id.3: Add RDMA_PS_IB port space description Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a53376c3c7887c52cf5b311b0b96cfa405a49d31 Author: Yan Droneaud <ydroneaud@opteya.com> Date: Tue Aug 27 11:37:54 2013 -0700 examples: Add cmtime to .gitignore Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 78dd0371cdad6bf27e98903ba66cebc01f52f6d5 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 22 15:29:15 2013 -0700 rsocket: Update rsocket man page Update fork support and RDMA_ROUTE socket option. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 5a5ec3458c67b1b431a18a0acbc950ef4e31f87f Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 22 12:00:54 2013 -0700 cmtime: Add retry support for address and route resolution Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit b031fead061eb0d2874be8f259c84e21433e4505 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 22 11:54:56 2013 -0700 cmtime: Allow user to specify timeout values Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit afd49dcc2bb13052075e07a7593f6593b43606ce Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 22 11:30:33 2013 -0700 cmtime: Add ability to time rdma_bind_addr calls Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 2949a92960546b75c647bcf14fec1f4369fd17fa Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Aug 5 10:57:43 2013 -0700 cmtime: Add example program that times rdma cm calls cmtime is a new sample program that measures how long it takes for each step in the connection process to complete. It can be used to analyze the performance of the various CM steps. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 8fd079abb8b2835908017f74ac70781d84e1e163 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Jul 26 09:52:55 2013 -0700 rstream: Use rsocket option to set route directly If we're using GID addressing, rdma_getaddrinfo can return routing data directly. Add an option for the user to indicate that rdma_getaddrinfo should be called in place of getaddrinfo. And if routing data is available, call rsetsockopt to set the route. This helps test rsockets when ibacm and AF_IB support are available. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 21c703e5a594283cf119ce1286831df5d1483b34 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Aug 2 14:18:06 2013 -0700 rsocket: Return 0 on success for SOL_RDMA options The processing of SOL_RDMA does not set the return value in the case of successfully handled options. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e33755decd339712fc57fbe25bed704d24e8621a Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jun 10 12:33:20 2013 -0700 rsockets: Add ability to set the IB route directly Add an RDMA specific rsocket option that allows the user to program the RDMA route directly. This is useful for apps that have path record data available, e.g. from ibacm. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit f77079d79becf4476cb75ea5c816aae70724116e Author: Sean Hefty <sean.hefty@intel.com> Date: Sat Jul 20 19:22:55 2013 -0700 examples: Add support for native IB addressing to samples Allow the user to specify GID addresses (AF_IB) into udaddy and rstream. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit ca353a3f985135504c429f82bf5a342ec26d11d4 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Jul 18 13:26:15 2013 -0700 rsockets: Support native IB addressing on connected rsockets Update rsockets to support AF_IB addresses on connected rsockets. Support for datagram rsockets is more difficult as a result of using real UDP sockets for QP resolution, so that support is deferred. For connected sockets, we need to update internal checks to handle AF_IB. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a8becf33bbbb363cb2e0f2b45456bc82b345c453 Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Jul 28 11:20:54 2013 +0200 [4/4] Declare 'server_port' as an unsigned variable Change the data type of the 'server_port' variable from signed to unsigned such that the cast in the fscanf() call can be removed. Signed-off-by: Bart Van Assche <bvanassche@acm.org> commit eee05e6604a60b007249f97613d3bb513c07c20d Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Jul 28 11:19:48 2013 +0200 [3/4] rsocket: Remove the unused variable 'ret' The variable 'ret' is assigned a value but that value is never used. This triggers the following compiler warning: src/rsocket.c:3720:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable] Hence remove this variable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> commit 9e758e0655242bb02aea5ec28fe4eeac2ec655f7 Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Jul 28 11:19:15 2013 +0200 [2/4] cma: Remove the unused variable 'id_priv' The variable 'id_priv' is assigned a value but is never used. This triggers the following compiler warning: src/cma.c:1178:25: warning: variable 'id_priv' set but not used [-Wunused-but-set-variable] Hence remove this variable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> commit 2a31c855fc95d04370db56de5b35d8271e577f6f Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Jul 28 11:18:36 2013 +0200 [1/4] acm: Remove the unused variable 'pri_path' The variable 'pri_path' is assigned a value but is never used. This triggers the following compiler warning: src/acm.c:301:26: warning: variable 'pri_path' set but not used [-Wunused-but-set-variable] Hence remove this variable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> commit c8be3cfde6902e490fadd6a51206c1bcba3e3aa2 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jun 10 10:57:56 2013 -0700 init: Remove USE_IB_ACM configuration option When the librdmacm is configured, it sets the USE_IB_ACM option if infininband/acm.h is found. We can remove this option with very little overhead, which would allow a user to install ACM after installing the librdmacm, and the librdmacm would be able to make use of ACM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6efb57780ca142ea4e3b0feebef554849047f79f Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jun 10 11:07:12 2013 -0700 acm: Define needed ACM protocol messages The librdmacm needs message definitions used to communicate with the ibacm. It currently pulls these from infiniband/acm.h, which is installed by ibacm. This creates an install order dependency on ibacm. However, work on the scalable SA has the ibacm using the librdmacm (via rsockets) for communication between the different SSA components. To resolve this issue, have the librdmacm define the message structures that it needs to communicate with ibacm. The librdmacm already defines some ACM messages through configuration checks. We just expand that capability, which isolates the librdmacm package from the ibacm package. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit c8173d50d1a8c2bbfb0c4459e05d3941175676b2 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Aug 29 15:02:54 2012 -0700 cmatose: Allow user to specify address format Provide an option for the user to indicate the type of addresses used as input. Support hostname, IPv4, IPv6, and GIDs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 704f54358a1f74229cd9e982b530ca8327c7658e Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 16:03:42 2013 -0700 Remove executable mode bit on text files Source code and man page should not be executable. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 3eb1704b2e11413077933d6d3a963d81d508bdf8 Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 23:59:52 2013 +0200 Open files with "close on exec" flag File opened by librdmacm are not supposed to be inherited across exec*(), most of the files are of no use for another program, and others cannot be used without the associated memory mapping. This patch changes fopen() open() and socket() to always set close on exec flag. This patch also add checks to configure to guess if fopen() supports "e" flag. If O_CLOEXEC and SOCK_CLOEXEC are supported, fopen() should support "e". If not supported, its discarded according to POSIX. Many operating systems have support for fopen("e"). You might find more information about close on exec in the following articles: - "Excuse me son, but your code is leaking !!!" by Dan Walsh http://danwalsh.livejournal.com/53603.html - "Secure File Descriptor Handling" by Ulrich Drepper http://udrepper.livejournal.com/20407.html Note: this patch won't set close on exec flag on file descriptors created by the kernel for completion channel and such. This is addressed by another kernel patch. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit d53cd79c3bde6186bda6822a04708b9d2666f8ae Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 23:59:50 2013 +0200 Add .gitignore rules Add the list of files/patterns to be exclueded from git status output. Additionally it will prevent such files/patterns to be added and committed. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e9ef6c2e2d8141dd5c32472918b8c087f745524b Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 23:59:49 2013 +0200 configure: Use automake's option "subdir-objects" Following advice in "Autotool Mythbuster" [1], option subdir-objects can be used to have Makefiles create object files in the same directory than theirs source files. It reduces clobbering in the build directory. [1] "Autotool Mythbuster", by Diego Elio "Flameeyes" Petten`o http://www.flameeyes.eu/autotools-mythbuster/automake/nonrecursive.html Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 3edfff79d98f72b754278c854f871c4a22a7ce3c Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 23:59:48 2013 +0200 configure: Apply updates proposed by autoupdate 'autoupdate' is a tool to help developer to update configure.ac. This patch applies a few fixes as suggested by autoupdate. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit f49ac33aaab147e5b126a75565f57e596600f372 Author: Jeff Squyres <jsquyres@cisco.com> Date: Tue Jul 16 23:59:47 2013 +0200 autogen.sh: Use autoreconf in autogen.sh The old sequence of Autotools commands listed in autogen.sh is no longer correct. Instead, just use the single "autoreconf" command, which will invoke all the Right Autotools commands in the correct order. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 9d2f1b068e6fcd62853fe013c7cc4316dcb3fc4b Author: Bart Van Assche <bvanassche@acm.org> Date: Tue Jul 16 23:59:46 2013 +0200 Makefile.am: Fix an automake warning Fix the following automake warning message: Makefile.am:1: `INCLUDES' is the old name for `AM_CPPFLAGS' (or `*_CPPFLAGS') A quote from the automake manual: INCLUDES This does the same job as AM_CPPFLAGS (or any per-target _CPPFLAGS variable if it is used). It is an older name for the same functionality. This variable is deprecated; we suggest using AM_CPPFLAGS and per-target _CPPFLAGS instead. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 715965b7231cd97d302e24c9e8ac89b2a57a57ab Author: Bart Van Assche <bvanassche@acm.org> Date: Tue Jul 16 23:59:45 2013 +0200 Add "foreign" option to AM_INIT_AUTOMAKE Switch to the modern form of the AM_INIT_AUTOMAKE macro and tell automake that the librdmacm package does not follow the GNU standards. This change makes it possible to use 'autoreconf' for the librdmacm package. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit ef095323918acac8fdc5386ebb7877fb5d34e5e3 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu May 2 13:47:51 2013 -0700 lib: Rename configure.in to configure.ac Update to latest autotools naming. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit faae8c5db396985a40dc56ad6f82f89a16b8e9f1 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Apr 11 10:05:29 2013 -0700 rsocket: Add support for iWarp iWarp does not support RDMA writes with immediate data. Instead of sending messages using immediate data, allow the rsocket protocol to exchange messages using sends. The rsocket protocol remains the same. RDMA writes are used for data transfers, with send messages used to transfer rsocket protocol messages. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 0d6ca1300d88377ae7f9162457e64c541a4630eb Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Apr 12 14:41:52 2013 -0700 rsocket: Merge usage of wr_id between stream and datagram svcs The rsocket data streaming and datagram services use different formats for the wr_id. Although some differences are needed, we can make them more similar. This will be useful when the wr_id is used for iwarp support, plus eliminates use of wr_id bits that aren't actually needed. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e57928b701ded6c5417b5ac0c153a239bf947612 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Mar 5 17:18:11 2013 -0800 librdmacm: Release 1.0.17 commit 24590bc96d8871d80124d68d182c915d7efcc9e6 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Feb 19 20:03:58 2013 -0800 librdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown Shutdown switches an rsocket from nonblocking to blocking to ensure that all data has been sent. After completing all transfers, it should switch back to nonblocking; this handles partial shutdown situations, where only half the connection is shut down. However, the code uses the value of '1' to set the nonblocking flag, rather than O_NONBLOCK. Fix this. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit be2a2a44663282cda1a60e05c3b85275c732acc6 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Feb 4 16:52:18 2013 -0800 librdmacm/rstream: Reduce default transfer count 1 million ping-pong transfers takes over 3 seconds to complete, and I'm impatient. Reduce the default number of transfers for small messsages to speed up running performance tests, especially when running over slower connections, like TCP sockets or over a WAN. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 69fadb50636d98de57c9069b83adf6d2c5c77fc6 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Feb 1 17:17:34 2013 -0800 librdmacm: Work-around kernel bug returning uid = 0 Older kernels have a bug where it can report an event with the uid set to 0. The librdmacm crashes when casting the uid to an rdma_cm_id and dereferencing the NULL pointer. There are a limited number of events where this can occur and in most cases it's safe to simply discard the event. (This is what the kernel does anyway.) However, it's possible for us to process an RDMA_CM_EVENT_ESTABLISHED event with the uid set to 0. (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.) Although it's rare for this to occur, it does in fact happen in practice. To work-around the kernel bug, when the uid of an established event is set to 0, we first try to locate the correct user space id based on related data before discarding the event. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 75e5b5b17d8a478b4fad5d9ee700edb943b050ba Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jan 28 14:56:25 2013 -0800 librdmacm: Define ucma_ib_init when IB_ACM is disabled ucma_ib_init is only defined if IB_ACM is enabled, which is determined by looking for the infiniband/acm.h header file. Define ucma_ib_init when IB_ACM is disabled. Problem reportedy by Suresh Shelvapille <suri@baymicrosystems.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1f6088f85af3c60ba4d57de1d8f1098e06761237 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jan 21 15:28:39 2013 -0800 rsockets: Update rsocket man page Update man page to include recently added rsocket options and undocumented configuration file. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 56e1a7cd4904fbfde59adbdfedd5374e5bde2e87 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jan 9 14:54:47 2013 -0800 rsockets: Add support for existing UDP apps Support for existing UDP applications is done via the rspreload library. However, when the preload library is loaded, socket calls used by rsockets get intercepted and converted into rsocket calls. The preload library was able to handle this for TCP rsockets by using a per thread variable and checking for recursive calls coming from rsockets back into the preload library. The preload library would direct such calls to the real socket calls. The problem is more complex for UDP rsockets, which can invoke socket calls from an internal rsocket thread. The result is that the preload library intercepts socket calls that originate from the rsocket library which are not recursive. Although, this is really a problem with the preload library, the simplest solution is for rsockets to fully initialize the library when allocating the first rsocket, versus deferring initialization until required. The preload library can then detect the recursive calls. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6047e1991e95b96b1992f39a466457e584c01226 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Dec 5 15:58:03 2012 -0800 examples/udpong: Add test program for rsocket datagrams Add a sample test program to test datagram rsockets. Move common routines used by udpong and other test programs into a common source file. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e6e93ed4231976eeab707b31e283be0a7acff6db Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Nov 9 10:26:38 2012 -0800 rsocket: Add datagram support Add datagram support through the rsocket API. Datagram support is handled through an entirely different protocol and internal implementation than streaming sockets. Unlike connected rsockets, datagram rsockets are not necessarily bound to a network (IP) address. A datagram socket may use any number of network (IP) addresses, including those which map to different RDMA devices. As a result, a single datagram rsocket must support using multiple RDMA devices and ports, and a datagram rsocket references a single UDP socket, plus zero or more UD QPs. Rsockets uses headers inserted before user data sent over UDP sockets to resolve remote UD QP numbers. When a user first attempts to send a datagram to a remote address (IP and UDP port), rsockets will take the following steps: 1. Store the destination address into a lookup table. 2. Resolve which local network address should be used when sending to the specified destination. 3. Allocate a UD QP on the RDMA device associated with the local address. 4. Send the user's datagram to the remote UDP socket. A header is inserted before the user's datagram. The header specifies the UD QP number associated with the local network address (IP and UDP port) of the send. A service thread is used to process messages received on the UDP socket. This thread updates the rsocket lookup tables with the remote QPN and path record data. The service thread forwards data received on the UDP socket to an rsocket QP. After the remote QPN and path records have been resolved, datagram communication between two nodes are done over the UD QP. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit c6bfc1c5b15e6207188a97e8a5df0405cfd2587f Author: Or Gerlitz <ogerlitz@mellanox.com> Date: Sun Dec 2 12:04:23 2012 +0000 [librdmacm] Fixed build problem due to missing macro rsocket.c wasn't passing compilation as of missing definition for the container_of macro, fix it. Reported-by: Eyal Salamon <esalomon@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit ab0d488c1e3ba7658f61a4d8da022b5afc17737f Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Nov 5 11:53:03 2012 -0800 rsocket: Remove fscanf build warnings Cast fscanf return values to (void) to indicate that we don't care if the call fails. In the case of a failure, we simply fall back to using default values. Problem reported by Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 7d92d0106f50e0371256e74863963a0e2e99a5c8 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Oct 24 10:23:52 2012 -0700 riostream: Add example program for using iomap routines. riostream is based on rstream, but uses the new riomap, riounmap, and riowrite calls instead. It runs a series of latency and bandwidth tests using remote iomapped memory. riostream is limited to using zero copy transfers at the receiving side only at this time. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit bb9fcba81acdfe34ea5df3bb23a45e0a486207da Author: Sean Hefty <sean.hefty@intel.com> Date: Sun Oct 21 14:16:03 2012 -0700 rsocket: Add APIs for direct data placement We introduce rsocket extensions for supporting direct data placement (also known as zero copy). Direct data placement avoids data copies into network buffers when sending or receiving data. This patch implements zero copies on the receive side, but adds some basic framework for supporting it on the sending side. Integrating zero copy support into the existing socket APIs is difficult to achieve when the sockets are set as nonblocking. Any such implementation is likely to be unusable in practice. The problem stems from the fact that socket operations are synchronous in nature. Support for asynchronous operations is limited to connection establishment. Therefore we introduce new calls to handle direct data placement. The use of the new calls is optional and does not affect the use of the existing calls. An attempt is made to have the new routines integrate naturally with the existing APIs. The new functions are: riomap, riounmap, and riowrite. The basic operation can be described as follows: 1. App A calls riomap to register a data buffer with the local RDMA device. Riomap returns an off_t offset value that corresponds to the registered data buffer. The app may select the offset value. 2. Rsockets will transmit an internal message to the remote peer with information about the registration. This exchange is hidden from the applications. 3. App A sends a notification message to app B indicating that the remote iomapped buffer is now available to receive data. 4. App B calls riowrite to transmit data directly into the riomapped data buffer. 5. App B sends a notification message to app A indicating that data is available in the mapped buffer. 6. After all transfers are complete, app A calls riounmap to deregister its data buffer. Riomap and riounmap are functionally equivalent to RDMA memory registration and deregistration routines. They are loosely based on the mmap and munmap APIs. off_t riomap(int socket, void *buf, size_t len, int prot, int flags, off_t offset) Riomap registers an application buffer with the RDMA hardware associated with an rsocket. The buffer is registered either for local only access (PROT_NONE) or for remote write access (PROT_WRITE). When registered for remote access, the buffer is mapped to a given offset. The offset is either provided by the user, or if the user selects -1 for the offset, rsockets selects one. The remote peer may access an iomapped buffer directly by specifying the correct offset. The mapping is not guaranteed to be available until after the remote peer receives a data transfer initiated after riomap has completed. int riounmap(int socket, void *buf, size_t len) Riounmap removes the mapping between a buffer and an rsocket. size_t riowrite(int socket, const void *buf, size_t count, off_t offset, int flags) Riowrite allows an application to transfer data over an rsocket directly into a remotely iomapped buffer. The remote buffer is specified through an offset parameter, which corresponds to a remote iomapped buffer. From the sender's perspective, riowrite behaves similar to rwrite. From a receiver's view, riowrite transfers are silently redirected into a pre- determined data buffer. Data is received automatically, and the receiver is not informed of the transfer. However, iowrite data is still considered part of the data stream, such that iowrite data will be written before a subsequent transfer is received. A message sent immediately after initiating an iowrite may be used to notify the receiver of the iowrite. It should be noted that the current implementation primarily focused on being functional for evaluation purposes. Some checks have been deferred for subsequent patches, and performance is currently limited by linear lookups. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit d2e96e99bf1fc3d14e33c741502cb689c810a27b Author: Roland Dreier <roland@purestorage.com> Date: Tue Oct 16 19:44:39 2012 +0000 rdma_xserver/client: Fix man page formatting Putting 'r' at the beginning of a line in the nroff source for man pages is confusing to nroff because lines that start with a single quote character ' or a dot character . are treated as control lines, which is not what's intended here. Some of the man page text ends up left out of the formatted output. Fix this by just wrapping the text slightly differently in the source (which doesn't matter since nroff reflows the text anyway). Also add a missing ".TP" so that the -p and -c options are not run together in the formatted output. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 507cc241e8b212c3cf3ed0ffb04e37095bbf8bb3 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Oct 8 10:33:21 2012 -0700 librdmacm: Disable ACM support if ibacm.port is not found The librdmacm will try to connect port 6125 if ibacm.port is not found. The problem is that some other service or application could be using that port and respond with garbage. Rather than falling back to a hard coded port number, if ibacm.port is not found, simply disable ACM support. This has the effect of removing support for older versions of ibacm, unless the port file is created manually. Patch created based on feedback from Doug Ledford and Florian Weimer from RedHat. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e57196c71ddd850e14f3e66355f02786e4914f72 Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:52 2012 +0000 [5/5,librdmacm] rping: added checks to the return values functions This will make rping to exit with return value other than zero in case of an error. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6c56dc404c999daa16a039f59b0160ab983acc98 Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:51 2012 +0000 [4/5,librdmacm] rstream: added missing return is accept() failed Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 41d6547bede80581b384b49bb35eac4fe089d08c Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:50 2012 +0000 [3/5,librdmacm] rstream: initialize return value in server_connect() If use_async == 0 and rs_accept() passes (i.e. non negative value), then the return value from the function was uninitialized. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1f1a03dae14cbb25a43b1b56aa5ae689776edc11 Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:49 2012 +0000 [2/5,librdmacm] rsocket: added missing break Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit eddbe8f0abc3d0f69755f0e510df2a7f21412c0b Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:48 2012 +0000 [1/5,librdmacm] rsocket: add missing va_end() after calling va_end() Not doing so, may lead to resource leak. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 8a92d0c3c8ce5f513dff974912143f6b0283f8e3 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Oct 4 12:01:50 2012 -0700 ucmatose: Remove connect parameter passed into rdma_accept Pass in NULL for conn_param into rdma_accept to indicate that the passive side will use the values specified by the active side. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 714af39b2bc2cc54dd2391a0df2c7e54856bc9c7 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Oct 4 11:49:59 2012 -0700 ucmatose: Fix number of connections to disconnect When ucmatose aborts because of issues trying to connect to the server, it moves to disconnecting all connections. However, not all connections may have been established. The result is that ucmatose will hang in disconnect_events. Fix this by setting the number of times that we need to disconnect to the number of times that we successfully connect. This problem is based on a report by Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 860b1a8784f1846be759eec46770cc723991479c Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Oct 3 15:05:20 2012 -0700 rping: Reduce retry_count to fit in 3-bits retry_count is a 3 bit value on IB, reduce it from 10 to 7. A value of 10 prevents rping from working over the Intel IB HCA. Problem reported by Doug Ledford <dledford@redhat.com> The retry_count is also not set when calling rdma_accept. Rather than passing different values into rdma_accept than what was specified by the remote side, use the values given in the connection request. Signed-off-by: …

usdf_mem.c: add fi_enosys.h

================================================================= ==849267== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4caa7230 at pc 0x7ffdf8608687 bp 0x7fff4caa71b0 sp 0x7fff4caa71a0 READ of size 8 at 0x7fff4caa7230 thread T0 #0 0x7ffdf8608686 in fi_tostr_ libfabric-current/src/fi_tostr.c:618 #1 0x402f3a in run_test_set ofi/libfabric-current/fabtest/unit/size_left_test.c:262 ofiwg#2 0x403457 in main libfabric-current/fabtest/unit/size_left_test.c:317 ofiwg#3 0x7ffdf4819b14 in __libc_start_main (/usr/lib64/libc.so.6+0x21b14) ofiwg#4 0x401988 in _start (libfabric-1.4.0/ofi_inst/bin/fi_size_left_test+0x401988) Address 0x7fff4caa7230 is located at offset 32 in frame <run_test_set> of T0's stack: This frame has 2 object(s): [32, 36) 'ep_type' [96, 104) 'info' HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow libfabric-current/src/fi_tostr.c:618 fi_tostr_ Shadow bytes around the buggy address: 0x10006994cdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x10006994ce40: 00 00 f1 f1 f1 f1[04]f4 f4 f4 f2 f2 f2 f2 00 f4 0x10006994ce50: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 0x10006994ce60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==849267== ABORTING Change-Id: I90e59ca4127a792718cac5180da33ff2caf66f2b

================================================================= ==849267== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4caa7230 at pc 0x7ffdf8608687 bp 0x7fff4caa71b0 sp 0x7fff4caa71a0 READ of size 8 at 0x7fff4caa7230 thread T0 #0 0x7ffdf8608686 in fi_tostr_ libfabric-current/src/fi_tostr.c:618 #1 0x402f3a in run_test_set ofi/libfabric-current/fabtest/unit/size_left_test.c:262 ofiwg#2 0x403457 in main libfabric-current/fabtest/unit/size_left_test.c:317 ofiwg#3 0x7ffdf4819b14 in __libc_start_main (/usr/lib64/libc.so.6+0x21b14) ofiwg#4 0x401988 in _start (libfabric-1.4.0/ofi_inst/bin/fi_size_left_test+0x401988) Address 0x7fff4caa7230 is located at offset 32 in frame <run_test_set> of T0's stack: This frame has 2 object(s): [32, 36) 'ep_type' [96, 104) 'info' HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow libfabric-current/src/fi_tostr.c:618 fi_tostr_ Shadow bytes around the buggy address: 0x10006994cdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x10006994ce40: 00 00 f1 f1 f1 f1[04]f4 f4 f4 f2 f2 f2 f2 00 f4 0x10006994ce50: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 0x10006994ce60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==849267== ABORTING Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>

================================================================= ==849267== ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4caa7230 at pc 0x7ffdf8608687 bp 0x7fff4caa71b0 sp 0x7fff4caa71a0 READ of size 8 at 0x7fff4caa7230 thread T0 #0 0x7ffdf8608686 in fi_tostr_ libfabric-current/src/fi_tostr.c:618 #1 0x402f3a in run_test_set ofi/libfabric-current/fabtest/unit/size_left_test.c:262 #2 0x403457 in main libfabric-current/fabtest/unit/size_left_test.c:317 #3 0x7ffdf4819b14 in __libc_start_main (/usr/lib64/libc.so.6+0x21b14) #4 0x401988 in _start (libfabric-1.4.0/ofi_inst/bin/fi_size_left_test+0x401988) Address 0x7fff4caa7230 is located at offset 32 in frame <run_test_set> of T0's stack: This frame has 2 object(s): [32, 36) 'ep_type' [96, 104) 'info' HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow libfabric-current/src/fi_tostr.c:618 fi_tostr_ Shadow bytes around the buggy address: 0x10006994cdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x10006994ce40: 00 00 f1 f1 f1 f1[04]f4 f4 f4 f2 f2 f2 f2 00 f4 0x10006994ce50: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 0x10006994ce60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x10006994ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap righ redzone: fb Freed Heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 ASan internal: fe ==849267== ABORTING Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>

Here is the deadlock scenario: #0 0x00007fed3a439495 in pthread_spin_lock () #1 0x00007fed37ad7cfd in fastlock_acquire () #2 0x00007fed37ad80a4 in psmx2_lock () #3 0x00007fed37ad8361 in psmx2_am_trx_ctxt_handler_ext () #4 0x00007fed37b084e7 in psmx2_am_trx_ctxt_handler_0 () #5 0x00007fed373c08c5 in self_am_short_request () #6 0x00007fed3739bf83 in __psm2_am_request_short () #7 0x00007fed37ad84ee in psmx2_trx_ctxt_disconnect_peers () A lock has been held in psmx2_trx_ctxt_disconnect_peers before psm2_am_request_short is called. While making progress inside this function, the execution is redirected to the AM handler due to the arrival of an incoming disconnection request. The AM handler tries to acquire the same lock that has already been held and reaches a deadlock. Fix by avoid calling psm2_am_request_short while holding the lock. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

Here is the deadlock scenario: #0 0x00007fed3a439495 in pthread_spin_lock () #1 0x00007fed37ad7cfd in fastlock_acquire () #2 0x00007fed37ad80a4 in psmx2_lock () #3 0x00007fed37ad8361 in psmx2_am_trx_ctxt_handler_ext () #4 0x00007fed37b084e7 in psmx2_am_trx_ctxt_handler_0 () #5 0x00007fed373c08c5 in self_am_short_request () #6 0x00007fed3739bf83 in __psm2_am_request_short () #7 0x00007fed37ad84ee in psmx2_trx_ctxt_disconnect_peers () A lock has been held in psmx2_trx_ctxt_disconnect_peers before psm2_am_request_short is called. While making progress inside this function, the execution is redirected to the AM handler due to the arrival of an incoming disconnection request. The AM handler tries to acquire the same lock that has already been held and reaches a deadlock. Fix by avoiding calling psm2_am_request_short while holding the lock. Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>

I'm not entirely sure if it is fixes the issue our QA is seeing (as they get err_entry.err=-104 - a wrong negative value), but with error injection I could easily trigger a use-after-free with the root from this function (with err_entry.err=104, though, so I still don't know where the wrong error sign came from). In my error injection reproducer ofi_send_socket() fails sometimes, which then triggers free of cm_ctx without removing the fd and cm_ctx from polling. Next poll round will then access cm_ctx and trigger a use-after-free. client_send_connreq tx_cm_data ofi_send_socket -> fails goto err ... err: free(cm_ctx) ASAN reports READ of size 4 at 0x6120000106c8 thread T4 (rpc_poll-0) #0 0x7f77005e0f21 in process_cm_ctx prov/tcp/src/tcpx_conn_mgr.c:482 ofiwg#1 0x7f77005e15ef in tcpx_conn_mgr_run prov/tcp/src/tcpx_conn_mgr.c:535 ofiwg#2 0x7f77005fc429 in tcpx_eq_read prov/tcp/src/tcpx_eq.c:48 ofiwg#3 0x4926dd in fi_eq_read /home/bschubert/local/rhel7/libfabric/include/rdma/fi_eq.h:352 0x6120000106c8 is located 8 bytes inside of 280-byte region [0x6120000106c0,0x6120000107d8) freed by thread T4 (rpc_poll-0) here: #0 0x7f77015915e7 in __interceptor_free ofiwg#1 0x7f77005e083b in client_send_connreq prov/tcp/src/tcpx_conn_mgr.c:422 ofiwg#2 0x7f77005e0f7e in process_cm_ctx prov/tcp/src/tcpx_conn_mgr.c:487 ofiwg#3 0x7f77005e15ef in tcpx_conn_mgr_run prov/tcp/src/tcpx_conn_mgr.c:535 ofiwg#4 0x7f77005fc429 in tcpx_eq_read prov/tcp/src/tcpx_eq.c:48 previously allocated by thread T5 (rpc_conn_mgr) here: #0 0x7f7701591b7e in __interceptor_calloc ofiwg#1 0x7f77005edb5c in tcpx_ep_connect prov/tcp/src/tcpx_ep.c:103 ofiwg#2 0x478b2f in fi_connect /home/bschubert/local/rhel7/libfabric/include/rdma/fi_cm.h:98 Signed-off-by: Bernd Schubert <bschubert@ddn.com>

Problem reported by Address Sanitizer: ================================================================= ==25220==ERROR: AddressSanitizer: heap-use-after-free on address 0x6270000072e0 at pc 0x00010b926a3c bp 0x700001bd1c30 sp 0x700001bd1c28 READ of size 4 at 0x6270000072e0 thread T4 #0 0x10b926a3b in sock_conn_listener_thread (libfabric.1.dylib:x86_64+0xdca3b) #1 0x7fff7e2d5660 in _pthread_body (libsystem_pthread.dylib:x86_64+0x3660) #2 0x7fff7e2d550c in _pthread_start (libsystem_pthread.dylib:x86_64+0x350c) #3 0x7fff7e2d4bf8 in thread_start (libsystem_pthread.dylib:x86_64+0x2bf8) 0x6270000072e0 is located 480 bytes inside of 12944-byte region [0x627000007100,0x62700000a390) freed by thread T0 here: #0 0x10baf1a9d in wrap_free (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56a9d) #1 0x10b9016bf in sock_ep_close (libfabric.1.dylib:x86_64+0xb76bf) #2 0x10b7f4a8f in fi_close fabric.h:593 #3 0x10b7f4209 in main shared_ctx.c:649 #4 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014) previously allocated by thread T0 here: #0 0x10baf1e27 in wrap_calloc (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56e27) #1 0x10b906df4 in sock_alloc_endpoint (libfabric.1.dylib:x86_64+0xbcdf4) #2 0x10b8f7fdb in sock_msg_ep (libfabric.1.dylib:x86_64+0xadfdb) #3 0x10b7f7c93 in fi_endpoint fi_endpoint.h:164 #4 0x10b7f5e40 in server_connect shared_ctx.c:471 #5 0x10b7f49ba in run shared_ctx.c:573 #6 0x10b7f411b in main shared_ctx.c:647 #7 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014) Thread T4 created by T0 here: #0 0x10bae999d in wrap_pthread_create (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x4e99d) #1 0x10b925f9b in sock_conn_start_listener_thread (libfabric.1.dylib:x86_64+0xdbf9b) #2 0x10b8e7eb2 in sock_domain (libfabric.1.dylib:x86_64+0x9deb2) #3 0x10b7f87d3 in fi_domain fi_domain.h:306 #4 0x10b7f5c9f in server_connect shared_ctx.c:460 #5 0x10b7f49ba in run shared_ctx.c:573 #6 0x10b7f411b in main shared_ctx.c:647 #7 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014) The issue shows up more frequently on OS X, which emulates epoll. However, I believe the problem could occur on any platform. In sock_ep_close, we remove the socket from the epoll fd, then free the endpoint. However, if the listener thread has received an event on the socket, but has not yet started processing it, then a race can occur. The listener thread could have returned from ofi_epoll_wait, but suspended trying to acquire the signal_lock. The signal_lock is acquired from sock_ep_close, where ofi_epoll_del is called, then released. The endpoint is then freed. The listener thread can now acquire the signal_lock, where it will attempt to access the freed endpoint data. To avoid the race, we add a change boolean to the listener. That boolean is only changed while holding the signal_lock. When a socket is removed from the epollfd, we mark the listener state as 'changed'. The listener thread checks the changed state prior to processing any events. If set, it clears the state, and calls ofi_epoll_wait again to get a new set of events to process. Note that this works for epoll set to level-triggered (poll semantics). Sockets that reported events will report those same events when wait is called a second time. Sockets which were removed from the epoll set would have their events removed, as they are no longer being monitored. This fix is applied both to the listener thread and cm thread. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4c61e7e0 at pc 0x14f2cb7ae0b9 bp 0x7fff4c61e650 sp 0x7fff4c61ddd8 WRITE of size 17 at 0x7fff4c61e7e0 thread T0 #0 0x14f2cb7ae0b8 (/lib64/libasan.so.5+0xb40b8) ofiwg#1 0x14f2cb7aedd2 in vsscanf (/lib64/libasan.so.5+0xb4dd2) ofiwg#2 0x14f2cb7aeede in __interceptor_sscanf (/lib64/libasan.so.5+0xb4ede) ofiwg#3 0x14f2cb230766 in ofi_addr_format src/common.c:401 ofiwg#4 0x14f2cb233238 in ofi_str_toaddr src/common.c:780 ofiwg#5 0x14f2cb314332 in vrb_handle_ib_ud_addr prov/verbs/src/verbs_info.c:1670 ofiwg#6 0x14f2cb314332 in vrb_get_match_infos prov/verbs/src/verbs_info.c:1787 ofiwg#7 0x14f2cb314332 in vrb_getinfo prov/verbs/src/verbs_info.c:1841 ofiwg#8 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010 ofiwg#9 0x14f2cb25fcc0 in ofi_get_core_info prov/util/src/util_attr.c:298 ofiwg#10 0x14f2cb269b20 in ofix_getinfo prov/util/src/util_attr.c:321 ofiwg#11 0x14f2cb3e29fd in rxd_getinfo prov/rxd/src/rxd_init.c:122 ofiwg#12 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010 ofiwg#13 0x407150 in ft_getinfo common/shared.c:794 ofiwg#14 0x414917 in ft_init_fabric common/shared.c:1042 ofiwg#15 0x402f40 in run functional/bw.c:155 ofiwg#16 0x402f40 in main functional/bw.c:252 ofiwg#17 0x14f2ca1b28e2 in __libc_start_main (/lib64/libc.so.6+0x238e2) ofiwg#18 0x401d1d in _start (/root/libfabric/fabtests/functional/fi_bw+0x401d1d) Address 0x7fff4c61e7e0 is located in stack of thread T0 at offset 48 in frame #0 0x14f2cb2306f3 in ofi_addr_format src/common.c:397 This frame has 1 object(s): [32, 48) 'fmt' <== Memory access at offset 48 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.5+0xb40b8) Shadow bytes around the buggy address: 0x1000698bbca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000698bbcb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000698bbcc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000698bbcd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000698bbce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x1000698bbcf0: 00 00 00 00 00 00 f1 f1 f1 f1 00 00[f2]f2 f3 f3 0x1000698bbd00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 0x1000698bbd10: f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 0x1000698bbd20: f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 0x1000698bbd30: f2 f2 00 00 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00 0x1000698bbd40: 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Fixes: 5d31276 ("common: Redo address string conversions") Signed-off-by: Honggang Li <honli@redhat.com>

ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff4c61e7e0 at pc 0x14f2cb7ae0b9 bp 0x7fff4c61e650 sp 0x7fff4c61ddd8 WRITE of size 17 at 0x7fff4c61e7e0 thread T0 #0 0x14f2cb7ae0b8 (/lib64/libasan.so.5+0xb40b8) #1 0x14f2cb7aedd2 in vsscanf (/lib64/libasan.so.5+0xb4dd2) #2 0x14f2cb7aeede in __interceptor_sscanf (/lib64/libasan.so.5+0xb4ede) #3 0x14f2cb230766 in ofi_addr_format src/common.c:401 #4 0x14f2cb233238 in ofi_str_toaddr src/common.c:780 #5 0x14f2cb314332 in vrb_handle_ib_ud_addr prov/verbs/src/verbs_info.c:1670 #6 0x14f2cb314332 in vrb_get_match_infos prov/verbs/src/verbs_info.c:1787 #7 0x14f2cb314332 in vrb_getinfo prov/verbs/src/verbs_info.c:1841 #8 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010 #9 0x14f2cb25fcc0 in ofi_get_core_info prov/util/src/util_attr.c:298 #10 0x14f2cb269b20 in ofix_getinfo prov/util/src/util_attr.c:321 #11 0x14f2cb3e29fd in rxd_getinfo prov/rxd/src/rxd_init.c:122 #12 0x14f2cb21fc28 in fi_getinfo_ src/fabric.c:1010 #13 0x407150 in ft_getinfo common/shared.c:794 #14 0x414917 in ft_init_fabric common/shared.c:1042 #15 0x402f40 in run functional/bw.c:155 #16 0x402f40 in main functional/bw.c:252 #17 0x14f2ca1b28e2 in __libc_start_main (/lib64/libc.so.6+0x238e2) #18 0x401d1d in _start (/root/libfabric/fabtests/functional/fi_bw+0x401d1d) Address 0x7fff4c61e7e0 is located in stack of thread T0 at offset 48 in frame #0 0x14f2cb2306f3 in ofi_addr_format src/common.c:397 This frame has 1 object(s): [32, 48) 'fmt' <== Memory access at offset 48 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.5+0xb40b8) Shadow bytes around the buggy address: 0x1000698bbca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000698bbcb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000698bbcc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000698bbcd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000698bbce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x1000698bbcf0: 00 00 00 00 00 00 f1 f1 f1 f1 00 00[f2]f2 f3 f3 0x1000698bbd00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 0x1000698bbd10: f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 0x1000698bbd20: f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 0x1000698bbd30: f2 f2 00 00 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00 0x1000698bbd40: 00 00 00 06 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Fixes: 5d31276 ("common: Redo address string conversions") Signed-off-by: Honggang Li <honli@redhat.com>

I'm not entirely sure if it is fixes the issue our QA is seeing (as they get err_entry.err=-104 - a wrong negative value), but with error injection I could easily trigger a use-after-free with the root from this function (with err_entry.err=104, though, so I still don't know where the wrong error sign came from). In my error injection reproducer ofi_send_socket() fails sometimes, which then triggers free of cm_ctx without removing the fd and cm_ctx from polling. Next poll round will then access cm_ctx and trigger a use-after-free. client_send_connreq tx_cm_data ofi_send_socket -> fails goto err ... err: free(cm_ctx) ASAN reports READ of size 4 at 0x6120000106c8 thread T4 (rpc_poll-0) #0 0x7f77005e0f21 in process_cm_ctx prov/tcp/src/tcpx_conn_mgr.c:482 ofiwg#1 0x7f77005e15ef in tcpx_conn_mgr_run prov/tcp/src/tcpx_conn_mgr.c:535 ofiwg#2 0x7f77005fc429 in tcpx_eq_read prov/tcp/src/tcpx_eq.c:48 ofiwg#3 0x4926dd in fi_eq_read /home/bschubert/local/rhel7/libfabric/include/rdma/fi_eq.h:352 0x6120000106c8 is located 8 bytes inside of 280-byte region [0x6120000106c0,0x6120000107d8) freed by thread T4 (rpc_poll-0) here: #0 0x7f77015915e7 in __interceptor_free ofiwg#1 0x7f77005e083b in client_send_connreq prov/tcp/src/tcpx_conn_mgr.c:422 ofiwg#2 0x7f77005e0f7e in process_cm_ctx prov/tcp/src/tcpx_conn_mgr.c:487 ofiwg#3 0x7f77005e15ef in tcpx_conn_mgr_run prov/tcp/src/tcpx_conn_mgr.c:535 ofiwg#4 0x7f77005fc429 in tcpx_eq_read prov/tcp/src/tcpx_eq.c:48 previously allocated by thread T5 (rpc_conn_mgr) here: #0 0x7f7701591b7e in __interceptor_calloc ofiwg#1 0x7f77005edb5c in tcpx_ep_connect prov/tcp/src/tcpx_ep.c:103 ofiwg#2 0x478b2f in fi_connect /home/bschubert/local/rhel7/libfabric/include/rdma/fi_cm.h:98 Signed-off-by: Bernd Schubert <bschubert@ddn.com>

Problem reported by Address Sanitizer: ================================================================= ==25220==ERROR: AddressSanitizer: heap-use-after-free on address 0x6270000072e0 at pc 0x00010b926a3c bp 0x700001bd1c30 sp 0x700001bd1c28 READ of size 4 at 0x6270000072e0 thread T4 #0 0x10b926a3b in sock_conn_listener_thread (libfabric.1.dylib:x86_64+0xdca3b) ofiwg#1 0x7fff7e2d5660 in _pthread_body (libsystem_pthread.dylib:x86_64+0x3660) ofiwg#2 0x7fff7e2d550c in _pthread_start (libsystem_pthread.dylib:x86_64+0x350c) ofiwg#3 0x7fff7e2d4bf8 in thread_start (libsystem_pthread.dylib:x86_64+0x2bf8) 0x6270000072e0 is located 480 bytes inside of 12944-byte region [0x627000007100,0x62700000a390) freed by thread T0 here: #0 0x10baf1a9d in wrap_free (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56a9d) ofiwg#1 0x10b9016bf in sock_ep_close (libfabric.1.dylib:x86_64+0xb76bf) ofiwg#2 0x10b7f4a8f in fi_close fabric.h:593 ofiwg#3 0x10b7f4209 in main shared_ctx.c:649 ofiwg#4 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014) previously allocated by thread T0 here: #0 0x10baf1e27 in wrap_calloc (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x56e27) ofiwg#1 0x10b906df4 in sock_alloc_endpoint (libfabric.1.dylib:x86_64+0xbcdf4) ofiwg#2 0x10b8f7fdb in sock_msg_ep (libfabric.1.dylib:x86_64+0xadfdb) ofiwg#3 0x10b7f7c93 in fi_endpoint fi_endpoint.h:164 ofiwg#4 0x10b7f5e40 in server_connect shared_ctx.c:471 ofiwg#5 0x10b7f49ba in run shared_ctx.c:573 ofiwg#6 0x10b7f411b in main shared_ctx.c:647 ofiwg#7 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014) Thread T4 created by T0 here: #0 0x10bae999d in wrap_pthread_create (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x4e99d) ofiwg#1 0x10b925f9b in sock_conn_start_listener_thread (libfabric.1.dylib:x86_64+0xdbf9b) ofiwg#2 0x10b8e7eb2 in sock_domain (libfabric.1.dylib:x86_64+0x9deb2) ofiwg#3 0x10b7f87d3 in fi_domain fi_domain.h:306 ofiwg#4 0x10b7f5c9f in server_connect shared_ctx.c:460 ofiwg#5 0x10b7f49ba in run shared_ctx.c:573 ofiwg#6 0x10b7f411b in main shared_ctx.c:647 ofiwg#7 0x7fff7dfbd014 in start (libdyld.dylib:x86_64+0x1014) The issue shows up more frequently on OS X, which emulates epoll. However, I believe the problem could occur on any platform. In sock_ep_close, we remove the socket from the epoll fd, then free the endpoint. However, if the listener thread has received an event on the socket, but has not yet started processing it, then a race can occur. The listener thread could have returned from ofi_epoll_wait, but suspended trying to acquire the signal_lock. The signal_lock is acquired from sock_ep_close, where ofi_epoll_del is called, then released. The endpoint is then freed. The listener thread can now acquire the signal_lock, where it will attempt to access the freed endpoint data. To avoid the race, we add a change boolean to the listener. That boolean is only changed while holding the signal_lock. When a socket is removed from the epollfd, we mark the listener state as 'changed'. The listener thread checks the changed state prior to processing any events. If set, it clears the state, and calls ofi_epoll_wait again to get a new set of events to process. Note that this works for epoll set to level-triggered (poll semantics). Sockets that reported events will report those same events when wait is called a second time. Sockets which were removed from the epoll set would have their events removed, as they are no longer being monitored. This fix is applied both to the listener thread and cm thread. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 #2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 #4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 #5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 #6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 #7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 #8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 ofiwg#9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 ofiwg#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 ofiwg#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <sean.hefty@intel.com>

If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 #2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 #4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 #5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 #6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 #7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 #8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 #9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 #10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 #11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <sean.hefty@intel.com>

LEX-3997: zero out MR key on fi_mr_close() * LEX-3997: zero out MR key on fi_mr_close() While running some extensive RCCL tests, the test crashed one of the systems. Debugging with GDB showed that it was a double free in rbtErase() that caused the crash. The backtrace looks as follows: .#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 .ofiwg#1 0x00007f458280c859 in __GI_abort () at abort.c:79 .ofiwg#2 0x00007f458287726e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f45829a1298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155 .ofiwg#3 0x00007f458287f2fc in malloc_printerr (str=str@entry=0x7f45829a3628 "double free or corruption (fasttop)") at malloc.c:5347 .ofiwg#4 0x00007f4582880c65 in _int_free (av=0x7f4374000020, p=0x7f437409b1f0, have_lock=0) at malloc.c:4266 .ofiwg#5 0x00007f4582719e78 in rbtErase (h=<optimized out>, i=<optimized out>) at /srcbuilddir/src/rbtree.c:353 .ofiwg#6 0x00007f458274c0c3 in lpp_mr_map_remove (lpp_mrp=0x7f437409b130) at /srcbuilddir/prov/lpp/src/lpp_mr.c:148 .ofiwg#7 lpp_mr_close_internal (lpp_mrp=0x7f437409b13… Approved-by: Eric Pilmore

redo patch for data (transfer) operations

140bd29

shefty added a commit that referenced this pull request Sep 2, 2014

Merge pull request #4 from jeffhammond/master

e532a4a

redo patch for data (transfer) operations

shefty merged commit e532a4a into ofiwg:master Sep 2, 2014

shefty mentioned this pull request Nov 5, 2014

Including usnic in build, but without using it results in crash #270

Closed

shefty mentioned this pull request Feb 26, 2015

sockets provider occasionally hangs #701

Closed

shefty mentioned this pull request Mar 6, 2015

prov/sockets: fi_cmatose hangs #725

Closed

shefty pushed a commit that referenced this pull request Mar 17, 2015

Merge pull request #4 from jsquyres/bturrubiates-topic/fi-ops-fix

0cfb34c

usdf_mem.c: add fi_enosys.h

shefty mentioned this pull request Sep 26, 2015

crash in sockets provider during finalize of fi_rdm_multi_recv #1309

Closed

shefty mentioned this pull request Oct 7, 2015

Crash in verbs provider using FI_INJECT #1349

Closed

tenbrugg mentioned this pull request Jun 27, 2016

running SNAP on 1k ranks with OpenMPI causes seg fault #2162

Closed

bturrubiates mentioned this pull request Aug 16, 2016

sockets: Size left hint behavior #2271

Closed

tonyzinger mentioned this pull request Jan 26, 2017

prov/socket segfaults in sock_ep_connect() when it tries to dereference dest_addr #2676

Closed

liuxuezhao mentioned this pull request May 26, 2017

send msg to a dead process got "-FI_EAGAIN" for socket provider #3007

Closed

arn314 mentioned this pull request Sep 13, 2017

Shared context test crashes on psm2 when it's built as a DL provider #3282

Closed

This was referenced Sep 29, 2017

prov/verbs - Assertion `buf_region->num_used == 0' failed. #3351

Closed

verbs+rxm - segfault in fi_ibv_wc_2_wce #3355

Closed

j-xiong mentioned this pull request Dec 12, 2017

prov/psm2: Fix a deadlock in connection cleanup handler #3613

Merged

swelch mentioned this pull request Aug 28, 2020

prov/verbs: account for off-by-one credit initialization #6212

Merged

Honggang-LI mentioned this pull request Dec 17, 2020

src/common.c: fix a stack-buffer-overflow issue #6466

Merged

shefty mentioned this pull request Dec 18, 2020

src/common.c: fix a stack-buffer-overflow issue #6471

Merged

frostedcmos mentioned this pull request Mar 30, 2021

DAOS: rxm crash in rxm_conn_close() on the server when client exits during rdma transfer #6665

Closed

frostedcmos mentioned this pull request Apr 20, 2021

daos: verbs;rxm - server crash when client compiled with intel-mpi #6696

Closed

frostedcmos mentioned this pull request Aug 5, 2021

DAOS: verbs;rxm - latest ofi main causes mem corruption when running at scale #6973

Closed

This was referenced Dec 8, 2021

DAOS: verbs;rxm - fi_cancel() error handling issue #7287

Closed

segfault in rxm_open_conn on master branch (NULL provider name) #7300

Closed

frostedcmos mentioned this pull request Feb 1, 2022

DAOS: tcp;ofi_rxm server segfault during failed connection attempt by the client #7417

Closed

bsbernd mentioned this pull request Feb 15, 2022

IME broken by "prov/tcp: Store a reference to the active transmit with the ep" #7443

Closed

zachdworkin mentioned this pull request Mar 5, 2022

contrib/intel/jenkins Cancel Stale Builds #7501

Merged

ghost mentioned this pull request Jun 6, 2022

prov/efa: fi_info crash in a system with mlnx but no efa defice #7805

Closed

bfaccini mentioned this pull request Jul 11, 2022

prov/verbs;ofi_rxm: rxm_handle_error():793<warn> fi_eq_readerr: err: Connection refused (111), prov_err: Unknown error -8 (-8) #7880

Closed

aingerson mentioned this pull request Oct 11, 2022

prov/psm3: race causing hangs in fi_multinode test #8090

Closed

aingerson mentioned this pull request Dec 5, 2022

fi_rdm_tagged_peek failures on occasional CI runs #8249

Closed

finjulhich mentioned this pull request May 13, 2023

prov/psm3: illegal instruction #8933

Closed

Juee14Desai mentioned this pull request Sep 15, 2023

prov/verbs: Few fabtests failing after setting FI_OFI_RXM_USE_SRX=true for verbs;ofi_rxm #9336

Closed

jordialcaraz mentioned this pull request Feb 15, 2024

prov/ofi_rxm Not working, Need core provider, skipping ofi_rxm #9820

Closed

zachdworkin mentioned this pull request Jun 25, 2024

prov/psm3: "munmap_chunk(): invalid pointer" on cleanup of fi_rdm_tagged_peek with OOB #10123

Open

iziemba mentioned this pull request Oct 22, 2024

prov/util: Change uffd stop routine to use pipe #10481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redo patch for data (transfer) operations #4

redo patch for data (transfer) operations #4

jeffhammond commented Aug 25, 2014

redo patch for data (transfer) operations #4

redo patch for data (transfer) operations #4

Conversation

jeffhammond commented Aug 25, 2014