Skip to content

Conversation

@jgunthorpe
Copy link
Member

This is the first new work for the tree, it is enough to build the entire tree in on step using cmake. The build output is intended to be the same as auto* would produce.

Now obsolete auto* and packaging files are removed.

- cm.c is a source file and should not be executable
- truescale-serdes.cmds is a script and should be executable

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
It is a mistake to not explicitly link to the libraries required.
Not linking causes the symbols to drop the symbol version which could
cause runtime problems down the road if pthreads ever goes through
another symbol version change.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
The recommended way to use this macro is at the top of the source file,
avoid globally setting it via 'gcc -D' as few source files actually
need it. In this tree we only need it in 19 out of 101 sources.

_GNU_SOURCE changes the behaviour of a few select calls away from the C99
standard and should generally be minimized.

Acked-by: Sean Hefty <sean.hefty@intel.com> (ibcm,rdmacm)
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
The standard is to use the same name for the library and .driver
file. The library is called hfi1verbs/ipathverbs so should the .driver,
add the verbs suffix.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
There is a typo in the Makefile.am that resulted in the version
script being ignored.

This changes the symbol names in the shlib, eg
    50: 0000000000001620   530 FUNC    GLOBAL DEFAULT   11 ib_cm_open_device
becomes
    46: 00000000000015f0   530 FUNC    GLOBAL DEFAULT   12 ib_cm_open_device@@IBCM_1.0

Binaries linked to the old library will continue to work with the new one.

Binaries linked to the new library will print a dynamic linker warning when
using the old library:

  ./a.out: /usr/lib/x86_64-linux-gnu/libibcm.so.1: no version information available (required by ./a.out)

But apparently continue to work.

Drepper (https://www.akkadia.org/drepper/symbol-versioning) seems to indicate
that going forward if we do decide to rev a symbol version then resolution for
binaries linked to the old library will continue to prefer the IBCM_1.0
symbol.  This makes sense as it is the basic mechanism used to introduce
symbol versions in the first place.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
auto* drift has rendered it unbuildable on modern distros.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Since librspreload is a LD_PRELOAD library it should only export
symbols it intends to override. The following internal symbols were
leaking out:

 getenv_options
 idm_clear
 idm_set
 idx_insert
 idx_remove
 idx_replace
 set_rsocket_options

The simplest way to fix this is with a map file.

Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
@jgunthorpe jgunthorpe force-pushed the stage1 branch 3 times, most recently from 17622c9 to c99f090 Compare September 20, 2016 22:39
This replaces the various autoconf/automake/libtool based schemes
unified cmake based scheme.

At this commit both schemes exist in the tree and can be run concurrently.
Except for the differences noted below they produce identical
'make install' and identical intput to cpp. This commit is intended
to be nearly 'no change' in terms of building.

An analysis of the post-install result shows the following differences:
 - cmake makes shlib symlinks libX.so -> libX.so.1 -> libX.so.1.0.0, while
   libtool does libX.so -> libX.so.1.0.0
 - librspreload's non-link name is lib/rsocket/librspreload.so (not .so.1)
   This correctly reflects the fact it is a soname-less LD_PRELOAD library.
   Symlinks are maintained for the other two incorrect names.
 - No static version of librspreload is produced. This library is only
   useful for LD_PRELOAD and cannot be statically linked to.
 - The provider shared library plugins and the LD_PRELOAD library
   have no SONAME. This is standard for plugin libraries.
 - The plugin shared libraries are not marked executable
 - -std=gnu99 is turned on globally
 - All shlibs have correct shared library dependencies (--as-needed and
   --no-undefined are turned on).
   Several binaries drop their pthreads dependency as they don't use it.
 - NDEBUG is controlled globally.
   Previously only libcxgb4 explicitly defined it
   Distros force this to be set during package built so this is no
   change
 - libtool *.la files are produced by cmake since libtool is not used
   and have a few minor differences:
   - Providers do not list the 'libX.so' bogus name, the .so is called
     libX-rdmav2.so
   - version information (current/age/revision) is bogus
   - The .la files are not marked executable

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
cmake 2.8.12 is also from 2013, however .12 is broadly compatible with
versions up 3.6. However, there were significant changes between .11 and .12,
such that .11 doesn't trivially work.

For some reason Centos 7 only includes 2.8.11 (Centos 6 includes .12). Even
though cmake 3.5.x is available from EPEL, make it easy for everyone and patch
in .11 support, this can be reverted someday.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
jgunthorpe and others added 7 commits September 21, 2016 14:56
This is in the same format as the Kernel's version of this file
and reflects who is responsible for each portion of the tree.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
The unified cmake version provides the same functionality now.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
These are either empty, an old copy of 'git log' or some very
old logs.

New stuff uses git, the changelog is in git, no need to keep these.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
These no longer apply to the combined build.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Building these is no longer supported.

Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
These were all almost the same and add no real value over the top-level
README.md file.  The only lost information is the exact names of supported
cards for some providers, but that already is information duplicated from
the kernel drivers and bound to get out of sync anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
No useful information left in it for the modern-day package.

Signed-off-by: Christoph Hellwig <hch@lst.de>
@jgunthorpe jgunthorpe merged commit 35929d4 into linux-rdma:master Sep 21, 2016
@jgunthorpe jgunthorpe deleted the stage1 branch September 21, 2016 21:24
@yishaih yishaih mentioned this pull request May 9, 2017
Hakon-Bugge added a commit to Hakon-Bugge/rdma-core that referenced this pull request Nov 22, 2018
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer
is performed. But the subsequent memcpy(), which uses the supplied
addr_len as argument, copies the whole shebang. This implies that the
provider is called with an address with arbitrary data padded.

This leads to a false mis-compare in the default provider's binary
tree lookup. Here is the stack trace and dump of the address buffer
from gdb (edited for better brevity):

(gdb) where
 #0  acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289
 linux-rdma#1  tfind () from /lib64/libc.so.6
 linux-rdma#2  acmp_get_dest () at prov/acmp/src/acmp.c:336
 linux-rdma#3  acmp_acquire_dest () at prov/acmp/src/acmp.c:379
 linux-rdma#4  acmp_add_addr () at prov/acmp/src/acmp.c:2385
 linux-rdma#5  acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044
 linux-rdma#6  acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325
 linux-rdma#7  acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326
 linux-rdma#8  acm_ipnl_handler () at src/acm.c:1453
 linux-rdma#9  acm_server () at src/acm.c:1884
 linux-rdma#10 main () at src/acm.c:3245

(gdb) x/20u dest1
0x18c46a8:  192 168     200     200     155     127     0       0
0x18c46b0:  95  184     77      105     155     127     0       0
0x18c46b8:  0   0       64      49
(gdb) x/20u dest2
0x18c5d70:  192 168     200     200     0       0       0       0
0x18c5d78:  0   0       0       0       0       0       0       0
0x18c5d80:  0   0       0       0

The fix is to use the real length of the address in the memcpy() in
acm_ep_insert_addr(). This is derived from the addr_type. Hence, we
can re-factor and remove the addr_len from the call stack.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Mark Haywood <mark.haywood@oracle.com>
Hakon-Bugge added a commit to Hakon-Bugge/rdma-core that referenced this pull request Nov 23, 2018
In acm_addr_lookup(), an address compare is performed. It compares
ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the
actual address length, as given by addr_type, may contain arbitrary
data.

For example, in acm_svr_select_src() is only the valid bytes for an
IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data().

Here is an example from debugging with gdb, slightly edited for better brevity:

(gdb) where
 #0  acm_addr_lookup () at src/acm.c:419
 linux-rdma#1  acm_get_port_ep_address () at src/acm.c:829
 linux-rdma#2  acm_get_ep_address () at src/acm.c:848
 linux-rdma#3  acm_rm_ep_ip () at src/acm.c:1322
 linux-rdma#4  acm_ipnl_handler () at src/acm.c:1452
 linux-rdma#5  acm_server () at src/acm.c:1867
 linux-rdma#6  main () at src/acm.c:3228

(gdb) x/16u ep->addr_info[i].addr.info.addr
0x1da66a8:  192 168     200     200     0       0       0       0
0x1da66b0:  0   0       0       0       0       0       0       0

(gdb) x/16u addr
0x7ffd165ca9f8: 192     168     200     200     62      127     0       0
0x7ffd165caa00: 95      8       14      129     62      127     0       0

(gdb) p addr_type
$5 = 2 '\002'

addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4
addresses are equal, but the compare detects different addresses,
because the full ACM_MAX_ADDRESS is used.

By introducing a helper function comparing names or addresses, the
actual length is used for addresses, and the functions
acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

v1 -> v2: Fixed Travis issue
Hakon-Bugge added a commit to Hakon-Bugge/rdma-core that referenced this pull request Nov 23, 2018
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer
is performed. But the subsequent memcpy(), which uses the supplied
addr_len as argument, copies the whole shebang. This implies that the
provider is called with an address with arbitrary data padded.

This leads to a false mis-compare in the default provider's binary
tree lookup. Here is the stack trace and dump of the address buffer
from gdb (edited for better brevity):

(gdb) where
 #0  acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289
 linux-rdma#1  tfind () from /lib64/libc.so.6
 linux-rdma#2  acmp_get_dest () at prov/acmp/src/acmp.c:336
 linux-rdma#3  acmp_acquire_dest () at prov/acmp/src/acmp.c:379
 linux-rdma#4  acmp_add_addr () at prov/acmp/src/acmp.c:2385
 linux-rdma#5  acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044
 linux-rdma#6  acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325
 linux-rdma#7  acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326
 linux-rdma#8  acm_ipnl_handler () at src/acm.c:1453
 linux-rdma#9  acm_server () at src/acm.c:1884
 linux-rdma#10 main () at src/acm.c:3245

(gdb) x/20u dest1
0x18c46a8:  192 168     200     200     155     127     0       0
0x18c46b0:  95  184     77      105     155     127     0       0
0x18c46b8:  0   0       64      49
(gdb) x/20u dest2
0x18c5d70:  192 168     200     200     0       0       0       0
0x18c5d78:  0   0       0       0       0       0       0       0
0x18c5d80:  0   0       0       0

The fix is to use the real length of the address in the memcpy() in
acm_ep_insert_addr(). This is derived from the addr_type. Hence, we
can re-factor and remove the addr_len from the call stack.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Mark Haywood <mark.haywood@oracle.com>
rosenbaumalex pushed a commit to rosenbaumalex/rdma-core that referenced this pull request Jan 7, 2019
In acm_addr_lookup(), an address compare is performed. It compares
ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the
actual address length, as given by addr_type, may contain arbitrary
data.

For example, in acm_svr_select_src() is only the valid bytes for an
IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data().

Here is an example from debugging with gdb, slightly edited for better brevity:

(gdb) where
 #0  acm_addr_lookup () at src/acm.c:419
 linux-rdma#1  acm_get_port_ep_address () at src/acm.c:829
 linux-rdma#2  acm_get_ep_address () at src/acm.c:848
 linux-rdma#3  acm_rm_ep_ip () at src/acm.c:1322
 linux-rdma#4  acm_ipnl_handler () at src/acm.c:1452
 linux-rdma#5  acm_server () at src/acm.c:1867
 linux-rdma#6  main () at src/acm.c:3228

(gdb) x/16u ep->addr_info[i].addr.info.addr
0x1da66a8:  192 168     200     200     0       0       0       0
0x1da66b0:  0   0       0       0       0       0       0       0

(gdb) x/16u addr
0x7ffd165ca9f8: 192     168     200     200     62      127     0       0
0x7ffd165caa00: 95      8       14      129     62      127     0       0

(gdb) p addr_type
$5 = 2 '\002'

addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4
addresses are equal, but the compare detects different addresses,
because the full ACM_MAX_ADDRESS is used.

By introducing a helper function comparing names or addresses, the
actual length is used for addresses, and the functions
acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

v1 -> v2: Fixed Travis issue
rosenbaumalex pushed a commit to rosenbaumalex/rdma-core that referenced this pull request Jan 7, 2019
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer
is performed. But the subsequent memcpy(), which uses the supplied
addr_len as argument, copies the whole shebang. This implies that the
provider is called with an address with arbitrary data padded.

This leads to a false mis-compare in the default provider's binary
tree lookup. Here is the stack trace and dump of the address buffer
from gdb (edited for better brevity):

(gdb) where
 #0  acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289
 linux-rdma#1  tfind () from /lib64/libc.so.6
 linux-rdma#2  acmp_get_dest () at prov/acmp/src/acmp.c:336
 linux-rdma#3  acmp_acquire_dest () at prov/acmp/src/acmp.c:379
 linux-rdma#4  acmp_add_addr () at prov/acmp/src/acmp.c:2385
 linux-rdma#5  acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044
 linux-rdma#6  acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325
 linux-rdma#7  acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326
 linux-rdma#8  acm_ipnl_handler () at src/acm.c:1453
 linux-rdma#9  acm_server () at src/acm.c:1884
 linux-rdma#10 main () at src/acm.c:3245

(gdb) x/20u dest1
0x18c46a8:  192 168     200     200     155     127     0       0
0x18c46b0:  95  184     77      105     155     127     0       0
0x18c46b8:  0   0       64      49
(gdb) x/20u dest2
0x18c5d70:  192 168     200     200     0       0       0       0
0x18c5d78:  0   0       0       0       0       0       0       0
0x18c5d80:  0   0       0       0

The fix is to use the real length of the address in the memcpy() in
acm_ep_insert_addr(). This is derived from the addr_type. Hence, we
can re-factor and remove the addr_len from the call stack.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Mark Haywood <mark.haywood@oracle.com>
jgunthorpe referenced this pull request in jgunthorpe/rdma-plumbing Feb 19, 2019
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Mar 27, 2019
MLX5_INLINE_SCATTER_{32,64} is an optimization inside the MLX5 driver,
that is triggered by the presence of flag MLX5_QP_FLAG_SCATTER_CQE.

It's best to understand first how SHPD works to understand the
underlying issue:

With Oracle's SHPD-hack, a memory region registered by process#0 is
shared by other processes (i.e. mapped into their address space).

That mapping is not guaranteed to land on the same virtual address in
process#1... and even "encouraged" to land on a different virtual
addresses if sysctl kernel.randomize_va_space is > 0.

What the client process does it calculate the delta between the address
that processs#0 created the shared segment at, and the address that said
shared memory segment landed on in its own address space. It then adds
that delta to the virtual addresses used in verbs such as
"ibv_post_recv".

Let's say process #0 has a shared memory segment at virtual address
0x2000 and process linux-rdma#1 has the same shared memory segment at 0x1000.

So process linux-rdma#1 would now "fake" the address by adding "0x1000" to each
request. Assuming it is only the HCA that uses said addresses, and the
memory-translation-table inside the HCA would make it land on the
correct address, as it doesn't really care which process is submitting
this request (it only checks for the PD to be correct... hence
shared-pd).

That works for as long as ONLY the HCA is using the address. If process
case (such as in this bug report), or in the worst-case it'll overwrite
some memory it doesn't really own.

So we simply turn off the MLX5_INLINE_SCATTER_{32,64} feature for QPs
created on behalf of PDs that are considered "inherited", e.g.  as
a result of calling "ibv_share_pd".

Orabug: 27949058

Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton added a commit to oracle/rdma-core that referenced this pull request Mar 27, 2019
Load the 'resilient_rdmaip' kernel module using the upstream kernel-boot
hot-plug framework.

NB: In the past, changes were made to the network service
(/etc/rc.d/init.d/network) to make it aware of the RDMAIP module. Those
changes include:

1. Disabling active/active bonding when the network service is stopped
   and enabling active/active bonding when the network service is
   started. In this context, stopping the network service means running
   'ifdown' on all interfaces so it is not clear why RDMAIP would still
   be atempting to do anything with the interfaces when they are down.
   RDMAIP should use netlink or other facilities to be aware of the
   interface state.

2. Triggering RDMAIP to run its bonding code when interfaces are up.
   This is apparently done to avoid some sort of delay in how long it
   might otherwise take RDMAIP to bond the interfaces. Again, this
   should be done automatically by having RDMAIP use netlink to be aware
   of interfaces state changes.

At this time, these changes are not included.

For linux-rdma#1: The network service used previously is no longer being used.
Instead, network interface configuration is done using systemd-networkd.
The systemd-networkd service has no concept of starting and stopping the
network. It is a network interface configuration service and stopping it
does not change the state of any of the interfaces it manages. Under
systemd, it is not possible to stop a service or socket and have all of
the netork interfaces shutdown in the way that would happen in the past
with "service network stop". If it is necessary to start and stop RDMAIP
then this must be done based on the state of the interfaces that RDMAIP
is currently bonding and not on a systemd-wide basis.

For linux-rdma#2: Again, there is no single service that enables all interfaces or
disables all interfaces. Furthermore, the concept of the network being
"up" in the current version of systemd is that the network is "up" as
soon as a single non-loopback interface has a routable IP addresses and
network connectivity. The network online target does NOT indicate that
all interfaces are active. It is possible, in some situations, to use
a 'systemd-networkd-wait-online.service' to have a service unit wait on
a particular interface but this requires prior knowledge of which
interface(s) to wait on and hardcoding this information into a service
unit. RDMAIP should be made aware of the state of the interfaces it is
bonding and run its actions when both initially reach link up.

Orabug: 28782057

Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Mar 27, 2019
In acm_addr_lookup(), an address compare is performed. It compares
ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the
actual address length, as given by addr_type, may contain arbitrary
data.

For example, in acm_svr_select_src() is only the valid bytes for an
IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data().

Here is an example from debugging with gdb, slightly edited for better brevity:

(gdb) where
 #0  acm_addr_lookup () at src/acm.c:419
 linux-rdma#1  acm_get_port_ep_address () at src/acm.c:829
 linux-rdma#2  acm_get_ep_address () at src/acm.c:848
 linux-rdma#3  acm_rm_ep_ip () at src/acm.c:1322
 linux-rdma#4  acm_ipnl_handler () at src/acm.c:1452
 linux-rdma#5  acm_server () at src/acm.c:1867
 linux-rdma#6  main () at src/acm.c:3228

(gdb) x/16u ep->addr_info[i].addr.info.addr
0x1da66a8:  192 168     200     200     0       0       0       0
0x1da66b0:  0   0       0       0       0       0       0       0

(gdb) x/16u addr
0x7ffd165ca9f8: 192     168     200     200     62      127     0       0
0x7ffd165caa00: 95      8       14      129     62      127     0       0

(gdb) p addr_type
$5 = 2 '\002'

addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4
addresses are equal, but the compare detects different addresses,
because the full ACM_MAX_ADDRESS is used.

By introducing a helper function comparing names or addresses, the
actual length is used for addresses, and the functions
acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

v1 -> v2: Fixed Travis issue

Orabug: 29037253

(cherry picked from commit c562033)
cherry-pick-repo=linux-rdma/rdma-core.git
unmodified-from-upstream: c562033

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Mar 27, 2019
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer
is performed. But the subsequent memcpy(), which uses the supplied
addr_len as argument, copies the whole shebang. This implies that the
provider is called with an address with arbitrary data padded.

This leads to a false mis-compare in the default provider's binary
tree lookup. Here is the stack trace and dump of the address buffer
from gdb (edited for better brevity):

(gdb) where
 #0  acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289
 linux-rdma#1  tfind () from /lib64/libc.so.6
 linux-rdma#2  acmp_get_dest () at prov/acmp/src/acmp.c:336
 linux-rdma#3  acmp_acquire_dest () at prov/acmp/src/acmp.c:379
 linux-rdma#4  acmp_add_addr () at prov/acmp/src/acmp.c:2385
 linux-rdma#5  acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044
 linux-rdma#6  acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325
 linux-rdma#7  acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326
 linux-rdma#8  acm_ipnl_handler () at src/acm.c:1453
 linux-rdma#9  acm_server () at src/acm.c:1884
 linux-rdma#10 main () at src/acm.c:3245

(gdb) x/20u dest1
0x18c46a8:  192 168     200     200     155     127     0       0
0x18c46b0:  95  184     77      105     155     127     0       0
0x18c46b8:  0   0       64      49
(gdb) x/20u dest2
0x18c5d70:  192 168     200     200     0       0       0       0
0x18c5d78:  0   0       0       0       0       0       0       0
0x18c5d80:  0   0       0       0

The fix is to use the real length of the address in the memcpy() in
acm_ep_insert_addr(). This is derived from the addr_type. Hence, we
can re-factor and remove the addr_len from the call stack.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Mark Haywood <mark.haywood@oracle.com>

Orabug: 29037270

(cherry picked from commit c73f5d7)
cherry-pick-repo=linux-rdma/rdma-core.git
unmodified-from-upstream: c73f5d7

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Mar 27, 2019
In acm_addr_lookup(), an address compare is performed. It compares
ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the
actual address length, as given by addr_type, may contain arbitrary
data.

For example, in acm_svr_select_src() is only the valid bytes for an
IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data().

Here is an example from debugging with gdb, slightly edited for better brevity:

(gdb) where
 #0  acm_addr_lookup () at src/acm.c:419
 linux-rdma#1  acm_get_port_ep_address () at src/acm.c:829
 linux-rdma#2  acm_get_ep_address () at src/acm.c:848
 linux-rdma#3  acm_rm_ep_ip () at src/acm.c:1322
 linux-rdma#4  acm_ipnl_handler () at src/acm.c:1452
 linux-rdma#5  acm_server () at src/acm.c:1867
 linux-rdma#6  main () at src/acm.c:3228

(gdb) x/16u ep->addr_info[i].addr.info.addr
0x1da66a8:  192 168     200     200     0       0       0       0
0x1da66b0:  0   0       0       0       0       0       0       0

(gdb) x/16u addr
0x7ffd165ca9f8: 192     168     200     200     62      127     0       0
0x7ffd165caa00: 95      8       14      129     62      127     0       0

(gdb) p addr_type
$5 = 2 '\002'

addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4
addresses are equal, but the compare detects different addresses,
because the full ACM_MAX_ADDRESS is used.

By introducing a helper function comparing names or addresses, the
actual length is used for addresses, and the functions
acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

v1 -> v2: Fixed Travis issue

Orabug: 29037253

(cherry picked from commit c562033)
cherry-pick-repo=linux-rdma/rdma-core.git
unmodified-from-upstream: c562033

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Mar 27, 2019
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer
is performed. But the subsequent memcpy(), which uses the supplied
addr_len as argument, copies the whole shebang. This implies that the
provider is called with an address with arbitrary data padded.

This leads to a false mis-compare in the default provider's binary
tree lookup. Here is the stack trace and dump of the address buffer
from gdb (edited for better brevity):

(gdb) where
 #0  acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289
 linux-rdma#1  tfind () from /lib64/libc.so.6
 linux-rdma#2  acmp_get_dest () at prov/acmp/src/acmp.c:336
 linux-rdma#3  acmp_acquire_dest () at prov/acmp/src/acmp.c:379
 linux-rdma#4  acmp_add_addr () at prov/acmp/src/acmp.c:2385
 linux-rdma#5  acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044
 linux-rdma#6  acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325
 linux-rdma#7  acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326
 linux-rdma#8  acm_ipnl_handler () at src/acm.c:1453
 linux-rdma#9  acm_server () at src/acm.c:1884
 linux-rdma#10 main () at src/acm.c:3245

(gdb) x/20u dest1
0x18c46a8:  192 168     200     200     155     127     0       0
0x18c46b0:  95  184     77      105     155     127     0       0
0x18c46b8:  0   0       64      49
(gdb) x/20u dest2
0x18c5d70:  192 168     200     200     0       0       0       0
0x18c5d78:  0   0       0       0       0       0       0       0
0x18c5d80:  0   0       0       0

The fix is to use the real length of the address in the memcpy() in
acm_ep_insert_addr(). This is derived from the addr_type. Hence, we
can re-factor and remove the addr_len from the call stack.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Mark Haywood <mark.haywood@oracle.com>

Orabug: 29037270

(cherry picked from commit c73f5d7)
cherry-pick-repo=linux-rdma/rdma-core.git
unmodified-from-upstream: c73f5d7

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Apr 9, 2019
In acm_addr_lookup(), an address compare is performed. It compares
ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the
actual address length, as given by addr_type, may contain arbitrary
data.

For example, in acm_svr_select_src() is only the valid bytes for an
IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data().

Here is an example from debugging with gdb, slightly edited for better brevity:

(gdb) where
 #0  acm_addr_lookup () at src/acm.c:419
 linux-rdma#1  acm_get_port_ep_address () at src/acm.c:829
 linux-rdma#2  acm_get_ep_address () at src/acm.c:848
 linux-rdma#3  acm_rm_ep_ip () at src/acm.c:1322
 linux-rdma#4  acm_ipnl_handler () at src/acm.c:1452
 linux-rdma#5  acm_server () at src/acm.c:1867
 linux-rdma#6  main () at src/acm.c:3228

(gdb) x/16u ep->addr_info[i].addr.info.addr
0x1da66a8:  192 168     200     200     0       0       0       0
0x1da66b0:  0   0       0       0       0       0       0       0

(gdb) x/16u addr
0x7ffd165ca9f8: 192     168     200     200     62      127     0       0
0x7ffd165caa00: 95      8       14      129     62      127     0       0

(gdb) p addr_type
$5 = 2 '\002'

addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4
addresses are equal, but the compare detects different addresses,
because the full ACM_MAX_ADDRESS is used.

By introducing a helper function comparing names or addresses, the
actual length is used for addresses, and the functions
acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

v1 -> v2: Fixed Travis issue

Orabug: 29037253

(cherry picked from commit c562033)
cherry-pick-repo=linux-rdma/rdma-core.git
unmodified-from-upstream: c562033

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Signed-off-by: Aron Silverton <aron.silverton@oracle.com>

Orabug: 29410510

Rebase from RDMA Core 19.2 -> 20.2.

(cherry picked from commit bbd44792)
cherry-pick-repo=linux-git/RDMA/rdma-core.git
unmodified-from-upstream: bbd44792

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Apr 9, 2019
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer
is performed. But the subsequent memcpy(), which uses the supplied
addr_len as argument, copies the whole shebang. This implies that the
provider is called with an address with arbitrary data padded.

This leads to a false mis-compare in the default provider's binary
tree lookup. Here is the stack trace and dump of the address buffer
from gdb (edited for better brevity):

(gdb) where
 #0  acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289
 linux-rdma#1  tfind () from /lib64/libc.so.6
 linux-rdma#2  acmp_get_dest () at prov/acmp/src/acmp.c:336
 linux-rdma#3  acmp_acquire_dest () at prov/acmp/src/acmp.c:379
 linux-rdma#4  acmp_add_addr () at prov/acmp/src/acmp.c:2385
 linux-rdma#5  acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044
 linux-rdma#6  acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325
 linux-rdma#7  acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326
 linux-rdma#8  acm_ipnl_handler () at src/acm.c:1453
 linux-rdma#9  acm_server () at src/acm.c:1884
 linux-rdma#10 main () at src/acm.c:3245

(gdb) x/20u dest1
0x18c46a8:  192 168     200     200     155     127     0       0
0x18c46b0:  95  184     77      105     155     127     0       0
0x18c46b8:  0   0       64      49
(gdb) x/20u dest2
0x18c5d70:  192 168     200     200     0       0       0       0
0x18c5d78:  0   0       0       0       0       0       0       0
0x18c5d80:  0   0       0       0

The fix is to use the real length of the address in the memcpy() in
acm_ep_insert_addr(). This is derived from the addr_type. Hence, we
can re-factor and remove the addr_len from the call stack.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Mark Haywood <mark.haywood@oracle.com>

Orabug: 29037270

(cherry picked from commit c73f5d7)
cherry-pick-repo=linux-rdma/rdma-core.git
unmodified-from-upstream: c73f5d7

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Signed-off-by: Aron Silverton <aron.silverton@oracle.com>

Orabug: 29410510

Rebase from RDMA Core 19.2 -> 20.2.

(cherry picked from commit fc2e7b4b)
cherry-pick-repo=linux-git/RDMA/rdma-core.git
unmodified-from-upstream: fc2e7b4b

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
nmorey pushed a commit that referenced this pull request Feb 10, 2020
[ Upstream commit d04d466 ]

This patch fixes the following Coverity complaint:

CID 1490689 (#1 of 1): Resource leak (RESOURCE_LEAK)
15. leaked_storage: Variable dev going out of scope leaks the storage it points to.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
nmorey pushed a commit that referenced this pull request Feb 10, 2020
[ Upstream commit d04d466 ]

This patch fixes the following Coverity complaint:

CID 1490689 (#1 of 1): Resource leak (RESOURCE_LEAK)
15. leaked_storage: Variable dev going out of scope leaks the storage it points to.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
nmorey pushed a commit that referenced this pull request Feb 10, 2020
[ Upstream commit d04d466 ]

This patch fixes the following Coverity complaint:

CID 1490689 (#1 of 1): Resource leak (RESOURCE_LEAK)
15. leaked_storage: Variable dev going out of scope leaks the storage it points to.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Nov 16, 2020
In acm_addr_lookup(), an address compare is performed. It compares
ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the
actual address length, as given by addr_type, may contain arbitrary
data.

For example, in acm_svr_select_src() is only the valid bytes for an
IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data().

Here is an example from debugging with gdb, slightly edited for better brevity:

(gdb) where
 #0  acm_addr_lookup () at src/acm.c:419
 linux-rdma#1  acm_get_port_ep_address () at src/acm.c:829
 linux-rdma#2  acm_get_ep_address () at src/acm.c:848
 linux-rdma#3  acm_rm_ep_ip () at src/acm.c:1322
 linux-rdma#4  acm_ipnl_handler () at src/acm.c:1452
 linux-rdma#5  acm_server () at src/acm.c:1867
 linux-rdma#6  main () at src/acm.c:3228

(gdb) x/16u ep->addr_info[i].addr.info.addr
0x1da66a8:  192 168     200     200     0       0       0       0
0x1da66b0:  0   0       0       0       0       0       0       0

(gdb) x/16u addr
0x7ffd165ca9f8: 192     168     200     200     62      127     0       0
0x7ffd165caa00: 95      8       14      129     62      127     0       0

(gdb) p addr_type
$5 = 2 '\002'

addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4
addresses are equal, but the compare detects different addresses,
because the full ACM_MAX_ADDRESS is used.

By introducing a helper function comparing names or addresses, the
actual length is used for addresses, and the functions
acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

v1 -> v2: Fixed Travis issue

Orabug: 29037253

(cherry picked from commit c562033)
cherry-pick-repo=github.com/linux-rdma/rdma-core.git
unmodified-from-upstream: c562033

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Acked-by: Aron Silverton <aron.silverton@oracle.com>

Orabug: 29410510

Rebase from RDMA Core 19.2 -> 20.2.

(cherry picked from commit 8763162)
cherry-pick-repo=linux-git.us.oracle.com/RDMA/rdma-core.git
unmodified-from-upstream: 8763162

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Acked-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Nov 16, 2020
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer
is performed. But the subsequent memcpy(), which uses the supplied
addr_len as argument, copies the whole shebang. This implies that the
provider is called with an address with arbitrary data padded.

This leads to a false mis-compare in the default provider's binary
tree lookup. Here is the stack trace and dump of the address buffer
from gdb (edited for better brevity):

(gdb) where
 #0  acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289
 linux-rdma#1  tfind () from /lib64/libc.so.6
 linux-rdma#2  acmp_get_dest () at prov/acmp/src/acmp.c:336
 linux-rdma#3  acmp_acquire_dest () at prov/acmp/src/acmp.c:379
 linux-rdma#4  acmp_add_addr () at prov/acmp/src/acmp.c:2385
 linux-rdma#5  acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044
 linux-rdma#6  acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325
 linux-rdma#7  acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326
 linux-rdma#8  acm_ipnl_handler () at src/acm.c:1453
 linux-rdma#9  acm_server () at src/acm.c:1884
 linux-rdma#10 main () at src/acm.c:3245

(gdb) x/20u dest1
0x18c46a8:  192 168     200     200     155     127     0       0
0x18c46b0:  95  184     77      105     155     127     0       0
0x18c46b8:  0   0       64      49
(gdb) x/20u dest2
0x18c5d70:  192 168     200     200     0       0       0       0
0x18c5d78:  0   0       0       0       0       0       0       0
0x18c5d80:  0   0       0       0

The fix is to use the real length of the address in the memcpy() in
acm_ep_insert_addr(). This is derived from the addr_type. Hence, we
can re-factor and remove the addr_len from the call stack.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Mark Haywood <mark.haywood@oracle.com>

Orabug: 29037270

(cherry picked from commit c73f5d7)
cherry-pick-repo=github.com/linux-rdma/rdma-core.git
unmodified-from-upstream: c73f5d7

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Acked-by: Aron Silverton <aron.silverton@oracle.com>

Orabug: 29410510

Rebase from RDMA Core 19.2 -> 20.2.

(cherry picked from commit 303f845)
cherry-pick-repo=linux-git.us.oracle.com/RDMA/rdma-core.git
unmodified-from-upstream: 303f845

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Acked-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton pushed a commit to oracle/rdma-core that referenced this pull request Nov 16, 2020
This patch fixes the following Coverity complaint:

CID 1490689 (linux-rdma#1 of 1): Resource leak (RESOURCE_LEAK)
15. leaked_storage: Variable dev going out of scope leaks the storage it points to.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Orabug: 31219526

(cherry picked from commit d04d466)
cherry-pick-repo=github.com/linux-rdma/rdma-core.git
unmodified-from-upstream: d04d466

Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
@rleon rleon mentioned this pull request Jun 8, 2021
rleon pushed a commit that referenced this pull request Nov 12, 2025
Fix the issue that rdma_resolve_addrinfo() gets deadlock when run in
sync mode:
 (gdb) bt
 #0  futex_wait
 #1  __GI___lll_lock_wait
 #2  0x00007ffff7dae791 in lll_mutex_lock_optimized
 #3  ___pthread_mutex_lock
 #4  0x00007ffff7f9f018 in ucma_process_addrinfo_resolved
 #5  0x00007ffff7fa1447 in rdma_get_cm_event
 #6  0x00007ffff7fa1fef in ucma_complete
 #7  0x00007ffff7fa2f9c in resolve_ai_sa
 #8  0x00007ffff7fa36ab in __rdma_resolve_addrinfo
 #9  rdma_resolve_addrinfo
 #10 0x00000000004017b6 in start_cm_client_sync
 #11 0x00000000004018ee in main

Fixes: 7b1a686 ("librdmacm: Provide interfaces to resolve IB services")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Sean Hefty <shefty@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
nmorey pushed a commit that referenced this pull request Nov 21, 2025
[ Upstream commit 7528827 ]

Fix the issue that rdma_resolve_addrinfo() gets deadlock when run in
sync mode:
 (gdb) bt
 #0  futex_wait
 #1  __GI___lll_lock_wait
 #2  0x00007ffff7dae791 in lll_mutex_lock_optimized
 #3  ___pthread_mutex_lock
 #4  0x00007ffff7f9f018 in ucma_process_addrinfo_resolved
 #5  0x00007ffff7fa1447 in rdma_get_cm_event
 #6  0x00007ffff7fa1fef in ucma_complete
 #7  0x00007ffff7fa2f9c in resolve_ai_sa
 #8  0x00007ffff7fa36ab in __rdma_resolve_addrinfo
 #9  rdma_resolve_addrinfo
 #10 0x00000000004017b6 in start_cm_client_sync
 #11 0x00000000004018ee in main

Fixes: 7b1a686 ("librdmacm: Provide interfaces to resolve IB services")
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Sean Hefty <shefty@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Nicolas Morey <nmorey@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant