-
Notifications
You must be signed in to change notification settings - Fork 801
Warning fixes #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We setup the two standard CMake build_types so that Release includes NDEBUG and RelWithDebInfo does not (by default CMake sets it in both). The recommendation is for packagers to use Release (by setting -DCMAKE_BUILD_TYPE=Release) and developers use RelWithDebInfo (the default) This also replaces the default flags for Release with the RelWithDebInfo, flags (-O2 -g -DNDEBUG) which is what we consider suitable for packaging. The CMake default of -O3 is not tested. Note that all the packaging systems I looked at force NDEBUG into the CFLAGS. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
We now recommend that this source be built with valgrind memcheck.h present, so use it automatically if it is available. Users looking to remove this tiny overhead can build with -DENABLE_VALGRIND=0 Downstream packagers should ensure the build is done with valgrind headers available. NOTE: Fedora/CentOS have shipped with valgrind turn on in their packaging, so for most users this is no change. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
These were preserved as part of the cmake transition, but no distributor uses them and we don't need them internally, so time for them to go. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
We no longer recommend that static libraries are distributed, this never worked sanely for libibverbs. Use: cmake -DENABLE_STATIC=1 To restore the old behaviour Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
rsocket.7 had an errant text substitution that never worked, it is a good idea to have the man pages use the correct paths, so let us have cmake run them through. Any man page ending in '.in' will be substituted automatically. Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
config.h is the only place we pass through cmake substitution, so it is the only place that can define the various filesystem paths. This patch handles the C code portions that use paths. Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
This removes hardwired paths from the documentation and broadly makes the documentation and scripts match what the C code is now doing. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
'/var/tmp' is an inappropriate places for lock files of this nature, they belong in /var/run. /var/lock does not seem suitable because this lock is not against a basic device node. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
The Debian packaging has always used this path, provide official support for this configuration so Debian does not rely on the absolute path in the .driver file, which breaks biarch. Since there is no reason for the providers to be in the system library search path (they export no symbols, and have no soname) make this the default configuration. The old behaviour can be restored by using: cmake -DVERBS_PROVIDER_DIR='' This continues to support out-of-tree drivers by searching both the provider path and the system library path if an unqualified name is given in the .driver file. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
This is the FHS defined place for non-user runnable helper programs. Debian forbids the use of /usr/libexec/ so we provide substitution support to let cmake customize this. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
This is particularly important for the three shared libraries, and we haven't been doing it right historically, perhaps due to libtool braindamage. The names of the shlibs are updated to: libibcm 1.0.11 libibumad 3.1.11 libibverbs 1.3.11 librdmacm 1.1.11 The SONAME remains the same. The overall package release is set to 11 due to libibumad having got up to a .10 release. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Turns out this is not a mailing list. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Necessary to use the Travis CI service. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
In C99 casting a pointer to an integer should always be done via uintptr_t. When compiling on 32 bit all these sites produce warnings. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
A u64 value needs to be casted to uintptr_t before being converted back into a pointer, otherwise gcc produces warning on a 32 bit build. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
It appears the original intent was to zero the reaminder of the struct. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
gcc 6.1 remarks: ../ibacm/src/acme.c:1069:6: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized] if (!ret && make_addr) This is because query_svcs() can return without setting ret if svc_list is empty. It looks like parse() probably cannot return an empty list, so avoid the compiler warning by initing to -1. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
gcc observes: ../ibacm/src/acm.c:3007:3: error: ignoring return value of 'lockf', declared with attribute warn_unused_result [-Werror=unused-result] lockf(lock_fd, F_ULOCK, 0); lockf locks are only held so long as the FD is open, so there is no reason to unlock it before calling close. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Otherwise a 32bit compile will print garbage for the GUID. igned-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
The signededness of a 'enum X' is undefined in the C standard, compilers are free to use any type they like. So, coercing -1 into an enum and expecting '< 0' to work is undefined behaviour, and as the warning shows at least clang miscompiles this code. Instead use 0 to indicate undefined MTU from pp_mtu_to_enum, 0 is unused in the mtu enum. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
gcc remarks: ../libibverbs/src/neigh.c:339:6: warning: 'sock_fd' may be used uninitialized in this function [-Wmaybe-uninitialized] err = try_send_to(sock_fd, buff, sizeof(buff), &addr_dst); But this is bogus because create_socket will always return an error if it does not set psock_fd. It looks like the insane if logic is just a tish too much for gcc to handle. Since the result of create_socket is discarded anyhow, simplify the tortured logic. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
../providers/nes/nes_uverbs.c:489:5: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
if (++nesuqp->rq_tail >= nesuqp->rq_size)
^~
../providers/nes/nes_uverbs.c:491:6: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
if (entry->status == NES_CQ_BUF_OV_ERR)
^~
Presumably this has been tested as is, so I've opted to preserve the
behaviour, but I can't tell if that is right or not.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
convert_send_wr switches incompletely on an enum. Assume the intent was to not do any copies for other enum members and dummy them in. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Mostly just delete cruft. nes has a number of unused related to HAVE_DECL_IBV_QPT_RAW_ETH, perhaps this code should be deleted entirely because whatever QPT_RAW_ETH is, it is not part of this repository. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Eg: comparison of unsigned expression < 0 is always false These are all harmless cases where some simple adjustments will supress the warning. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
It used to be you could suppress this with (void), however the gcc developers have decided to get rid of that. So, look closely at each occurrence and decide what to do: - *pingpong: Join the error handling with the if statement directly above - niegh: read on a timer_fd should never fail, so just use assert. The assert is compiled out for Release builds so this is no-change - acm: Failure of ucma_set_server_port is detected by a 0 return so check fscanf and return appropriately. This is no change since fscanf failure was assumed to have left server_port as 0 (though I doubt the standard supports that usage) - rsocket: This looks super sketchy. At least lets make the intent clear with a read_all/write_all wrapper that calls assert. Most likely this code is wrong.. Mangle the code with failable_fscanf to make it clear, but as with acm, I don't think the standard supports this usage. Acked-by: Sean Hefty <sean.hefty@intel.com> (rdmacm) Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
The canonical way to zero fill a struct is {}.
Sometimes people will write this as {0} which does the same thing
if the first struct member is integral.
However the preference for {} is because it allows the compiler to see
that the intent is for every member to be zero, and this is not an
inadvertent incomplete initialization of an array or struct.
We have a random jumble of both styles, so lets prefer {} since
it avoids a useful warning.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Now that the build is warning free, try to keep it that way. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Hakon-Bugge
added a commit
to Hakon-Bugge/rdma-core
that referenced
this pull request
Nov 1, 2018
In acm_addr_lookup(), an address compare is performed. It compares ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the actual address length, as given by addr_type, may contain arbitrary data. For example, in acm_svr_select_src() is only the valid bytes for an IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data(). Here is an example from debugging with gdb, slightly edited for better brevity: (gdb) where #0 acm_addr_lookup () at src/acm.c:419 linux-rdma#1 acm_get_port_ep_address () at src/acm.c:829 linux-rdma#2 acm_get_ep_address () at src/acm.c:848 linux-rdma#3 acm_rm_ep_ip () at src/acm.c:1322 linux-rdma#4 acm_ipnl_handler () at src/acm.c:1452 linux-rdma#5 acm_server () at src/acm.c:1867 linux-rdma#6 main () at src/acm.c:3228 (gdb) x/16u ep->addr_info[i].addr.info.addr 0x1da66a8: 192 168 200 200 0 0 0 0 0x1da66b0: 0 0 0 0 0 0 0 0 (gdb) x/16u addr 0x7ffd165ca9f8: 192 168 200 200 62 127 0 0 0x7ffd165caa00: 95 8 14 129 62 127 0 0 (gdb) p addr_type $5 = 2 '\002' addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4 addresses are equal, but the compare detects different addresses, because the full ACM_MAX_ADDRESS is used. By introducing a helper function comparing names or addresses, the actual length is used for addresses, and the functions acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Hakon-Bugge
added a commit
to Hakon-Bugge/rdma-core
that referenced
this pull request
Nov 1, 2018
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer is performed. But the subsequent memcpy(), which uses the supplied addr_len as argument, copies the whole shebang. This implies that the provider is called with an address with arbitrary data padded. This leads to a false mis-compare in the default provider's binary tree lookup. Here is the stack trace and dump of the address buffer from gdb (edited for better brevity): (gdb) where #0 acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289 linux-rdma#1 tfind () from /lib64/libc.so.6 linux-rdma#2 acmp_get_dest () at prov/acmp/src/acmp.c:336 linux-rdma#3 acmp_acquire_dest () at prov/acmp/src/acmp.c:379 linux-rdma#4 acmp_add_addr () at prov/acmp/src/acmp.c:2385 linux-rdma#5 acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044 linux-rdma#6 acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325 linux-rdma#7 acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326 linux-rdma#8 acm_ipnl_handler () at src/acm.c:1453 linux-rdma#9 acm_server () at src/acm.c:1884 linux-rdma#10 main () at src/acm.c:3245 (gdb) x/20u dest1 0x18c46a8: 192 168 200 200 155 127 0 0 0x18c46b0: 95 184 77 105 155 127 0 0 0x18c46b8: 0 0 64 49 (gdb) x/20u dest2 0x18c5d70: 192 168 200 200 0 0 0 0 0x18c5d78: 0 0 0 0 0 0 0 0 0x18c5d80: 0 0 0 0 The fix is to use the real length of the address in the memcpy() in acm_ep_insert_addr(). Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Hakon-Bugge
added a commit
to Hakon-Bugge/rdma-core
that referenced
this pull request
Nov 22, 2018
In acm_addr_lookup(), an address compare is performed. It compares ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the actual address length, as given by addr_type, may contain arbitrary data. For example, in acm_svr_select_src() is only the valid bytes for an IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data(). Here is an example from debugging with gdb, slightly edited for better brevity: (gdb) where #0 acm_addr_lookup () at src/acm.c:419 linux-rdma#1 acm_get_port_ep_address () at src/acm.c:829 linux-rdma#2 acm_get_ep_address () at src/acm.c:848 linux-rdma#3 acm_rm_ep_ip () at src/acm.c:1322 linux-rdma#4 acm_ipnl_handler () at src/acm.c:1452 linux-rdma#5 acm_server () at src/acm.c:1867 linux-rdma#6 main () at src/acm.c:3228 (gdb) x/16u ep->addr_info[i].addr.info.addr 0x1da66a8: 192 168 200 200 0 0 0 0 0x1da66b0: 0 0 0 0 0 0 0 0 (gdb) x/16u addr 0x7ffd165ca9f8: 192 168 200 200 62 127 0 0 0x7ffd165caa00: 95 8 14 129 62 127 0 0 (gdb) p addr_type $5 = 2 '\002' addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4 addresses are equal, but the compare detects different addresses, because the full ACM_MAX_ADDRESS is used. By introducing a helper function comparing names or addresses, the actual length is used for addresses, and the functions acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Hakon-Bugge
added a commit
to Hakon-Bugge/rdma-core
that referenced
this pull request
Nov 22, 2018
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer is performed. But the subsequent memcpy(), which uses the supplied addr_len as argument, copies the whole shebang. This implies that the provider is called with an address with arbitrary data padded. This leads to a false mis-compare in the default provider's binary tree lookup. Here is the stack trace and dump of the address buffer from gdb (edited for better brevity): (gdb) where #0 acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289 linux-rdma#1 tfind () from /lib64/libc.so.6 linux-rdma#2 acmp_get_dest () at prov/acmp/src/acmp.c:336 linux-rdma#3 acmp_acquire_dest () at prov/acmp/src/acmp.c:379 linux-rdma#4 acmp_add_addr () at prov/acmp/src/acmp.c:2385 linux-rdma#5 acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044 linux-rdma#6 acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325 linux-rdma#7 acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326 linux-rdma#8 acm_ipnl_handler () at src/acm.c:1453 linux-rdma#9 acm_server () at src/acm.c:1884 linux-rdma#10 main () at src/acm.c:3245 (gdb) x/20u dest1 0x18c46a8: 192 168 200 200 155 127 0 0 0x18c46b0: 95 184 77 105 155 127 0 0 0x18c46b8: 0 0 64 49 (gdb) x/20u dest2 0x18c5d70: 192 168 200 200 0 0 0 0 0x18c5d78: 0 0 0 0 0 0 0 0 0x18c5d80: 0 0 0 0 The fix is to use the real length of the address in the memcpy() in acm_ep_insert_addr(). This is derived from the addr_type. Hence, we can re-factor and remove the addr_len from the call stack. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Mark Haywood <mark.haywood@oracle.com>
Hakon-Bugge
added a commit
to Hakon-Bugge/rdma-core
that referenced
this pull request
Nov 23, 2018
In acm_addr_lookup(), an address compare is performed. It compares ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the actual address length, as given by addr_type, may contain arbitrary data. For example, in acm_svr_select_src() is only the valid bytes for an IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data(). Here is an example from debugging with gdb, slightly edited for better brevity: (gdb) where #0 acm_addr_lookup () at src/acm.c:419 linux-rdma#1 acm_get_port_ep_address () at src/acm.c:829 linux-rdma#2 acm_get_ep_address () at src/acm.c:848 linux-rdma#3 acm_rm_ep_ip () at src/acm.c:1322 linux-rdma#4 acm_ipnl_handler () at src/acm.c:1452 linux-rdma#5 acm_server () at src/acm.c:1867 linux-rdma#6 main () at src/acm.c:3228 (gdb) x/16u ep->addr_info[i].addr.info.addr 0x1da66a8: 192 168 200 200 0 0 0 0 0x1da66b0: 0 0 0 0 0 0 0 0 (gdb) x/16u addr 0x7ffd165ca9f8: 192 168 200 200 62 127 0 0 0x7ffd165caa00: 95 8 14 129 62 127 0 0 (gdb) p addr_type $5 = 2 '\002' addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4 addresses are equal, but the compare detects different addresses, because the full ACM_MAX_ADDRESS is used. By introducing a helper function comparing names or addresses, the actual length is used for addresses, and the functions acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- v1 -> v2: Fixed Travis issue
Hakon-Bugge
added a commit
to Hakon-Bugge/rdma-core
that referenced
this pull request
Nov 23, 2018
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer is performed. But the subsequent memcpy(), which uses the supplied addr_len as argument, copies the whole shebang. This implies that the provider is called with an address with arbitrary data padded. This leads to a false mis-compare in the default provider's binary tree lookup. Here is the stack trace and dump of the address buffer from gdb (edited for better brevity): (gdb) where #0 acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289 linux-rdma#1 tfind () from /lib64/libc.so.6 linux-rdma#2 acmp_get_dest () at prov/acmp/src/acmp.c:336 linux-rdma#3 acmp_acquire_dest () at prov/acmp/src/acmp.c:379 linux-rdma#4 acmp_add_addr () at prov/acmp/src/acmp.c:2385 linux-rdma#5 acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044 linux-rdma#6 acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325 linux-rdma#7 acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326 linux-rdma#8 acm_ipnl_handler () at src/acm.c:1453 linux-rdma#9 acm_server () at src/acm.c:1884 linux-rdma#10 main () at src/acm.c:3245 (gdb) x/20u dest1 0x18c46a8: 192 168 200 200 155 127 0 0 0x18c46b0: 95 184 77 105 155 127 0 0 0x18c46b8: 0 0 64 49 (gdb) x/20u dest2 0x18c5d70: 192 168 200 200 0 0 0 0 0x18c5d78: 0 0 0 0 0 0 0 0 0x18c5d80: 0 0 0 0 The fix is to use the real length of the address in the memcpy() in acm_ep_insert_addr(). This is derived from the addr_type. Hence, we can re-factor and remove the addr_len from the call stack. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Mark Haywood <mark.haywood@oracle.com>
rosenbaumalex
pushed a commit
to rosenbaumalex/rdma-core
that referenced
this pull request
Jan 7, 2019
In acm_addr_lookup(), an address compare is performed. It compares ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the actual address length, as given by addr_type, may contain arbitrary data. For example, in acm_svr_select_src() is only the valid bytes for an IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data(). Here is an example from debugging with gdb, slightly edited for better brevity: (gdb) where #0 acm_addr_lookup () at src/acm.c:419 linux-rdma#1 acm_get_port_ep_address () at src/acm.c:829 linux-rdma#2 acm_get_ep_address () at src/acm.c:848 linux-rdma#3 acm_rm_ep_ip () at src/acm.c:1322 linux-rdma#4 acm_ipnl_handler () at src/acm.c:1452 linux-rdma#5 acm_server () at src/acm.c:1867 linux-rdma#6 main () at src/acm.c:3228 (gdb) x/16u ep->addr_info[i].addr.info.addr 0x1da66a8: 192 168 200 200 0 0 0 0 0x1da66b0: 0 0 0 0 0 0 0 0 (gdb) x/16u addr 0x7ffd165ca9f8: 192 168 200 200 62 127 0 0 0x7ffd165caa00: 95 8 14 129 62 127 0 0 (gdb) p addr_type $5 = 2 '\002' addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4 addresses are equal, but the compare detects different addresses, because the full ACM_MAX_ADDRESS is used. By introducing a helper function comparing names or addresses, the actual length is used for addresses, and the functions acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- v1 -> v2: Fixed Travis issue
rosenbaumalex
pushed a commit
to rosenbaumalex/rdma-core
that referenced
this pull request
Jan 7, 2019
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer is performed. But the subsequent memcpy(), which uses the supplied addr_len as argument, copies the whole shebang. This implies that the provider is called with an address with arbitrary data padded. This leads to a false mis-compare in the default provider's binary tree lookup. Here is the stack trace and dump of the address buffer from gdb (edited for better brevity): (gdb) where #0 acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289 linux-rdma#1 tfind () from /lib64/libc.so.6 linux-rdma#2 acmp_get_dest () at prov/acmp/src/acmp.c:336 linux-rdma#3 acmp_acquire_dest () at prov/acmp/src/acmp.c:379 linux-rdma#4 acmp_add_addr () at prov/acmp/src/acmp.c:2385 linux-rdma#5 acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044 linux-rdma#6 acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325 linux-rdma#7 acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326 linux-rdma#8 acm_ipnl_handler () at src/acm.c:1453 linux-rdma#9 acm_server () at src/acm.c:1884 linux-rdma#10 main () at src/acm.c:3245 (gdb) x/20u dest1 0x18c46a8: 192 168 200 200 155 127 0 0 0x18c46b0: 95 184 77 105 155 127 0 0 0x18c46b8: 0 0 64 49 (gdb) x/20u dest2 0x18c5d70: 192 168 200 200 0 0 0 0 0x18c5d78: 0 0 0 0 0 0 0 0 0x18c5d80: 0 0 0 0 The fix is to use the real length of the address in the memcpy() in acm_ep_insert_addr(). This is derived from the addr_type. Hence, we can re-factor and remove the addr_len from the call stack. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Mark Haywood <mark.haywood@oracle.com>
jgunthorpe
pushed a commit
to jgunthorpe/rdma-plumbing
that referenced
this pull request
Feb 19, 2019
Added check for successful strdup of node_name_map_file
aron-silverton
pushed a commit
to oracle/rdma-core
that referenced
this pull request
Mar 27, 2019
In acm_addr_lookup(), an address compare is performed. It compares ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the actual address length, as given by addr_type, may contain arbitrary data. For example, in acm_svr_select_src() is only the valid bytes for an IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data(). Here is an example from debugging with gdb, slightly edited for better brevity: (gdb) where #0 acm_addr_lookup () at src/acm.c:419 linux-rdma#1 acm_get_port_ep_address () at src/acm.c:829 linux-rdma#2 acm_get_ep_address () at src/acm.c:848 linux-rdma#3 acm_rm_ep_ip () at src/acm.c:1322 linux-rdma#4 acm_ipnl_handler () at src/acm.c:1452 linux-rdma#5 acm_server () at src/acm.c:1867 linux-rdma#6 main () at src/acm.c:3228 (gdb) x/16u ep->addr_info[i].addr.info.addr 0x1da66a8: 192 168 200 200 0 0 0 0 0x1da66b0: 0 0 0 0 0 0 0 0 (gdb) x/16u addr 0x7ffd165ca9f8: 192 168 200 200 62 127 0 0 0x7ffd165caa00: 95 8 14 129 62 127 0 0 (gdb) p addr_type $5 = 2 '\002' addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4 addresses are equal, but the compare detects different addresses, because the full ACM_MAX_ADDRESS is used. By introducing a helper function comparing names or addresses, the actual length is used for addresses, and the functions acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- v1 -> v2: Fixed Travis issue Orabug: 29037253 (cherry picked from commit c562033) cherry-pick-repo=linux-rdma/rdma-core.git unmodified-from-upstream: c562033 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton
pushed a commit
to oracle/rdma-core
that referenced
this pull request
Mar 27, 2019
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer is performed. But the subsequent memcpy(), which uses the supplied addr_len as argument, copies the whole shebang. This implies that the provider is called with an address with arbitrary data padded. This leads to a false mis-compare in the default provider's binary tree lookup. Here is the stack trace and dump of the address buffer from gdb (edited for better brevity): (gdb) where #0 acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289 linux-rdma#1 tfind () from /lib64/libc.so.6 linux-rdma#2 acmp_get_dest () at prov/acmp/src/acmp.c:336 linux-rdma#3 acmp_acquire_dest () at prov/acmp/src/acmp.c:379 linux-rdma#4 acmp_add_addr () at prov/acmp/src/acmp.c:2385 linux-rdma#5 acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044 linux-rdma#6 acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325 linux-rdma#7 acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326 linux-rdma#8 acm_ipnl_handler () at src/acm.c:1453 linux-rdma#9 acm_server () at src/acm.c:1884 linux-rdma#10 main () at src/acm.c:3245 (gdb) x/20u dest1 0x18c46a8: 192 168 200 200 155 127 0 0 0x18c46b0: 95 184 77 105 155 127 0 0 0x18c46b8: 0 0 64 49 (gdb) x/20u dest2 0x18c5d70: 192 168 200 200 0 0 0 0 0x18c5d78: 0 0 0 0 0 0 0 0 0x18c5d80: 0 0 0 0 The fix is to use the real length of the address in the memcpy() in acm_ep_insert_addr(). This is derived from the addr_type. Hence, we can re-factor and remove the addr_len from the call stack. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Mark Haywood <mark.haywood@oracle.com> Orabug: 29037270 (cherry picked from commit c73f5d7) cherry-pick-repo=linux-rdma/rdma-core.git unmodified-from-upstream: c73f5d7 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton
pushed a commit
to oracle/rdma-core
that referenced
this pull request
Mar 27, 2019
In acm_addr_lookup(), an address compare is performed. It compares ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the actual address length, as given by addr_type, may contain arbitrary data. For example, in acm_svr_select_src() is only the valid bytes for an IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data(). Here is an example from debugging with gdb, slightly edited for better brevity: (gdb) where #0 acm_addr_lookup () at src/acm.c:419 linux-rdma#1 acm_get_port_ep_address () at src/acm.c:829 linux-rdma#2 acm_get_ep_address () at src/acm.c:848 linux-rdma#3 acm_rm_ep_ip () at src/acm.c:1322 linux-rdma#4 acm_ipnl_handler () at src/acm.c:1452 linux-rdma#5 acm_server () at src/acm.c:1867 linux-rdma#6 main () at src/acm.c:3228 (gdb) x/16u ep->addr_info[i].addr.info.addr 0x1da66a8: 192 168 200 200 0 0 0 0 0x1da66b0: 0 0 0 0 0 0 0 0 (gdb) x/16u addr 0x7ffd165ca9f8: 192 168 200 200 62 127 0 0 0x7ffd165caa00: 95 8 14 129 62 127 0 0 (gdb) p addr_type $5 = 2 '\002' addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4 addresses are equal, but the compare detects different addresses, because the full ACM_MAX_ADDRESS is used. By introducing a helper function comparing names or addresses, the actual length is used for addresses, and the functions acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- v1 -> v2: Fixed Travis issue Orabug: 29037253 (cherry picked from commit c562033) cherry-pick-repo=linux-rdma/rdma-core.git unmodified-from-upstream: c562033 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton
pushed a commit
to oracle/rdma-core
that referenced
this pull request
Mar 27, 2019
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer is performed. But the subsequent memcpy(), which uses the supplied addr_len as argument, copies the whole shebang. This implies that the provider is called with an address with arbitrary data padded. This leads to a false mis-compare in the default provider's binary tree lookup. Here is the stack trace and dump of the address buffer from gdb (edited for better brevity): (gdb) where #0 acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289 linux-rdma#1 tfind () from /lib64/libc.so.6 linux-rdma#2 acmp_get_dest () at prov/acmp/src/acmp.c:336 linux-rdma#3 acmp_acquire_dest () at prov/acmp/src/acmp.c:379 linux-rdma#4 acmp_add_addr () at prov/acmp/src/acmp.c:2385 linux-rdma#5 acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044 linux-rdma#6 acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325 linux-rdma#7 acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326 linux-rdma#8 acm_ipnl_handler () at src/acm.c:1453 linux-rdma#9 acm_server () at src/acm.c:1884 linux-rdma#10 main () at src/acm.c:3245 (gdb) x/20u dest1 0x18c46a8: 192 168 200 200 155 127 0 0 0x18c46b0: 95 184 77 105 155 127 0 0 0x18c46b8: 0 0 64 49 (gdb) x/20u dest2 0x18c5d70: 192 168 200 200 0 0 0 0 0x18c5d78: 0 0 0 0 0 0 0 0 0x18c5d80: 0 0 0 0 The fix is to use the real length of the address in the memcpy() in acm_ep_insert_addr(). This is derived from the addr_type. Hence, we can re-factor and remove the addr_len from the call stack. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Mark Haywood <mark.haywood@oracle.com> Orabug: 29037270 (cherry picked from commit c73f5d7) cherry-pick-repo=linux-rdma/rdma-core.git unmodified-from-upstream: c73f5d7 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton
pushed a commit
to oracle/rdma-core
that referenced
this pull request
Apr 9, 2019
In acm_addr_lookup(), an address compare is performed. It compares ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the actual address length, as given by addr_type, may contain arbitrary data. For example, in acm_svr_select_src() is only the valid bytes for an IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data(). Here is an example from debugging with gdb, slightly edited for better brevity: (gdb) where #0 acm_addr_lookup () at src/acm.c:419 linux-rdma#1 acm_get_port_ep_address () at src/acm.c:829 linux-rdma#2 acm_get_ep_address () at src/acm.c:848 linux-rdma#3 acm_rm_ep_ip () at src/acm.c:1322 linux-rdma#4 acm_ipnl_handler () at src/acm.c:1452 linux-rdma#5 acm_server () at src/acm.c:1867 linux-rdma#6 main () at src/acm.c:3228 (gdb) x/16u ep->addr_info[i].addr.info.addr 0x1da66a8: 192 168 200 200 0 0 0 0 0x1da66b0: 0 0 0 0 0 0 0 0 (gdb) x/16u addr 0x7ffd165ca9f8: 192 168 200 200 62 127 0 0 0x7ffd165caa00: 95 8 14 129 62 127 0 0 (gdb) p addr_type $5 = 2 '\002' addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4 addresses are equal, but the compare detects different addresses, because the full ACM_MAX_ADDRESS is used. By introducing a helper function comparing names or addresses, the actual length is used for addresses, and the functions acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- v1 -> v2: Fixed Travis issue Orabug: 29037253 (cherry picked from commit c562033) cherry-pick-repo=linux-rdma/rdma-core.git unmodified-from-upstream: c562033 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com> Orabug: 29410510 Rebase from RDMA Core 19.2 -> 20.2. (cherry picked from commit bbd44792) cherry-pick-repo=linux-git/RDMA/rdma-core.git unmodified-from-upstream: bbd44792 Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
aron-silverton
pushed a commit
to oracle/rdma-core
that referenced
this pull request
Apr 9, 2019
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer is performed. But the subsequent memcpy(), which uses the supplied addr_len as argument, copies the whole shebang. This implies that the provider is called with an address with arbitrary data padded. This leads to a false mis-compare in the default provider's binary tree lookup. Here is the stack trace and dump of the address buffer from gdb (edited for better brevity): (gdb) where #0 acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289 linux-rdma#1 tfind () from /lib64/libc.so.6 linux-rdma#2 acmp_get_dest () at prov/acmp/src/acmp.c:336 linux-rdma#3 acmp_acquire_dest () at prov/acmp/src/acmp.c:379 linux-rdma#4 acmp_add_addr () at prov/acmp/src/acmp.c:2385 linux-rdma#5 acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044 linux-rdma#6 acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325 linux-rdma#7 acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326 linux-rdma#8 acm_ipnl_handler () at src/acm.c:1453 linux-rdma#9 acm_server () at src/acm.c:1884 linux-rdma#10 main () at src/acm.c:3245 (gdb) x/20u dest1 0x18c46a8: 192 168 200 200 155 127 0 0 0x18c46b0: 95 184 77 105 155 127 0 0 0x18c46b8: 0 0 64 49 (gdb) x/20u dest2 0x18c5d70: 192 168 200 200 0 0 0 0 0x18c5d78: 0 0 0 0 0 0 0 0 0x18c5d80: 0 0 0 0 The fix is to use the real length of the address in the memcpy() in acm_ep_insert_addr(). This is derived from the addr_type. Hence, we can re-factor and remove the addr_len from the call stack. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Mark Haywood <mark.haywood@oracle.com> Orabug: 29037270 (cherry picked from commit c73f5d7) cherry-pick-repo=linux-rdma/rdma-core.git unmodified-from-upstream: c73f5d7 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com> Orabug: 29410510 Rebase from RDMA Core 19.2 -> 20.2. (cherry picked from commit fc2e7b4b) cherry-pick-repo=linux-git/RDMA/rdma-core.git unmodified-from-upstream: fc2e7b4b Signed-off-by: Mark Haywood <mark.haywood@oracle.com>
aron-silverton
pushed a commit
to oracle/rdma-core
that referenced
this pull request
Nov 16, 2020
In acm_addr_lookup(), an address compare is performed. It compares ACM_MAX_ADDRESS worth of bytes. However, the bytes exceeding the actual address length, as given by addr_type, may contain arbitrary data. For example, in acm_svr_select_src() is only the valid bytes for an IPv4 or IPv6 copied. Similar in acm_nl_to_addr_data(). Here is an example from debugging with gdb, slightly edited for better brevity: (gdb) where #0 acm_addr_lookup () at src/acm.c:419 linux-rdma#1 acm_get_port_ep_address () at src/acm.c:829 linux-rdma#2 acm_get_ep_address () at src/acm.c:848 linux-rdma#3 acm_rm_ep_ip () at src/acm.c:1322 linux-rdma#4 acm_ipnl_handler () at src/acm.c:1452 linux-rdma#5 acm_server () at src/acm.c:1867 linux-rdma#6 main () at src/acm.c:3228 (gdb) x/16u ep->addr_info[i].addr.info.addr 0x1da66a8: 192 168 200 200 0 0 0 0 0x1da66b0: 0 0 0 0 0 0 0 0 (gdb) x/16u addr 0x7ffd165ca9f8: 192 168 200 200 62 127 0 0 0x7ffd165caa00: 95 8 14 129 62 127 0 0 (gdb) p addr_type $5 = 2 '\002' addr_type is here 2, which is ACM_ADDRESS_IP. We see that the IPv4 addresses are equal, but the compare detects different addresses, because the full ACM_MAX_ADDRESS is used. By introducing a helper function comparing names or addresses, the actual length is used for addresses, and the functions acm_mark_addr_invalid() and acm_addr_lookup() are greatly simplified. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> --- v1 -> v2: Fixed Travis issue Orabug: 29037253 (cherry picked from commit c562033) cherry-pick-repo=github.com/linux-rdma/rdma-core.git unmodified-from-upstream: c562033 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Acked-by: Aron Silverton <aron.silverton@oracle.com> Orabug: 29410510 Rebase from RDMA Core 19.2 -> 20.2. (cherry picked from commit 8763162) cherry-pick-repo=linux-git.us.oracle.com/RDMA/rdma-core.git unmodified-from-upstream: 8763162 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Acked-by: Aron Silverton <aron.silverton@oracle.com>
aron-silverton
pushed a commit
to oracle/rdma-core
that referenced
this pull request
Nov 16, 2020
In acm_ep_insert_addr() an attempt to zero out the tmp address buffer is performed. But the subsequent memcpy(), which uses the supplied addr_len as argument, copies the whole shebang. This implies that the provider is called with an address with arbitrary data padded. This leads to a false mis-compare in the default provider's binary tree lookup. Here is the stack trace and dump of the address buffer from gdb (edited for better brevity): (gdb) where #0 acmp_compare_dest (dest1=0x18c46a8, dest2=0x18c5d70) at prov/acmp/src/acmp.c:289 linux-rdma#1 tfind () from /lib64/libc.so.6 linux-rdma#2 acmp_get_dest () at prov/acmp/src/acmp.c:336 linux-rdma#3 acmp_acquire_dest () at prov/acmp/src/acmp.c:379 linux-rdma#4 acmp_add_addr () at prov/acmp/src/acmp.c:2385 linux-rdma#5 acm_ep_insert_addr (..., addr_len=addr_len@entry=64, ...) at src/acm.c:2044 linux-rdma#6 acm_ep_insert_addr (..., addr_len=64, ...) at src/acm.c:1325 linux-rdma#7 acm_add_ep_ip (ip_str=0x7ffeeda298e0 "192.168.200.200", ...) at src/acm.c:1326 linux-rdma#8 acm_ipnl_handler () at src/acm.c:1453 linux-rdma#9 acm_server () at src/acm.c:1884 linux-rdma#10 main () at src/acm.c:3245 (gdb) x/20u dest1 0x18c46a8: 192 168 200 200 155 127 0 0 0x18c46b0: 95 184 77 105 155 127 0 0 0x18c46b8: 0 0 64 49 (gdb) x/20u dest2 0x18c5d70: 192 168 200 200 0 0 0 0 0x18c5d78: 0 0 0 0 0 0 0 0 0x18c5d80: 0 0 0 0 The fix is to use the real length of the address in the memcpy() in acm_ep_insert_addr(). This is derived from the addr_type. Hence, we can re-factor and remove the addr_len from the call stack. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Mark Haywood <mark.haywood@oracle.com> Orabug: 29037270 (cherry picked from commit c73f5d7) cherry-pick-repo=github.com/linux-rdma/rdma-core.git unmodified-from-upstream: c73f5d7 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Acked-by: Aron Silverton <aron.silverton@oracle.com> Orabug: 29410510 Rebase from RDMA Core 19.2 -> 20.2. (cherry picked from commit 303f845) cherry-pick-repo=linux-git.us.oracle.com/RDMA/rdma-core.git unmodified-from-upstream: 303f845 Signed-off-by: Mark Haywood <mark.haywood@oracle.com> Acked-by: Aron Silverton <aron.silverton@oracle.com>
shefty
pushed a commit
to shefty/rdma-core
that referenced
this pull request
Nov 10, 2025
Subject: [PATCH] librdmacm: Fix rdma_resolve_addrinfo() deadlock in sync mode Fix the issue that rdma_resolve_addrinfo() gets deadlock when run in sync mode: (gdb) bt #0 futex_wait #1 __GI___lll_lock_wait linux-rdma#2 0x00007ffff7dae791 in lll_mutex_lock_optimized linux-rdma#3 ___pthread_mutex_lock linux-rdma#4 0x00007ffff7f9f018 in ucma_process_addrinfo_resolved linux-rdma#5 0x00007ffff7fa1447 in rdma_get_cm_event linux-rdma#6 0x00007ffff7fa1fef in ucma_complete linux-rdma#7 0x00007ffff7fa2f9c in resolve_ai_sa linux-rdma#8 0x00007ffff7fa36ab in __rdma_resolve_addrinfo linux-rdma#9 rdma_resolve_addrinfo linux-rdma#10 0x00000000004017b6 in start_cm_client_sync linux-rdma#11 0x00000000004018ee in main Issue: 4582946 Fixes: 7b1a686 ("librdmacm: Provide interfaces to resolve IB services") Change-Id: Ia724795a559bab6d965a35b8fd3e0f0096472a44 Signed-off-by: Mark Zhang <markzhang@nvidia.com>
shefty
pushed a commit
to shefty/rdma-core
that referenced
this pull request
Nov 11, 2025
Fix the issue that rdma_resolve_addrinfo() gets deadlock when run in sync mode: (gdb) bt #0 futex_wait #1 __GI___lll_lock_wait linux-rdma#2 0x00007ffff7dae791 in lll_mutex_lock_optimized linux-rdma#3 ___pthread_mutex_lock linux-rdma#4 0x00007ffff7f9f018 in ucma_process_addrinfo_resolved linux-rdma#5 0x00007ffff7fa1447 in rdma_get_cm_event linux-rdma#6 0x00007ffff7fa1fef in ucma_complete linux-rdma#7 0x00007ffff7fa2f9c in resolve_ai_sa linux-rdma#8 0x00007ffff7fa36ab in __rdma_resolve_addrinfo linux-rdma#9 rdma_resolve_addrinfo linux-rdma#10 0x00000000004017b6 in start_cm_client_sync linux-rdma#11 0x00000000004018ee in main Signed-off-by: Mark Zhang <markzhang@nvidia.com>
shefty
pushed a commit
to shefty/rdma-core
that referenced
this pull request
Nov 11, 2025
Fix the issue that rdma_resolve_addrinfo() gets deadlock when run in sync mode: (gdb) bt #0 futex_wait #1 __GI___lll_lock_wait linux-rdma#2 0x00007ffff7dae791 in lll_mutex_lock_optimized linux-rdma#3 ___pthread_mutex_lock linux-rdma#4 0x00007ffff7f9f018 in ucma_process_addrinfo_resolved linux-rdma#5 0x00007ffff7fa1447 in rdma_get_cm_event linux-rdma#6 0x00007ffff7fa1fef in ucma_complete linux-rdma#7 0x00007ffff7fa2f9c in resolve_ai_sa linux-rdma#8 0x00007ffff7fa36ab in __rdma_resolve_addrinfo linux-rdma#9 rdma_resolve_addrinfo linux-rdma#10 0x00000000004017b6 in start_cm_client_sync linux-rdma#11 0x00000000004018ee in main Fixes: 7b1a686 ("librdmacm: Provide interfaces to resolve IB services") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Sean Hefty <shefty@nvidia.com>
rleon
pushed a commit
that referenced
this pull request
Nov 12, 2025
Fix the issue that rdma_resolve_addrinfo() gets deadlock when run in sync mode: (gdb) bt #0 futex_wait #1 __GI___lll_lock_wait #2 0x00007ffff7dae791 in lll_mutex_lock_optimized #3 ___pthread_mutex_lock #4 0x00007ffff7f9f018 in ucma_process_addrinfo_resolved #5 0x00007ffff7fa1447 in rdma_get_cm_event #6 0x00007ffff7fa1fef in ucma_complete #7 0x00007ffff7fa2f9c in resolve_ai_sa #8 0x00007ffff7fa36ab in __rdma_resolve_addrinfo #9 rdma_resolve_addrinfo #10 0x00000000004017b6 in start_cm_client_sync #11 0x00000000004018ee in main Fixes: 7b1a686 ("librdmacm: Provide interfaces to resolve IB services") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Sean Hefty <shefty@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
nmorey
pushed a commit
that referenced
this pull request
Nov 21, 2025
[ Upstream commit 7528827 ] Fix the issue that rdma_resolve_addrinfo() gets deadlock when run in sync mode: (gdb) bt #0 futex_wait #1 __GI___lll_lock_wait #2 0x00007ffff7dae791 in lll_mutex_lock_optimized #3 ___pthread_mutex_lock #4 0x00007ffff7f9f018 in ucma_process_addrinfo_resolved #5 0x00007ffff7fa1447 in rdma_get_cm_event #6 0x00007ffff7fa1fef in ucma_complete #7 0x00007ffff7fa2f9c in resolve_ai_sa #8 0x00007ffff7fa36ab in __rdma_resolve_addrinfo #9 rdma_resolve_addrinfo #10 0x00000000004017b6 in start_cm_client_sync #11 0x00000000004018ee in main Fixes: 7b1a686 ("librdmacm: Provide interfaces to resolve IB services") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Sean Hefty <shefty@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Nicolas Morey <nmorey@suse.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These are largely cosmetic changes to get to a warning free build, many have been on the list before.