Skip to content

v1.12.0

Compare
Choose a tag to compare
@brminich brminich released this 12 Jan 15:00
d367332

1.12.0 (January 12, 2022)

Features:

Core

  • Added beta-level support for Go language bindings
  • Added new objects to VFS (md, component, log_level, etc.)
  • Added configuration variable to specify which loadable modules are allowed
  • Added build-time configuration to disable sigaction overriding

UCP

  • Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs
  • Added ucp_worker_address_query() API
  • Updated ucp_ep_query() API for getting local and remote addresses
  • Added address versioning to correctly preserve wire compatibility starting from version 1.11.0
  • Added new client/server connection establishment packet header format
  • Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint
  • Added iov zcopy support to RMA operations
  • Reduced memory usage of unexpected messages by fitting receive buffer size to packet size
  • Added support for modifying UCT and UCS configs by ucp_config_modify() API
  • Optimized unpacked rkeys memory consumption
  • Added request flag to influence latency vs. bandwidth protocol
  • Reduced memory management overhead with new protocols
  • Improved performance calculations for new protocols
  • Added AMO support with GPU memory target using new protocols
  • Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols
  • Added support for user-defined alignment in Active Messages
  • Added support for offload tag sync in new protocols
  • Updated ucp_atomic_post() to use NBX flow

UCT

  • Added API - uct_iface_is_reachable_v2()
  • Added IPv6 address support in TCP
  • Added latency estimation to uct_iface_estimate_perf()
  • Adjusted knem and cma overhead cost
  • Increased built-in TCP keep-alive interval to 2 seconds

RDMA CORE (IB, ROCE, etc.)

  • Added detection of IB NDR devices
  • Added check for CQ overrun in assert mode
  • Added bitmap usage for releasing detached DCIs
  • Added configuration for requests ack frequency with DevX
  • Added remote QP info to tx error CQE traces

UCS

  • Added API for a per-process aggregate-sum statistics report
  • Added memory pool set data structure
  • Added new ptr_array API for bulk allocation
  • Added ucs_string_buffer_append_flags() for string buffer
  • Added ucs_ffs32()
  • Added ucs_vsnprintf_safe() which always adds '\0'
  • Added thread-safe put to ptr_map
  • Improved accuracy of the topology distance estimation
  • Added prints of leaked callbacks from the callback queue
  • Removed a diagnostic message when fuse thread is stopped
  • Added configurable limit for the memory consumed by rcache
  • Added configuration for VFS(FUSE) thread affinity
  • Added memory limit support to memtrack

CUDA

  • Added global memtype cache to allow UCT transports to query memory attributes
  • Auto-register CUDA whole allocations to avoid repeated registration costs
  • Added capability to select CUDA stream based on source and destination memory type
    (required for device memory based pipelining)
  • Added selection of CUDA-IPC capabilities based on NVLINK topology
    (to prefer writes vs. reads for specific platforms using NVML)
  • Added option to set cuda_copy bandwidth
  • Added profiling of CUDA runtime function calls
  • Added option to limit GPUDirectRDMA size in rendezvous protocol

Java

  • Added ucp_listener_reject functionality
  • Added support for setting worker id and querying it from the connection request
  • Added support to bind on a free port in UcpListener

Packaging

  • Added cmake config files for better integration with external cmake based projects

Tests

  • Removed memcpy from AM eager flow in io_demo
  • Added check_qps.sh script to detected stuck QPs
  • Improved diagnostic in test_init_mt
  • Added iov support in ucp_client_server
  • Added option to use epoll in io_demo
  • Added registration of memory allocated by io_demo in memtrack
  • Extended statistics in io_demo
  • Improved logging in io_demo
  • Replaced rand by urand in io_demo
  • More improvements in io_demo
  • Generalized median calculation to support any percentile in ucx_perftest

Tools

  • Added loop-back transport support in ucx_perftest
  • Split ucx_perftest into separate modules
  • Added process placement option for ucx_info
  • Extended parameters correctness check in ucx_perftest
  • Added support for GPU memory RMA and atomics in ucx_perftest

CI

  • Updated gtest 1.7 to 1.10
  • Increased uptime in network corrupter (used for io_demo)
  • Enabled set of gtests for new protocols
  • Added running CI in docker containers
  • Increased thresholds for test_ucp_wait_mem
  • Added test for ucx binary compatibility between OS versions
  • Increased test job timeout to 6 hours
  • Reduced testing time under valgrind
  • Added suppressions for glibc and libnl leaks
  • Relaxed performance requirements in perf test

Bugfixes

Core

  • Fixed invalid remote memory access after connection error
  • Fixed creating more than 64K endpoints between the same peers
  • Fixed simultaneous endpoint close with ucp_hello_world

UCP

  • Fixes and improvements in new protocols infrastructure
  • Fixes in AM flows
  • Fixed tag short threshold selection
  • Multiple fixes in keep-alive protocol
  • Multiple fixes in wire-up protocol
  • Fixes in error flow during rendezvous protocol
  • Multiple fixes in general error flow
  • Fixed fallback to PUT pipeline in rendezvous protocol
  • Reduced default value of keep-alive interval to 20 seconds
  • Fixes in tag_send datatype processing

UCT

  • Fixed keep-alive protocol for intra-node transports (sm, cuda)
  • Fixed deadlock in TCP
  • Suppressed EHOSTUNREACH error in TCP sockcm
  • Restricted connecting loop-back to other devices in TCP

RDMA CORE (IB, ROCE, etc.)

  • Fixed pkey_index initialization when creating RC QP with DEVX
  • Disabled MP_SRQ by default
  • Fixed TX WQ overflow check
  • Fixed dci->pool_index initialization when HAVE_DC_DV is false
  • Fixed syndrome value for creating rdmacm reserved qpn
  • Fixed error code on rdma_establish failure
  • Fixed uct_ep_am_short_iov for UD verbs
  • Fixed handling of error CQE after rc_ep is destroyed
  • Fixes in flow control when error CQE is polled
  • Multiple fixes in RC and DC error flows
  • Fixed deadlock between DCIs and RDMA_READ credits
  • Removed AM handler invocation for PURE_GRANT messages
  • Fixed endpoint arbiter_group leak in DC
  • Fixed resource check in flush for DC

UCS

  • Fixed segmentation fault for ucs_stats_parser
  • Fixed potential crash on cleanup when use UCX profiling
  • Fixed read_profile print of new request
  • Fixed uninitialized variable access in VFS
  • Changed log level of inotify_init failure to diag
  • Fixed integer overflow in mpool chunk allocation

Packaging

  • Fixed with-fuse arg for RPM build

Documentation

  • Fixes in UCP, UCT, UCS, FAQ and README documentation

Tests

  • Multiple fixes in io_demo

CI

  • Fixed snapshot docker name
  • Fixed hipMallocManaged hook gtest
  • Fixes in Azure release pipeline
  • Fixes in Coverity CI
  • Fixed test_uct_query gtest for ROCm
  • Fixes in jenkins test script
  • Fixed release commit title check