Skip to content

Tier 1: pool curl resources, RDMA RAII + decline-fallback, fix RemoveObjectsResult UB#215

Merged
harshavardhana merged 2 commits into
mainfrom
tier1-http-perf-fixes
May 27, 2026
Merged

Tier 1: pool curl resources, RDMA RAII + decline-fallback, fix RemoveObjectsResult UB#215
harshavardhana merged 2 commits into
mainfrom
tier1-http-perf-fixes

Conversation

@harshavardhana
Copy link
Copy Markdown
Member

Summary

Tier 1 of the C++-native modernization audit, plus two latent bug fixes uncovered while running the full test suite against AIStor + RDMA. End-to-end validated on coe09 (client) against a single-node coe08 RDMA MinIO build of latest eos.

Changes

Performance (HTTP transport)

  • http.cc: move curl_global_init to a one-time function-local static (was per-request via curlpp::Cleanup; non-thread-safe, expensive — OpenSSL init etc.)
  • http.cc: attach a process-wide CURLSH to each curlpp::Easy so the connection cache, DNS cache, and TLS session cache survive the Easy handle's lifetime. One mutex per CURL_LOCK_DATA_* slot — a single mutex covering all slots self-deadlocks because libcurl can hold one slot's lock while acquiring another on the same thread.
  • http.cc: enable CURLOPT_TCP_KEEPALIVE so pooled sockets survive idle gaps.

Thread safety

  • baseclient.{h,cc}: guard region_map_ with std::shared_mutex. Read sites now use find() rather than operator[], so they no longer silently insert empty entries on every miss and can take a shared lock; erase + write sites take a unique lock.
  • baseclient.cc: UploadPart propagates args.headers into the wrapped PutObjectApiArgs so caller-supplied per-part headers reach the signer.

GPU-direct friendliness

  • request.cc: BuildHeaders honors a caller-supplied x-amz-content-sha256. When the caller pre-sets it (e.g. UNSIGNED-PAYLOAD for a device-resident body) the signer uses that value and skips Sha256Hash(body) — avoids dragging GPU memory through OpenSSL just to compute a signing hash that TLS already authenticates.
  • client.cc: the GPU-buffer HTTP fallback in PutObject sets UNSIGNED-PAYLOAD, and the multipart loop propagates it into every UploadPart.

RDMA path robustness

  • client.cc: wrap the page-aligned part buffer and cuMemObj registration in AlignedBuffer + ScopedRDMARegistration RAII guards so deregister-before-free ordering is enforced structurally, not by careful early-return ordering.
  • client.cc: when cuMemObjGetDescriptor declines to register (peermem not loaded, IB unhealthy, GPUDirect misconfigured), silently skip the RDMA path and fall through to HTTP multipart instead of failing the whole call.

Bug fixes uncovered during validation

  • client.cc: fix UB in RemoveObjectsResult::Populate — when the caller's func returned false on the first call (empty batch), done_ flipped to true but itr_ stayed uninitialized. operator bool() then compared an uninit iterator → SIGSEGV the first time a caller copied/dereferenced the result. Confirmed pre-existing in main (reproduces with Tier 1 reverted). Pin itr_ to errors.end() on empty batches.
  • tests/tests.cc: SelectObjectContent now detects MethodNotAllowed (AIStor doesn't implement S3 Select) and prints skipped: instead of failing the suite.

Validation

Built with MINIO_CPP_ENABLE_RDMA=ON against vendored cuObj + libcufile, on coe09 (mlx5_0 IB NIC, 400 Gbps). Server: single-node AIStor at 15.15.15.59:9200 from latest miniohq/eos with make install-cgo TAGS=rdma.

MakeBucket()                ✅
RemoveBucket()              ✅
BucketExists()              ✅
ListBuckets()               ✅
StatObject()                ✅
RemoveObject()              ✅
DownloadObject()            ✅
GetObject()                 ✅
ListObjects()               ✅
ListObjects() 1010 objects  ✅
PutObject()                 ✅
CopyObject()                ✅
UploadObject()              ✅
RemoveObjects()             ✅
SelectObjectContent()       ⏭ skipped (server returned MethodNotAllowed — S3 Select not implemented in AIStor)
ListenBucketNotification()  ✅

RDMA round-trip (./GetPutRDMA 15.15.15.59:9200 minioadmin minioadmin 10485760) also passes: PUT + GET of a 10 MiB host-allocated, page-aligned, cuObj-registered buffer succeed against the AIStor RDMA endpoint.

Test plan

  • MINIO_CPP_ENABLE_RDMA=ON build (release)
  • Full tests binary exit 0 against AIStor
  • GetPutRDMA round-trip against AIStor RDMA endpoint
  • Build matrix: MINIO_CPP_ENABLE_RDMA=OFF (non-RDMA consumers)
  • Larger object multipart with the UNSIGNED-PAYLOAD path

…oadPart headers

http.cc:
- Replace the per-request curlpp::Cleanup with a function-local static so
  curl_global_init() runs exactly once per process. The previous code
  re-ran the (non-thread-safe, OpenSSL-touching) global init on every
  HTTP call.
- Attach a process-wide CURLSH to each curlpp::Easy with one
  std::mutex per CURL_LOCK_DATA_* slot, sharing the connection cache,
  DNS cache, and TLS session cache. A single mutex covering all slots
  deadlocked because libcurl can hold one slot's lock while acquiring
  another on the same thread.
- Enable CURLOPT_TCP_KEEPALIVE so pooled sockets survive idle gaps
  between S3 calls.

baseclient:
- Guard region_map_ with a std::shared_mutex. Reads in
  HandleRedirectResponse and GetRegion now go through find() instead
  of operator[], so they no longer mutate the map (the prior code
  silently inserted empty entries on every lookup miss) and can take
  a shared lock; the erase and write sites take a unique lock.
- UploadPart now propagates args.headers into the wrapped
  PutObjectApiArgs so caller-supplied per-part headers reach the
  signer.
…t, fix RemoveObjectsResult UB

request.cc:
- BuildHeaders now honors a caller-supplied x-amz-content-sha256.
  When a Put/Post caller pre-sets it (e.g. "UNSIGNED-PAYLOAD" for a
  GPU-resident body) the signer uses that value and skips
  utils::Sha256Hash(body), avoiding a host-side read of device memory
  just to compute a signing hash that TLS already authenticates.

client.cc:
- Wrap the part-buffer allocation and cuObj registration in two small
  RAII guards (AlignedBuffer, ScopedRDMARegistration) so deregistration
  always runs before the buffer is freed and is enforced structurally,
  not by careful early-return ordering.
- When the cuObj layer is connected but cuMemObjGetDescriptor declines
  to register the buffer (e.g. nvidia_peermem.ko not loaded, IB device
  unhealthy, GPUDirect misconfigured), silently skip the RDMA path and
  fall through to HTTP multipart instead of failing the whole call.
- The GPU-buffer HTTP fallback in PutObject now sets
  x-amz-content-sha256: UNSIGNED-PAYLOAD, and the multipart loop
  propagates that header into every UploadPart, so the signer does
  not have to hash device memory.
- Fix UB in RemoveObjectsResult::Populate: when the caller's func
  returned false on the very first call (empty batch), done_ was
  set true but itr_ stayed uninitialized, so operator bool() compared
  an uninit iterator and the first dereference segfaulted. Pin
  itr_ to errors.end() when the batch is empty.

tests/tests.cc:
- SelectObjectContent now detects MethodNotAllowed (the server does
  not implement S3 Select) and prints "skipped:" instead of failing
  the suite.
@harshavardhana harshavardhana force-pushed the tier1-http-perf-fixes branch from ea3bf62 to c0b2cc8 Compare May 27, 2026 07:47
@harshavardhana harshavardhana merged commit 007c67a into main May 27, 2026
7 checks passed
@harshavardhana harshavardhana deleted the tier1-http-perf-fixes branch May 27, 2026 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant