Skip to content

Commit

Permalink
rdma: (fix) do not use cq_entry.len in send completions
Browse files Browse the repository at this point in the history
According to libfabric specs, the len field in a completion entry
only applies to completed receive operations. We were using it for send
completions as well, which is currently working in the EFA provider,
but there is no guarantee that this will be true in the future and for
other providers.

Signed-off-by: Amedeo Sapio <asapio@amazon.com>
  • Loading branch information
AmedeoSapio authored and rajachan committed Apr 18, 2024
1 parent 4c9a063 commit 2c191c9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/nccl_ofi_rdma.c
Original file line number Diff line number Diff line change
Expand Up @@ -1433,7 +1433,7 @@ static inline int process_completions(struct fi_cq_data_entry *cq_entry, uint64_

if (req->type == NCCL_OFI_RDMA_SEND_CONN || req->type == NCCL_OFI_RDMA_SEND_CONN_RESP) {
/* CONN or CONN_RESP send completion */
ret = inc_req_completion(req, cq_entry[comp_idx].len, 1);
ret = inc_req_completion(req, sizeof(nccl_ofi_rdma_connection_info_t), 1);

} else if (req->type == NCCL_OFI_RDMA_SEND_CTRL) {
/* CTRL message send completion */
Expand Down

0 comments on commit 2c191c9

Please sign in to comment.