fi_getname() doesn't work with infiniband verbs #6554
-
Hi all, I am trying to implement a basic one-sided RMA communication mechanism with libfabric and I'm stuck. I would appreciate a pointer about what I'm missing. Expected result: fi_getname() returns the name of the current endpoint (to be used by another endpoint which will perform remote writes) My code works with the "shm" provider but it doesn't with "verbs;ofi_rxm". This is a minimal reproducer for my problem (the compilable version of the code is attached): int main (int argc, char *argv[])
{
fi_context ctx;
struct fi_info *hints = fi_allocinfo();
assert(hints);
hints->ep_attr->type = FI_EP_RDM;
hints->addr_format = FI_FORMAT_UNSPEC;
hints->caps = FI_MSG | FI_RMA;
hints->mode = FI_CONTEXT;
hints->domain_attr->mr_mode = FI_MR_LOCAL | FI_MR_ALLOCATED | FI_MR_PROV_KEY | FI_MR_VIRT_ADDR;
hints->fabric_attr->prov_name = strdup("verbs");
struct fi_info *info;
CHECK(fi_getinfo(FI_VERSION(1, 11), argv[1], argv[2], 0, hints, &info));
assert(info);
std::cerr << "name=" << info->fabric_attr->name << " prov_name=" << info->fabric_attr->prov_name << std::endl;
struct fid_fabric *fabric;
CHECK(fi_fabric(info->fabric_attr, &fabric, &ctx));
struct fid_domain *domain;
CHECK(fi_domain(fabric, info, &domain, &ctx));
struct fid_ep *ep;
CHECK(fi_endpoint(domain, info, &ep, &ctx));
// Init CQ
fi_cq_attr cqAttr = {0};
cqAttr.format = FI_CQ_FORMAT_CONTEXT;
cqAttr.wait_obj = FI_WAIT_NONE;
cqAttr.size = info->tx_attr->size;
struct fid_cq *cq_tx, *cq_rx;
CHECK(fi_cq_open(domain, &cqAttr, &cq_tx, &ctx));
cqAttr.size = info->rx_attr->size;
CHECK(fi_cq_open(domain, &cqAttr, &cq_rx, &ctx));
int flags = FI_TRANSMIT;
flags |= FI_SELECTIVE_COMPLETION;
CHECK(fi_ep_bind(ep, &cq_tx->fid, flags));
flags = FI_RECV;
flags |= FI_SELECTIVE_COMPLETION;
CHECK(fi_ep_bind(ep, &cq_rx->fid, flags));
// Init Address vector
fi_av_attr avAttr;
memset(&avAttr, 0, sizeof(avAttr));
avAttr.count = 1;
struct fid_av *av;
CHECK(fi_av_open(domain, &avAttr, &av, NULL));
CHECK(fi_ep_bind(ep, &av->fid, 0));
CHECK(fi_enable(ep));
// GET NAME
size_t len = 256;
char addr[len];
memset(addr, '\0', len);
CHECK(fi_getname(&ep->fid, addr, &len));
std::cerr << "addr=" << addr << " len=" << len << std::endl;
std::cerr << std::hex << "0x" << *((unsigned long long *)(addr)) << std::endl; This is the output with FI_LOG_LEVEL=warn and node=the_ip_address_corresponding_to_ib:
The "addr=�" part is what doesn't make sense to me. Relevant details of the environment:
In this environment the Let me know if you need additional information. Thank you in advance, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
fi_getname() returns a binary address, not a string. In the example above, the format of that address can be determined by looking at info->addr_format. I think verbs uses a sockaddr based format. |
Beta Was this translation helpful? Give feedback.
fi_getname() returns a binary address, not a string. In the example above, the format of that address can be determined by looking at info->addr_format. I think verbs uses a sockaddr based format.