-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EFA host information userspace version #918
Conversation
To commit ?? ("RDMA/efa: Report userspace version in host info"). Signed-off-by: Gal Pressman <galpress@amazon.com>
Report the userspace package version to the kernel driver. Signed-off-by: Gal Pressman <galpress@amazon.com>
@@ -57,6 +57,8 @@ static struct verbs_context *efa_alloc_context(struct ibv_device *vdev, | |||
|
|||
cmd.comp_mask |= EFA_ALLOC_UCONTEXT_CMD_COMP_TX_BATCH; | |||
cmd.comp_mask |= EFA_ALLOC_UCONTEXT_CMD_COMP_MIN_SQ_WR; | |||
strncpy((char *)cmd.userspace_ver, PACKAGE_VERSION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about it, but probably this is wrong version to report.
There are situations where provides/efa version will be different from rdma-core package_version.
For example, how well will it work for static compiled libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is rdma-core version reall valuable anyhow? I tought most of the tricky code in EFA was part of libfabric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you mean dynamically compiled libraries? How could the rdma-core package version be different than the EFA provider's if it's compiled statically?
What do you have in mind, passing
1 1.1.${PACKAGE_VERSION} |
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you mean dynamically compiled libraries? How could the rdma-core package version be different than the EFA provider's if it's compiled statically?
What do you have in mind, passing
1 1.1.${PACKAGE_VERSION} instead?
I mean that in theory you can compile rdma-core with -DENABLE_STATIC=1 and take the libefa.a file to your own hello_world application, but still use libibverbs.so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgunthorpe Well, it's both.
Having the libfabric version would clearly be nice too, but it's not necessarily instead of the rdma-core version (especially since not all apps necessarily run over libfabric).
Libfabric will require a weird direct verb to pass its version, that's a topic for another day 😄 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, this has the potential to go too far in regards to what should be reported through rdma and what shouldn't.
But specifically talking about rdma-core/provider version, it is very tightly coupled with the EFA driver (as opposed to libfabric/any other app), it's two pieces of the same driver so the userspace version is actually needed to get the full picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that in theory you can compile rdma-core with -DENABLE_STATIC=1 and take the libefa.a file to your own hello_world application, but still use libibverbs.so.
No, you can't, it is blocked. All must be matched.
You are assuming that rdma-core is clean and received from distro. It is not always true.
It can be user-compiled and installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, our build system mangles all private symbols in the lib.a files to include the package version. It is deliberately made impossible to mix and match, and it is completely impossible for a static provider to link to a dynamic libibverbs at all.
$ nm lib/libmlx4.a
U rdmacore33_0_ibv_cmd_modify_cq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Leon and Jason, I'm leaving the patch as is.
Let me know if anything else needs fix.
This small series reports the userspace rdma-core version to the EFA driver which passes it to the device through the host information feature, used for debugging and troubleshooting purposes.
Kernel submission:
https://lore.kernel.org/linux-rdma/20210105104326.67895-1-galpress@amazon.com/