Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFA host information userspace version #918

Closed
wants to merge 2 commits into from

Conversation

gal-pressman
Copy link
Contributor

@gal-pressman gal-pressman commented Jan 5, 2021

This small series reports the userspace rdma-core version to the EFA driver which passes it to the device through the host information feature, used for debugging and troubleshooting purposes.

Kernel submission:
https://lore.kernel.org/linux-rdma/20210105104326.67895-1-galpress@amazon.com/

To commit ?? ("RDMA/efa: Report userspace version in host info").

Signed-off-by: Gal Pressman <galpress@amazon.com>
Report the userspace package version to the kernel driver.

Signed-off-by: Gal Pressman <galpress@amazon.com>
@@ -57,6 +57,8 @@ static struct verbs_context *efa_alloc_context(struct ibv_device *vdev,

cmd.comp_mask |= EFA_ALLOC_UCONTEXT_CMD_COMP_TX_BATCH;
cmd.comp_mask |= EFA_ALLOC_UCONTEXT_CMD_COMP_MIN_SQ_WR;
strncpy((char *)cmd.userspace_ver, PACKAGE_VERSION,
Copy link
Member

@rleon rleon Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about it, but probably this is wrong version to report.
There are situations where provides/efa version will be different from rdma-core package_version.
For example, how well will it work for static compiled libraries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is rdma-core version reall valuable anyhow? I tought most of the tricky code in EFA was part of libfabric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean dynamically compiled libraries? How could the rdma-core package version be different than the EFA provider's if it's compiled statically?

What do you have in mind, passing

1 1.1.${PACKAGE_VERSION}

instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean dynamically compiled libraries? How could the rdma-core package version be different than the EFA provider's if it's compiled statically?

What do you have in mind, passing

1 1.1.${PACKAGE_VERSION}

instead?

I mean that in theory you can compile rdma-core with -DENABLE_STATIC=1 and take the libefa.a file to your own hello_world application, but still use libibverbs.so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgunthorpe Well, it's both.
Having the libfabric version would clearly be nice too, but it's not necessarily instead of the rdma-core version (especially since not all apps necessarily run over libfabric).

Libfabric will require a weird direct verb to pass its version, that's a topic for another day 😄 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, this has the potential to go too far in regards to what should be reported through rdma and what shouldn't.

But specifically talking about rdma-core/provider version, it is very tightly coupled with the EFA driver (as opposed to libfabric/any other app), it's two pieces of the same driver so the userspace version is actually needed to get the full picture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that in theory you can compile rdma-core with -DENABLE_STATIC=1 and take the libefa.a file to your own hello_world application, but still use libibverbs.so.

No, you can't, it is blocked. All must be matched.

You are assuming that rdma-core is clean and received from distro. It is not always true.
It can be user-compiled and installed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, our build system mangles all private symbols in the lib.a files to include the package version. It is deliberately made impossible to mix and match, and it is completely impossible for a static provider to link to a dynamic libibverbs at all.

$ nm lib/libmlx4.a
U rdmacore33_0_ibv_cmd_modify_cq

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Leon and Jason, I'm leaving the patch as is.
Let me know if anything else needs fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants