Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to compile the CUDA-enabled version without GPU-enabled hfi1 driver? #57

Open
RemiLacroix-IDRIS opened this issue Aug 24, 2020 · 8 comments

Comments

@RemiLacroix-IDRIS
Copy link
Contributor

Hello,

We are currently managing all installations for our cluster on a node which does not have GPU and consequently does not have a GPU-enabled hfi1 driver.

Due to the following code snippet, this prevents us from building the CUDA-enabled version of PSM2:
https://github.com/intel/opa-psm2/blob/7a33bedc4bb3dff4e57c00293a2d70890db4d983/psm_hal_gen1/psm_hal_inline_i.h#L507-L516

Is there any way to work that around? There is a runtime check to ensure the hfi1 driver is actually GPU-enabled, wouldn't that be enough?

Best regards,
Rémi

@mwheinz
Copy link

mwheinz commented Aug 25, 2020

If you look in the IFS package, the CUDA binaries should be there. You should be able to find the CUDA versions of the RPMs and, using commands like opascpall and opacmd, install them on the appropriate ndoes.

@RemiLacroix-IDRIS
Copy link
Contributor Author

We are in a context where we would like to build PSM2 instead of installing it from the RPMs.

@ToddRimmer
Copy link

ToddRimmer commented Aug 25, 2020 via email

@RemiLacroix-IDRIS
Copy link
Contributor Author

That's unfortunate but thanks for the answer.

So to build the cuda enabled PSM, you need to have the cuda enabled hfi1 driver installed so it’s header files are available.

Wouldn't it be possible to distribute the required headers with PSM and test at runtime that the actual driver has the proper capabilities?

@BrendanCunningham
Copy link
Contributor

That's unfortunate but thanks for the answer.

So to build the cuda enabled PSM, you need to have the cuda enabled hfi1 driver installed so it’s header files are available.

Wouldn't it be possible to distribute the required headers with PSM and test at runtime that the actual driver has the proper capabilities?

As we (PSM2) do not maintain hfi1 and we wish for PSM2 to build against the hfi1 headers installed on the system, we are not going to include the hfi1 headers with PSM2.

Runtime check

PSM2 does check at runtime whether the loaded hfi1 has matching GPUDirect capabilities:
https://github.com/intel/opa-psm2/blob/7a33bedc4bb3dff4e57c00293a2d70890db4d983/psm_context.c#L537-L550

That is, the following combinations do not work or are not advisable:

  • PSM2, no CUDA w/ hfi1-gpudirect => fatal
  • PSM2-CUDA w/hfi1, no GPUDirect support => warning

Building on host that does not have hfi1-gpudirect headers

You can get the hfi1 headers needed to build PSM2 with CUDA support (uapi/rdma/hfi/hfi1_{user,ioctl}.h) from the ifs-kernel-updates-devel .rpm found in an IFS tarball (from Intel RDC).

IFS tarballs for most distros should have both CUDA and non-CUDA ifs-kernel-updates-devel .rpms. Right now hfi1 headers found in both CUDA and non-CUDA ifs-kernel-updates-devel .rpms both have the required CUDA/GPUDirect definitions.

You can install the ifs-kernel-updates-devel .rpm on your build node (headers will go under /usr/include/uapi/rdma/hfi). Alternatively, you can extract the .rpm with rpmdev-extract, place the headers where you want, then edit IFS_HFI_HEADER_PATH in psm/buildflags.mak to point to the appropriate uapi/ grandparent of hfi/hfi1_{user,ioctl}.h. I have tried this and it works.

Let me know if this helps or if you have any more questions. Thanks.

Brendan

@RemiLacroix-IDRIS
Copy link
Contributor Author

Just to be sure I understand correctly, this RPM is not installed by default?

@BrendanCunningham
Copy link
Contributor

BrendanCunningham commented Sep 2, 2020

Just to be sure I understand correctly, this RPM is not installed by default?

No, the IFS 'INSTALL' script should install ifs-kernel-updates-devel.

I am saying that if you did not install IFS on your build node that you can extract the hfi1 headers required to build PSM2 from the ifs-kernel-updates-devel .rpm found in the IFS tarball.

@RemiLacroix-IDRIS
Copy link
Contributor Author

Ok, then I need to double-check what is happening here because I couldn't find any /usr/include/uapi directory on our nodes, although I am confident that we have IFS installed on those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants