-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e4s: add upcxx +cuda #32157
e4s: add upcxx +cuda #32157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no easy way to test and no detailed knowledge of E4S internals, but looks right to me.
c5e120d
to
9d5375a
Compare
Looks like we can't build this in our standard container environment due to lack of NVIDIA driver:
|
The package fails to build. See https://gitlab.spack.io/spack/spack/-/jobs/3029172. |
This sounds like the expected behavior - the CUDA support in UPC++ (and the underlying GASNet-EX communication layer) exist to allow GPUDirect RDMA offload communication. As such, the libraries have a requirement on the CUDA Driver Library when using the |
Of course! We have so many packages in E4S I sometimes forget which ones require the driver. We can in the future adapt our CI build environment so that upcxx and others can be targetted to build in containers scheduled to machines where the gpu driver is available. Thank you! |
Looking at UPCXX's
Line 1073 above can exit OK just using the stub driver. But then there is the additional check on exit code for |
We have a number of things going on at our end right now, including getting new releases of UPC++ and GASNet-EX out the door, and corresponding updates to their spack packages to follow that. Is there an impending cutoff date for a Spack and/or E4S release before which we need to address the issue of an inability to test the PR in an Nvidia-driverless environment? |
Independent of any other points in the existing discussion thread, I want to respond to somethings @eugeneswalker said:
AND
There are reasons other than just the CUDA driver library to be concerned about the prospect of configuring and building UPC++ and its underlying GASNet-EX communications library in one environment for later deployment in a different environment. So this "future adaptation" is something we should potentially discuss elsewhere (when @bonachea and I are not quite as pressed for time as we are this week). |
In response to what I think is the main sticking point for this pull request: I believe the current configuration logic accurately reflects a probe for a supported environment. We link the driver API as a normal library and do not currently have the all of necessary logic to allow reliable substitution of the stub libs. Therefore, removing the return code check would likely allow one to proceed past configuration to instead fail at application runtime. I am not sure how to best address the fact that "your standard container environment" wants to build the |
No, not at all. It would be convenient if the driver weren’t required in order to ensure |
9d5375a
to
828f3ae
Compare
@spackbot run pipeline |
I've started that pipeline for you! |
Pull request was closed
Add
upcxx +cuda
to E4S stack@wspear @bonachea