Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU calls in CPU mode #533

Open
haampie opened this issue Aug 20, 2020 · 4 comments
Open

GPU calls in CPU mode #533

haampie opened this issue Aug 20, 2020 · 4 comments

Comments

@haampie
Copy link
Collaborator

haampie commented Aug 20, 2020

At various places we have if (acc::num_devices() > 0) { ... } which still gets executed in CPU mode when you have the hardware. I just noticed this because I didn't have the fix for the excessive amounts of streams yet, and acc::create_streams made tests fail on Daint even though --control.processing_unit=cpu

@toxa81
Copy link
Collaborator

toxa81 commented Aug 20, 2020

A valid point. But the case GPU is here, but run on CPU is mostly for debug purpose. It should not be used in production.
The more likely case code compiled with GPU support, but no GPU device found should be handled properly.

@haampie
Copy link
Collaborator Author

haampie commented Aug 20, 2020

Yeah, I see, my real issue in the end appears to be not having set CRAY_CUDA_MPS=1. Running multi process MPI tests in CPU mode on a single node with a GPU doesn't work otherwise

@simonpintarelli
Copy link
Collaborator

the if(acc::num_devices) is used to guard calls to GPU functions if there is no device. A system without a device, but with GPU enabled code can be simulated using export CUDA_VISIBLE_DEVICES (should work). But it would make sense if we disable at least the creation of streams if the processing unit is CPU.

@toxa81
Copy link
Collaborator

toxa81 commented Aug 20, 2020

Agree. But this happens very early in the sirius::initialize(). This function should get information about CPU device as soon as possible. We can pass the information found in the command line or use a hacky" solution with environment variables. Say, `export SIRIUS_PU_DEVICE=CPU' will be the only way to control a device to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants