Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign GPU to MPI rank #261

Open
uvilla opened this issue Mar 22, 2024 · 0 comments
Open

Assign GPU to MPI rank #261

uvilla opened this issue Mar 22, 2024 · 0 comments

Comments

@uvilla
Copy link
Member

uvilla commented Mar 22, 2024

The current way that in TPS we assign a GPU to each MPI rank is (see here ):

device_id = mpi_rank % numGpusPerRank

where numGpusPerRank is set from the .ini file.
The default value of this variable is 1; see here. None of the *.ini input files in our testsuite changes the default value, so I am assuming all local jobs are running on a single GPU.

This makes TPS hard to port across different clusters and local machines. Some schedulers (e.g. those on TACC) make all GPUs on a node available to all tasks on that node, while other schedulers (e.g. flux) restrict which GPUs are visible to each task (e.g. through the variable ROCR_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, NVIDIA_VISIBLE_DEVICES, or ``CUDA_VISIBLE_DEVICES`).

I propose a more flexible way to handle this by introducing the command line argument --gpu-affinity (short-hand -ga).

Three affinity policies will be available:

  • default: Set the deviceID to 0. This is perfect for local resources with a single GPU or when the scheduler restricts which devices are visible to a task (like flux does).

  • direct (default): Set the deviceID equal to the mpi-rank. This is perfect on a single node (local or on the cluster) when the number of mpi-tasks is less or equal to the number of GPUs.

  • env-localid: the device id is set through an environmental variable defined with --localid-varname. Many schedulers set an environmental variable that provides a local numbering of the tasks running on a specific node. In slurm, this variable is called SLURM_LOCALID, in flux FLUX_TASK_LOCAL_ID. See also: https://docs.nersc.gov/jobs/affinity/#gpus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant