Support Nvidia GPU-MPS #5

lsawade · 2022-03-05T23:26:43Z

Hi,

I would like to think about how to implement GPU-MPS. Right now job.py Hardcodes the -a flag to -a 1. Due to the [nprocs, cpus-per-task, gpus-per-task] setup, it seems like it would require quite a lot of recoding on your end. To see how I usually implement GPU-MPS for specfem I attached an example for specfem.

My main worry is that on the one hand, creating a new LSF job to support is quite easy, but starts making the package cluttered. On the other hand, incorporating the GPU-MPS capability into the current LSF(Job) class may overcomplicate the class.

What do you think?

Maybe a add_special_mpi() in node.py and special_mpiexec() in job.py?

Specfem example

I compile Specfem for 6 chunks and NEX_*=2, so a total of 24 MPI tasks. Now I want to run Specfem on a single node using 6 GPUs. GPUMPS has to be enable at job-request level using the line

#BSUB -alloc_flags "gpumps"

Then, to run specfem you have to assign 4 tasks to a single GPU. The way I'm doing it is to ask for 6 resource sets, each with 4 tasks and 4 cpus but only 1 gpus:

jsrun -n 6 -a 4 -c 4 -g 1 ./bin/xspecfem3D

The text was updated successfully, but these errors were encountered:

lsawade · 2022-03-06T01:02:13Z

As a fix @icui implemented that the gpus_per_task can be a floating point number. So setting.

Compare 24 GPU implementation with 6 GPUs and 4 MPS slices:

24 GPUs

node.add_mpi('bin/xspecfem3D', 24, (1, 1), cwd=specfemdir)

results in the following jsrun command:

jsrun -n 24 -a 1 -c 1 -g 1 ./bin/xspecfem3D

6 GPUs

node.add_mpi('bin/xspecfem3D', 24, (1, 0.25), cwd=specfemdir)

results in the following jsrun command:

jsrun -n 6 -a 4 -c 4 -g 1 ./bin/xspecfem3D

source in nnodes/job.py:

class LSF(Job):
...
   def mpiexec(...):
       ...
       a = 1

       if isinstance(gpus_per_proc, float):
           a = round(1 / gpus_per_proc)
           cpus_per_proc *= a
           gpus_per_proc = 1
           nprocs //= a
       
       return f'{jsrun} -n {nprocs} -a {a} -c {cpus_per_proc} -g {gpus_per_proc} {cmd}'

lsawade closed this as completed Mar 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Nvidia GPU-MPS #5

Support Nvidia GPU-MPS #5

lsawade commented Mar 5, 2022 •

edited

Loading

lsawade commented Mar 6, 2022

Support Nvidia GPU-MPS #5

Support Nvidia GPU-MPS #5

Comments

lsawade commented Mar 5, 2022 • edited Loading

Specfem example

lsawade commented Mar 6, 2022

lsawade commented Mar 5, 2022 •

edited

Loading