You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to think about how to implement GPU-MPS. Right now job.py Hardcodes the -a flag to -a 1. Due to the [nprocs, cpus-per-task, gpus-per-task] setup, it seems like it would require quite a lot of recoding on your end. To see how I usually implement GPU-MPS for specfem I attached an example for specfem.
My main worry is that on the one hand, creating a new LSF job to support is quite easy, but starts making the package cluttered. On the other hand, incorporating the GPU-MPS capability into the current LSF(Job) class may overcomplicate the class.
What do you think?
Maybe a add_special_mpi() in node.py and special_mpiexec() in job.py?
Specfem example
I compile Specfem for 6 chunks and NEX_*=2, so a total of 24 MPI tasks. Now I want to run Specfem on a single node using 6 GPUs. GPUMPS has to be enable at job-request level using the line
#BSUB -alloc_flags "gpumps"
Then, to run specfem you have to assign 4 tasks to a single GPU. The way I'm doing it is to ask for 6 resource sets, each with 4 tasks and 4 cpus but only 1 gpus:
jsrun -n 6 -a 4 -c 4 -g 1 ./bin/xspecfem3D
The text was updated successfully, but these errors were encountered:
Hi,
I would like to think about how to implement GPU-MPS. Right now
job.py
Hardcodes the-a
flag to-a 1
. Due to the[nprocs, cpus-per-task, gpus-per-task]
setup, it seems like it would require quite a lot of recoding on your end. To see how I usually implement GPU-MPS for specfem I attached an example for specfem.My main worry is that on the one hand, creating a new LSF job to support is quite easy, but starts making the package cluttered. On the other hand, incorporating the GPU-MPS capability into the current
LSF(Job)
class may overcomplicate the class.What do you think?
Maybe a
add_special_mpi()
innode.py
andspecial_mpiexec()
injob.py
?Specfem example
I compile Specfem for 6 chunks and
NEX_*=2
, so a total of 24 MPI tasks. Now I want to run Specfem on a single node using 6 GPUs. GPUMPS has to be enable at job-request level using the line#BSUB -alloc_flags "gpumps"
Then, to run specfem you have to assign
4 tasks
to a single GPU. The way I'm doing it is to ask for6 resource sets
, each with4 tasks
and4 cpus
but only1 gpus
:The text was updated successfully, but these errors were encountered: