Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upSet cpus-per-tasks in sbatch script #36
Conversation
|
I agree with not renaming the parameter, since for people less familiar with SLURM, "cpus_per_node" is clearer than "cpus_per_task" (it's not clear if the task applies to one node or all nodes). Could you update the documentation in slurm_apply.R before we merge the pull request? I would suggest changing the current description of the cpus_per_node parameter: #' @param cpus_per_node The number of CPUs per node on the cluster; determines how #' many processes are run in parallel per node. to: Also, this part of the details section: |
|
I added the requested documentation. There were also some other roxygen changes not applied so put the Rd updates in another commit. |
|
Can someone please accept this pull request so this fix gets into the release version? I am getting hit by this as well. |
|
BTW, I don't want to sound like a snob, but, the rslurm "nodes" are effectively SLURM's "tasks", which are different than SLURM/cluster "nodes" (node = physical computer, task = work piece run on that physical computer), so, I have to admit that it sounds confusing. I had to run a few rslurm examples to figure out that "node" is a "task", not the SLURM node. For those familiar with SLURM, calling the rslurm "nodes" "tasks" would make much more sense. Then you'd ask for N "tasks" which means we'll run N R workers (N SLURM job array jobs). And then it would make sense to use the cpus_per_task to say how many CPU cores per R worker to run. cpus_per_node refers to the physical CPU core count in that piece of hardware. MC |
|
@mcuma The motivation for creating this package and in particular the slurm_apply function was to automatically split a parallel task on multiple nodes of a SLURM cluster where MPI was not supported, so R had to "manually" split up the task with slurm_apply and regroup the pieces with get_slurm_out. For example, if someone wanted to use 4 nodes with 8 CPUs each, the strategy was to split the initial task into 4 (so it could occupy 4 nodes) and parallelize maximally within each node (hence the cpus_per_node argument). I understand that rslurm has gained a much wider use now, and there might be cases where someone wants to split up a task for other reasons than to submit it to different nodes that don't communicate. However changing the argument names now raises the question of backwards compatibility as the OP said. |
68a3a92
into
SESYNC-ci:master
Fixes #33
I did not rename
cpus_per_node. While there is a discrepancy with the slurm option name, I like the easier to understand option pairnodesandcpus_per_node. In addition, it's backwards compatible.If you like it, I could add a comment to the documentation about
cpus_per_nodeand#SBATCH --cpus-per-task.