running on SLURM with MP and MPI #53

chillydog · 2019-10-29T22:06:45Z

Hi All,

I have a large data set that kept giving me out of memory errors. So, I've reduced it to 50000 rows by 100 columns (plus response column) for testing on a cluster managed by SLURM to explore if I can break things up among nodes to access more memory (and cores). Incidentally, all variables are 2-level factors with substantial imbalance in the response (output) factor, so I'm using imbalanced.rfsrc.

My R code and my SLURM batch files are below. Basically, I ask for 56 processes over 2 nodes (28 cores each node) as a way to explore if I'm getting the benefit of the memory of 2 nodes (plus the cores); I've asked for a minimum of 128GB each node. I'm wondering if my R code is structured correctly to exploit both the shared memory (MP) and distributed memory (MPI). I get the following output (56 times), which does not look good (indeed my batch job seems to be hanging as I type).

OUTPUT (so far...still running...seems to be hanging):

56 times I get:
randomForestSRC 2.9.1

Type rfsrc.news() to see new features, changes, and bug fixes.

followed by 56 instances of :

A process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

Local host: [[42790,0],2] (PID 302952)

If you are absolutely sure that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.

followed by 56 instances of (from detectCores())
[1] 28

But, so far, I cannot tell if imbalanced.rfsrc has been called yet. As the warnings suggest, my job may be hanging. I expected that, with the presence of Rmpi, etc, imbalanced.rfsrc would take care of the MPI + MP details behind the scenes. What wrong? Your help is much appreciated.

Best, -- Jay

MY R CODE (testing.R):

lp<- .libPaths()
.libPaths(c("/home/jjb485/Rlib", lp))
library(parallel)
library(Rmpi, lib.loc="/home/jjb485/Rlib")
library(randomForestSRC, lib.loc="/home/jjb485/Rlib")
detectCores()
options(rf.cores=detectCores(), mc.cores=detectCores())
load("/home/jjb485/neonatal/testing/testing.df.RData")
testing.rfsrc<- imbalanced(NAS ~ .,
testing.df,
method="brf",
importance="permute")
save("testing.rfsrc", file="testing.RData")

MY SLURM BATCH SCRIPT:

#!/bin/bash
#SBATCH --job-name=testing
#SBATCH --output=/scratch/jjb485/neonatal/testing/testing.txt
#SBATCH --time=1-00:00:00
#SBATCH --chdir=/scratch/jjb485/neonatal/testing
#SBATCH --ntasks=56
#SBATCH --nodes=2-2
#SBATCH --mem=128G

module load openmpi
module load R/latest
srun Rscript /home/jjb485/neonatal/testing/testing.R

kogalur · 2019-11-04T16:29:47Z

By MP, I assume you mean OpenMP. On a cluster you have OpenMP and MPI occurring simultaneously. Within a node, you have OpenMP executing, and across nodes, you have MPI. We successfully ran a SLURM batch script on a cluster back in the day as a test of scalability. The use of a cluster is indicated if you want to grow more trees simultaneously. Hybrid computing is not indicated as a work-around for memory issues.

In your case, it's theoretically possible to grow 56 trees in parallel across two nodes. But each tree still needs to access all the data. Also, a complicating issue is that one must give the cluster instructions to grow two sub-forests, one on each node. The problem is then of combining the ensembles from each forest into a single forest. It's a non-trivial enterprise.

You are using the wrapper imbalanced.rfsrc() in a hybrid environment. Anything other than standard rfsrc() or predict.rfsrc() calls are not recommended on a cluster as many of the other functions contain multiple calls to these two core functions. Combining sub-forest outputs from either of the two core functions into a single forest would be necessary before any other calculations could proceed.

At the end of the day, using a cluster requires writing some code and using mpi.send.Robj(), mpi.recv.Robj(), mpi.spawn.Rslaves(), and using the supervisor process to parse the output sent back from the workers into a single forest output.

I don't think any of this is what you need to do at all. What are the original dimensions of your data? What is ntree? What, specifically, are the parameters in your function call?

chillydog · 2019-11-04T19:06:01Z

I think I understand what you are saying about OpenMP/MPI (I’ve read the Supplementary Code at https://kogalur.github.io/randomForestSRC/theory.html#section9). But, I don’t quite understand why more nodes (memory) will not help me. In any case, the size of my data sets are on the order of a few multiples of 100,000 cases and a few multiples of 1000 covariates (e.g., 250,000 cases by 3000 covariates). Everything is binary (2 level factors), and the response indicates a rare disease (about 0.5% rate), so I’m interested in imbalanced methods. I thought Balanced Random Forests (Chen and Brieman 2004), as implemented in imbalanced.rfsrc() with the ‘’brf” option would help with memory because it selects such a releatively small bootstrap sample from the majority class. My call to imbalanced.rfsrc() is testing.rfsrc<- imbalanced(NAS ~ ., testing.df, method="brf", importance="permute”) so ntrees = 3000 (default) and, for this case, I reduced the size of my data set to 50,000 x 100, just until I can get things to run on the cluster (MPI). But, as I said, my real data sets are on the order of 250000 x 3000 (more or less). To be sure, I can get this smaller size data set (testing.df 50000 x 100) to run, no problem, but when attempting to use my real, larger data sets, I get "out of memory” errors both on my Mac (32GB 8 cores) and on a CentOS Linux SLURM cluster (1 node 28 cores 128 GB (no MPI)). And, my attempt at MP/MPI, across multiple nodes on the cluster, failed because I did not understand (at the time) the more hands on approach required to get MPI to run. Alas, you say this MP may not work anyway with imbalanced.rfsrc(). I wonder if I just have to ask for _fewer cores_ to get more memory per core to escape the “out of memory” error at the expense of compute time. Your suggestions are welcomed. Best, Jay J. Jay Barber Associate Professor Northern Arizona University School of Informatics, Computing, and Cybersystems (SICCS) Bldg. #90 Room 221 1295 S Knoles Drive PO Box 5693 Flagstaff, AZ 86001 Phone: 928-523-6869 On Nov 4, 2019, at 9:29 AM, Udaya Kogalur <notifications@github.com<mailto:notifications@github.com>> wrote: By MP, I assume you mean OpenMP. On a cluster you have OpenMP and MPI occurring simultaneously. Within a node, you have OpenMP executing, and across nodes, you have MPI. We successfully ran a SLURM batch script on a cluster back in the day as a test of scalability. The use of a cluster is indicated if you want to grow more trees simultaneously. Hybrid computing is not indicated as a work-around for memory issues. In your case, it's theoretically possible to grow 56 trees in parallel across two nodes. But each tree still needs to access all the data. Also, a complicating issue is that one must give the cluster instructions to grow two sub-forests, one on each node. The problem is then of combining the ensembles from each forest into a single forest. It's a non-trivial enterprise. You are using the wrapper imbalanced.rfsrc() in a hybrid environment. Anything other than standard rfsrc() or predict.rfsrc() calls are not recommended on a cluster as many of the other functions contain multiple calls to these two core functions. Combining sub-forest outputs from either of the two core functions into a single forest would be necessary before any other calculations could proceed. At the end of the day, using a cluster requires writing some code and using mpi.send.Robj(), mpi.recv.Robj(), mpi.spawn.Rslaves(), and using the supervisor process to parse the output sent back from the workers into a single forest output. I don't think any of this is what you need to do at all. What are the original dimensions of your data? What is ntree? What, specifically, are the parameters in your function call? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kogalur_randomForestSRC_issues_53-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DACKV7DJVDAXRJTO7SAU4J33QSBEXXA5CNFSM4JGQFXS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC73MZY-23issuecomment-2D549434983&d=DwMCaQ&c=l45AxH-kUV29SRQusp9vYR0n1GycN4_2jInuKy6zbqQ&r=zI_qnInFzYljMfgrW4XCCTppKvsJRrkfHDKifJ93E0U&m=3grbe_bv_ZE8awfoHOfWvi4ekIMX6dHFTLOzh7__omA&s=dShnE0ukmKmhkcxG-VkwlVS-vhRKCWqdNoZ2FIVBDuY&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACKV7DNXEVGXEAIJGVHKA6LQSBEXXANCNFSM4JGQFXSQ&d=DwMCaQ&c=l45AxH-kUV29SRQusp9vYR0n1GycN4_2jInuKy6zbqQ&r=zI_qnInFzYljMfgrW4XCCTppKvsJRrkfHDKifJ93E0U&m=3grbe_bv_ZE8awfoHOfWvi4ekIMX6dHFTLOzh7__omA&s=Mzq3hXpD0OkhmAMKFLU-pXRYS9Ryd9fJVLLPVA0nVaY&e=>.

ishwaran · 2019-11-04T19:23:00Z

The request for importance is extremely computationally demanding. Try first running it without this option. Hemant

…

On 11/4/19 2:06 PM, Jay wrote: I think I understand what you are saying about OpenMP/MPI (I’ve read the Supplementary Code at https://kogalur.github.io/randomForestSRC/theory.html#section9). But, I don’t quite understand why more nodes (memory) will not help me. In any case, the size of my data sets are on the order of a few multiples of 100,000 cases and a few multiples of 1000 covariates (e.g., 250,000 cases by 3000 covariates). Everything is binary (2 level factors), and the response indicates a rare disease (about 0.5% rate), so I’m interested in imbalanced methods. I thought Balanced Random Forests (Chen and Brieman 2004), as implemented in imbalanced.rfsrc() with the ‘’brf” option would help with memory because it selects such a releatively small bootstrap sample from the majority class. My call to imbalanced.rfsrc() is testing.rfsrc<- imbalanced(NAS ~ ., testing.df, method="brf", importance="permute”) so ntrees = 3000 (default) and, for this case, I reduced the size of my data set to 50,000 x 100, just until I can get things to run on the cluster (MPI). But, as I said, my real data sets are on the order of 250000 x 3000 (more or less). To be sure, I can get this smaller size data set (testing.df 50000 x 100) to run, no problem, but when attempting to use my real, larger data sets, I get "out of memory” errors both on my Mac (32GB 8 cores) and on a CentOS Linux SLURM cluster (1 node 28 cores 128 GB (no MPI)). And, my attempt at MP/MPI, across multiple nodes on the cluster, failed because I did not understand (at the time) the more hands on approach required to get MPI to run. Alas, you say this MP may not work anyway with imbalanced.rfsrc(). I wonder if I just have to ask for _fewer cores_ to get more memory per core to escape the “out of memory” error at the expense of compute time. Your suggestions are welcomed. Best, Jay J. Jay Barber Associate Professor Northern Arizona University School of Informatics, Computing, and Cybersystems (SICCS) Bldg. #90 Room 221 1295 S Knoles Drive PO Box 5693 Flagstaff, AZ 86001 Phone: 928-523-6869 On Nov 4, 2019, at 9:29 AM, Udaya Kogalur ***@***.******@***.***>> wrote: By MP, I assume you mean OpenMP. On a cluster you have OpenMP and MPI occurring simultaneously. Within a node, you have OpenMP executing, and across nodes, you have MPI. We successfully ran a SLURM batch script on a cluster back in the day as a test of scalability. The use of a cluster is indicated if you want to grow more trees simultaneously. Hybrid computing is not indicated as a work-around for memory issues. In your case, it's theoretically possible to grow 56 trees in parallel across two nodes. But each tree still needs to access all the data. Also, a complicating issue is that one must give the cluster instructions to grow two sub-forests, one on each node. The problem is then of combining the ensembles from each forest into a single forest. It's a non-trivial enterprise. You are using the wrapper imbalanced.rfsrc() in a hybrid environment. Anything other than standard rfsrc() or predict.rfsrc() calls are not recommended on a cluster as many of the other functions contain multiple calls to these two core functions. Combining sub-forest outputs from either of the two core functions into a single forest would be necessary before any other calculations could proceed. At the end of the day, using a cluster requires writing some code and using mpi.send.Robj(), mpi.recv.Robj(), mpi.spawn.Rslaves(), and using the supervisor process to parse the output sent back from the workers into a single forest output. I don't think any of this is what you need to do at all. What are the original dimensions of your data? What is ntree? What, specifically, are the parameters in your function call? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kogalur_randomForestSRC_issues_53-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DACKV7DJVDAXRJTO7SAU4J33QSBEXXA5CNFSM4JGQFXS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC73MZY-23issuecomment-2D549434983&d=DwMCaQ&c=l45AxH-kUV29SRQusp9vYR0n1GycN4_2jInuKy6zbqQ&r=zI_qnInFzYljMfgrW4XCCTppKvsJRrkfHDKifJ93E0U&m=3grbe_bv_ZE8awfoHOfWvi4ekIMX6dHFTLOzh7__omA&s=dShnE0ukmKmhkcxG-VkwlVS-vhRKCWqdNoZ2FIVBDuY&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACKV7DNXEVGXEAIJGVHKA6LQSBEXXANCNFSM4JGQFXSQ&d=DwMCaQ&c=l45AxH-kUV29SRQusp9vYR0n1GycN4_2jInuKy6zbqQ&r=zI_qnInFzYljMfgrW4XCCTppKvsJRrkfHDKifJ93E0U&m=3grbe_bv_ZE8awfoHOfWvi4ekIMX6dHFTLOzh7__omA&s=Mzq3hXpD0OkhmAMKFLU-pXRYS9Ryd9fJVLLPVA0nVaY&e=>. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#53?email_source=notifications&email_token=AK6G4J4AIQI2WGJELQZK7FDQSBXBVA5CNFSM4JGQFXS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDALOXY#issuecomment-549500767>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK6G4J6RPEQQAOZ4JLQ4QY3QSBXBVANCNFSM4JGQFXSQ>.

-- Hemant Ishwaran Deputy Statistical Editor, J. Thor. Cardio. Surg. Director of Statistical Methodology Professor, Division of Biostatistics Director of Graduate Studies Don Soffer Clinical Research Center, Room 1058 1120 NW 14th Street University of Miami, Miami FL 33136 hemant.ishwaran@gmail.com (preferred) hishwaran@med.miami.edu (305) 243-5473 (office) (305) 243-5544 (fax) http://web.ccs.miami.edu/~hishwaran

ishwaran closed this as completed Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running on SLURM with MP and MPI #53

running on SLURM with MP and MPI #53

chillydog commented Oct 29, 2019

kogalur commented Nov 4, 2019

chillydog commented Nov 4, 2019 via email

ishwaran commented Nov 4, 2019 via email

running on SLURM with MP and MPI #53

running on SLURM with MP and MPI #53

Comments

chillydog commented Oct 29, 2019

**followed by 56 instances of **:

If you are absolutely sure that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0.

kogalur commented Nov 4, 2019

chillydog commented Nov 4, 2019 via email

ishwaran commented Nov 4, 2019 via email

followed by 56 instances of :

If you are absolutely sure that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.