-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Growing trees... Killed" extreme high memory consumption for survival forests #202
Comments
I'm doing some tests + memory monitoring and will provide them here as soon as possible. |
Thanks. This is very strange. Any idea what could still be different in the real dataset? I guess you are not allowed to share it? Could you check the size of the resulting forest in the two cases, e.g., with By the way, |
Main differences in the datasets (real vs. simulated)
I'll check the tree sizes as soon as my servers work fine, again. This is somehow offtopic, but...: Yesterday I made some excessive run + kill tests, since then I have troubles.
This had a bad effect: progress and memory demand example tablessimulation script (5-10 minute version,
|
Please excuse my late response.
The picture below shows a benchmark over min.node.size: But nonetheless the memory demand (not shown) for the real data remains massive. |
The reason is probably that in a survival forest a cumulative hazard function has to be saved in each terminal node. If there are many unique time points in the dataset, these CHFs grow large and for many deep trees they are a lot of them. To verify this, could you try to change the splitting rule to "extratrees" and/or "maxstat" and check if this changes the memory usage? |
900 - 1000 unqiue time points do exists.
I'll check the alternative splitting rules as soon as my other computations have finished, this may take up to a month. I'll provide an update as soon as possible.
Today I trained a SRF in the same manner as before (min.node.size 15, ntrees = 1000), but with more patients (62K; real data, no simulation data) than before. The patients utilized before are contained as a subset in these. Very surprisingly the memory demand dropped to ~ 50 GB, which is nice, but
confusing.
|
Approximating survival times to a restricted grid of time values can greatly improve the performance. 1000 time points is way too many. By the way, in randomForestSRC they have a parameter for facilitating that operation. I don't feel like such a parameter is absolutely needed (I prefer full control in defining the time grid myself), but it might be useful one to have. |
Dear all, |
Hi @XavierPrudent, to point out a few things I did following @mnwright recommendations:
More a workaround than a solution
Beside of that I also played around with other RF implementations. As far as I know, ranger is still the most efficient implementation in R for (survival) random forest. |
@XavierPrudent Please give some details (best with reproducible example). |
Hello,
thanks for providing the ranger package for fast RSF.
And thanks for your time reading this.
Depending on the value of the num.trees size, growing of the trees aborts suddenly, even though I have
two strong servers as described below.
There is no error message, but "killed" - please see attached screenshot.
Given
num.trees=5000
the interruption occured at growing progress of e. g. 52%, 76%, 86%.But never at an lower progress rate.
In another dataset I've observed this behaviour at 99%, too.
I've tried using the "dependent.variable" and "status.variable" notation instead of providing
a survival formula or survival object, but that didn't helped, too.
I'm running the R-Script in bash mode to avoid any overhead or pertubations from RStudio.
The different training datasets I tried have 27.100 Obserations and 500 features.
The
ranger()
function call is:ranger(dependent.variable.name = "time", status.variable.name = "status", data = training, num.trees = num.trees, save.memory = T )
Whereas the call from the shell is e. g.
R < run.Ranger.R --no-save
All independent variables are numeric and scale between [0,1].
A workaround is reducing num.tress with a try-and-error approach.
If using
importance = TRUE
I have to reduce the tree size further to avoid "killed" sessions.Setting save.memory = TRUE` doesn't help.
I'd be glad and thankful for any ideas or proposals!
Finally, here are the hardware / software stats:
OS System
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:g raphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.8 (Santiago)
Release: 6.8
Codename: Santiago
Hardware, two servers with each (problem occurs on both, so it's server independent):
CPU: 20 Cores, Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
RAM: 246 GB
R
platform x86_64-redhat-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 3.2
year 2016
month 10
day 31
svn rev 71607
language R
version.string R version 3.3.2 (2016-10-31)
nickname Sincere Pumpkin Patch
The text was updated successfully, but these errors were encountered: