New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous Treatment Memory Error #558
Comments
If you decrease the HPC memory to something insufficient, do you get a different error message than the malloc above for binary W? |
For easier bug analysis could you please post a simulated example that replicates your error message? |
Thanks @erikcs I will try to generate a simulated example. But just curious - does continuous treatment generally take more memory than binary treatment? |
I tested on a small dataset, and it seems that binary treatments takes half of the time continuous treatment takes to run. I think it probably takes half of the memory as well? I was what part of the code is lagging the performance? @erikcs |
I don't see how the type of W should matter for speed when what being passed to causal train is always continuous? |
So I have tried the code in machines with different RAMs. It fails when the RAMs of the HPCs are 300GB or 1TB. When I raise the RAM to around 3TB, it finally runs through. I am not sure what is the bottle neck here - still need to investigate. But this is probably a memory issue. Not sure if it is related to #187 . But it always stuck in the training phase on machines with RAMs below 1TB. And it only happens with continuous treatment. I have not been able to create a simulated example yet - I guess I would need to simulate a very large dataset. Will keep trying though. On another note, I am also use DMTCP to checkpoint my jobs on HPCs. Not sure if that is related to the issue. Because 3TB is actually a lot of RAM ... I could barely find the machine to do the job. |
Have you tried running the forest with |
Could you try the same experiment but setting min.node.size larger, for example 200? (if for some reason there are so much more splits with continuous W the number of nodes causes the tree vectors to grow unreasonably large) |
By setting min.node.size larger. does it limit the trees' depth? @erikcs |
Actually, probably limit the number of nodes. |
It is actually possible that there are too many nodes. Because when I tried to plot the graphs with the best tree function provided in #281 , it gives me the following error:
The fact that there is a stack overflow when traversing through the trees indicates that there might be far too many nodes in the trees to be plotted. |
Anyways, I have submitted a new job with min.node.size equal to 100. Let's wait and see what happens. |
Looks like uplifting the min.node.size to 100 worked! Maybe it is the problem that the number of nodes grow unreasonably large when the treatment is continuous ... |
Ok, thanks for checking |
@ginward
Here I just use a simulation to deomnstrate the error, which shows it only works for regression_forest, but not for causal_forest or instrumental_forest:
thanks |
@tianshengwang Have you tried limiting the size of nodes to, say, 500? I think it is the fact that continuous treatment will expand the trees too wide and created too many nodes, thus creating a stackoverflow. |
You mean increase the min.node.size? I used, 500, 1000, 2000, still don't work for causal_forest and instrumental_forest. |
@tianshengwang It works on my side.
I have ignored the warnings. |
Got it, thanks! |
Closing this out because we didn't identify a bug and you seem to have found a way forward. |
Description of the bug
I am estimating causal forest on a large dataset (around 2-3 GB with 1.6 million observations and 8 independent variables, as well as 1 dependent variable) with 4000 trees. When I use a binary treatment
W
, the forest runs fine. However, when I switch to a continuous treatmentW
, R crashes.I am running it on a HPC with 32 CPUs and 12GB RAM per CPU.
The error message is (I masked some of my memory addresses with *):
Does continuous treatment consume more memory than binary treatment?
GRF version
development
The text was updated successfully, but these errors were encountered: