You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been applying causal forest to datas that have tens millions of observations, and I've been encountering memory issues. When building a forest with only 1000 trees (lower than the recommended amount), the whole forest takes more than 200 GB of memory, even though the raw data is only 2 GB. I think the reason is each tree stores a copy of a fraction of the data (both in the "nodes" and "oob_samples" attributes).
One strategy I had of reducing the memory intensity is to use a small "sample.fraction" parameter. However, when I set a low value, the memory size for an individual tree actually went up. I think the reason is the size of tree$oob_samples went up, which makes sense. Is there a way we can not save the out of sample observations for each tree?
The text was updated successfully, but these errors were encountered:
Yes, this is a very reasonable suggestion. We're currently planning to add an 'optimized' mode for the prediction-only case in #122, which will avoid storing the OOB sample information. I've also filed #145, which suggests we avoid storing OOB samples altogether.
I have been applying causal forest to datas that have tens millions of observations, and I've been encountering memory issues. When building a forest with only 1000 trees (lower than the recommended amount), the whole forest takes more than 200 GB of memory, even though the raw data is only 2 GB. I think the reason is each tree stores a copy of a fraction of the data (both in the "nodes" and "oob_samples" attributes).
One strategy I had of reducing the memory intensity is to use a small "sample.fraction" parameter. However, when I set a low value, the memory size for an individual tree actually went up. I think the reason is the size of tree$oob_samples went up, which makes sense. Is there a way we can not save the out of sample observations for each tree?
The text was updated successfully, but these errors were encountered: