-
I'm working with the The saved file size for the model is 100GB, and it takes 15min to load back into memory on my VM. Looking at the sizes in memory of each of the object attributes, it looks like the culprit (as expected) is the I understand that random forests may lead to bigger files than linear models, but 100GB for 82 features seems excessive. Is there any way to streamline the imputer object to make a more manageable size? Or am I stuck with this as long as I use random forests as the estimator? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I am not that surprised. You have 150 trees x 82 features stored. Each tree will be unpruned meaning you are trying to split in the worst case scenario 24k samples where each sample can become a leaf. So you can try to limit the size of the forest reducing the number of trees and the number of leaves. |
Beta Was this translation helpful? Give feedback.
I am not that surprised. You have 150 trees x 82 features stored. Each tree will be unpruned meaning you are trying to split in the worst case scenario 24k samples where each sample can become a leaf.
So you can try to limit the size of the forest reducing the number of trees and the number of leaves.