Massive memory usage by parallel RandomForestClassifier

I think this will be hard to fix without swapping out joblib (or maybe even the GIL ;), but basically the amount of memory used by RandomForestClassifier is exorbitant for n_jobs > 1. In my case, I have a dataset of about 1GB (300,000 samples by 415 features by 64-bit float), but doing fit() on a RandomForestClassifier having n_jobs=16 results in 45GB of memory being used.

Does anyone have any ideas or is this hopeless without moving everything to C?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Massive memory usage by parallel RandomForestClassifier #936

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Massive memory usage by parallel RandomForestClassifier #936

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions