Skip to content

RandomForestClassifier parallel issues with CPU usage decreasing over run #12482

Closed
@hermidalc

Description

@hermidalc

Description

Related or identical to issue #6023 but it seems as of 0.19.2 it's not fixed even though that issue is closed. I encountered it not with GridSearchCV but with RFE wrapping RF. I get the exact same strange behavior where parallel CPU usage starts like it should at 100% and then steadily decreases to low numbers while system CPU usage (in Linux shown in top) increases to 10-15% CPU per core which is not normal. The fit never finishes as well (or takes way too long if it ever does finish)

Steps/Code to Reproduce

from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=3200, n_informative=100, n_redundant=3100, n_classes=2, n_clusters_per_class=30)

pipe = Pipeline([
    ('slr', StandardScaler()),
    ('fs', RFE(RandomForestClassifier(n_estimators=1000, max_features='auto', class_weight='balanced', n_jobs=-1), step=0.01, n_features_to_select=10))
])
pipe.fit(X, y)

Expected Results

Parallel CPU usage to be effectively 100% on number of cores = n_jobs for each iteration of RFE and for the pipeline fit to complete in a normal time.

Actual Results

Parallel CPU usage starts like it should at 100% and then steadily decreases to low numbers while system CPU usage (in Linux shown in top) increases to 10-15% CPU per core which is not normal. The pipeline fit never finishes.

Versions

Linux-4.18.16-200.fc28.x86_64-x86_64-with-fedora-28-Twenty_Eight
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) [GCC 7.2.0]
NumPy 1.14.3
SciPy 1.1.0
Scikit-Learn 0.19.2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions