The multi-threading issues on RandomForestClassifier

Hi, 

I'm using RandomForestClassifier to train a model on Ubuntu14.04 with python2.7.11 thru anaconda package. Below is the core coding:

rf = RandomForestClassifier(n_jobs = -1, random_state = seed)
parameters = {'n_estimators': [2000],
              'criterion': ['entropy'],
              'max_depth': [10],
              'min_samples_leaf': [3],
              #'oob_score': [False, True],
              'max_features': ['auto']}

print "Start paramter grid search..."
start = time()
clf = GridSearchCV(rf, parameters, n_jobs = 4, scoring = 'roc_auc',
                   cv = StratifiedKFold(y_train, n_folds = 4, shuffle = True,  random_state = 128),
                   verbose=2, refit = True)

clf.fit(x_train, y_train)

I turned on the CPU monitor to watch the CPU status on a quad-core system. However, in the beginning, all CPUs are 99% used. However, after ~1hr, the CPU's usage drops to ~0.3%, which does not seem normal.

Below is the output on from the terminal:

---

Fitting 4 folds for each of 1 candidates, totalling 4 fits
[CV] max_features=auto, n_estimators=2000, criterion=entropy, max_
n_samples_leaf=3
[CV] max_features=auto, n_estimators=2000, criterion=entropy, max_
n_samples_leaf=3
[CV] max_features=auto, n_estimators=2000, criterion=entropy, max_
n_samples_leaf=3
[CV] max_features=auto, n_estimators=2000, criterion=entropy, max_
n_samples_leaf=3
[CV]  max_features=auto, n_estimators=2000, criterion=entropy, max
in_samples_leaf=3 -68.4min
[Parallel(n_jobs=4)]: Done   1 jobs       | elapsed: 68.4min
[CV]  max_features=auto, n_estimators=2000, criterion=entropy, max
in_samples_leaf=3 -68.8min

---

Below is the status of CPU usage:

---

top - 01:40:46 up  4:40,  0 users,  load average: 0.00, 0.00, 0.95
Tasks:  66 total,   1 running,  65 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   6553600 total,  4579708 used,  1973892 free,        0 buffers
KiB Swap:  6553600 total,  5511600 used,  1042000 free.    36652 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  
  994 root      20   0   23116     60     60 S   0.0  0.0   0:00.00 ptyserved
  997 root      20   0   39372      0      0 S   0.0  0.0   0:00.01 nginx
 1000 root      20   0   39876    924    536 S   0.0  0.0   0:01.75 nginx
 1002 root      20   0   12736      0      0 S   0.0  0.0   0:00.00 getty
 1004 root      20   0   12736      0      0 S   0.0  0.0   0:00.00 getty
 1435 root      20   0   18144     24     24 S   0.0  0.0   0:00.00 bash
 1455 root      20   0   59568      0      0 S   0.0  0.0   0:00.00 su
 1456 root      20   0   18140      0      0 S   0.0  0.0   0:00.00 bash
 1467 root      20   0   21916    800    504 R   0.0  0.0   0:12.72 top  
 1888 root      20   0   61316      4      4 S   0.0  0.0   0:00.00 sshd
 1950 postfix   20   0   27408    272    184 S   0.0  0.0   0:00.05 qmgr
 2062 root      20   0   59568     92     92 S   0.0  0.0   0:00.00 su
 2063 root      20   0   18144     60     60 S   0.0  0.0   0:00.00 bash
 2074 root      20   0 5861480   3260    888 S   0.0  0.0   1:26.11 python***
 2087 root      20   0 8268200 2.020g    216 S   0.0 32.3  66:39.13 python***
 2090 root      20   0 8268200 2.034g   1676 S   0.0 32.5  66:38.78 python***
 2184 postfix   20   0   27356    524    240 S   0.0  0.0   0:00.00 pickup
 2210 root      20   0 5861480   2836    104 S   0.0  0.0   0:00.00 python ****
 2225 root      20   0 5861480   4040    820 S   0.0  0.1   0:00.00 python ****

---

Although the training data is about 207MB with 300 features, the CPU usage drop doesn't seem to usual. 

Do you know what is going on?

Thank you very much!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The multi-threading issues on RandomForestClassifier #6023

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

The multi-threading issues on RandomForestClassifier #6023

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions