d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater hits = np.where(cur_imp[0] > imp_sha_max)[0] #30

shushan2017 · 2017-10-09T02:40:26Z

d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater
hits = np.where(cur_imp[0] > imp_sha_max)[0]

tagomatech · 2017-11-04T09:07:58Z

Can you provide a bit more background on this issue such as function used, parameters, dataset, .. ?

tagomatech · 2017-11-05T18:13:28Z

OK. I got myself that same error message. My understanding is that it is numpy sending a warning while doing comparisons with NaN. Looking at the code in boruta_py.py, this suggests that due to your data and/or your parameters, no feature is better than than the best shadow.

Can you confirm that the attribute n_features_ of your Boruta object returned 0?

flaviozamponi · 2017-11-07T07:48:53Z

Dear all,
I get the same warning using this simple script (synthetic data from sklearn)

`from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.datasets import make_regression
from boruta import BorutaPy

Xdum, ydum = make_regression(n_samples = 100, n_features=50,
n_informative=10, bias = 150.0,
noise = 30, random_state=0)
rf = RandomForestRegressor(n_jobs=-1, max_depth=10)

feat_selector = BorutaPy(rf, n_estimators=1000, perc=100, max_iter=20, verbose=2)
feat_selector.fit(Xdum,ydum)`

Please note that I set n_informative to 10 and at the end Boruta finds indeed 7 relevant features: feat_selector.n_features_ returns 7. The problem appears also with max_iter=100.
I'm using sklearn 0.19.1 and numpy 1.13.3

tagomatech · 2017-11-11T20:59:04Z

Note that the higher max_iter, the more likely you get this error message.
BTW I fixed the problem. Please see my pull request. Your snippet code runs with no warning message applying this small code change.

flaviozamponi · 2017-11-12T12:37:18Z

Thanks a lot!

danielhomola · 2017-11-19T19:47:55Z

Thanks @tagomatech ,accepted the PR.

Saravji · 2018-02-08T17:03:39Z

Please re-open this issue, as the proposed (and implemented) solution introduces an error:
The moment NaNs are encountered, this solution behaves in the following way:
(The array printed out is hits right after the assignment in question):
Note: I am working with the Madalon example notebook in the package.

Referred to as code variant A:
[ 28 48 64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 472
475 493]
Iteration: 7 / 100
Confirmed: 0
Tentative: 499
Rejected: 0
[ 28 48 64 105 128 153 241 281 318 336 338 378 433 442 451 453 472 475
493]
Iteration: 8 / 100
Confirmed: 0
Tentative: 21
Rejected: 478
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Iteration: 9 / 100
Confirmed: 0
Tentative: 21
Rejected: 478
[ 0 1 2 3 4 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Iteration: 10 / 100
Confirmed: 0
Tentative: 21
Rejected: 478

Instead of using the actual features, it uses the n first not rejected features.
For contrast, same pipeline, only using the replaced code:

Referred to as code variant B:
[ 28 48 64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 472
475 493]
Iteration: 7 / 100
Confirmed: 0
Tentative: 499
Rejected: 0
[ 28 48 64 105 128 153 241 281 318 336 338 378 433 442 451 453 472 475
493]
Iteration: 8 / 100
Confirmed: 0
Tentative: 21
Rejected: 478

  print(hits)
[ 28  48  64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 455
 472 475 493]
Iteration: 	9 / 100
Confirmed: 	19
Tentative: 	2
Rejected: 	478
~~~/boruta_py.py:421: RuntimeWarning: invalid value encountered in greater
  print(hits)
[ 28  48  64 105 128 153 241 281 318 336 338 378 433 442 451 453 455 472
 475 493]
Iteration: 	10 / 100
Confirmed: 	19
Tentative: 	2
Rejected: 	478

Line number is off, as I have both code snippets in the file.

The results of both runs are significantly different:
Variant A terminates repeatable after 34 iterations, results vary between 1 or 2 accepted and remainder rejected features.
Variant B terminates after 100 iterations with 21 accepted and 1 or 2 tentative features.

solving issue scikit-learn-contrib#30 in _assign_hits

danielhomola closed this as completed Nov 19, 2017

Saravji added a commit to Saravji/boruta_py that referenced this issue Feb 8, 2018

Update boruta_py.py

b549626

solving issue scikit-learn-contrib#30 in _assign_hits

Saravji mentioned this issue Feb 8, 2018

Update boruta_py.py #38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater hits = np.where(cur_imp[0] > imp_sha_max)[0] #30

d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater hits = np.where(cur_imp[0] > imp_sha_max)[0] #30

shushan2017 commented Oct 9, 2017

tagomatech commented Nov 4, 2017

tagomatech commented Nov 5, 2017 •

edited

Loading

flaviozamponi commented Nov 7, 2017 •

edited

Loading

tagomatech commented Nov 11, 2017 •

edited

Loading

flaviozamponi commented Nov 12, 2017

danielhomola commented Nov 19, 2017

Saravji commented Feb 8, 2018

d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater hits = np.where(cur_imp[0] > imp_sha_max)[0] #30

d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater hits = np.where(cur_imp[0] > imp_sha_max)[0] #30

Comments

shushan2017 commented Oct 9, 2017

tagomatech commented Nov 4, 2017

tagomatech commented Nov 5, 2017 • edited Loading

flaviozamponi commented Nov 7, 2017 • edited Loading

tagomatech commented Nov 11, 2017 • edited Loading

flaviozamponi commented Nov 12, 2017

danielhomola commented Nov 19, 2017

Saravji commented Feb 8, 2018

tagomatech commented Nov 5, 2017 •

edited

Loading

flaviozamponi commented Nov 7, 2017 •

edited

Loading

tagomatech commented Nov 11, 2017 •

edited

Loading