Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater hits = np.where(cur_imp[0] > imp_sha_max)[0] #30

Closed
shushan2017 opened this issue Oct 9, 2017 · 7 comments

Comments

@shushan2017
Copy link

d:\Anaconda3\lib\site-packages\boruta\boruta_py.py:418: RuntimeWarning: invalid value encountered in greater
hits = np.where(cur_imp[0] > imp_sha_max)[0]

@tagomatech
Copy link
Contributor

Can you provide a bit more background on this issue such as function used, parameters, dataset, .. ?

@tagomatech
Copy link
Contributor

tagomatech commented Nov 5, 2017

OK. I got myself that same error message. My understanding is that it is numpy sending a warning while doing comparisons with NaN. Looking at the code in boruta_py.py, this suggests that due to your data and/or your parameters, no feature is better than than the best shadow.

Can you confirm that the attribute n_features_ of your Boruta object returned 0?

@flaviozamponi
Copy link

flaviozamponi commented Nov 7, 2017

Dear all,
I get the same warning using this simple script (synthetic data from sklearn)

`from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.datasets import make_regression
from boruta import BorutaPy

Xdum, ydum = make_regression(n_samples = 100, n_features=50,
n_informative=10, bias = 150.0,
noise = 30, random_state=0)
rf = RandomForestRegressor(n_jobs=-1, max_depth=10)

feat_selector = BorutaPy(rf, n_estimators=1000, perc=100, max_iter=20, verbose=2)
feat_selector.fit(Xdum,ydum)`

Please note that I set n_informative to 10 and at the end Boruta finds indeed 7 relevant features: feat_selector.n_features_ returns 7. The problem appears also with max_iter=100.
I'm using sklearn 0.19.1 and numpy 1.13.3

@tagomatech
Copy link
Contributor

tagomatech commented Nov 11, 2017

Note that the higher max_iter, the more likely you get this error message.
BTW I fixed the problem. Please see my pull request. Your snippet code runs with no warning message applying this small code change.

@flaviozamponi
Copy link

Thanks a lot!

@danielhomola
Copy link
Collaborator

Thanks @tagomatech ,accepted the PR.

@Saravji
Copy link
Contributor

Saravji commented Feb 8, 2018

Please re-open this issue, as the proposed (and implemented) solution introduces an error:
The moment NaNs are encountered, this solution behaves in the following way:
(The array printed out is hits right after the assignment in question):
Note: I am working with the Madalon example notebook in the package.

Referred to as code variant A:
[ 28 48 64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 472
475 493]
Iteration: 7 / 100
Confirmed: 0
Tentative: 499
Rejected: 0
[ 28 48 64 105 128 153 241 281 318 336 338 378 433 442 451 453 472 475
493]
Iteration: 8 / 100
Confirmed: 0
Tentative: 21
Rejected: 478
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Iteration: 9 / 100
Confirmed: 0
Tentative: 21
Rejected: 478
[ 0 1 2 3 4 5 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Iteration: 10 / 100
Confirmed: 0
Tentative: 21
Rejected: 478

Instead of using the actual features, it uses the n first not rejected features.
For contrast, same pipeline, only using the replaced code:

Referred to as code variant B:
[ 28 48 64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 472
475 493]
Iteration: 7 / 100
Confirmed: 0
Tentative: 499
Rejected: 0
[ 28 48 64 105 128 153 241 281 318 336 338 378 433 442 451 453 472 475
493]
Iteration: 8 / 100
Confirmed: 0
Tentative: 21
Rejected: 478

  print(hits)
[ 28  48  64 105 128 153 204 241 281 318 336 338 378 433 442 451 453 455
 472 475 493]
Iteration: 	9 / 100
Confirmed: 	19
Tentative: 	2
Rejected: 	478
~~~/boruta_py.py:421: RuntimeWarning: invalid value encountered in greater
  print(hits)
[ 28  48  64 105 128 153 241 281 318 336 338 378 433 442 451 453 455 472
 475 493]
Iteration: 	10 / 100
Confirmed: 	19
Tentative: 	2
Rejected: 	478

Line number is off, as I have both code snippets in the file.

The results of both runs are significantly different:
Variant A terminates repeatable after 34 iterations, results vary between 1 or 2 accepted and remainder rejected features.
Variant B terminates after 100 iterations with 21 accepted and 1 or 2 tentative features.

Saravji added a commit to Saravji/boruta_py that referenced this issue Feb 8, 2018
solving issue scikit-learn-contrib#30 in _assign_hits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants