Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed "Tuple Index Out of range error", unit test and example notebook #48

Merged
merged 2 commits into from Jan 31, 2019

Conversation

guitarmind
Copy link
Contributor

@guitarmind guitarmind commented Dec 27, 2018

This PR relates to #47, a bug I made in #46.
We should only check the size of 1st dimension of not_selected array, as it would be 0 already if all features are relevant.

This time I have also double-checked that unit test case is passed (fixed a small issue inside as well).

Here is the output log of unit test:

$ python unit_tests.py
..
----------------------------------------------------------------------
Ran 2 tests in 11.859s

OK

I also discovered some compatibility issues to Python (I'm using 3.6.5) and Pandas in the example notebook while doing correctness test, and it should work well with current version as well!

@guitarmind guitarmind changed the title Fixed "Tuple Index Out of range error" and unit test Fixed "Tuple Index Out of range error", unit test and example notebook Dec 27, 2018
@freshnemo
Copy link

HI, @guitarmind I tried the Madalon Dataset. ipynd provided by the package. It showed the following error message "Type error: unhashable type: slice" from "pandas/core/generic.py line 2487 : res= cache.get (item)". The current python version I used is 3.6.4 pandas is 0.23.4. Could you provide your package setting from your test environment?

@guitarmind
Copy link
Contributor Author

Hi @freshnemo,

Are you using the my forked version? The error Type error: unhashable type: slice is actually what I was trying to fix in the Madalon_Data_Set notebook. It is because that X needs to be an numpy array to do slicing in the line 402 of boruta_py.py source:

x_cur = np.copy(X[:, x_cur_ind])

In the PR I made a change in the notebook to get X in numpy format:

y = data.pop('target')
X = data.copy().values

Note that this PR is not merged so the changes are not applied yet.
Could you share full stacktrace mesage to know more details, thanks.

Test environment:

  • Python 3.6.5
  • Pandas 0.23.0

@freshnemo
Copy link

Yes, when I forked the commend you provided, at beginning, Boruta_py can run but will soon stop. If the iteration is 100, Boruta usually stop at 45 iteration and show the error as #47 "if not_selected.shape[0] > 0 and not_selected.shape[1] > 0:" tuple is out of index.
My python version is 3.6.4 and pandas is 0.23.4.
In addition, I found an example which used Boruta_py at kaggle which could work. This is why I suspect python 3.6.4 might have a bug.

@guitarmind
Copy link
Contributor Author

guitarmind commented Dec 28, 2018

So what is the stacktrace of error? "if not_selected.shape[0] > 0 and not_selected.shape[1] > 0: has been changed in the PR.

@freshnemo
Copy link

Oh, Sorry, I did not check the "file changed " tab. I modified the code which you provided. Thanks for your help, the code can run.

@guitarmind
Copy link
Contributor Author

guitarmind commented Dec 28, 2018

Good to know that 👍

@silverstone1903
Copy link

silverstone1903 commented Jan 18, 2019

Edit: Seems like it's working. 😄

Hi @guitarmind,

When I change the line 336 to if not_selected.shape[0] > 0: I get this error:

IndexError                                Traceback (most recent call last)
<timed eval> in <module>()

<ipython-input-40-4c6a084e678c> in fit(self, X, y)
    199         """
    200 
--> 201         return self._fit(X, y)
    202 
    203     def transform(self, X, weak=False):

<ipython-input-40-4c6a084e678c> in _fit(self, X, y)
    312         tentative = np.where(dec_reg == 0)[0]
    313         # ignore the first row of zeros
--> 314         tentative_median = np.median(imp_history[1:, tentative], axis=0)
    315         # which tentative to keep
    316         tentative_confirmed = np.where(tentative_median

IndexError: too many indices for array

Before changing it just gives tuple index error in the end but function works properly. Do you have any idea?

@danielhomola danielhomola merged commit eaad6a3 into scikit-learn-contrib:master Jan 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants