Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when run your example #12

Closed
robinbing opened this issue Sep 2, 2016 · 2 comments
Closed

Issue when run your example #12

robinbing opened this issue Sep 2, 2016 · 2 comments

Comments

@robinbing
Copy link

Hi,

When I run your example code, at line 'feat_selector.fit(X,y)', I have red words 'TypeError: unhashable type: 'slice''. So I tried to change y = y.values and x = x.values. Then after 99 iterations (maxrun = 100), there is another red words 'TypeError: iteration over a 0-d array'.

So I was wondering what happen there... Thanks a lot

@danielhomola
Copy link
Collaborator

Hi,

What't the dimension of X and y? Are you sure they're both numpy arrays?

@mavillan
Copy link

mavillan commented Sep 6, 2016

Hi Daniel,

I have the same problem as @robinbing. Here is my test code

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from boruta_py import BorutaPy

# load X and y
# NOTE BorutaPy accepts numpy arrays only, hence the .values attribute
#X = pd.read_csv('my_X_table.csv', index_col=0).values
#y = pd.read_csv('my_y_vector.csv', index_col=0).values
X = 10*np.random.random((1000,210))
y = np.zeros(1000, dtype=int)
y[np.random.random(1000) >= 0.5] = 1 


# define random forest classifier, with utilising all cores and
# sampling in proportion to y labels
rf = RandomForestClassifier(n_jobs=-1, class_weight='auto', max_depth=5)

# define Boruta feature selection method
feat_selector = BorutaPy(rf, n_estimators='auto', verbose=2, max_iter=1000)

# find all relevant features
feat_selector.fit(X, y)

# check selected features
feat_selector.support_

# check ranking of features
feat_selector.ranking_

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)

it's basically your same example code, but with randomly generated data. Here is the error:

Traceback (most recent call last):
  File "boruta_example.py", line 23, in <module>
    feat_selector.fit(X, y)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 191, in fit
    return self._fit(X, y)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 325, in _fit
    iter_ranks = self._nanrankdata(imp_history_rejected, axis=1)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 493, in _nanrankdata
    ranks = sp.stats.mstats.rankdata(np.ma.masked_invalid(X), axis=axis)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 260, in rankdata
    return ma.apply_along_axis(_rank1d,axis,data,use_missing).view(ndarray)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/numpy/ma/extras.py", line 394, in apply_along_axis
    res = func1d(arr[tuple(i.tolist())], *args, **kwargs)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 248, in _rank1d
    for r in repeats[0]:
TypeError: iteration over a 0-d array

It seems an error of SciPy's rankdata function.

Note: It was tested on Anaconda's Python2 and Python3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants