Issue when run your example #12

robinbing · 2016-09-02T06:31:26Z

Hi,

When I run your example code, at line 'feat_selector.fit(X,y)', I have red words 'TypeError: unhashable type: 'slice''. So I tried to change y = y.values and x = x.values. Then after 99 iterations (maxrun = 100), there is another red words 'TypeError: iteration over a 0-d array'.

So I was wondering what happen there... Thanks a lot

danielhomola · 2016-09-02T13:54:01Z

Hi,

What't the dimension of X and y? Are you sure they're both numpy arrays?

mavillan · 2016-09-06T01:51:00Z

Hi Daniel,

I have the same problem as @robinbing. Here is my test code

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from boruta_py import BorutaPy

# load X and y
# NOTE BorutaPy accepts numpy arrays only, hence the .values attribute
#X = pd.read_csv('my_X_table.csv', index_col=0).values
#y = pd.read_csv('my_y_vector.csv', index_col=0).values
X = 10*np.random.random((1000,210))
y = np.zeros(1000, dtype=int)
y[np.random.random(1000) >= 0.5] = 1 


# define random forest classifier, with utilising all cores and
# sampling in proportion to y labels
rf = RandomForestClassifier(n_jobs=-1, class_weight='auto', max_depth=5)

# define Boruta feature selection method
feat_selector = BorutaPy(rf, n_estimators='auto', verbose=2, max_iter=1000)

# find all relevant features
feat_selector.fit(X, y)

# check selected features
feat_selector.support_

# check ranking of features
feat_selector.ranking_

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)

it's basically your same example code, but with randomly generated data. Here is the error:

Traceback (most recent call last):
  File "boruta_example.py", line 23, in <module>
    feat_selector.fit(X, y)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 191, in fit
    return self._fit(X, y)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 325, in _fit
    iter_ranks = self._nanrankdata(imp_history_rejected, axis=1)
  File "/home/martin/Repositories/svm/lib/boruta_py.py", line 493, in _nanrankdata
    ranks = sp.stats.mstats.rankdata(np.ma.masked_invalid(X), axis=axis)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 260, in rankdata
    return ma.apply_along_axis(_rank1d,axis,data,use_missing).view(ndarray)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/numpy/ma/extras.py", line 394, in apply_along_axis
    res = func1d(arr[tuple(i.tolist())], *args, **kwargs)
  File "/home/martin/miniconda2/envs/python3/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 248, in _rank1d
    for r in repeats[0]:
TypeError: iteration over a 0-d array

It seems an error of SciPy's rankdata function.

Note: It was tested on Anaconda's Python2 and Python3

danielhomola closed this as completed Sep 5, 2016

nerdcha mentioned this issue Nov 29, 2016

iteration over a 0-d array in _nanrankdata #16

Closed

bittremieux mentioned this issue Dec 14, 2016

Fix nanrankdata bug when the entire row/column is NaN #19

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when run your example #12

Issue when run your example #12

robinbing commented Sep 2, 2016

danielhomola commented Sep 2, 2016

mavillan commented Sep 6, 2016

Issue when run your example #12

Issue when run your example #12

Comments

robinbing commented Sep 2, 2016

danielhomola commented Sep 2, 2016

mavillan commented Sep 6, 2016