When using Pandas DataFrames #3

mbernico · 2016-04-09T19:37:49Z

_add_shadows_get_imps() fails when X is pandas rather than numpy

Pandas DF can no longer be sliced as
x_cur = np.copy(X[:, x_cur_ind])

x_cur = np.copy(X.as_matrix()[:, x_cur_ind])
OR
x_cur = np.copy(X.ix[:, x_cur_ind])

I'd recommend testing/casting dataframes to numpy arrays in _fit

danielhomola · 2016-04-10T17:33:57Z

Yepp atm boruta expects a numpy array for X, but this is made explicit in the docstring of fit():
X : array-like, shape = [n_samples, n_features]
The training input samples.

If you feel this is an important issue, please add this to the fit and I'll review your changes.

Oh you did, that's wonderful, cheers!

mbernico · 2016-04-10T19:39:37Z

The examples show pandas going in. I suppose it would be as easy to just update the user doc to show them to only send numpy. I built a 'pandas check' but that has the unfortunate side effect of adding a dependency. It appears that's how sklearn handles it as well though. Toss up, I'll leave you to decide which you like better :)

PR for packaging, Python3 Support, and Issues #3 and #4

danielhomola · 2016-04-10T21:52:30Z

Hi Mike,

Yepp, I wanted it to have a scikit learn interface, so kinda instinctively stuck with the numpy input as sklearn does.. I added a warning to the examples as you recommended, and renamed boruta_py2 to boruta_py_plus.. Also left in your sanity check for pandas dataframes jsut in case. Pandas is pretty common now, it's not a major dependency issue imo..

Thanks again for your valuable input, really appreciate it!

cheers,
Dan

mbernico added a commit to mbernico/boruta_py that referenced this issue Apr 9, 2016

Fix Issue scikit-learn-contrib#3

6c0f997

danielhomola closed this as completed Apr 10, 2016

mbernico mentioned this issue Apr 10, 2016

PR for packaging, Python3 Support, and Issues #3 and #4 #5

Merged

danielhomola added a commit that referenced this issue Apr 10, 2016

Merge pull request #5 from mbernico/master

809c31d

PR for packaging, Python3 Support, and Issues #3 and #4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using Pandas DataFrames #3

When using Pandas DataFrames #3

mbernico commented Apr 9, 2016

danielhomola commented Apr 10, 2016

mbernico commented Apr 10, 2016

danielhomola commented Apr 10, 2016

When using Pandas DataFrames #3

When using Pandas DataFrames #3

Comments

mbernico commented Apr 9, 2016

danielhomola commented Apr 10, 2016

mbernico commented Apr 10, 2016

danielhomola commented Apr 10, 2016