Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using Pandas DataFrames #3

Closed
mbernico opened this issue Apr 9, 2016 · 3 comments
Closed

When using Pandas DataFrames #3

mbernico opened this issue Apr 9, 2016 · 3 comments

Comments

@mbernico
Copy link
Contributor

mbernico commented Apr 9, 2016

_add_shadows_get_imps() fails when X is pandas rather than numpy

Pandas DF can no longer be sliced as
x_cur = np.copy(X[:, x_cur_ind])

x_cur = np.copy(X.as_matrix()[:, x_cur_ind])
OR
x_cur = np.copy(X.ix[:, x_cur_ind])

I'd recommend testing/casting dataframes to numpy arrays in _fit

mbernico added a commit to mbernico/boruta_py that referenced this issue Apr 9, 2016
@danielhomola
Copy link
Collaborator

Yepp atm boruta expects a numpy array for X, but this is made explicit in the docstring of fit():
X : array-like, shape = [n_samples, n_features]
The training input samples.

If you feel this is an important issue, please add this to the fit and I'll review your changes.

Oh you did, that's wonderful, cheers!

@mbernico
Copy link
Contributor Author

The examples show pandas going in. I suppose it would be as easy to just update the user doc to show them to only send numpy. I built a 'pandas check' but that has the unfortunate side effect of adding a dependency. It appears that's how sklearn handles it as well though. Toss up, I'll leave you to decide which you like better :)

danielhomola added a commit that referenced this issue Apr 10, 2016
PR for packaging, Python3 Support, and Issues #3 and #4
@danielhomola
Copy link
Collaborator

Hi Mike,

Yepp, I wanted it to have a scikit learn interface, so kinda instinctively stuck with the numpy input as sklearn does.. I added a warning to the examples as you recommended, and renamed boruta_py2 to boruta_py_plus.. Also left in your sanity check for pandas dataframes jsut in case. Pandas is pretty common now, it's not a major dependency issue imo..

Thanks again for your valuable input, really appreciate it!

cheers,
Dan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants