Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross_val_score passes a list rather than a DataFrame to Estimator #26

Closed
cancan101 opened this issue Jun 16, 2015 · 3 comments
Closed

Comments

@cancan101
Copy link

Docs say:

sklearn-pandas provides a wrapper on sklearn's cross_val_score function which passes a pandas DataFrame to the estimator rather than a numpy array

but I see:

test = pd.DataFrame({'a':[1,2,3]})
class Model(BaseEstimator):
    def fit(self, x):
        print type(x)
    def score(self, x):
        return 1_ = cross_val_score(Model(), test)
<type 'list'>
<type 'list'>
<type 'list'>
@cancan101
Copy link
Author

The issue is in _safe_split:

type(_safe_split(Model(), DataWrapper(test), None, [1, 2])[0])
list

@cancan101
Copy link
Author

It actually looks like the newest version of scikit handles Dataframes natively.

@dukebody
Copy link
Collaborator

dukebody commented Nov 8, 2015

@cancan101 you are probably right regarding your last comment.

Regarding the cross_val_score wrapper, it was intended to be able to be able to use a dataframe with the cross-validation functions in pipelines. The list you see is in fact a somewhat dirty trick to avoid sklearn to turn the dataframe into an array. That list is later mapped to a DataFrame inside the DataFrameMapper. But you cannot use the cv-wrapper directly without the DataFrameMapper.

If you want to use a dataframe in cv with old versions of sklearn, simply pass df.values to the sklearn cv functions.

I'm closing this because I don't consider it a bug.

@dukebody dukebody closed this as completed Nov 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants