Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] split data using _safe_split in _permutaion_test_scorer to fix error… #5697

Merged
merged 1 commit into from Dec 29, 2016

Conversation

@equialgo
Copy link
Contributor

@equialgo equialgo commented Nov 2, 2015

… when using Pandas DataFrame/Series

Related to issue #5696

@raghavrv
Copy link
Member

@raghavrv raghavrv commented Nov 30, 2015

thanks for fixing this!

Could you also add this fix to the model_selection module and also add a NRT to model_selection/tests/test_validation.py? :)

@raghavrv
Copy link
Member

@raghavrv raghavrv commented Nov 30, 2015

Also could you edit the

PR title to -
[MRG] split data using _safe_split in _permutaion_test_scorer to fix error when using Pandas DataFrame/Series

And the PR description to "Fixes #5696"

@amueller amueller changed the title split data using _safe_split in _permutaion_test_scorer to fix error… [MRG] split data using _safe_split in _permutaion_test_scorer to fix error… Dec 10, 2015
@amueller
Copy link
Member

@amueller amueller commented Dec 10, 2015

yeah a test would be great, using the mock data frame or an acutal dataframe if pandas is installed.

@equialgo
Copy link
Contributor Author

@equialgo equialgo commented Jan 3, 2016

First of all, sorry for the late reply.

I added the fix also to model_selection/_validation.py.

Moreover, I added a test_permutation_test_score_pandas() test case to model_selection/tests/test_validation.py (based on the test_cross_val_score_pandas() test case)

@dankessler
Copy link

@dankessler dankessler commented Dec 14, 2016

Is there additional work that needs to be done before this can be merged? I'm still encountering the issue that this addresses (#5696), which I'm working around by calling .values on my series. Is it just the merge conflict that's hairy? If so, I'm happy to take a crack.

@amueller amueller added the Bug label Dec 14, 2016
@amueller
Copy link
Member

@amueller amueller commented Dec 14, 2016

Wow sorry this one has been lying around for a bit. It needs a rebase and an entry to whatsnew.rst - and reviews, though that should be reasonably quick.

@amueller
Copy link
Member

@amueller amueller commented Dec 14, 2016

This actually fixes another issue, namely using precomputed kernels / distance matrices with permutation_score. It would be great if there was also a regression test for that.
Otherwise looks good to me. Sorry for the slow turn-around.

check_df = lambda x: isinstance(x, InputFeatureType)
check_series = lambda x: isinstance(x, TargetType)
clf = CheckingClassifier(check_X=check_df, check_y=check_series)
permutation_test_score(clf, X_df, y_ser)

This comment has been minimized.

@jnothman

jnothman Dec 14, 2016
Member

Ideally we should check that, for fixed random_state, results are identical regardless of input type.

This comment has been minimized.

@equialgo

equialgo Dec 24, 2016
Author Contributor

At the moment, such input tests are also not present for the other validation methods, there is only: test_cross_val_score_pandas(), test_cross_val_predict_pandas().

I wonder is that really a necessity?

@jnothman
Copy link
Member

@jnothman jnothman commented Dec 14, 2016

@equialgo Sorry for the extreme delay. Let us know if you have time soon to complete the changes.

@equialgo
Copy link
Contributor Author

@equialgo equialgo commented Dec 24, 2016

Hi guys, I rebased the fix it should be relatively easy to merge now (if it passes the checks).

@amueller I do not really get why this also fixes the issue using precomputed kernels / distance matrices with permutation_score. If it is trivial please elaborate, otherwise it might be better to add a new ticket for that one.

@jnothman
Copy link
Member

@jnothman jnothman commented Dec 25, 2016

You've done something strange with your commit history. Perhaps a rebase is in order.

@equialgo equialgo force-pushed the equialgo:fix-issue-5696 branch from 210d663 to f0bc2cd Dec 28, 2016
@equialgo
Copy link
Contributor Author

@equialgo equialgo commented Dec 28, 2016

@jnothman oops, kinda messed-up there! Re-did the rebase; commit history should be okay now.

…hed commits)

Squashed commits:
[94fd9f4] split data using _safe_split in _permutaion_test_scorer
[522053b] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[21b23ce] running test_permutation_test_score_pandas on iris data to prevent warnings.
[15a48bf] adding safe_indexing to _shuffle function
[9ea5c9e] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[3cf5e8f] split  data using _safe_split in _permutaion_test_scorer to fix error when using Pandas DataFrame/Series
@equialgo equialgo force-pushed the equialgo:fix-issue-5696 branch from f0bc2cd to de14dfa Dec 28, 2016
@GaelVaroquaux GaelVaroquaux changed the title [MRG] split data using _safe_split in _permutaion_test_scorer to fix error… [MRG+1] split data using _safe_split in _permutaion_test_scorer to fix error… Dec 28, 2016
@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Dec 28, 2016

+1 to merge this guy (there is a flake8 failure, but it's because of lambda expressions, and here they seem legit).

Let's try to merge this fast, it's been lying around for too long.

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Dec 28, 2016

@raghavrv : are you +1 for merge? If so, let's merge

@raghavrv
Copy link
Member

@raghavrv raghavrv commented Dec 28, 2016

I think this needs a whatsnew entry? I can send a quick PR after this gets merged if you want. Otherwise LGTM... That pep8 error needs to be added to ignore list... (#8131)

Copy link
Member

@raghavrv raghavrv left a comment

Pending whatsnew... (Feel free to merge and later add a whatsnew...)

@raghavrv raghavrv merged commit 986a49b into scikit-learn:master Dec 29, 2016
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@raghavrv
Copy link
Member

@raghavrv raghavrv commented Dec 29, 2016

Okay... I'm merging this... Il make a whatsnew entry for this and another PR tomorrow...

@raghavrv
Copy link
Member

@raghavrv raghavrv commented Dec 29, 2016

Thanks @equialgo!

@equialgo
Copy link
Contributor Author

@equialgo equialgo commented Dec 29, 2016

You guys thanks for merging!

raghavrv added a commit to raghavrv/scikit-learn that referenced this pull request Jan 5, 2017
…arn#5697)

Squashed commits:
[94fd9f4] split data using _safe_split in _permutaion_test_scorer
[522053b] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[21b23ce] running test_permutation_test_score_pandas on iris data to prevent warnings.
[15a48bf] adding safe_indexing to _shuffle function
[9ea5c9e] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[3cf5e8f] split  data using _safe_split in _permutaion_test_scorer to fix error when using Pandas DataFrame/Series
sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…arn#5697)

Squashed commits:
[94fd9f4] split data using _safe_split in _permutaion_test_scorer
[522053b] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[21b23ce] running test_permutation_test_score_pandas on iris data to prevent warnings.
[15a48bf] adding safe_indexing to _shuffle function
[9ea5c9e] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[3cf5e8f] split  data using _safe_split in _permutaion_test_scorer to fix error when using Pandas DataFrame/Series
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…arn#5697)

Squashed commits:
[94fd9f4] split data using _safe_split in _permutaion_test_scorer
[522053b] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[21b23ce] running test_permutation_test_score_pandas on iris data to prevent warnings.
[15a48bf] adding safe_indexing to _shuffle function
[9ea5c9e] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[3cf5e8f] split  data using _safe_split in _permutaion_test_scorer to fix error when using Pandas DataFrame/Series
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…arn#5697)

Squashed commits:
[94fd9f4] split data using _safe_split in _permutaion_test_scorer
[522053b] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[21b23ce] running test_permutation_test_score_pandas on iris data to prevent warnings.
[15a48bf] adding safe_indexing to _shuffle function
[9ea5c9e] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[3cf5e8f] split  data using _safe_split in _permutaion_test_scorer to fix error when using Pandas DataFrame/Series
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…arn#5697)

Squashed commits:
[94fd9f4] split data using _safe_split in _permutaion_test_scorer
[522053b] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[21b23ce] running test_permutation_test_score_pandas on iris data to prevent warnings.
[15a48bf] adding safe_indexing to _shuffle function
[9ea5c9e] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[3cf5e8f] split  data using _safe_split in _permutaion_test_scorer to fix error when using Pandas DataFrame/Series
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…arn#5697)

Squashed commits:
[94fd9f4] split data using _safe_split in _permutaion_test_scorer
[522053b] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[21b23ce] running test_permutation_test_score_pandas on iris data to prevent warnings.
[15a48bf] adding safe_indexing to _shuffle function
[9ea5c9e] adding test case test_permutation_test_score_pandas() to check if permutation_test_score plays nice with pandas dataframe/series
[3cf5e8f] split  data using _safe_split in _permutaion_test_scorer to fix error when using Pandas DataFrame/Series
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants