New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics #7388

Merged
merged 150 commits into from Jul 7, 2017

Conversation

@raghavrv
Member

raghavrv commented Sep 11, 2016

Supercedes #2759
Fixes #1837


TODO

  • Add utils function check_multimetric_scoring for validation of multimetric scoring param
    • Tests
  • Support multiple metrics for cross_val_score
    • Tests
  • Support multiple metrics for GridSearchCV and RandomizedSearchCV
    • Tests
    • More tests for the refit param w.r.t multimetric setting...
  • Example on GridSearchCV plotting multiple metrics for the search of min_samples_split on a dtc
  • Permit refit='<metric/scorer>'
  • Revert multiple metrics for validation_curve
  • Revert multiple metrics for learning_curve
  • Test fit_grid_point better to ensure previous public API is not broken
  • make output of cross_val_score a dict (like grid-search's cv_results_
  • make a section in cross_val_score's userguide for multi-metric
  • Make a section in GridSearchCV's userguide for multi-metric
  • Add whatsnew entry

Currently, in master

  • scoring can only be a single string ('precision' etc) or a single callable (make_scorer(precision_score), custom_scorer).

In this PR

  • scoring can now be a list/tuple like ('precision', 'accuracy'...) or a dict like {'precision': make_scorer(precision_score), 'accuracy score': 'accuracy', 'custom': custom_scorer_callable}
  • If (and only if) the scoring is of multimetric type, the return of cross_val_score / learning_curve / validation_curve will be dict mapping scorer_names to their corresponding train_scores or test_scores.
  • GridSearchCV's attributes best_index_, best_params_, best_score_ will correspond to the metric set at refit param. If refit is simply True an error is raised.
  • GridSearchCV's cv_result_ attribute will consist of keys ending with scorer names for multiple metrics...

A sample plot on multiple metric search for min_samples_split in dtc (click on the plot to go to the example hosted at circle ci)

cc: @jnothman @amueller @vene

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 12, 2016

Member

(If it's focused on cross_val_score then it doesn't supersede #2579...)

I've been thinking of a function cross_validate that returns a dict like GridSearchCV.cv_results_, but with each value a scalar. Perhaps a parameter would switch it to returning the split results as an array. The same functionality could instead be rolled into cross_val_score, but I haven't yet deeply considered the benefits of either approach.

Member

jnothman commented Sep 12, 2016

(If it's focused on cross_val_score then it doesn't supersede #2579...)

I've been thinking of a function cross_validate that returns a dict like GridSearchCV.cv_results_, but with each value a scalar. Perhaps a parameter would switch it to returning the split results as an array. The same functionality could instead be rolled into cross_val_score, but I haven't yet deeply considered the benefits of either approach.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 12, 2016

Member

I've been thinking of a function cross_validate that returns a dict like GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call mean/std directly on the return value of cross_val_score...

Member

raghavrv commented Sep 12, 2016

I've been thinking of a function cross_validate that returns a dict like GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call mean/std directly on the return value of cross_val_score...

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 12, 2016

Member

Yes, but I mean to also get times, training score, multiple param results,
etc.

On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:

I've been thinking of a function cross_validate that returns a dict like
GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call mean/std
directly on the return value of cross_val_score...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65vvX4y7jyLTJasCJTeI4OjEMvItks5qpSB2gaJpZM4J6Iwm
.

Member

jnothman commented Sep 12, 2016

Yes, but I mean to also get times, training score, multiple param results,
etc.

On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:

I've been thinking of a function cross_validate that returns a dict like
GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call mean/std
directly on the return value of cross_val_score...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65vvX4y7jyLTJasCJTeI4OjEMvItks5qpSB2gaJpZM4J6Iwm
.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 12, 2016

Member
  • not multiple param, multiple metric

On 12 September 2016 at 21:10, Joel Nothman joel.nothman@gmail.com wrote:

Yes, but I mean to also get times, training score, multiple param results,
etc.

On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:

I've been thinking of a function cross_validate that returns a dict like
GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call
mean/std directly on the return value of cross_val_score...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65vvX4y7jyLTJasCJTeI4OjEMvItks5qpSB2gaJpZM4J6Iwm
.

Member

jnothman commented Sep 12, 2016

  • not multiple param, multiple metric

On 12 September 2016 at 21:10, Joel Nothman joel.nothman@gmail.com wrote:

Yes, but I mean to also get times, training score, multiple param results,
etc.

On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:

I've been thinking of a function cross_validate that returns a dict like
GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call
mean/std directly on the return value of cross_val_score...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65vvX4y7jyLTJasCJTeI4OjEMvItks5qpSB2gaJpZM4J6Iwm
.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 12, 2016

Member

Maybe as a separate function cross_validate as rolling it into cross_val_score will complicate the common man's use case? (I believe not everyone wants multiple metric support?)

Thoughts @vene @amueller @agramfort

I thought we could simply have

  • scoring as a list of predefined metric strings / dict of names --> scorers. - The scores will now be a dict of names --> scores
  • scoring as a single string / callable - The scores will be, like before, an array.
Member

raghavrv commented Sep 12, 2016

Maybe as a separate function cross_validate as rolling it into cross_val_score will complicate the common man's use case? (I believe not everyone wants multiple metric support?)

Thoughts @vene @amueller @agramfort

I thought we could simply have

  • scoring as a list of predefined metric strings / dict of names --> scorers. - The scores will now be a dict of names --> scores
  • scoring as a single string / callable - The scores will be, like before, an array.
@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 12, 2016

Member
Member

agramfort commented Sep 12, 2016

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 15, 2016

Member

Thanks for the comment @agramfort. I will post a sample script soon.

Member

raghavrv commented Sep 15, 2016

Thanks for the comment @agramfort. I will post a sample script soon.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 16, 2016

Member

And @GaelVaroquaux thanks for the comment at #7435. Could you clarify what kind of output you have in mind for cross_val_score when multiple metrics are to be evaluated, if not a dict?

@agramfort this is the usage I had in mind -

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> from sklearn.datasets import make_regression

>>> dtc = DecisionTreeRegressor()
>>> X, y = make_regression(n_samples=100, random_state=42)


# For multiple metric - as list of metrics
>>> cross_val_score(dtc, X, y, cv=2, scoring=['neg_mean_absolute_error',
...                                           'neg_mean_squared_error',
...                                           'neg_median_absolute_error'])
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
 'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
 'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}

# For multiple metric - as dict of callables
>>> cross_val_score(dtc, X, y, cv=2,
...                 scoring={'neg_mean_absolute_error': neg_uae_scorer,
...                          'neg_mean_squared_error': neg_mse_scorer,
...                          'neg_median_absolute_error': neg_mae_scorer})
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
 'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
 'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}


# For single metric (like before)
>>> cross_val_score(dtc, X, y, cv=2, scoring='neg_mean_absolute_error')
array([-109.20020926, -124.05659102])
Member

raghavrv commented Sep 16, 2016

And @GaelVaroquaux thanks for the comment at #7435. Could you clarify what kind of output you have in mind for cross_val_score when multiple metrics are to be evaluated, if not a dict?

@agramfort this is the usage I had in mind -

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> from sklearn.datasets import make_regression

>>> dtc = DecisionTreeRegressor()
>>> X, y = make_regression(n_samples=100, random_state=42)


# For multiple metric - as list of metrics
>>> cross_val_score(dtc, X, y, cv=2, scoring=['neg_mean_absolute_error',
...                                           'neg_mean_squared_error',
...                                           'neg_median_absolute_error'])
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
 'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
 'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}

# For multiple metric - as dict of callables
>>> cross_val_score(dtc, X, y, cv=2,
...                 scoring={'neg_mean_absolute_error': neg_uae_scorer,
...                          'neg_mean_squared_error': neg_mse_scorer,
...                          'neg_median_absolute_error': neg_mae_scorer})
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
 'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
 'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}


# For single metric (like before)
>>> cross_val_score(dtc, X, y, cv=2, scoring='neg_mean_absolute_error')
array([-109.20020926, -124.05659102])
@raghavrv

This comment has been minimized.

Show comment
Hide comment
Member

raghavrv commented Sep 16, 2016

@mblondel WDYT?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 16, 2016

Member

Ah, for this usecase I actually support a dict. It's a bit weird if the output type changes depending on whether you provide a single metric or not, though.
Again, I think some data format that is easily converted to a pandas dataframe is great.

For callables, couldn't we just use __name__ instead of a dict? Or is that not stable enough?

Member

amueller commented Sep 16, 2016

Ah, for this usecase I actually support a dict. It's a bit weird if the output type changes depending on whether you provide a single metric or not, though.
Again, I think some data format that is easily converted to a pandas dataframe is great.

For callables, couldn't we just use __name__ instead of a dict? Or is that not stable enough?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 16, 2016

Member

So @jnothman suggested introducing a new function, and I think that might be a good idea. Optionally we could deprecate the current behavior of cross_val_score.

I think the new output should be structured like the cv_results_ with metrics and folds and times and summary statistics.

Member

amueller commented Sep 16, 2016

So @jnothman suggested introducing a new function, and I think that might be a good idea. Optionally we could deprecate the current behavior of cross_val_score.

I think the new output should be structured like the cv_results_ with metrics and folds and times and summary statistics.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 16, 2016

Member

For callables, couldn't we just use name instead of a dict? Or is that not stable enough?

Two functions can have same name. We discussed this when we were brewing the cv_results_... This way we don't have to do complex heuristics to figure out the name of the scorer. We simply let the user supply the scorer name...

Member

raghavrv commented Sep 16, 2016

For callables, couldn't we just use name instead of a dict? Or is that not stable enough?

Two functions can have same name. We discussed this when we were brewing the cv_results_... This way we don't have to do complex heuristics to figure out the name of the scorer. We simply let the user supply the scorer name...

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 16, 2016

Member

So @jnothman suggested introducing a new function, and I think that might be a good idea.

cross_validate? Which returns something similar to cv_results_? Okay! any opposition to this from @agramfort or @GaelVaroquaux?

Optionally we could deprecate the current behavior of cross_val_score.

Were you suggesting that cross_val_score also return a dict?

The list of scores as returned by cross_val_score for single metric will still be the most common use case... When people use multiple metric then they should definitely be expected to check the docstring to know how the scores for different metrics will be returned... correct?

Can I suggest that we leave cross_val_score as such (without implementing multiple metric there) and let it remain as a quick easy way to cross-validate for single metric and like Joel suggested cross_validate which will return a dict like cv_result_? There we can easily support multiple metric...

Member

raghavrv commented Sep 16, 2016

So @jnothman suggested introducing a new function, and I think that might be a good idea.

cross_validate? Which returns something similar to cv_results_? Okay! any opposition to this from @agramfort or @GaelVaroquaux?

Optionally we could deprecate the current behavior of cross_val_score.

Were you suggesting that cross_val_score also return a dict?

The list of scores as returned by cross_val_score for single metric will still be the most common use case... When people use multiple metric then they should definitely be expected to check the docstring to know how the scores for different metrics will be returned... correct?

Can I suggest that we leave cross_val_score as such (without implementing multiple metric there) and let it remain as a quick easy way to cross-validate for single metric and like Joel suggested cross_validate which will return a dict like cv_result_? There we can easily support multiple metric...

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 16, 2016

Member

Two functions can have same name. We discussed this when we were brewing the cv_results_... This way we don't have to do complex heuristics to figure out the name of the scorer. We simply let the user supply the scorer name...

Sorry I missed that. But there is no multiple metric in GridSearchCV yet, right? So this PR would introduce the "scoring parameter as dict" as an interface.

Member

amueller commented Sep 16, 2016

Two functions can have same name. We discussed this when we were brewing the cv_results_... This way we don't have to do complex heuristics to figure out the name of the scorer. We simply let the user supply the scorer name...

Sorry I missed that. But there is no multiple metric in GridSearchCV yet, right? So this PR would introduce the "scoring parameter as dict" as an interface.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 16, 2016

Member

But there is no multiple metric in GridSearchCV yet, right?

Not yet. Implementing there is very straight forward given our new cv_results_ attr...

But before that we need to fix on _fit_and_score and cross_val_score. They are the time consuming part involving API discussion...

Member

raghavrv commented Sep 16, 2016

But there is no multiple metric in GridSearchCV yet, right?

Not yet. Implementing there is very straight forward given our new cv_results_ attr...

But before that we need to fix on _fit_and_score and cross_val_score. They are the time consuming part involving API discussion...

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 16, 2016

Member

Hm so do we also want to support f1_score with averaging=None in this? When doing grid-search, what would be used to decide the maximum, then? Hopefully not the first class.

Member

amueller commented Sep 16, 2016

Hm so do we also want to support f1_score with averaging=None in this? When doing grid-search, what would be used to decide the maximum, then? Hopefully not the first class.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 17, 2016

Member

No we won't support f1_score without averaging. For all such multiclass scorers you will get an error (as before).

ValueError: multiclass format is not supported

If the user wants it, they can quickly wrap the individual scorers into separate scorers with single value output each... In which case each such scorer will have a ranking associated with it...

And the best_estimator_ / best_index_ / best_score_ all would also have to be a dict with {scorer_name --> val}...

EDIT For single metric, the current format of best_estimator_ / best_index_ / best_score_ is all preserved as such...

Member

raghavrv commented Sep 17, 2016

No we won't support f1_score without averaging. For all such multiclass scorers you will get an error (as before).

ValueError: multiclass format is not supported

If the user wants it, they can quickly wrap the individual scorers into separate scorers with single value output each... In which case each such scorer will have a ranking associated with it...

And the best_estimator_ / best_index_ / best_score_ all would also have to be a dict with {scorer_name --> val}...

EDIT For single metric, the current format of best_estimator_ / best_index_ / best_score_ is all preserved as such...

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 17, 2016

Member

In short, we can't design cross_val_score in isolation. Make it work for GridSearchCV then we'll adapt it to cross_val_score. A dict doens't naturally specify one metric as score.

Member

jnothman commented Sep 17, 2016

In short, we can't design cross_val_score in isolation. Make it work for GridSearchCV then we'll adapt it to cross_val_score. A dict doens't naturally specify one metric as score.

@raghavrv raghavrv changed the title from [WIP] ENH Allow `cross_val_score` to evaluate on multiple metrics to [WIP] ENH Allow `cross_val_score`, `GridSearchCV` and `RandomizedSearchCV` to evaluate on multiple metrics Sep 17, 2016

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 26, 2016

Member

(This is waiting for #7026#7325 to be merged)

Member

raghavrv commented Sep 26, 2016

(This is waiting for #7026#7325 to be merged)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 29, 2016

Member

Please in 0.19. Please please please. While I have monkey patched this in my own code, I've fixed a colleague's code by simply avoiding cross_val_score altogether...

Member

jnothman commented Sep 29, 2016

Please in 0.19. Please please please. While I have monkey patched this in my own code, I've fixed a colleague's code by simply avoiding cross_val_score altogether...

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 29, 2016

Member

Please in 0.19. Please please please

Sure :P I thought 0.18 milestone is already complete with the timing and training score added? I intended this for 0.180.19 only...

Member

raghavrv commented Sep 29, 2016

Please in 0.19. Please please please

Sure :P I thought 0.18 milestone is already complete with the timing and training score added? I intended this for 0.180.19 only...

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Sep 29, 2016

Member

*0.19 only

Member

raghavrv commented Sep 29, 2016

*0.19 only

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Oct 1, 2016

Member

I've not added tests for GridSearhCV/RandomizedSearchCV/learning_curve/validation_curve, apart from that this is ready for an initial review. Have added an example which summarizes the API.

Example API for GridSearchCV and cross_val_score when multiple metrics are to be evaluated.

Ping @jnothman @amueller @MechCoder @vene @agramfort

Member

raghavrv commented Oct 1, 2016

I've not added tests for GridSearhCV/RandomizedSearchCV/learning_curve/validation_curve, apart from that this is ready for an initial review. Have added an example which summarizes the API.

Example API for GridSearchCV and cross_val_score when multiple metrics are to be evaluated.

Ping @jnothman @amueller @MechCoder @vene @agramfort

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 2, 2016

Member

Tests are pretty useful when reviewing: they tell you what the author expected the code to do. But I gather you've got tests for cross_val_score. I'm going to be occupied with Jewish New Year for the next few days so don't expect my reviews in a hurry.

Member

jnothman commented Oct 2, 2016

Tests are pretty useful when reviewing: they tell you what the author expected the code to do. But I gather you've got tests for cross_val_score. I'm going to be occupied with Jewish New Year for the next few days so don't expect my reviews in a hurry.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Oct 2, 2016

Member

Please take your time. And wish you a prosperous Jewish New year! :)

(I'll try adding the tests and making them pass meanwhile) - Done except for learning_curve and validation_curve as I'm not sure if we want multiple metric support there...

Member

raghavrv commented Oct 2, 2016

Please take your time. And wish you a prosperous Jewish New year! :)

(I'll try adding the tests and making them pass meanwhile) - Done except for learning_curve and validation_curve as I'm not sure if we want multiple metric support there...

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 5, 2016

Member

I'm wondering if we want refit=False in the XSearchCV classes for multiple metrics by default. Currently we are refitting once per scorer, which might be very time-consuming. I'd either declare a single scorer the one for selection or do that. When would you want the current one? Also, doesn't it encourage multiple hypothesis testing fallacies?

Member

amueller commented Oct 5, 2016

I'm wondering if we want refit=False in the XSearchCV classes for multiple metrics by default. Currently we are refitting once per scorer, which might be very time-consuming. I'd either declare a single scorer the one for selection or do that. When would you want the current one? Also, doesn't it encourage multiple hypothesis testing fallacies?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 5, 2016

Member

I feel like we have an opportunity here to make the output of cross_val_score richer. Why shouldn't it include the training score and times?

Member

amueller commented Oct 5, 2016

I feel like we have an opportunity here to make the output of cross_val_score richer. Why shouldn't it include the training score and times?

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Oct 5, 2016

Member

I'm wondering if we want refit=False in the XSearchCV classes for multiple metrics by default....

That is a very good point... This also raises the question, do we want one universal best_estimator_. Currently I think the scores are not comparable, but is there a way to compare them?

I feel like we have an opportunity here to make the output of cross_val_score richer. Why shouldn't it include the training score and times?

Yes like Joel suggests we could have a cross_validate function that returns something similar to cv_results_? But in that case I feel we should leave the cross_val_score as such. (Remove multiple metric support too...) Let it be a quick easy way to cross validate and get scores for a single metric... WDYT?

Member

raghavrv commented Oct 5, 2016

I'm wondering if we want refit=False in the XSearchCV classes for multiple metrics by default....

That is a very good point... This also raises the question, do we want one universal best_estimator_. Currently I think the scores are not comparable, but is there a way to compare them?

I feel like we have an opportunity here to make the output of cross_val_score richer. Why shouldn't it include the training score and times?

Yes like Joel suggests we could have a cross_validate function that returns something similar to cv_results_? But in that case I feel we should leave the cross_val_score as such. (Remove multiple metric support too...) Let it be a quick easy way to cross validate and get scores for a single metric... WDYT?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 6, 2016

Member

I've always suggested to refit for one scorer only.

On 6 October 2016 at 09:35, Raghav RV notifications@github.com wrote:

I'm wondering if we want refit=False in the XSearchCV classes for multiple
metrics by default....

That is a very good point... This also raises the question, do we want one
universal best_estimator_. Currently I think the scores are not
comparable, but is there a way to compare them?

I feel like we have an opportunity here to make the output of
cross_val_score richer. Why shouldn't it include the training score and
times?

Yes like Joel suggests we could have a cross_validate function that
returns something similar to cv_results_? But in that case I feel we
should leave the cross_val_score as such. (Remove multiple metric support
too...) Let it be a quick easy way to cross validate and get scores for a
single metric... WDYT?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz668AZQnqwI__0POaMkK1RSxyqbHoks5qxCZAgaJpZM4J6Iwm
.

Member

jnothman commented Oct 6, 2016

I've always suggested to refit for one scorer only.

On 6 October 2016 at 09:35, Raghav RV notifications@github.com wrote:

I'm wondering if we want refit=False in the XSearchCV classes for multiple
metrics by default....

That is a very good point... This also raises the question, do we want one
universal best_estimator_. Currently I think the scores are not
comparable, but is there a way to compare them?

I feel like we have an opportunity here to make the output of
cross_val_score richer. Why shouldn't it include the training score and
times?

Yes like Joel suggests we could have a cross_validate function that
returns something similar to cv_results_? But in that case I feel we
should leave the cross_val_score as such. (Remove multiple metric support
too...) Let it be a quick easy way to cross validate and get scores for a
single metric... WDYT?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz668AZQnqwI__0POaMkK1RSxyqbHoks5qxCZAgaJpZM4J6Iwm
.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Oct 6, 2016

Member

Thanks for the comment! Do you suggest that we enforce refit=False when scoring is multimetric? (That would simplify some of the code... Also when the delegated methods are called the error would be "refit is False. Set it to True". If we want a more specific error message something along the lines of "x method is not available with multimetric scoring.", we will have to retain the changes to if_delegate_has_method helper... WDYT?

Also if that is the case what will happen if one explicitly sets refit=True?

Member

raghavrv commented Oct 6, 2016

Thanks for the comment! Do you suggest that we enforce refit=False when scoring is multimetric? (That would simplify some of the code... Also when the delegated methods are called the error would be "refit is False. Set it to True". If we want a more specific error message something along the lines of "x method is not available with multimetric scoring.", we will have to retain the changes to if_delegate_has_method helper... WDYT?

Also if that is the case what will happen if one explicitly sets refit=True?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 6, 2016

Member

Maybe we could allow users to specify which scorer to refit for via the refit option and either raise an error or don't refit if it's not set?

Member

amueller commented Oct 6, 2016

Maybe we could allow users to specify which scorer to refit for via the refit option and either raise an error or don't refit if it's not set?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 6, 2016

Member

Yes, I've previously suggested refit='accuracy' for instance.

On 7 October 2016 at 04:31, Andreas Mueller notifications@github.com
wrote:

Maybe we could allow users to specify which scorer to refit for via the
refit option and either raise an error or don't refit if it's not set?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6xoiiNgYyq5bO7GxD1CRk8sOXOJWks5qxTBkgaJpZM4J6Iwm
.

Member

jnothman commented Oct 6, 2016

Yes, I've previously suggested refit='accuracy' for instance.

On 7 October 2016 at 04:31, Andreas Mueller notifications@github.com
wrote:

Maybe we could allow users to specify which scorer to refit for via the
refit option and either raise an error or don't refit if it's not set?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6xoiiNgYyq5bO7GxD1CRk8sOXOJWks5qxTBkgaJpZM4J6Iwm
.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 6, 2016

Member

but really "best" and refit should correspond.

Honestly, I wonder some times whether we should be making this about
multiple metrics at all, and really just having:

  • one scoring metric
  • any number of other diagnostics extracted by a specified function; users
    should be able to get top coefficients zipped together with feature names;
    or a measure of sparsity.

Sorry I've not looked at this yet.

On 7 October 2016 at 09:14, Joel Nothman joel.nothman@gmail.com wrote:

Yes, I've previously suggested refit='accuracy' for instance.

On 7 October 2016 at 04:31, Andreas Mueller notifications@github.com
wrote:

Maybe we could allow users to specify which scorer to refit for via the
refit option and either raise an error or don't refit if it's not set?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6xoiiNgYyq5bO7GxD1CRk8sOXOJWks5qxTBkgaJpZM4J6Iwm
.

Member

jnothman commented Oct 6, 2016

but really "best" and refit should correspond.

Honestly, I wonder some times whether we should be making this about
multiple metrics at all, and really just having:

  • one scoring metric
  • any number of other diagnostics extracted by a specified function; users
    should be able to get top coefficients zipped together with feature names;
    or a measure of sparsity.

Sorry I've not looked at this yet.

On 7 October 2016 at 09:14, Joel Nothman joel.nothman@gmail.com wrote:

Yes, I've previously suggested refit='accuracy' for instance.

On 7 October 2016 at 04:31, Andreas Mueller notifications@github.com
wrote:

Maybe we could allow users to specify which scorer to refit for via the
refit option and either raise an error or don't refit if it's not set?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6xoiiNgYyq5bO7GxD1CRk8sOXOJWks5qxTBkgaJpZM4J6Iwm
.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Oct 7, 2016

Member

BTW does this qualify as one of Joel's pets ? ;)

Member

raghavrv commented Oct 7, 2016

BTW does this qualify as one of Joel's pets ? ;)

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Oct 7, 2016

Member

Maybe we could allow users to specify which scorer to refit for via the refit option and either raise an error or don't refit if it's not set?

Your twitter poll seems to indicate that people want one per scorer ;)

On a serious note, the refit=<metric> seems neat to me. That would also mean we return the same type for both single and multiple metric like @GaelVaroquaux wanted.

So are we converging towards -

  1. refit=<metric_name> for multimetric scorer and refit=<True/False> for single metric scorers and hence return a non-dict for all best_* params...

    I have a question here - Should we add a param best_scorer, so that in the case of multimetric scoring, when refit=False (and not any scorer name), the best_score_ would be the best score for that metric/scorer... (And restrict refit to be a bool)

  2. A cross_val_score which does not support multimetric but instead a new cross_validate function which returns cv_results_ like dict with keys for all the scorers...

  3. Multimetric not implemented in any of learning_curve, validation_curve etc.

Sorry I've not looked at this yet.

Please take your time. :) I'm just noting down what I infer from the discussions and my thoughts on that...

Member

raghavrv commented Oct 7, 2016

Maybe we could allow users to specify which scorer to refit for via the refit option and either raise an error or don't refit if it's not set?

Your twitter poll seems to indicate that people want one per scorer ;)

On a serious note, the refit=<metric> seems neat to me. That would also mean we return the same type for both single and multiple metric like @GaelVaroquaux wanted.

So are we converging towards -

  1. refit=<metric_name> for multimetric scorer and refit=<True/False> for single metric scorers and hence return a non-dict for all best_* params...

    I have a question here - Should we add a param best_scorer, so that in the case of multimetric scoring, when refit=False (and not any scorer name), the best_score_ would be the best score for that metric/scorer... (And restrict refit to be a bool)

  2. A cross_val_score which does not support multimetric but instead a new cross_validate function which returns cv_results_ like dict with keys for all the scorers...

  3. Multimetric not implemented in any of learning_curve, validation_curve etc.

Sorry I've not looked at this yet.

Please take your time. :) I'm just noting down what I infer from the discussions and my thoughts on that...

@amueller amueller changed the title from [MRG + 1] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics to [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics Jul 7, 2017

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 7, 2017

Member

Wait we actually have +2 now? Alright, pressing the green button then!

Member

amueller commented Jul 7, 2017

Wait we actually have +2 now? Alright, pressing the green button then!

@amueller amueller merged commit a08555a into scikit-learn:master Jul 7, 2017

5 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 97.36% of diff hit (target 96.31%)
Details
codecov/project 96.36% (+0.04%) compared to b413299
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 7, 2017

Member

It looks like the last round of @jnothman's nitpicks were not addressed yet. @raghavrv can you open a new PR for that?

Member

amueller commented Jul 7, 2017

It looks like the last round of @jnothman's nitpicks were not addressed yet. @raghavrv can you open a new PR for that?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 8, 2017

Member

= O M F G =

This was literally the feature request that brought me to scikit-learn's issue tracker for the first time...

Does that mean I have to go away now?

Member

jnothman commented Jul 8, 2017

= O M F G =

This was literally the feature request that brought me to scikit-learn's issue tracker for the first time...

Does that mean I have to go away now?

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 8, 2017

Member
Member

GaelVaroquaux commented Jul 8, 2017

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jul 8, 2017

Member

you made it @raghavrv ! 🥇 🥇 🥇 :)

Member

agramfort commented Jul 8, 2017

you made it @raghavrv ! 🥇 🥇 🥇 :)

@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Jul 8, 2017

Member

Great work Raghav!

Member

MechCoder commented Jul 8, 2017

Great work Raghav!

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jul 8, 2017

Member

OMG OMG OMG. I can't believe this is finally merged. Thanks everyone for the reviews!!! @vene @amueller - My dear mentors, hereby I successfully finish my GSoC 2015 :') :p

Member

raghavrv commented Jul 8, 2017

OMG OMG OMG. I can't believe this is finally merged. Thanks everyone for the reviews!!! @vene @amueller - My dear mentors, hereby I successfully finish my GSoC 2015 :') :p

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jul 8, 2017

Member

@raghavrv can you open a new PR for that?

Sure :) Sorry I was flying to Austin!

Member

raghavrv commented Jul 8, 2017

@raghavrv can you open a new PR for that?

Sure :) Sorry I was flying to Austin!

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 10, 2017

Member

@raghavrv see you tomorrow :)

Member

amueller commented Jul 10, 2017

@raghavrv see you tomorrow :)

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jul 10, 2017

Member

I think @amueller is suggesting you use this kind of instructive wording in the narrative docs. Perhaps just adopt his wording?

Where is it? I can't find it.

Member

raghavrv commented Jul 10, 2017

I think @amueller is suggesting you use this kind of instructive wording in the narrative docs. Perhaps just adopt his wording?

Where is it? I can't find it.

@raghavrv raghavrv deleted the raghavrv:multimetric_cross_val_score branch Jul 11, 2017

massich added a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evalu…
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

bmanohar16 added a commit to bmanohar16/scikit-learn that referenced this pull request Jul 20, 2017

Change tag name
Old refers to new tag added with PR #7388

@amueller amueller removed this from PR phase in Andy's pets Jul 21, 2017

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 27, 2017

Merge tag '0.19b2' into releases
Release 0.19b2

* tag '0.19b2': (808 commits)
  Preparing 0.19b2
  [MRG+1] FIX out of bounds array access in SAGA (#9376)
  FIX make test_importances pass on 32 bit linux
  Release 0.19b1
  DOC remove 'in dev' header in whats_new.rst
  DOC typos in whats_news.rst [ci skip]
  [MRG] DOC cleaning up what's new for 0.19 (#9252)
  FIX t-SNE memory usage and many other optimizer issues (#9032)
  FIX broken link in gallery and bad title rendering
  [MRG] DOC Replace \acute by prime (#9332)
  Fix typos (#9320)
  [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (#9206)
  DOC Residual sum vs. regression sum (#9314)
  [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (#9317)
  More informative error message for classification metrics given regression output (#9275)
  [MRG] COSMIT Remove unused parameters in private functions (#9310)
  [MRG+1] Ridgecv normalize (#9302)
  [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (#7388)
  Add data_home parameter to fetch_kddcup99 (#9289)
  FIX makedirs(..., exists_ok) not available in Python 2 (#9284)
  ...

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 27, 2017

Merge branch 'releases' into dfsg (reremoved joblib and jquery)
* releases: (808 commits)
  Preparing 0.19b2
  [MRG+1] FIX out of bounds array access in SAGA (#9376)
  FIX make test_importances pass on 32 bit linux
  Release 0.19b1
  DOC remove 'in dev' header in whats_new.rst
  DOC typos in whats_news.rst [ci skip]
  [MRG] DOC cleaning up what's new for 0.19 (#9252)
  FIX t-SNE memory usage and many other optimizer issues (#9032)
  FIX broken link in gallery and bad title rendering
  [MRG] DOC Replace \acute by prime (#9332)
  Fix typos (#9320)
  [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (#9206)
  DOC Residual sum vs. regression sum (#9314)
  [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (#9317)
  More informative error message for classification metrics given regression output (#9275)
  [MRG] COSMIT Remove unused parameters in private functions (#9310)
  [MRG+1] Ridgecv normalize (#9302)
  [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (#7388)
  Add data_home parameter to fetch_kddcup99 (#9289)
  FIX makedirs(..., exists_ok) not available in Python 2 (#9284)
  ...

yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 27, 2017

Merge branch 'dfsg' into debian
* dfsg: (808 commits)
  Preparing 0.19b2
  [MRG+1] FIX out of bounds array access in SAGA (#9376)
  FIX make test_importances pass on 32 bit linux
  Release 0.19b1
  DOC remove 'in dev' header in whats_new.rst
  DOC typos in whats_news.rst [ci skip]
  [MRG] DOC cleaning up what's new for 0.19 (#9252)
  FIX t-SNE memory usage and many other optimizer issues (#9032)
  FIX broken link in gallery and bad title rendering
  [MRG] DOC Replace \acute by prime (#9332)
  Fix typos (#9320)
  [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (#9206)
  DOC Residual sum vs. regression sum (#9314)
  [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (#9317)
  More informative error message for classification metrics given regression output (#9275)
  [MRG] COSMIT Remove unused parameters in private functions (#9310)
  [MRG+1] Ridgecv normalize (#9302)
  [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (#7388)
  Add data_home parameter to fetch_kddcup99 (#9289)
  FIX makedirs(..., exists_ok) not available in Python 2 (#9284)
  ...

jnothman added a commit that referenced this pull request Jul 30, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Aug 6, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evalu…
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evalu…
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evalu…
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evalu…
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evalu…
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evalu…
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evalu…
…ate on multiple metrics (#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until #8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG + 1] DOC Fix Sphinx errors (#9420)
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment