Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics #7388

Merged
merged 150 commits into from Jul 7, 2017

Conversation

@raghavrv
Copy link
Member

@raghavrv raghavrv commented Sep 11, 2016

Supercedes #2759
Fixes #1837


TODO

  • Add utils function check_multimetric_scoring for validation of multimetric scoring param
    • Tests
  • Support multiple metrics for cross_val_score
    • Tests
  • Support multiple metrics for GridSearchCV and RandomizedSearchCV
    • Tests
    • More tests for the refit param w.r.t multimetric setting...
  • Example on GridSearchCV plotting multiple metrics for the search of min_samples_split on a dtc
  • Permit refit='<metric/scorer>'
  • Revert multiple metrics for validation_curve
  • Revert multiple metrics for learning_curve
  • Test fit_grid_point better to ensure previous public API is not broken
  • make output of cross_val_score a dict (like grid-search's cv_results_
  • make a section in cross_val_score's userguide for multi-metric
  • Make a section in GridSearchCV's userguide for multi-metric
  • Add whatsnew entry

Currently, in master

  • scoring can only be a single string ('precision' etc) or a single callable (make_scorer(precision_score), custom_scorer).

In this PR

  • scoring can now be a list/tuple like ('precision', 'accuracy'...) or a dict like {'precision': make_scorer(precision_score), 'accuracy score': 'accuracy', 'custom': custom_scorer_callable}
  • If (and only if) the scoring is of multimetric type, the return of cross_val_score / learning_curve / validation_curve will be dict mapping scorer_names to their corresponding train_scores or test_scores.
  • GridSearchCV's attributes best_index_, best_params_, best_score_ will correspond to the metric set at refit param. If refit is simply True an error is raised.
  • GridSearchCV's cv_result_ attribute will consist of keys ending with scorer names for multiple metrics...

A sample plot on multiple metric search for min_samples_split in dtc (click on the plot to go to the example hosted at circle ci)

cc: @jnothman @amueller @vene

@jnothman
Copy link
Member

@jnothman jnothman commented Sep 12, 2016

(If it's focused on cross_val_score then it doesn't supersede #2579...)

I've been thinking of a function cross_validate that returns a dict like GridSearchCV.cv_results_, but with each value a scalar. Perhaps a parameter would switch it to returning the split results as an array. The same functionality could instead be rolled into cross_val_score, but I haven't yet deeply considered the benefits of either approach.

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 12, 2016

I've been thinking of a function cross_validate that returns a dict like GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call mean/std directly on the return value of cross_val_score...

@jnothman
Copy link
Member

@jnothman jnothman commented Sep 12, 2016

Yes, but I mean to also get times, training score, multiple param results,
etc.

On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:

I've been thinking of a function cross_validate that returns a dict like
GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call mean/std
directly on the return value of cross_val_score...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65vvX4y7jyLTJasCJTeI4OjEMvItks5qpSB2gaJpZM4J6Iwm
.

@jnothman
Copy link
Member

@jnothman jnothman commented Sep 12, 2016

  • not multiple param, multiple metric

On 12 September 2016 at 21:10, Joel Nothman joel.nothman@gmail.com wrote:

Yes, but I mean to also get times, training score, multiple param results,
etc.

On 12 September 2016 at 19:50, Raghav RV notifications@github.com wrote:

I've been thinking of a function cross_validate that returns a dict like
GridSearchCV.cv_results_

For a single param evaluation, I think it's easier to simply call
mean/std directly on the return value of cross_val_score...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7388 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz65vvX4y7jyLTJasCJTeI4OjEMvItks5qpSB2gaJpZM4J6Iwm
.

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 12, 2016

Maybe as a separate function cross_validate as rolling it into cross_val_score will complicate the common man's use case? (I believe not everyone wants multiple metric support?)

Thoughts @vene @amueller @agramfort

I thought we could simply have

  • scoring as a list of predefined metric strings / dict of names --> scorers. - The scores will now be a dict of names --> scores
  • scoring as a single string / callable - The scores will be, like before, an array.
@agramfort
Copy link
Member

@agramfort agramfort commented Sep 12, 2016

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 15, 2016

Thanks for the comment @agramfort. I will post a sample script soon.

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 16, 2016

And @GaelVaroquaux thanks for the comment at #7435. Could you clarify what kind of output you have in mind for cross_val_score when multiple metrics are to be evaluated, if not a dict?

@agramfort this is the usage I had in mind -

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> from sklearn.datasets import make_regression

>>> dtc = DecisionTreeRegressor()
>>> X, y = make_regression(n_samples=100, random_state=42)


# For multiple metric - as list of metrics
>>> cross_val_score(dtc, X, y, cv=2, scoring=['neg_mean_absolute_error',
...                                           'neg_mean_squared_error',
...                                           'neg_median_absolute_error'])
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
 'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
 'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}

# For multiple metric - as dict of callables
>>> cross_val_score(dtc, X, y, cv=2,
...                 scoring={'neg_mean_absolute_error': neg_uae_scorer,
...                          'neg_mean_squared_error': neg_mse_scorer,
...                          'neg_median_absolute_error': neg_mae_scorer})
{'neg_mean_absolute_error': array([-109.20020926, -124.05659102]),
 'neg_mean_squared_error': array([-15507.92864917, -27689.6700291 ]),
 'neg_median_absolute_error': array([ -87.57322795, -117.34946122])}


# For single metric (like before)
>>> cross_val_score(dtc, X, y, cv=2, scoring='neg_mean_absolute_error')
array([-109.20020926, -124.05659102])
@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 16, 2016

@mblondel WDYT?

@amueller
Copy link
Member

@amueller amueller commented Sep 16, 2016

Ah, for this usecase I actually support a dict. It's a bit weird if the output type changes depending on whether you provide a single metric or not, though.
Again, I think some data format that is easily converted to a pandas dataframe is great.

For callables, couldn't we just use __name__ instead of a dict? Or is that not stable enough?

@amueller
Copy link
Member

@amueller amueller commented Sep 16, 2016

So @jnothman suggested introducing a new function, and I think that might be a good idea. Optionally we could deprecate the current behavior of cross_val_score.

I think the new output should be structured like the cv_results_ with metrics and folds and times and summary statistics.

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 16, 2016

For callables, couldn't we just use name instead of a dict? Or is that not stable enough?

Two functions can have same name. We discussed this when we were brewing the cv_results_... This way we don't have to do complex heuristics to figure out the name of the scorer. We simply let the user supply the scorer name...

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 16, 2016

So @jnothman suggested introducing a new function, and I think that might be a good idea.

cross_validate? Which returns something similar to cv_results_? Okay! any opposition to this from @agramfort or @GaelVaroquaux?

Optionally we could deprecate the current behavior of cross_val_score.

Were you suggesting that cross_val_score also return a dict?

The list of scores as returned by cross_val_score for single metric will still be the most common use case... When people use multiple metric then they should definitely be expected to check the docstring to know how the scores for different metrics will be returned... correct?

Can I suggest that we leave cross_val_score as such (without implementing multiple metric there) and let it remain as a quick easy way to cross-validate for single metric and like Joel suggested cross_validate which will return a dict like cv_result_? There we can easily support multiple metric...

@amueller
Copy link
Member

@amueller amueller commented Sep 16, 2016

Two functions can have same name. We discussed this when we were brewing the cv_results_... This way we don't have to do complex heuristics to figure out the name of the scorer. We simply let the user supply the scorer name...

Sorry I missed that. But there is no multiple metric in GridSearchCV yet, right? So this PR would introduce the "scoring parameter as dict" as an interface.

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 16, 2016

But there is no multiple metric in GridSearchCV yet, right?

Not yet. Implementing there is very straight forward given our new cv_results_ attr...

But before that we need to fix on _fit_and_score and cross_val_score. They are the time consuming part involving API discussion...

@amueller
Copy link
Member

@amueller amueller commented Sep 16, 2016

Hm so do we also want to support f1_score with averaging=None in this? When doing grid-search, what would be used to decide the maximum, then? Hopefully not the first class.

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 17, 2016

No we won't support f1_score without averaging. For all such multiclass scorers you will get an error (as before).

ValueError: multiclass format is not supported

If the user wants it, they can quickly wrap the individual scorers into separate scorers with single value output each... In which case each such scorer will have a ranking associated with it...

And the best_estimator_ / best_index_ / best_score_ all would also have to be a dict with {scorer_name --> val}...

EDIT For single metric, the current format of best_estimator_ / best_index_ / best_score_ is all preserved as such...

@jnothman
Copy link
Member

@jnothman jnothman commented Sep 17, 2016

In short, we can't design cross_val_score in isolation. Make it work for GridSearchCV then we'll adapt it to cross_val_score. A dict doens't naturally specify one metric as score.

@raghavrv raghavrv changed the title [WIP] ENH Allow `cross_val_score` to evaluate on multiple metrics [WIP] ENH Allow `cross_val_score`, `GridSearchCV` and `RandomizedSearchCV` to evaluate on multiple metrics Sep 17, 2016
@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 26, 2016

(This is waiting for #7026#7325 to be merged)

@raghavrv raghavrv force-pushed the raghavrv:multimetric_cross_val_score branch 2 times, most recently from f6d3fe6 to 8b89687 Sep 26, 2016
@jnothman
Copy link
Member

@jnothman jnothman commented Sep 29, 2016

Please in 0.19. Please please please. While I have monkey patched this in my own code, I've fixed a colleague's code by simply avoiding cross_val_score altogether...

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 29, 2016

Please in 0.19. Please please please

Sure :P I thought 0.18 milestone is already complete with the timing and training score added? I intended this for 0.180.19 only...

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Sep 29, 2016

*0.19 only

@raghavrv raghavrv force-pushed the raghavrv:multimetric_cross_val_score branch 2 times, most recently from 1149527 to f5a917d Sep 29, 2016
@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Jul 8, 2017

OMG OMG OMG. I can't believe this is finally merged. Thanks everyone for the reviews!!! @vene @amueller - My dear mentors, hereby I successfully finish my GSoC 2015 :') :p

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Jul 8, 2017

@raghavrv can you open a new PR for that?

Sure :) Sorry I was flying to Austin!

@amueller
Copy link
Member

@amueller amueller commented Jul 10, 2017

@raghavrv see you tomorrow :)

@raghavrv
Copy link
Member Author

@raghavrv raghavrv commented Jul 10, 2017

I think @amueller is suggesting you use this kind of instructive wording in the narrative docs. Perhaps just adopt his wording?

Where is it? I can't find it.

@raghavrv raghavrv deleted the raghavrv:multimetric_cross_val_score branch Jul 11, 2017
massich added a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
bmanohar16 added a commit to bmanohar16/scikit-learn that referenced this pull request Jul 20, 2017
Old refers to new tag added with PR scikit-learn#7388
@amueller amueller removed this from PR phase in Andy's pets Jul 21, 2017
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 27, 2017
Release 0.19b2

* tag '0.19b2': (808 commits)
  Preparing 0.19b2
  [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376)
  FIX make test_importances pass on 32 bit linux
  Release 0.19b1
  DOC remove 'in dev' header in whats_new.rst
  DOC typos in whats_news.rst [ci skip]
  [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252)
  FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032)
  FIX broken link in gallery and bad title rendering
  [MRG] DOC Replace \acute by prime (scikit-learn#9332)
  Fix typos (scikit-learn#9320)
  [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206)
  DOC Residual sum vs. regression sum (scikit-learn#9314)
  [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317)
  More informative error message for classification metrics given regression output (scikit-learn#9275)
  [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310)
  [MRG+1] Ridgecv normalize (scikit-learn#9302)
  [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388)
  Add data_home parameter to fetch_kddcup99 (scikit-learn#9289)
  FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284)
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 27, 2017
* releases: (808 commits)
  Preparing 0.19b2
  [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376)
  FIX make test_importances pass on 32 bit linux
  Release 0.19b1
  DOC remove 'in dev' header in whats_new.rst
  DOC typos in whats_news.rst [ci skip]
  [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252)
  FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032)
  FIX broken link in gallery and bad title rendering
  [MRG] DOC Replace \acute by prime (scikit-learn#9332)
  Fix typos (scikit-learn#9320)
  [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206)
  DOC Residual sum vs. regression sum (scikit-learn#9314)
  [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317)
  More informative error message for classification metrics given regression output (scikit-learn#9275)
  [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310)
  [MRG+1] Ridgecv normalize (scikit-learn#9302)
  [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388)
  Add data_home parameter to fetch_kddcup99 (scikit-learn#9289)
  FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284)
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Jul 27, 2017
* dfsg: (808 commits)
  Preparing 0.19b2
  [MRG+1] FIX out of bounds array access in SAGA (scikit-learn#9376)
  FIX make test_importances pass on 32 bit linux
  Release 0.19b1
  DOC remove 'in dev' header in whats_new.rst
  DOC typos in whats_news.rst [ci skip]
  [MRG] DOC cleaning up what's new for 0.19 (scikit-learn#9252)
  FIX t-SNE memory usage and many other optimizer issues (scikit-learn#9032)
  FIX broken link in gallery and bad title rendering
  [MRG] DOC Replace \acute by prime (scikit-learn#9332)
  Fix typos (scikit-learn#9320)
  [MRG + 1 (rv) + 1 (alex) + 1] Add a check to test the docstring params and their order (scikit-learn#9206)
  DOC Residual sum vs. regression sum (scikit-learn#9314)
  [MRG] [HOTFIX] Fix capitalization in test and hence fix failing travis at master (scikit-learn#9317)
  More informative error message for classification metrics given regression output (scikit-learn#9275)
  [MRG] COSMIT Remove unused parameters in private functions (scikit-learn#9310)
  [MRG+1] Ridgecv normalize (scikit-learn#9302)
  [MRG + 2] ENH Allow `cross_val_score`, `GridSearchCV` et al. to evaluate on multiple metrics (scikit-learn#7388)
  Add data_home parameter to fetch_kddcup99 (scikit-learn#9289)
  FIX makedirs(..., exists_ok) not available in Python 2 (scikit-learn#9284)
  ...
jnothman added a commit that referenced this pull request Jul 30, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR #6651

* Change tag name
Old refers to new tag added with PR #7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
jnothman added a commit to jnothman/scikit-learn that referenced this pull request Aug 6, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…ate on multiple metrics (scikit-learn#7388)

* ENH cross_val_score now supports multiple metrics

* DOCFIX permutation_test_score

* ENH validate multiple metric scorers

* ENH Move validation of multimetric scoring param out

* ENH GridSearchCV and RandomizedSearchCV now support multiple metrics

* EXA Add an example demonstrating the multiple metric in GridSearchCV

* ENH Let check_multimetric_scoring tell if its multimetric or not

* FIX For single metric name of scorer should remain 'score'

* ENH validation_curve and learning_curve now support multiple metrics

* MNT move _aggregate_score_dicts helper into _validation.py

* TST More testing/ Fixing scores to the correct values

* EXA Add cross_val_score to multimetric example

* Rename to multiple_metric_evaluation.py

* MNT Remove scaffolding

* FIX doctest imports

* FIX wrap the scorer and unwrap the score when using _score() in rfe

* TST Cleanup the tests. Test for is_multimetric too

* TST Make sure it registers as single metric when scoring is of that type

* PEP8

* Don't use dict comprehension to make it work in python2.6

* ENH/FIX/TST grid_scores_ should not be available for multimetric evaluation

* FIX+TST delegated methods NA when multimetric is enabled...

TST Add general tests to GridSearchCV and RandomizedSearchCV

* ENH add option to disable delegation on multimetric scoring

* Remove old function from __all__

* flake8

* FIX revert disable_on_multimetric

* stash

* Fix incorrect rebase

* [ci skip]

* Make sure refit works as expected and remove irrelevant tests

* Allow passing standard scorers by name in multimetric scorers

* Fix example

* flake8

* Address reviews

* Fix indentation

* Ensure {'acc': 'accuracy'} and ['precision'] are valid inputs

* Test that for single metric, 'score' is a key

* Typos

* Fix incorrect rebase

* Compare multimetric grid search with multiple single metric searches

* Test X, y list and pandas input; Test multimetric for unsupervised grid search

* Fix tests; Unsupervised multimetric gs will not pass until scikit-learn#8117 is merged

* Make a plot of Precision vs ROC AUC for RandomForest varying the n_estimators

* Add example to grid_search.rst

* Use the classic tuning of C param in SVM instead of estimators in RF

* FIX Remove scoring arg in deafult scorer test

* flake8

* Search for min_samples_split in DTC; Also show f-score

* REVIEW Make check_multimetric_scoring private

* FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed

* REVIEW Plot best score; Shorten legends

* REVIEW/COSMIT multimetric --> multi-metric

* REVIEW Mark the best scores of P/R scores too

* Revert "FIX Add more samples to see if 3% mismatch on 32 bit systems gets fixed"

This reverts commit ba766d9.

* ENH Use looping for iid testing

* FIX use param grid as scipy's stats dist in 0.12 do not accept seed

* ENH more looping less code; Use small non-noisy dataset

* FIX Use named arg after expanded args

* TST More testing of the refit parameter

* Test that in multimetric search refit to single metric, the delegated methods
  work as expected.
* Test that setting probability=False works with multimetric too
* Test refit=False gives sensible error

* COSMIT multimetric --> multi-metric

* REV Correct example doc

* COSMIT

* REVIEW Make tests stronger; Fix bugs in _check_multimetric_scorer

* REVIEW refit param: Raise for empty strings

* TST Invalid refit params

* REVIEW Use <scorer_name> alone; recall --> Recall

* REV specify when we expect scorers to not be None

* FLAKE8

* REVERT multimetrics in learning_curve and validation_curve

* REVIEW Simpler coding style

* COSMIT

* COSMIT

* REV Compress example a bit. Move comment to top

* FIX fit_grid_point's previous API must be preserved

* Flake8

* TST Use loop; Compare with single-metric

* REVIEW Use dict-comprehension instead of helper

* REVIEW Remove redundant test

* Fix tests incorrect braces

* COSMIT

* REVIEW Use regexp

* REV Simplify aggregation of score dicts

* FIX precision and accuracy test

* FIX doctest and flake8

* TST the best_* attributes multimetric with single metric

* Address @jnothman's review

* Address more comments \o/

* DOCFIXES

* Fix use the validated fit_param from fit's arguments

* Revert alpha to a lower value as before

* Using def instead of lambda

* Address @jnothman's review batch 1: Fix tests / Doc fixes

* Remove superfluous tests

* Remove more superfluous testing

* TST/FIX loop over refit and check found n_clusters

* Cosmetic touches

* Use zip instead of manually listing the keys

* Fix inverse_transform

* FIX bug in fit_grid_point; Allow only single score

TST if fit_grid_point works as intended

* ENH Use only ROC-AUC and F1-score

* Fix typos and flake8; Address Andy's reviews

MNT Add a comment on why we do such a transpose + some fixes

* ENH Better error messages for incorrect multimetric scoring values +...

ENH Avoid exception traceback while using incorrect scoring string

* Dict keys must be of string type only

* 1. Better error message for invalid scoring 2...
Internal functions return single score for single metric scoring

* Fix test failures and shuffle tests

* Avoid wrapping scorer as dict in learning_curve

* Remove doc example as asked for

* Some leftover ones

* Don't wrap scorer in validation_curve either

* Add a doc example and skip it as dict order fails doctest

* Import zip from six for python2.7 compat

* Make cross_val_score return a cv_results-like dict

* Add relevant sections to userguide

* Flake8 fixes

* Add whatsnew and fix broken links

* Use AUC and accuracy instead of f1

* Fix failing doctests cross_validation.rst

* DOC add the wrapper example for metrics that return multiple return values

* Address andy's comments

* Be less weird

* Address more of andy's comments

* Make a separate cross_validate function to return dict and a cross_val_score

* Update the docs to reflect the new cross_validate function

* Add cross_validate to toc-tree

* Add more tests on type of cross_validate return and time limits

* FIX failing doctests

* FIX ensure keys are not plural

* DOC fix

* Address some pending comments

* Remove the comment as it is irrelevant now

* Remove excess blank line

* Fix flake8 inconsistencies

* Allow fit_times to be 0 to conform with windows precision

* DOC specify how refit param is to be set in multiple metric case

* TST ensure cross_validate works for string single metrics + address @jnothman's reviews

* Doc fixes

* Remove the shape and transform parameter of _aggregate_score_dicts

* Address Joel's doc comments

* Fix broken doctest

* Fix the spurious file

* Address Andy's comments

* MNT Remove erroneous entry

* Address Andy's comments

* FIX broken links

* Update whats_new.rst

missing newline
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Fix Rouseeuw1984 broken link

* Change label vbgmm to bgmm
Previously modified with PR scikit-learn#6651

* Change tag name
Old refers to new tag added with PR scikit-learn#7388

* Remove prefix underscore to match tag

* Realign to fit 80 chars

* Link to metrics.rst.
pairwise metrics yet to be documented

* Remove tag as LSHForest is deprecated

* Remove all references to randomized_l1 and sphx_glr_auto_examples_linear_model_plot_sparse_recovery.py.
It is deprecated.

* Fix few Sphinx warnings

* Realign to 80 chars

* Changes based on PR review

* Remove unused ref in calibration

* Fix link ref in covariance.rst

* Fix linking issues

* Differentiate Rouseeuw1999 tag within file.

* Change all duplicate Rouseeuw1999 tags

* Remove numbers from tag Rousseeuw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet