Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

ENH extensible parameter search results #1787

Closed
wants to merge 9 commits into from

3 participants

@jnothman
Owner

GridSearch and friends need to be able to return more fields in their results (e.g. #1742, composite score).

More generally, the conceivable results from a parameter search can be classified into:
1. per-parameter setting, per-fold (currently only the test score for each fold)
2. per-parameter setting (currently the parameters and mean test score across folds)
3. per-search (best_params_, best_score_, best_estimator_; however best_params_ and best_score_ are redundantly available in grid_scores_ as long as the index of the best parameters is known.)

Hence this patch changes the output of a parameter search to be (attribute names are open for debate!):

  • grid_results_ (1.) a structured array (a numpy array with named fields) with one record per set of parameters
  • fold_results_ (2.) a structured array with one record per fold per set of parameters
  • best_index_ (3.)
  • best_estimator_ if refit==True (3.)

The structured arrays can be indexed by field name to produce an array of values; alternatively they can be indexed as an array to produce a single record, akin to the namedtuples introduced in 0c94b55 (not in 0.13.1). In any case it allows numpy vectorised operations, as used here when calculating the mean score for each parameter setting (in _aggregate_scores).

Given this data, the legacy grid_scores_ (already deprecated), best_params_ and best_scores_ are calculated as properties.

This approach is extensible to new fields, in particular new fields within fold_results_ records, which are compiled from dicts returned from fit_fold (formerly fit_grid_point).

This PR is cut back from #1768; there you can see this extensibility exemplified to store training scores, training and test times, and precision and recall together with F-score.

@amueller
Owner

Could you rebase please?
Maybe parameter_results_ would be better than grid_results_?
I think I am with you on this one now. Not sure if it is easier to merge this one first or #1742.

It would be great if @GaelVaroquaux, @ogrisel and @larsmans could comment, as this is core api stuff :)

@amueller
Owner

This looks great! If the test pass, I think this is good to go (plus the rebase obv).

@amueller
Owner

There are probably some examples that need updating and possibly also the docs.

@jnothman
Owner

rebased. Any pointers to examples and docs needing updates?

@amueller
Owner

well, the grid-search narrative documentation and the examples using grid-search probably.
Also, you can run the test suite and see if there are any deprecation warnings.
Do you know where to look for the docs and examples?

@jnothman
Owner

As an aside (perhaps subject to a separate PR), I wonder whether we should return the parameters as a structured array (rather than dicts). So, rather than grid_results_ being something like:

array([({'foo': 5, 'bar': 'a'}, 1.0), ({'foo': 3, 'bar': 'a'}, 0.5)], 
      dtype=[('parameters', '|O4'), ('test_score', '<f4')])

it would be:

array([((5, 'a'), 1.0), ((3, 'a'), 0.5)], 
      dtype=[('parameters', [('foo', '<i4'), ('bar', '|S1')]), ('test_score', '<f4')])

This allows us to easily query the data by parameter value:

>>> grid_results_['parameters']['foo'] > 4
array([ True, False], dtype=bool)

Note this would also apply to randomised searches, helping towards #1020 where a solution like #1034 could not.

This approach, however, doesn't handle grid searches with multiple grids (i.e. passing an array of dicts to ParameterGrid), because there's no assurance that the same fields will be set in each grid (and the opposite is likely with #1769). This could be solved by using a masked record array, in which case it would be sensible to make parameters_ separate from grid_results_.

WDYT?

@jnothman
Owner

This last commit (b18a278) moves cross-validation evaluation code into a single place, the cross_validation module. This means *SearchCV can focus on the parameter search, and that users not interested in a parameter search (or performing one by hand) can take advantage of these extended results.

@jnothman jnothman REFACTOR/ENH refactor CV scoring from grid_search and cross_validation
Provides consinstent and enhanced structured-array result style for non-search CV evaluation.

So far only regression-tested.
b18a278
@jnothman

I'm not satisfied with these method names (score_folds, score_means; and in truth, these methods could be removed). I chose something verb-phrase-like. Suggestions are welcome.

@jnothman
Owner

[The exact form of this output structure depends on other issues like #1850; to handle sample_weights (#1574), test_n_samples should instead be stored in a separate searcher attribute, fold_weights_. And my current naming preference for the result attributes is search_results_ and fold_results_.]

@amueller
Owner

From your many pull requests, I think this is the one that I really want to merge first. Did you think about my proposal to rename grid_results_ to parameter_results_? I'll try to do a more serious review soon.

sklearn/grid_search.py
@@ -760,20 +702,27 @@ class RandomizedSearchCV(BaseSearchCV):
Attributes
----------
- `cv_scores_` : list of named tuples
- Contains scores for all parameter combinations in param_grid.
- Each entry corresponds to one parameter setting.
- Each named tuple has the attributes:
+ `grid_results_` : structured array of shape [# param combinations]
@amueller Owner
amueller added a note

This shows that grid_results_ is not a good name, as it is not a grid here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/cross_validation.py
((45 lines not shown))
+ results : dict of string to any
+ An extensible storage of fold results including the following keys:
+
+ ``'test_score'`` : float
+ The estimator's score on the test set.
+ ``'test_n_samples'`` : int
+ The number of samples in the test set.
+ """
+ if verbose > 1:
+ start_time = time.time()
+ if est_params is None:
+ msg = ''
+ else:
+ msg = '%s' % (', '.join('%s=%s' % (k, v)
+ for k, v in est_params.items()))
+ print("[CVEvaluator]%s %s" % (msg, (64 - len(msg)) * '.'))
@amueller Owner
amueller added a note

I don't think this should print CVEvaluator as this is a public method, isn't it?

@jnothman Owner
jnothman added a note

Well, the previous code printed GridSearchCV when it was a public method, and called from RandomizedSearchCV as well.

@amueller Owner

Hum... ok... doesn't make it much better, but fair enough ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@amueller
Owner

Refactoring the fit_fold from fit_grid_point and cross_val_score_ seems to be the right thing to me.

What made you add the CVEvaluator class?
Basically this moves some functionality from grid_search to cross_validation, right?
The two things I notice are the iid mean scoring and the iteration over parameter settings.
You didn't add the iid to the cross_val_score function, though.

What I don't like about this is that suddenly the concept of parameters appears in cross_validation.
Also, looking a bit into the future, maybe we want to keep the for-loop in grid_search. If we start doing any smart searches, I think CVEvaluator will not be powerful enough any more - but maybe that is getting ahead of ourselves.

@jnothman
Owner
  • On this being the one to merge first: I agree (in particularly because then we can merge in training scores and times), but there are some questions on the record structure and naming conventions to support multiple scores, so we need to think about that at the same time.
  • On a better name for grid_results_, I am currently swayed to search_results_. But what I think is to create consistency, the documentation needs to define its nomenclature, i.e. define each "point" or "candidate" of the search as representing a set of parameters that is evaluated, and define "fold" as well.
  • On the refactoring: what this does is moves the parameter evaluation to cross_validation; the search space/algorithm, and the selecting of a best (probably), and any analysis of search results, is still to be defined in model_selection. Perhaps cross_validation belongs in the new model_selection package anyway?
  • Yes, iid should be copied to cross_val_score.
  • If we want fit_fold to be shared, we need some concept of setting parameters within it anyway. The main reason for allowing a sequence of parameter settings in CVEvaluator is parallelism, and that if we work out ways to handle parameters that can be changed without refitting, that will need to be done on a per-fold basis; i.e. this needs to be optimised with respect to some parameter sequence.
  • Re smart searches: I actually made this refactoring while considering smarter searches, which come down to a series of searches over known candidates: i.e. evaluate some candidates and then determine a trajectory, and consider another set of candidates, and so on. So this fits well to that purpose (though perhaps not the ideal API, but who's to know?).
@jnothman
Owner

I just realised I didn't answer the question "What made you add the CVEvaluator class?"

Clearly it encapsulates the parallel fitting and the formatting of results. I also thought users of cross_val_score in an interactive context might appreciate something a bit more powerful, such that you can manually run an evaluation with one set of parameter settings ("candidate") then try another, etc. So something re-entrant was useful.

But a re-entrant setup was most important for the context of custom search algorithms (not in this PR; see https://github.com/jnothman/scikit-learn/tree/param_search_callback) where the CVEvaluator acts as a closure over its validated arguments and can be repeatedly called to evaluate different candidates. In particular, a more complicated search would inherit from CVEvaluator so that with every evaluation of candidates the results could be stored on the way (or not, given a setting).

@jnothman
Owner

I also considered making CVEvaluator more general so it would handle the permutations-significance-testing case as well (i.e. parallelising over reorderings of y, not parameters), but I didn't like the result.

@amueller
Owner

Thanks a lot for the feedback :) sounds sensible to me.
search_results_ would be fine with me, too.
I guess when we write the docs we will see what sounds most natural.

I'll do a fine-grained review asap ;)

sklearn/cross_validation.py
@@ -1038,13 +1041,74 @@ def __len__(self):
##############################################################################
-def _cross_val_score(estimator, X, y, scorer, train, test, verbose,
- fit_params):
- """Inner loop for cross validation"""
- n_samples = X.shape[0] if sp.issparse(X) else len(X)
+def fit_fold(estimator, X, y, train, test, scorer,
@amueller Owner

If this function is public, it should be in the references (doc/modules/classes.rst)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@amueller amueller commented on the diff
sklearn/cross_validation.py
@@ -1103,50 +1185,221 @@ def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1,
See 'Scoring objects' in the model evaluation section of the user guide
@amueller Owner

Is this comment appropriate? I don't think this section says anything about CVEvaluator.
Also CVEvaluator should also be added to the references.

@jnothman Owner

yes, looks like a cut-paste fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/cross_validation.py
@@ -1103,50 +1185,221 @@ def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1,
See 'Scoring objects' in the model evaluation section of the user guide
for details.
- cv : cross-validation generator, optional
- A cross-validation generator. If None, a 3-fold cross
- validation is used or 3-fold stratified cross-validation
- when y is supplied and estimator is a classifier.
+ cv : integer or cross-validation generator, optional
+ A cross-validation generator or number of stratified folds (default 3).
@amueller Owner

I think we can't leave out the detail that the folds are stratified only for classification, as it doesn't make sense otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jnothman
Owner

Sure, I can add things to classes.rst... Alternatively, we could keep CVEvaluator and fit_fold private until we're more happy with them and their APIs?

@jnothman
Owner

I considered changing grid_results_ to search_results_, but if it's found in a search object that seems redundant. Maybe we just want results_ or means_ or mean_results_.

But then I'm not sure about parameters in there. I'd like to see the parameters as a separate attribute, probably as a structured array of their own, but it makes the results somewhat harder to visually inspect without a zip (which is of course what the previous results formats have done).

While we're talking terminology, I think we want to change parameter_iter to candidates and point or similar to candidate. I think this only affects local variables and similar, but the nomenclature should be cleaned up, in the code and in the tutorial.

@amueller
Owner

I think we should always add stuff to classes.rst if it is public. Otherwise people have to look into the code to get help. It might help us get earlier feedback.

Just results_ seems a bit generic. mean_results_ would be ok with me. Why don't you like parameter_results_? The fold_results are indexed by the folds, the parameter_results_ are indexed by the parameters...

@amueller
Owner

I think the structure of results_ and fold_results_ should be explained in the narrative in 5.2.1. It is pretty bad that the current dict is not explained there.
Feel free to rename the variables. There are some left-over grids, that probably should go.

@larsmans @GaelVaroquaux I'd really like you to have a look if you find the time.

@jnothman
Owner

I wrote about this to the ML in order to weigh up alternatives and potentially get wider consensus. I don't think structured arrays are used elsewhere in scikit-learn, and I worry that while appropriate, they are a little too constraining and unfamiliar.

@jnothman
Owner

It's not relevant to the rest of the proposal, but I've decided CVEvaluator should be CVScorer, adopting the Scorer interface (__call__(X, y=None, sample_weight=None, ...)), important for forward compatibility and API similarity. __call__ will call a search method whose arguments are the same except for the first which is an iterable of candidates. Both __call__ and search will return either a dict or a structured array, either way a mapping of names to arrays/values. All other public methods from CVEvaluator will disappear. Finally I intend to propose this refactoring as a separate PR.

@ogrisel
Owner

Sorry for later feedback. I will try to have a look at this PR soon as I a currently working with RandomizedSearchCV.

@ogrisel
Owner

I assigned this PR for Milestone 0.14 as the new RandomizedSeachCV will be introduced in that release and we don't want to break the API too much once it's released.

@jnothman
Owner

@ogrisel: With regards to your comments on the ML, would we like to see the default storage / presentation of results as:

  • a list of dicts
  • a dict of arrays
  • structure-ambivalent because they will be hidden behind something like my search result interface

?

@ogrisel
Owner

I would prefer a list of dicts with:

  • parameters_id (integer unique for each parameter combinations, used for grouping)
  • fold_id (unique integer for each CV fold, used for grouping computing mean and std scores scores)
  • parameters (the dict of the actual parameters values: cannot be hashed in general hence cannot be used for grouping directly hence the use of a parameters_id field)
  • train_fold_size (integer might be useful later if we use the same interface to compute learning curves simultaneously)
  • test_fold_size (useful for computing iid mean score)
  • validation_error (for the provided scoring, used for model ranking once averaged across collected folds)
  • training_error (to be able to evaluate the impact of the parameters on the under fitting and over fitting behavior of the model)
  • training_time (float in seconds)

And later we will let the user compute additional attribute using a callback API, for instance to collect additional complementary scores such as per class precision, recall and f1 score or full confusion matrices.

Then make the search result interface compute the grouped statistics and rank models by mean validation errors by grouping on the parameters_id fields.

@jnothman
Owner

That structure makes a lot of sense in terms of asynchronous parallelisation... I'm still not entirely convinced it's worthwhile having each fold available to the user as a separate record (which is providing the output of map, before reduce). I also don't think train and test fold size necessarily need to be in there if we are using the same folds for every candidate.

I guess what you're trying to say is that this is the nature of our raw data: a series of fold records. And maybe we need to make a distinction between:

  • the fold records produced by the search
  • the default in-memory storage
  • the default API

My suggestion of structured arrays was intended to provide compact in-memory storage with easy, flexible and efficient access, but still required per-fold intermediate records.

Let's say that we could pass some kind of results_manager to the CV-search constructor. Let's say it's a class that accepts a cv generator (or listified form) so that it knows the number and sizes of of folds, and that the constructed object is stored on the CV-search estimator as results_. Let's say it has to have an add method and a get_best method. I can think of three primary implementations:

  • no storage: get_best performs a find-max over average scores (and results_ provides no data).
  • in-memory storage: don't care what the underlying storage is as long as it can be pickled and produces an interface like my SearchResult object.
  • off-site storage: dump data to file / kv-store / RDBMS and perform find-max at the same time and/or provide a complete API.

Each of these needs to:

  • group data from the same candidate for multiple folds, if add is called per-fold.
  • know how to calculate the best score, including (iid -> fold-weighted) average and greater_is_better logic.

I don't really think that first point should be necessary. If we have an asynchronous processing queue, we will still expect folds for each candidate to be evaluated roughly at the same time, so grouping can happen more efficiently by handling it straight off the queue (storing all the fold results temporarily in memory) rather than in each results_manager implementation. (Perhaps you wouldn't want to store all the folds in memory for LeavePOut, but I don't think that's going to be used for a large dataset / candidate space.)

@jnothman
Owner

In short: I can't think of a use-case where a user wants per-fold data to be in a list. In an iterable coming off a queue, yes. In a relational DB, perhaps. (Grouped by candidate, certainly.)

@ogrisel
Owner

That structure makes a lot of sense in terms of asynchronous parallelisation... I'm still not entirely convinced it's worthwhile having each fold available to the user as a separate record (which is providing the output of map, before reduce). I also don't think train and test fold size necessarily need to be in there if we are using the same folds for every candidate.

It is for fail over if some parameters set will generate ill conditioned optimization problems that are not numerically stable across all CV folds. That can happen with SGDClassifier and GBRT models apparently.

Dealing with missing evaluations is very useful, even with the lack of async parallelization.

If we have an asynchronous processing queue, we will still expect folds for each candidate to be evaluated roughly at the same time

This statement is false if we would like to implement the "warm start with growing number of CV folds" use case.

In short: I can't think of a use-case where a user wants per-fold data to be in a list. In an iterable coming off a queue, yes. In a relational DB, perhaps. (Grouped by candidate, certainly.)

Implementing fault tolerant grid search is one, iteratively growable CV folds is another (warm restarts with a higher number of CV iterations).

I wasted a couple of grid search run (lasting 10min each times) precisely because of those 2 missing use cases yesterday. So they are not made up use cases: as a regular user of the lib I really feel the need for those.

Also implementing learning curves with a variable train_fold_size will also be a usecase where the append-only list of atomic evaluation dicts will be easier.

In short: the dumb fold log records datastructure is so much more simple and flexible to allow the implementation of any additional use cases in the future (e.g. learning curves and warm restarts in any dimension) that I think it should be the base underlying datastructure we collect internally even if we expect the user to rarely have the need to access it directly but rather through the results_ object.

For instance we could have:

  • results_log_ : the append only list of dict datastructure to store the raw evaluations

  • results_summary_ : an object that provides user friendly ways to query the results. This object class could take the raw log as constructor parameter and compute its own aggregates (like iid mean scores for ranking).

The results log can be kept if we implement warm restarts. The results_summary_ will have to be reseted and recomputed from the updated log.

The enduser API can still be made simple by providing a results object that can do the aggregation and even output the structured array datastructure you propose if it prove really useful from an enduser API standpoint.

@ogrisel
Owner

Also I don't think memory efficiency will never be an issue: even with millions of evaluations the overhead of python dicts and python object reference is pretty manageable in 2013 :)

@jnothman
Owner

Also I don't think memory efficiency will never be an issue: even with millions of evaluations the overhead of python dicts and python object reference is pretty manageable in 2013 :)

Assuming you're not collecting other data, but in that case you're right, the dict overhead will make little difference, and I'm going on about nothing. For fault tolerance there's still sense in storing some data on-disk, though.

I'll think about how best to transform this PR into something like that.

@jnothman
Owner

So from master, the things that IMO should happen are:

  • the fit_grid_point function should return a dict that will be used directly as a results_log_ entry, which means it needs to be passed the candidate id and fold id where it is not currently.
  • this implementation should also replace the parallelised function in cross_val_score, forming a CVScorer class to handle the shared parallelisation logic. These first two points form a PR on their own.
  • following on from that, PRs to store the log and an API for results access.
@jnothman
Owner

And again, I should point out that one difficulty with dicts is that our names for fields in them cannot have deprecation warnings, so it's a bit dangerous making them a public API...

@ogrisel
Owner

And again, I should point out that one difficulty with dicts is that our names for fields in them cannot have deprecation warnings, so it's a bit dangerous making them a public API...

That's a valid point I had not thought of.

@jnothman
Owner

So we could make them custom objects, but they're less portable. I can't yet think of a nice solution there, except to make the results_log_ an unstable advanced feature...

(And not being concerned by the memory consumption of dicts, your comment on the memory efficiency of namedtuples in the context of _CVScoreTuple is a bit superfluous!)

@ogrisel
Owner

So from master, the things that IMO should happen are:

  • a the fit_grid_point function should return a dict that will be used directly as a results_log_ entry, which means it needs to be passed the candidate id and fold id where it is not currently.
  • b this implementation should also replace the parallelised function in cross_val_score, forming a CVScorer class to handle the shared parallelisation logic. These first two points form a PR on their own.
  • c following on from that, PRs to store the log and an API for results access.

Sounds good. Also +1 for using candidate_id instead of parameters_id.

I would like to have other people opinions on our discussion though. Apparently people are pretty busy at the moment. Let see:

Ping @larsmans @mblondel @amueller @pprett @glouppe @arjoly @vene

I know @GaelVaroquaux is currently traveling at conferences. We might have a look at this during the SciPy sprint next week with him and @jakevdp.

@ogrisel
Owner

(And not being concerned by the memory consumption of dicts, your comment on the memory efficiency of namedtuples in the context of _CVScoreTuple is a bit superfluous!)

Indeed, it's just that I added a __slots__ = () to the existing CVScoreTuple tuple to make it more idiomatic and then people started to ask why. Hence I added the comment.

@jnothman
Owner

I would like to have other people opinions on our discussion though.

I think the discussion is a bit hard to navigate and it would be more sensible to present a cut back PR: #2079. I'll close this one as it seems we're unlikely to go with its solution.

@jnothman jnothman closed this
@jnothman jnothman referenced this pull request
Closed

[MRG] Refactor CV and grid search #2736

1 of 1 task complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 6, 2013
  1. @jnothman
Commits on Apr 7, 2013
  1. @jnothman
Commits on Apr 9, 2013
  1. @jnothman

    REFACTOR/ENH refactor CV scoring from grid_search and cross_validation

    jnothman authored
    Provides consinstent and enhanced structured-array result style for non-search CV evaluation.
    
    So far only regression-tested.
Commits on May 1, 2013
  1. @jnothman

    COSMIT fix pep8 violations

    jnothman authored
  2. @jnothman
Commits on May 11, 2013
  1. @jnothman

    DOC Fix comments for cross_val_score and CVEvaluator

    jnothman authored
    also, add iid parameter to cross_val_score
  2. @jnothman
  3. @jnothman
Commits on Jun 9, 2013
  1. @jnothman
This page is out of date. Refresh to see the latest.
View
7 examples/grid_search_digits.py
@@ -59,9 +59,12 @@
print()
print("Grid scores on development set:")
print()
- for params, mean_score, scores in clf.cv_scores_:
+ candidates = clf.search_results_['parameters']
+ means = clf.search_results_['test_score']
+ stds = clf.fold_results_['test_score'].std(axis=1)
+ for params, mean, std in zip(candidates, means, stds):
print("%0.3f (+/-%0.03f) for %r"
- % (mean_score, scores.std() / 2, params))
+ % (mean, std / 2, params))
print()
print("Detailed classification report:")
View
8 examples/svm/plot_rbf_parameters.py
@@ -105,12 +105,8 @@
pl.axis('tight')
# plot the scores of the grid
-# cv_scores_ contains parameter settings and scores
-score_dict = grid.cv_scores_
-
-# We extract just the scores
-scores = [x[1] for x in score_dict]
-scores = np.array(scores).reshape(len(C_range), len(gamma_range))
+scores = grid.search_results_['test_score']
+scores = scores.reshape(len(C_range), len(gamma_range))
# draw heatmap of accuracy as a function of gamma and C
pl.figure(figsize=(8, 6))
View
2  examples/svm/plot_svm_scale_c.py
@@ -131,7 +131,7 @@
cv=ShuffleSplit(n=n_samples, train_size=train_size,
n_iter=250, random_state=1))
grid.fit(X, y)
- scores = [x[1] for x in grid.cv_scores_]
+ scores = grid.search_results_['test_score']
scales = [(1, 'No scaling'),
((n_samples * train_size), '1/n_samples'),
View
393 sklearn/cross_validation.py
@@ -14,18 +14,21 @@
from itertools import combinations
from math import ceil, floor, factorial
import numbers
+import time
import numpy as np
import scipy.sparse as sp
from .base import is_classifier, clone
from .utils import check_arrays, check_random_state, safe_mask
+from .utils.validation import _num_samples
from .utils.fixes import unique
-from .externals.joblib import Parallel, delayed
-from .externals.six import string_types
+from .externals.joblib import Parallel, delayed, logger
+from .externals.six import string_types, iterkeys
from .metrics import SCORERS, Scorer
__all__ = ['Bootstrap',
+ 'CVScorer',
'KFold',
'LeaveOneLabelOut',
'LeaveOneOut',
@@ -1038,13 +1041,74 @@ def __len__(self):
##############################################################################
-def _cross_val_score(estimator, X, y, scorer, train, test, verbose,
- fit_params):
- """Inner loop for cross validation"""
- n_samples = X.shape[0] if sp.issparse(X) else len(X)
+def _fit_fold(estimator, X, y, train, test, scoring,
+ verbose, est_params=None, fit_params=None):
+ """Run fit on one set of parameters.
+
+ Parameters
+ ----------
+ estimator : estimator object
+ This estimator will be cloned and then fitted.
+
+ X : array-like, sparse matrix or list
+ Input data.
+
+ y : array-like or None
+ Targets for input data.
+
+ train : ndarray, dtype int or bool
+ Boolean mask or indices for training set.
+
+ test : ndarray, dtype int or bool
+ Boolean mask or indices for test set.
+
+ scoring : callable or None.
+ If provided must be a scoring object / function with signature
+ ``scoring(estimator, X, y)``.
+
+ verbose : int
+ Verbosity level.
+
+ est_params : dict
+ Parameters to be set on estimator for this fold.
+
+ **fit_params : kwargs
+ Additional parameter passed to the fit function of the estimator.
+
+
+ Returns
+ -------
+ results : dict of string to any
+ An extensible storage of fold results including the following keys:
+
+ ``'test_score'`` : float
+ The estimator's score on the test set.
+ ``'test_n_samples'`` : int
+ The number of samples in the test set.
+ """
+ if verbose > 1:
+ start_time = time.time()
+ if est_params is None:
+ msg = ''
+ else:
+ msg = '%s' % (', '.join('%s=%s' % (k, v)
+ for k, v in est_params.items()))
+ print("Fitting fold %s %s" % (msg, (64 - len(msg)) * '.'))
+
+ n_samples = _num_samples(X)
+
+ # Adapt fit_params to train portion only
+ if fit_params is None:
+ fit_params = {}
fit_params = dict([(k, np.asarray(v)[train]
if hasattr(v, '__len__') and len(v) == n_samples else v)
for k, v in fit_params.items()])
+
+ if hasattr(estimator, 'kernel') and callable(estimator.kernel):
+ # cannot compute the kernel values with custom function
+ raise ValueError("Cannot use a custom kernel function. "
+ "Precompute the kernel matrix instead.")
+
if not hasattr(X, "shape"):
if getattr(estimator, "_pairwise", False):
raise ValueError("Precomputed kernels or affinity matrices have "
@@ -1062,27 +1126,248 @@ def _cross_val_score(estimator, X, y, scorer, train, test, verbose,
X_train = X[safe_mask(X, train)]
X_test = X[safe_mask(X, test)]
- if y is None:
- y_train = None
- y_test = None
- else:
- y_train = y[train]
- y_test = y[test]
- estimator.fit(X_train, y_train, **fit_params)
- if scorer is None:
- score = estimator.score(X_test, y_test)
+ # update parameters of the classifier after a copy of its base structure
+ if est_params is not None:
+ estimator = clone(estimator)
+ estimator.set_params(**est_params)
+
+ if scoring is None:
+ scoring = lambda estimator, *args: estimator.score(*args)
+
+ if y is not None:
+ y_test = y[safe_mask(y, test)]
+ y_train = y[safe_mask(y, train)]
+ fit_args = (X_train, y_train)
+ score_args = (X_test, y_test)
else:
- score = scorer(estimator, X_test, y_test)
- if not isinstance(score, numbers.Number):
- raise ValueError("scoring must return a number, got %s (%s)"
- " instead." % (str(score), type(score)))
+ fit_args = (X_train,)
+ score_args = (X_test,)
+
+ # do actual fitting
+ estimator.fit(*fit_args, **fit_params)
+ test_score = scoring(estimator, *score_args)
+
+ if not isinstance(test_score, numbers.Number):
+ raise ValueError("scoring must return a number, got %s (%s)"
+ " instead." % (str(test_score), type(test_score)))
+
+ if verbose > 2:
+ msg += ", score=%f" % test_score
if verbose > 1:
- print("score: %f" % score)
- return score
+ end_msg = "%s -%s" % (msg,
+ logger.short_format_time(time.time() -
+ start_time))
+ print("Fitting fold %s %s" % ((64 - len(end_msg)) * '.', end_msg))
+ return {
+ 'test_score': test_score,
+ 'test_n_samples': _num_samples(X_test),
+ }
+
+
+class CVScorer(object):
+ """Parallelized cross-validation for a given estimator and dataset
+
+ Parameters
+ ----------
+ cv : integer or cross-validation generator, optional
+ A cross-validation generator or number of folds (default 3). Folds will
+ be stratified if the estimator is a classifier.
+
+ scoring : string or callable, optional
+ Either one of either a string ("zero_one", "f1", "roc_auc", ... for
+ classification, "mse", "r2", ... for regression) or a callable.
+ See 'Scoring objects' in the model evaluation section of the user guide
+ for details.
+
+ iid : boolean, optional
+ If True (default), the data is assumed to be identically distributed
+ across the folds, and the mean score is the total score per sample,
+ and not the mean score across the folds.
+
+ n_jobs : integer, optional
+ The number of jobs to run in parallel (default 1). -1 means 'all CPUs'.
+
+ pre_dispatch : int, or string, optional
+ Controls the number of jobs that get dispatched during parallel
+ execution. Reducing this number can be useful to avoid an
+ explosion of memory consumption when more jobs get dispatched
+ than CPUs can process. This parameter can be:
+
+ - None, in which case all the jobs are immediatly
+ created and spawned. Use this for lightweight and
+ fast-running jobs, to avoid delays due to on-demand
+ spawning of the jobs
+
+ - An int, giving the exact number of total jobs that are
+ spawned
+
+ - A string, giving an expression as a function of n_jobs,
+ as in '2*n_jobs'
+ verbose : integer, optional
+ The verbosity level.
+
+ fit_params : dict, optional
+ Parameters to pass to the fit method of the estimator.
+ """
+
+ def __init__(self, cv=None, scoring=None, iid=True,
+ n_jobs=1, pre_dispatch='2*n_jobs', verbose=0, fit_params=None,
+ score_func=None):
+
+ self.cv = cv
+ self.iid = iid
+ self.verbose = verbose
+ self.fit_params = fit_params
+
+ if score_func is not None:
+ warnings.warn("Passing function as ``score_func`` is "
+ "deprecated and will be removed in 0.15. "
+ "Either use strings or score objects.", stacklevel=2)
+ self.scoring = Scorer(score_func)
+ elif isinstance(scoring, string_types):
+ self.scoring = SCORERS[scoring]
+ else:
+ self.scoring = scoring
+
+ self.parallel = Parallel(n_jobs=n_jobs, verbose=verbose,
+ pre_dispatch=pre_dispatch)
+
+ def _calc_means(self, scores, n_samples):
+ """
+ Calculate means of the final dimension of `scores`, weighted by
+ `n_samples` if `iid` is True.
+
+ Parameters
+ ----------
+ scores : array-like of floats
+ The scores to aggregate.
+
+ n_samples : array-like of integers with same shape as `scores`
+ The number of samples considered in calculating each score.
+
+ Returns
+ -------
+ means : ndarray with shape of `scores` except for last dimension
+ The means of the last dimension of scores.
+ """
+ scores = np.asarray(scores)
+ n_samples = np.asarray(n_samples)
+ if self.iid:
+ scores = scores * n_samples
+ scores = scores.sum(axis=-1) / n_samples.sum(axis=-1)
+ else:
+ scores = scores.sum(axis=-1) / scores.shape[-1]
+ return scores
+
+ def _format_results(self, out, n_folds):
+ # group by params
+ out = [
+ [fold_results for fold_results in out[start:start + n_folds]]
+ for start in range(0, len(out), n_folds)
+ ]
+
+ # dicts to structured arrays (assume keys are same throughout):
+ keys = sorted(iterkeys(out[0][0]))
+ arrays = (
+ [[fold_results[key] for fold_results in point] for point in out]
+ for key in keys)
+ out = np.rec.fromarrays(arrays, names=keys)
+
+ # for now, only one mean:
+ means = np.rec.fromarrays(
+ [self._calc_means(out['test_score'], out['test_n_samples'])],
+ names=['test_score']
+ )
-def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1,
- verbose=0, fit_params=None, score_func=None):
+ return means, out
+
+ def __call__(self, estimator, X, y=None):
+ """Cross-validate the estimator on the given data.
+
+ Parameters
+ ----------
+ estimator : estimator object implementing 'fit'
+ The object to use to fit the data.
+
+ X : array-like of shape at least 2D
+ The data to fit.
+
+ y : array-like, optional
+ The target variable to try to predict in the case of
+ supervised learning.
+
+ Returns
+ -------
+ means : structured array
+ This provides fields:
+ * ``test_score``, the mean test score across folds
+ The array has one dimension corresponding to parameter settings if
+ an iterable is provided, and zero dimensions otherwise.
+ fold_results : structured array
+ For each cross-validation fold, this provides fields:
+ * ``test_score``, the score for this fold
+ * ``test_n_samples``, the number of samples in testing
+ The first axis indexes parameter settings where an iterable is
+ provided.
+ """
+ means, folds = self.search([{}], estimator, X, y)
+ return means[0], folds[0]
+
+ def search(self, candidates, estimator, X, y=None):
+ """Cross-validate the estimator for candidate parameter settings.
+
+ Parameters
+ ----------
+ candidates : iterable of dicts
+ The estimator will be cloned and have these parameters
+ set for each candidate.
+
+ estimator : estimator object implementing 'fit'
+ The object to use to fit the data.
+
+ X : array-like of shape at least 2D
+ The data to fit.
+
+ y : array-like, optional
+ The target variable to try to predict in the case of
+ supervised learning.
+
+ Returns
+ -------
+ means : structured array of shape (n_candidates,)
+ This provides fields:
+ * ``test_score``, the mean test score across folds
+ fold_results : structured array of shape (n_candidates, n_folds)
+ For each cross-validation fold, this provides fields:
+ * ``test_score``, the score for this fold
+ * ``test_n_samples``, the number of samples in testing
+ """
+ X, y = check_arrays(X, y, sparse_format='csr', allow_lists=True)
+ cv = check_cv(self.cv, X, y, classifier=is_classifier(estimator))
+ cv = list(cv)
+ n_folds = len(cv)
+ if self.scoring is None and not hasattr(estimator, 'score'):
+ raise TypeError(
+ "If no scoring is specified, the estimator passed "
+ "should have a 'score' method. The estimator %s "
+ "does not." % estimator)
+
+ out = self.parallel(
+ delayed(_fit_fold)(
+ estimator, X, y,
+ est_params=est_params, train=train, test=test,
+ scoring=self.scoring, verbose=self.verbose,
+ fit_params=self.fit_params
+ )
+ for est_params in candidates for train, test in cv)
+
+ means, folds = self._format_results(out, n_folds)
+ return means, folds
+
+
+def cross_val_score(estimator, X, y=None, scoring=None, cv=None, iid=True,
+ n_jobs=1, verbose=0, fit_params=None, score_func=None):
"""Evaluate a score by cross-validation
Parameters
@@ -1103,14 +1388,34 @@ def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1,
See 'Scoring objects' in the model evaluation section of the user guide
@amueller Owner

Is this comment appropriate? I don't think this section says anything about CVEvaluator.
Also CVEvaluator should also be added to the references.

@jnothman Owner

yes, looks like a cut-paste fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
for details.
- cv : cross-validation generator, optional
- A cross-validation generator. If None, a 3-fold cross
- validation is used or 3-fold stratified cross-validation
- when y is supplied and estimator is a classifier.
+ cv : integer or cross-validation generator, optional
+ A cross-validation generator or number of folds (default 3). Folds will
+ be stratified if the estimator is a classifier.
+
+ iid : boolean, optional
+ If True (default), the data is assumed to be identically distributed
+ across the folds, and the mean score is the total score per sample,
+ and not the mean score across the folds.
n_jobs : integer, optional
- The number of CPUs to use to do the computation. -1 means
- 'all CPUs'.
+ The number of jobs to run in parallel (default 1). -1 means 'all CPUs'.
+
+ pre_dispatch : int, or string, optional
+ Controls the number of jobs that get dispatched during parallel
+ execution. Reducing this number can be useful to avoid an
+ explosion of memory consumption when more jobs get dispatched
+ than CPUs can process. This parameter can be:
+
+ - None, in which case all the jobs are immediatly
+ created and spawned. Use this for lightweight and
+ fast-running jobs, to avoid delays due to on-demand
+ spawning of the jobs
+
+ - An int, giving the exact number of total jobs that are
+ spawned
+
+ - A string, giving an expression as a function of n_jobs,
+ as in '2*n_jobs'
verbose : integer, optional
The verbosity level.
@@ -1120,33 +1425,13 @@ def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1,
Returns
-------
- scores : array of float, shape=(len(list(cv)),)
+ scores : array of float, shape=[len(cv)]
Array of scores of the estimator for each run of the cross validation.
"""
- X, y = check_arrays(X, y, sparse_format='csr', allow_lists=True)
- cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
- if score_func is not None:
- warnings.warn("Passing function as ``score_func`` is "
- "deprecated and will be removed in 0.15. "
- "Either use strings or score objects.", stacklevel=2)
- scorer = Scorer(score_func)
- elif isinstance(scoring, string_types):
- scorer = SCORERS[scoring]
- else:
- scorer = scoring
- if scorer is None and not hasattr(estimator, 'score'):
- raise TypeError(
- "If no scoring is specified, the estimator passed "
- "should have a 'score' method. The estimator %s "
- "does not." % estimator)
- # We clone the estimator to make sure that all the folds are
- # independent, and that it is pickle-able.
- fit_params = fit_params if fit_params is not None else {}
- scores = Parallel(n_jobs=n_jobs, verbose=verbose)(
- delayed(_cross_val_score)(
- clone(estimator), X, y, scorer, train, test, verbose, fit_params)
- for train, test in cv)
- return np.array(scores)
+ cv_score = CVScorer(
+ scoring=scoring, cv=cv, n_jobs=n_jobs,
+ verbose=verbose, fit_params=fit_params, score_func=score_func)
+ return cv_score(estimator, X, y)[1]['test_score']
def _permutation_test_score(estimator, X, y, cv, scorer):
View
267 sklearn/grid_search.py
@@ -9,22 +9,20 @@
# License: BSD Style.
from abc import ABCMeta, abstractmethod
-from collections import Mapping, namedtuple
+from collections import Mapping
from functools import partial, reduce
from itertools import product
-import numbers
import operator
-import time
import warnings
import numpy as np
+from numpy.lib import recfunctions
-from .base import BaseEstimator, is_classifier, clone
+from .base import BaseEstimator, clone
from .base import MetaEstimatorMixin
-from .cross_validation import check_cv
-from .externals.joblib import Parallel, delayed, logger
-from .externals.six import string_types
-from .utils import safe_mask, check_random_state
+from .cross_validation import CVScorer, _fit_fold
+from .externals.six import string_types, iterkeys
+from .utils import check_random_state, deprecated
from .utils.validation import _num_samples, check_arrays
from .metrics import SCORERS, Scorer
@@ -189,10 +187,13 @@ def __len__(self):
return self.n_iter
+@deprecated('fit_grid_point is deprecated and will be removed in 0.15.')
def fit_grid_point(X, y, base_clf, clf_params, train, test, scorer,
verbose, loss_func=None, **fit_params):
"""Run fit on one set of parameters.
+ This function is DEPRECATED. Use `cross_validation.fit_fold` instead.
+
Parameters
----------
X : array-like, sparse matrix or list
@@ -229,73 +230,17 @@ def fit_grid_point(X, y, base_clf, clf_params, train, test, scorer,
score : float
Score of this parameter setting on given training / test split.
- estimator : estimator object
- Estimator object of type base_clf that was fitted using clf_params
- and provided train / test split.
+ clf_params : dict
+ The parameters used to train this estimator.
n_samples_test : int
Number of test samples in this split.
"""
- if verbose > 1:
- start_time = time.time()
- msg = '%s' % (', '.join('%s=%s' % (k, v)
- for k, v in clf_params.items()))
- print("[GridSearchCV] %s %s" % (msg, (64 - len(msg)) * '.'))
-
- # update parameters of the classifier after a copy of its base structure
- clf = clone(base_clf)
- clf.set_params(**clf_params)
-
- if hasattr(base_clf, 'kernel') and callable(base_clf.kernel):
- # cannot compute the kernel values with custom function
- raise ValueError("Cannot use a custom kernel function. "
- "Precompute the kernel matrix instead.")
-
- if not hasattr(X, "shape"):
- if getattr(base_clf, "_pairwise", False):
- raise ValueError("Precomputed kernels or affinity matrices have "
- "to be passed as arrays or sparse matrices.")
- X_train = [X[idx] for idx in train]
- X_test = [X[idx] for idx in test]
- else:
- if getattr(base_clf, "_pairwise", False):
- # X is a precomputed square kernel matrix
- if X.shape[0] != X.shape[1]:
- raise ValueError("X should be a square kernel matrix")
- X_train = X[np.ix_(train, train)]
- X_test = X[np.ix_(test, train)]
- else:
- X_train = X[safe_mask(X, train)]
- X_test = X[safe_mask(X, test)]
-
- if y is not None:
- y_test = y[safe_mask(y, test)]
- y_train = y[safe_mask(y, train)]
- clf.fit(X_train, y_train, **fit_params)
-
- if scorer is not None:
- this_score = scorer(clf, X_test, y_test)
- else:
- this_score = clf.score(X_test, y_test)
- else:
- clf.fit(X_train, **fit_params)
- if scorer is not None:
- this_score = scorer(clf, X_test)
- else:
- this_score = clf.score(X_test)
-
- if not isinstance(this_score, numbers.Number):
- raise ValueError("scoring must return a number, got %s (%s)"
- " instead." % (str(this_score), type(this_score)))
-
- if verbose > 2:
- msg += ", score=%f" % this_score
- if verbose > 1:
- end_msg = "%s -%s" % (msg,
- logger.short_format_time(time.time() -
- start_time))
- print("[GridSearchCV] %s %s" % ((64 - len(end_msg)) * '.', end_msg))
- return this_score, clf_params, _num_samples(X_test)
+ res = _fit_fold(
+ base_clf, X, y, train, test, scorer, verbose,
+ loss_func=None, est_params=clf_params, fit_params=fit_params
+ )
+ return res['test_score'], clf_params, res['test_n_samples']
def _check_param_grid(param_grid):
@@ -316,11 +261,6 @@ def _check_param_grid(param_grid):
"list.")
-_CVScoreTuple = namedtuple('_CVScoreTuple',
- ('parameters', 'mean_validation_score',
- 'cv_validation_scores'))
-
-
class BaseSearchCV(BaseEstimator, MetaEstimatorMixin):
"""Base class for hyper parameter search with cross-validation.
"""
@@ -387,6 +327,27 @@ def decision_function(self):
def transform(self):
return self.best_estimator_.transform
+ @property
+ def grid_scores_(self):
+ warnings.warn("grid_scores_ is deprecated and will be removed in 0.15."
+ " Use search_results_ and fold_results_ instead.",
+ DeprecationWarning)
+ return zip(self.search_results_['parameters'],
+ self.search_results_['test_score'],
+ self.fold_results_['test_score'])
+
+ @property
+ def best_score_(self):
+ if not hasattr(self, 'best_index_'):
+ raise AttributeError('Call fit() to calculate best_score_')
+ return self.search_results_['test_score'][self.best_index_]
+
+ @property
+ def best_params_(self):
+ if not hasattr(self, 'best_index_'):
+ raise AttributeError('Call fit() to calculate best_params_')
+ return self.search_results_['parameters'][self.best_index_]
+
def _check_estimator(self):
"""Check that estimator can be fitted and score can be computed."""
if (not hasattr(self.estimator, 'fit') or
@@ -406,11 +367,6 @@ def _check_estimator(self):
def _fit(self, X, y, parameter_iterator, **params):
"""Actual fitting, performing the search over parameters."""
- estimator = self.estimator
- cv = self.cv
-
- n_samples = _num_samples(X)
- X, y = check_arrays(X, y, allow_lists=True, sparse_format='csr')
if self.loss_func is not None:
warnings.warn("Passing a loss function is "
@@ -428,55 +384,37 @@ def _fit(self, X, y, parameter_iterator, **params):
scorer = SCORERS[self.scoring]
else:
scorer = self.scoring
-
self.scorer_ = scorer
+ n_samples = _num_samples(X)
+ X, y = check_arrays(X, y, allow_lists=True, sparse_format='csr')
if y is not None:
if len(y) != n_samples:
raise ValueError('Target variable (y) has a different number '
'of samples (%i) than data (X: %i samples)'
% (len(y), n_samples))
y = np.asarray(y)
- cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
-
- base_clf = clone(self.estimator)
-
- pre_dispatch = self.pre_dispatch
-
- out = Parallel(
- n_jobs=self.n_jobs, verbose=self.verbose,
- pre_dispatch=pre_dispatch)(
- delayed(fit_grid_point)(
- X, y, base_clf, clf_params, train, test, scorer,
- self.verbose, **self.fit_params) for clf_params in
- parameter_iterator for train, test in cv)
-
- # Out is a list of triplet: score, estimator, n_test_samples
- n_param_points = len(list(parameter_iterator))
- n_fits = len(out)
- n_folds = n_fits // n_param_points
-
- scores = list()
- cv_scores = list()
- for grid_start in range(0, n_fits, n_folds):
- n_test_samples = 0
- score = 0
- these_points = list()
- for this_score, clf_params, this_n_test_samples in \
- out[grid_start:grid_start + n_folds]:
- these_points.append(this_score)
- if self.iid:
- this_score *= this_n_test_samples
- n_test_samples += this_n_test_samples
- score += this_score
- if self.iid:
- score /= float(n_test_samples)
- else:
- score /= float(n_folds)
- scores.append((score, clf_params))
- cv_scores.append(these_points)
- cv_scores = np.asarray(cv_scores)
+ cv_eval = CVScorer(cv=self.cv, scoring=self.scorer_,
+ iid=self.iid, fit_params=self.fit_params,
+ n_jobs=self.n_jobs, pre_dispatch=self.pre_dispatch,
+ verbose=self.verbose)
+ search_results, cv_results = cv_eval.search(parameter_iterator,
+ self.estimator, X, y)
+
+ # Append 'parameters' to search_results
+ # Broken due to https://github.com/numpy/numpy/issues/2346:
+ # search_results = recfunctions.append_fields(
+ # search_results, 'parameters',
+ # np.asarray(list(parameter_iterator)), usemask=False)
+ new_search_results = np.zeros(
+ search_results.shape,
+ dtype=search_results.dtype.descr + [('parameters', 'O')]
+ )
+ for name in search_results.dtype.names:
+ new_search_results[name] = search_results[name]
+ new_search_results['parameters'] = list(parameter_iterator)
+ search_results = new_search_results
# Note: we do not use max(out) to make ties deterministic even if
# comparison on estimator instances is not deterministic
@@ -490,30 +428,29 @@ def _fit(self, X, y, parameter_iterator, **params):
else:
best_score = np.inf
- for score, params in scores:
+ for i, score in enumerate(search_results['test_score']):
if ((score > best_score and greater_is_better)
- or (score < best_score and not greater_is_better)):
+ or (score < best_score
+ and not greater_is_better)):
best_score = score
- best_params = params
+ best_index = i
- self.best_params_ = best_params
- self.best_score_ = best_score
+ self.best_index_ = best_index
+ self.fold_results_ = cv_results
+ self.search_results_ = search_results
if self.refit:
# fit the best estimator using the entire dataset
# clone first to work around broken estimators
- best_estimator = clone(base_clf).set_params(**best_params)
+ best_estimator = clone(self.estimator).set_params(
+ **self.best_params_
+ )
if y is not None:
best_estimator.fit(X, y, **self.fit_params)
else:
best_estimator.fit(X, **self.fit_params)
self.best_estimator_ = best_estimator
- # Store the computed scores
- self.cv_scores_ = [
- _CVScoreTuple(clf_params, score, all_scores)
- for clf_params, (score, _), all_scores
- in zip(parameter_iterator, scores, cv_scores)]
return self
@@ -568,9 +505,9 @@ class GridSearchCV(BaseSearchCV):
as in '2*n_jobs'
iid : boolean, optional
- If True, the data is assumed to be identically distributed across
- the folds, and the loss minimized is the total loss per sample,
- and not the mean loss across the folds.
+ If True (default), the data is assumed to be identically distributed
+ across the folds, and the mean score is the total score per sample,
+ and not the mean score across the folds.
cv : integer or cross-validation generator, optional
If an integer is passed, it is the number of folds (default 3).
@@ -604,20 +541,27 @@ class GridSearchCV(BaseSearchCV):
Attributes
----------
- `cv_scores_` : list of named tuples
- Contains scores for all parameter combinations in param_grid.
- Each entry corresponds to one parameter setting.
- Each named tuple has the attributes:
+ `search_results_` : structured array of shape [# param combinations]
+ For each parameter combination in ``param_grid`` includes these fields:
- * ``parameters``, a dict of parameter settings
- * ``mean_validation_score``, the mean score over the
+ * ``parameters``, dict of parameter settings
+ * ``test_score``, the mean score over the
cross-validation folds
- * ``cv_validation_scores``, the list of scores for each fold
+
+ `fold_results_` : structured array of shape [# param combinations, # folds]
+ For each cross-validation fold includes these fields:
+
+ * ``test_score``, the score for this fold
+ * ``test_n_samples``, the number of samples in testing
`best_estimator_` : estimator
Estimator that was choosen by the search, i.e. estimator
which gave highest score (or smallest loss if specified)
- on the left out data.
+ on the left out data. Available only if refit=True.
+
+ `best_index_` : int
+ The index of the best parameter setting into ``search_results_`` and
+ ``fold_results_`` data.
`best_score_` : float
Score of best_estimator on the left out data.
@@ -625,6 +569,10 @@ class GridSearchCV(BaseSearchCV):
`best_params_` : dict
Parameter setting that gave the best results on the hold out data.
+ `grid_scores_` : list of tuples (deprecated)
+ Contains scores for all parameter combinations in ``param_grid``:
+ each tuple is (parameters, mean score, fold scores).
+
Notes
------
The parameters selected are those that maximize the score of the left out
@@ -659,12 +607,6 @@ def __init__(self, estimator, param_grid, scoring=None, loss_func=None,
self.param_grid = param_grid
_check_param_grid(param_grid)
- @property
- def grid_scores_(self):
- warnings.warn("grid_scores_ is deprecated and will be removed in 0.15."
- " Use cv_scores_ instead.", DeprecationWarning)
- return self.cv_scores_
-
def fit(self, X, y=None, **params):
"""Run fit with all sets of parameters.
@@ -760,20 +702,27 @@ class RandomizedSearchCV(BaseSearchCV):
Attributes
----------
- `cv_scores_` : list of named tuples
- Contains scores for all parameter combinations in param_grid.
- Each entry corresponds to one parameter setting.
- Each named tuple has the attributes:
+ `search_results_` : structured array of shape [# param combinations]
+ For each parameter combination in ``param_grid`` includes these fields:
- * ``parameters``, a dict of parameter settings
- * ``mean_validation_score``, the mean score over the
+ * ``parameters``, dict of parameter settings
+ * ``test_score``, the mean score over the
cross-validation folds
- * ``cv_validation_scores``, the list of scores for each fold
+
+ `fold_results_` : structured array of shape [# param combinations, # folds]
+ For each cross-validation fold includes these fields:
+
+ * ``test_score``, the score for this fold
+ * ``test_n_samples``, the number of samples in testing
`best_estimator_` : estimator
Estimator that was choosen by the search, i.e. estimator
which gave highest score (or smallest loss if specified)
- on the left out data.
+ on the left out data. Available only if refit=True.
+
+ `best_index_` : int
+ The index of the best parameter setting into ``search_results_`` and
+ ``fold_results_`` data.
`best_score_` : float
Score of best_estimator on the left out data.
@@ -781,6 +730,10 @@ class RandomizedSearchCV(BaseSearchCV):
`best_params_` : dict
Parameter setting that gave the best results on the hold out data.
+ `grid_scores_` : list of tuples (deprecated)
+ Contains scores for all parameter combinations in ``param_grid``:
+ each tuple is (parameters, mean score, fold scores).
+
Notes
-----
The parameters selected are those that maximize the score of the left out
View
61 sklearn/tests/test_grid_search.py
@@ -134,9 +134,11 @@ def test_grid_search():
grid_search.fit(X, y)
sys.stdout = old_stdout
assert_equal(grid_search.best_estimator_.foo_param, 2)
+ assert_equal(grid_search.best_params_, {'foo_param': 2})
+ assert_equal(grid_search.best_score_, 1.)
for i, foo_i in enumerate([1, 2, 3]):
- assert_true(grid_search.cv_scores_[i][0]
+ assert_true(grid_search.search_results_['parameters'][i]
== {'foo_param': foo_i})
# Smoke test the score etc:
grid_search.score(X, y)
@@ -145,19 +147,52 @@ def test_grid_search():
grid_search.transform(X)
-def test_trivial_cv_scores():
+def test_grid_scores():
+ """Test that GridSearchCV.grid_scores_ is filled in the correct format"""
+ clf = MockClassifier()
+ grid_search = GridSearchCV(clf, {'foo_param': [1, 2, 3]}, verbose=3)
+ # make sure it selects the smallest parameter in case of ties
+ old_stdout = sys.stdout
+ sys.stdout = StringIO()
+ grid_search.fit(X, y)
+ sys.stdout = old_stdout
+ assert_equal(grid_search.best_estimator_.foo_param, 2)
+
+ n_folds = 3
+ with warnings.catch_warnings(record=True):
+ for i, foo_i in enumerate([1, 2, 3]):
+ assert_true(grid_search.grid_scores_[i][0]
+ == {'foo_param': foo_i})
+ # mean score
+ assert_almost_equal(
+ grid_search.grid_scores_[i][1],
+ (1. if foo_i > 1 else 0.)
+ )
+ # all fold scores
+ assert_array_equal(
+ grid_search.grid_scores_[i][2],
+ [1. if foo_i > 1 else 0.] * n_folds
+ )
+
+
+def test_trivial_results():
"""Test search over a "grid" with only one point.
- Non-regression test: cv_scores_ wouldn't be set by GridSearchCV.
+ Non-regression test: search_results_, etc. wouldn't be set by GridSearchCV.
"""
clf = MockClassifier()
grid_search = GridSearchCV(clf, {'foo_param': [1]})
grid_search.fit(X, y)
- assert_true(hasattr(grid_search, "cv_scores_"))
+ # Ensure attributes are set
+ grid_search.search_results_
+ grid_search.fold_results_
+ grid_search.best_index_
random_search = RandomizedSearchCV(clf, {'foo_param': [0]})
random_search.fit(X, y)
- assert_true(hasattr(random_search, "cv_scores_"))
+ grid_search.search_results_
+ grid_search.fold_results_
+ grid_search.best_index_
def test_no_refit():
@@ -196,20 +231,22 @@ def test_grid_search_iid():
# once with iid=True (default)
grid_search = GridSearchCV(svm, param_grid={'C': [1, 10]}, cv=cv)
grid_search.fit(X, y)
- _, average_score, scores = grid_search.cv_scores_[0]
+ scores = grid_search.fold_results_[0]['test_score']
assert_array_almost_equal(scores, [1, 1. / 3.])
# for first split, 1/4 of dataset is in test, for second 3/4.
# take weighted average
+ average_score = grid_search.search_results_[0]['test_score']
assert_almost_equal(average_score, 1 * 1. / 4. + 1. / 3. * 3. / 4.)
# once with iid=False (default)
grid_search = GridSearchCV(svm, param_grid={'C': [1, 10]}, cv=cv,
iid=False)
grid_search.fit(X, y)
- _, average_score, scores = grid_search.cv_scores_[0]
# scores are the same as above
+ scores = grid_search.fold_results_[0]['test_score']
assert_array_almost_equal(scores, [1, 1. / 3.])
# averaged score is just mean of scores
+ average_score = grid_search.search_results_[0]['test_score']
assert_almost_equal(average_score, np.mean(scores))
@@ -419,7 +456,10 @@ def test_X_as_list():
cv = KFold(n=len(X), n_folds=3)
grid_search = GridSearchCV(clf, {'foo_param': [1, 2, 3]}, cv=cv)
grid_search.fit(X.tolist(), y).score(X, y)
- assert_true(hasattr(grid_search, "cv_scores_"))
+ # Ensure result attributes are set
+ grid_search.search_results_
+ grid_search.fold_results_
+ grid_search.best_index_
def test_unsupervised_grid_search():
@@ -466,7 +506,7 @@ def test_randomized_search():
params = dict(C=distributions.expon())
search = RandomizedSearchCV(LinearSVC(), param_distributions=params)
search.fit(X, y)
- assert_equal(len(search.cv_scores_), 10)
+ assert_equal(len(search.search_results_['test_score']), 10)
def test_grid_search_score_consistency():
@@ -479,9 +519,8 @@ def test_grid_search_score_consistency():
grid_search = GridSearchCV(clf, {'C': Cs}, scoring=score)
grid_search.fit(X, y)
cv = StratifiedKFold(n_folds=3, y=y)
- for C, scores in zip(Cs, grid_search.cv_scores_):
+ for C, scores in zip(Cs, grid_search.fold_results_['test_score']):
clf.set_params(C=C)
- scores = scores[2] # get the separate runs from grid scores
i = 0
for train, test in cv:
clf.fit(X[train], y[train])
Something went wrong with that request. Please try again.