Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+ ½] Fixing #7155 in stochastic_gradient.py #7159

Closed
wants to merge 25 commits into from

Conversation

Projects
None yet
9 participants
@yl565
Copy link
Contributor

yl565 commented Aug 7, 2016

Fixes #7155
Now calling predict_proba/predict_log_proba of SGDClassifier throw the correct error when not fitted and calling predict_proba/predict_log_proba of GridSearchCV will not throw error when estimator=SGDClassifier and param_grid={'loss':['log']}

@yl565 yl565 changed the title [MRG] Fixing #7155 in stochastic_gradient.py [WIP] Fixing #7155 in stochastic_gradient.py Aug 7, 2016

@raghavrv

This comment has been minimized.

Copy link
Member

raghavrv commented Aug 7, 2016

Thanks for the PR! Could you also add a non regression test please?

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Aug 8, 2016

Hi @raghavrv, thanks for your comments! It is exciting that I'm submitting my first PR to sklearn. I'm fairly new to the unit testing procedure. Do you mean add some test to the linear_model/tests/test_sgd.py and tests/test_grid_search.py for the bugs in #7155?

all_or_any=all)
is_fitted = True
except _NotFittedError:
is_fitted = False

This comment has been minimized.

Copy link
@agramfort

agramfort Aug 8, 2016

Member

I am not sure it's the right fix.

is check_is_fitted really needed to know if predict_proba is supported?

can't you just remove the call to check_is_fitted ?

This comment has been minimized.

Copy link
@yl565

yl565 Aug 8, 2016

Author Contributor

If I remove the call, GridSearchCV would throw loss hinge error when predict_proba is called. The problem came from this line. It tried to get the predict_proba attribute of SGDClassifier class but the default loss is hinge. So I changed it to raise the probability estimates are not available for loss=hingeerror only if the estimator has already been fitted.

@raghavrv

This comment has been minimized.

Copy link
Member

raghavrv commented Aug 8, 2016

It is exciting that I'm submitting my first PR to sklearn.

Indeed it is ;)

I'm fairly new to the unit testing procedure. Do you mean add some test to the linear_model/tests/test_sgd.py and tests/test_grid_search.py for the bugs in #7155?

Adding a test along the lines of

def test_sgd_loss_param_grid_search():
    # Make sure the predict_proba works when loss is specified
    # as one of the parameters in the param_grid
    rng = np.random.RandomState(0)
    X = rng.random_sample([5, 2])
    y = rng.random_integers(0, 1, [5, ])
    param_grid = {
        'loss': ['log'],
    }
    clf_grid = grid_search.GridSearchCV(estimator=SGDClassifier(random_state=42),
                                        param_grid=param_grid)
    clf_grid.fit(X, y).predict_proba(X)

added to linear_model/tests/test_sgd.py would be good. (This is an issue with sgd (as we are hacking it a bit) and hence should go into sgd tests.)

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Aug 8, 2016

@raghavrv Your instruction is really helpful! I added the test along with a test to make sure when predict_proba is called the error raised is not fitted but not AttributeError: probability estimates are not available for loss='hinge'. Now running test_sgd.py with the original stochastic_gradient.py would give the following errors:

======================================================================
ERROR: test_loss_param_grid_search (sklearn.linear_model.tests.test_sgd.DenseSGDClassifierTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/tests/test_sgd.py", line 836, in test_loss_param_grid_search
    clf_grid.predict_proba(X)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/utils/metaestimators.py", line 35, in __get__
    self.get_attribute(obj)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 756, in predict_proba
    self._check_proba()
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 721, in _check_proba
    " loss=%r" % self.loss)
AttributeError: probability estimates are not available for loss='hinge'

======================================================================
ERROR: test_not_fitted_error (sklearn.linear_model.tests.test_sgd.DenseSGDClassifierTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/tests/test_sgd.py", line 846, in test_not_fitted_error
    clf.predict_proba, X)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 756, in predict_proba
    self._check_proba()
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 721, in _check_proba
    " loss=%r" % self.loss)
AttributeError: probability estimates are not available for loss='hinge'

======================================================================
ERROR: test_loss_param_grid_search (sklearn.linear_model.tests.test_sgd.SparseSGDClassifierTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/tests/test_sgd.py", line 837, in test_loss_param_grid_search
    clf_grid.predict_log_proba(X)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/utils/metaestimators.py", line 35, in __get__
    self.get_attribute(obj)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 822, in predict_log_proba
    self._check_proba()
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 721, in _check_proba
    " loss=%r" % self.loss)
AttributeError: probability estimates are not available for loss='hinge'

======================================================================
ERROR: test_not_fitted_error (sklearn.linear_model.tests.test_sgd.SparseSGDClassifierTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/tests/test_sgd.py", line 846, in test_not_fitted_error
    clf.predict_proba, X)
  File "/home/yichuanliu/anaconda3/envs/scikit-learn-dev/lib/python3.5/unittest/case.py", line 1258, in assertRaisesRegex
    return context.handle('assertRaisesRegex', args, kwargs)
  File "/home/yichuanliu/anaconda3/envs/scikit-learn-dev/lib/python3.5/unittest/case.py", line 176, in handle
    callable_obj(*args, **kwargs)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/tests/test_sgd.py", line 44, in predict_proba
    return super(SparseSGDClassifier, self).predict_proba(X)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 756, in predict_proba
    self._check_proba()
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 721, in _check_proba
    " loss=%r" % self.loss)
AttributeError: probability estimates are not available for loss='hinge'

----------------------------------------------------------------------
Ran 168 tests in 1.292s

FAILED (errors=4)

@yl565 yl565 changed the title [WIP] Fixing #7155 in stochastic_gradient.py [MRG] Fixing #7155 in stochastic_gradient.py Aug 8, 2016

@@ -32,7 +32,7 @@
from .sgd_fast import Huber
from .sgd_fast import EpsilonInsensitive
from .sgd_fast import SquaredEpsilonInsensitive

from ..exceptions import NotFittedError as _NotFittedError

This comment has been minimized.

Copy link
@nelson-liu

nelson-liu Aug 8, 2016

Contributor

i think you should add a newline after this, to replace the one you removed.

@@ -20,8 +20,9 @@
from sklearn.base import clone
from sklearn.linear_model import SGDClassifier, SGDRegressor
from sklearn.preprocessing import LabelEncoder, scale, MinMaxScaler

This comment has been minimized.

Copy link
@nelson-liu

nelson-liu Aug 8, 2016

Contributor

same here, insert a newline to replace the one you removed

@@ -823,7 +824,27 @@ def test_multiple_fit(self):
y = [["ham", "spam"][i] for i in LabelEncoder().fit_transform(Y)]
clf.fit(X[:, :-1], y)


This comment has been minimized.

Copy link
@nelson-liu

nelson-liu Aug 8, 2016

Contributor

here as well

# Make sure raise not fitted error and not the loss='hinge' error
clf = self.factory()
assert_raises_regexp(_NotFittedError,
"This {} instance is not fitted yet. "

This comment has been minimized.

Copy link
@nelson-liu

nelson-liu Aug 8, 2016

Contributor

python 2.6 requires numbered fields with format, hence why travis is a bit angry

yl565 added some commits Aug 8, 2016

@@ -757,6 +764,7 @@ def predict_proba(self):
return self._predict_proba

def _predict_proba(self, X):
check_is_fitted(self, ["t_", "coef_", "intercept_"], all_or_any=all)

This comment has been minimized.

Copy link
@raghavrv

raghavrv Aug 8, 2016

Member

Now this gets checked twice no? Do we want that?

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Aug 8, 2016

@nelson-liu Thanks for the comments! Newlines added.

@raghavrv

This comment has been minimized.

Copy link
Member

raghavrv commented Aug 8, 2016

Also why do we even set the fit params like t_ at init? I think we should fix that too.

Additionally I think we could simplify this by combining the _check_proba, _predict_proba and predict_proba into a single method with a decorator. The decorator would be similar to the if_delegete_has_method decorator. It can be made to check the loss and raise AttributeError. That was the fix I had in mind. But I am not sure if that would work as I haven't tried it yet... Maybe you can try that. That will make the code more readable I feel...

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Aug 9, 2016

@raghavrv Following your idea, I used decorator on a single method predict_proba. Also removed setting t_ and coef_ at init. Seems to be working as expected now?

@@ -496,7 +501,7 @@ def partial_fit(self, X, y, classes=None, sample_weight=None):
Returns
-------
self : returns an instance of self.
"""
"""

This comment has been minimized.

Copy link
@raghavrv

raghavrv Aug 9, 2016

Member

trailing white space

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Aug 9, 2016

@raghavrv PEP8 problem fixed!

@raghavrv

This comment has been minimized.

Copy link
Member

raghavrv commented Aug 9, 2016

Thanks! When loss='hinge' we would want an AttributeError instead of a NotFittedError error I think when the estimator is not fitted yet...

Before further reviews - @MechCoder and @agramfort does this approach seem okay?

@raghavrv

This comment has been minimized.

Copy link
Member

raghavrv commented Aug 9, 2016

@yl565 Could you revert back the PEP8 changes unrelated to this PR please?

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Aug 9, 2016

@raghavrv Thanks for all your help and comments! Will revert the PEP8 unrelated to this PR later tonight.

When loss='hinge' we would want an AttributeError instead of a NotFittedError error I think when the estimator is not fitted yet...

I have tried that but it will give the following testing error:

======================================================================
FAIL: sklearn.tests.test_common.test_non_meta_estimators('SGDClassifier', <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'>)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/yichuanliu/anaconda3/envs/scikit-learn-dev/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/utils/testing.py", line 343, in wrapper
    return fn(*args, **kwargs)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/utils/estimator_checks.py", line 1060, in check_estimators_unfitted
    est.predict_proba, X)
  File "/home/data/Dropbox/Python/scikit-learn/sklearn/utils/testing.py", line 424, in assert_raise_message
    (message, error_message))
AssertionError: Error message does not include the expected string: 'fit'. Observed error message: "probability estimates are notavailable for loss='hinge'
@raghavrv

This comment has been minimized.

Copy link
Member

raghavrv commented Aug 9, 2016

While initializing the SGDClassifier for that test, you will have to special case and set loss='log'...

BTW can you check why when loss='hinge', this line passes and invokes the test?

@raghavrv

This comment has been minimized.

Copy link
Member

raghavrv commented Aug 9, 2016

I was thinking along the lines of

from sklearn.model_selection import GridSearchCV
from sklearn.exceptions import NotFittedError
from sklearn.utils.validation import check_is_fitted

class _CheckProba(object):
    def __init__(self, fn):
        self.fn = fn

    def __call__(self, *args, **kw):
        return self.fn(*args, **kw)

    def __get__(self, obj, type=None):
        if obj.loss not in ("log", "modified_huber"):
            raise AttributeError("probability estimates are not available for"
                                 " loss=%r" % obj.loss)
        return self.__class__(self.fn.__get__(obj, type))

class Estim(object):
    def __init__(self, loss='hinge'):
        self.loss=loss

    @_CheckProba
    def predict_proba(self, X):
        check_is_fitted(self, 'coef_')
        return np.arange(X.shape[0])

    def fit(self, X, y=None):
        self.coef_ = 1
        return self

    def __repr__(self):
        return "Estim(loss=%r)" % self.loss

Test the following ...

>>> X = [[1, 2], [2, 3]]
>>> y = [0, 1]
>>> hasattr(Estim().fit(X, y), 'predict_proba')
False
>>> hasattr(Estim(loss='log').fit(X, y), 'predict_proba')
True
>>> Estim(loss='log').predict_proba(X)
NotFittedError
fn_check_obj(obj) with obj as the metaestimator instance.
The function will be executed before ducktyping. Typically used for
checking if the estimator has been fitted.

This comment has been minimized.

Copy link
@lesteve

lesteve Sep 6, 2016

Member

I would leave the doctests as they were or add them as tests to sklearn/utils/tests/test_metaestimators.py.

You could also add additional tests in the same file as you did at some point of this PR. Adding tests for if_fitted_delegate_has_method would be good too.

@@ -24,3 +33,33 @@ def test_delegated_docstring():
in str(MockMetaEstimator.func.__doc__))
assert_true("This is a mock delegated function"
in str(MockMetaEstimator().func.__doc__))


def test_stochastic_gradient_loss_param():

This comment has been minimized.

Copy link
@lesteve

lesteve Sep 6, 2016

Member

This test doesn't really belong to test_metaestimators.py maybe somewhere like sklearn/tests/test_grid_search.py would be better.

This comment has been minimized.

Copy link
@ogrisel

ogrisel Sep 11, 2016

Member

+1 for moving it to sklearn/model_selection/tests/test_search.py

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Sep 6, 2016

Only being able to sniff out supported methods after fitting seems to be somewhat of a problem :( This "only" happens with GridSearchCV and RandomizedSearchCV, right?
This seems to have weird impact on the estimator tags (#6599), because they are now only defined after fitting... Basically all possible tags can change during fit based on what steps a grid-search put into a pipeline.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 7, 2016

It happens when capabilities can't be determined from parameters alone. Currently that means GridSearchCV and RandomizedSearchCV, I think.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 7, 2016

Other estimators which could delegate to a GridSearchCV may be a problem too. A pipeline with a gridsearch as its last step, for instance, has indeterminate predict_proba support until after fitting.

Then in terms of duck-typing I see two choices:

  • only ever test for attributes after fitting
  • specify a set of attributes that can be tested for before fitting, the remainder only after. The same may be said of tags.

For reference, here is a quick-and-dirty listing of attributes we hasattr (on things that mostly aren't data):

1   "Lock"
1   "_idf_diag"
1   "_optimizer"
1   "alpha"
1   "alphas"
1   "args"
1   "compute_labels"
1   "connected_components"
1   "f_cached"
1   "fit_predict"
1   "iteritems"
1   "kind"
1   "lsqr"
1   "min_weight_fraction_leaf"
1   "n_nonzero_coefs"
1   "penalty"
1   "predict_log_proba"
1   "random_state"
1   "score"
1   "sorted_indices"
1   "sparsify"
1   "split"
1   "toarray"
1   attr
1   hasattr(MetaEst(HasNoPredict(
1   hasattr(MetaEst(HasPredict(
1   reg
2   "dtype"
2   "get_feature_names"
2   "get_params"
2   "item"
2   "items"
2   "kernel"
3   "data"
3   "fit_transform"
3   "l1_ratio"
3   "n_iter"
3   "transform"
3   method
4   "fit"
4   "partial_fit"
5   "decision_function"
5   "n_clusters"
5   "predict"
6   "n_components"
8   "predict_proba"

For example: predict_proba is a "only after fit" case. I think predict should be stable over fitting.

grep -oR 'hasattr([^\)]*)' sklearn/ | grep -v tests | grep -v external | grep -v '_\b' | cut -d: -f2 | grep -v 'hasattr(\(X\|t\|x\|v\|y\|np\|self.init\|f\|filename\|alg\|M\|yval\|w\|self.cv\)\b' | grep -v '^Binary' | cut -d' ' -f2 | tr "'" "\"" | sed 's/)$//' | count | sort -n

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Sep 7, 2016

So maybe we apply @if_delegate_has_method to stable method like predict and apply @if_fitted_delegate_has_method to predict_proba? Or we specify in metaestimators.py a list of attributes that should be checked if fitted?

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Sep 7, 2016

Sooo we want to warn people if they check hasattr before fit on the ones that are not stable?

predict can be not stable if I grid-search a Pipeline that can have as steps either PCA or SVC.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 7, 2016

More practically, can we separate the long-term solution from one which
supports the case broken in this bug, but does not break anything new? I
think the answer is the back-off delegation that was implemented in this PR
a few days ago...

On 8 September 2016 at 01:58, Yichuan Liu notifications@github.com wrote:

So maybe we apply @if_delegate_has_method to stable method like predict
and apply @if_fitted_delegate_has_method to predict_proba? Or we specify
in metaestimators.py a list of attributes that should be checked if
fitted?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7159 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz66jUOpqd4w5lYdzAOJNqYvoxPmwOks5qnt8KgaJpZM4Jek2m
.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Sep 9, 2016

@jnothman hm it kinda solves the problem after fit. I'm ok with it. Should we still target that for 0.18?

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Sep 9, 2016

@jnothman Do you think it is better to revert back to 265fc70?

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Sep 9, 2016

@yl565 I think that was the suggestion.

This allows ducktyping of the decorated method based on ``attribute_name``.
This allows ducktyping of the decorated method based on
``delegate.attribute_name`` where ``delegate`` is the first item in
``delegate_names`` that is an attribute of the base object.

This comment has been minimized.

Copy link
@amueller

amueller Sep 9, 2016

Member

I find this sentence hard to understand. maybe make it two sentences. And say "Here delegate is the first item in delegate_names for which hasattr(object, delegate) is true? not sure.

This comment has been minimized.

Copy link
@yl565

yl565 Sep 9, 2016

Author Contributor

How about the following:

Using this class to create a decorator for a class method object.attribute_name.
An AttributeError will be raised if none of the delegates (specified in
delegate_names) is an attribute of object (i.e. hasattr(object, delegate )
is False) or none of the delegates has an attribute attribute_name (i.e.
hasattr(delegate, attribute_name) is False).

This allows ducktyping of the decorated method object.attribute_name based on
object.delegate.attribute_name where delegate is the first item in
delegate_names for which hasattr(object, delegate ) is True).

This comment has been minimized.

Copy link
@ogrisel

ogrisel Sep 11, 2016

Member

I don't like " for a class method object.attribute_name":

  • attribute_name can be any attribute, not necessarily a method
  • a class method named "object.attribute_name" is weird or even contradictory.

This comment has been minimized.

Copy link
@ogrisel

ogrisel Sep 11, 2016

Member

Let's just spit the second sentence as @amueller suggested.

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Sep 11, 2016

Besides the unaddressed comments still visible in the diff view of this PR on github, LGTM.

@@ -14,16 +14,22 @@ class _IffHasAttrDescriptor(object):
"""Implements a conditional property using the descriptor protocol.
Using this class to create a decorator will raise an ``AttributeError``
if the ``attribute_name`` is not present on the base object.
if none of the delegates (specified in ``delegate_names``) is an attribute
of the base object or none of the delegates has an attribute

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 12, 2016

Member

"none of the delegates has" -> "the first found delegate does not have"

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Sep 12, 2016

@yl565 let me finish this PR for you. We want to have it merged into based before cutting the 0.18.X release branch tonight.

@yl565

This comment has been minimized.

Copy link
Contributor Author

yl565 commented Sep 12, 2016

@ogrisel Sorry for couldn't response earlier, have been busy preparing a paper. Please go head. Thanks.

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Sep 12, 2016

No problem!

ogrisel added a commit that referenced this pull request Sep 13, 2016

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Sep 13, 2016

#7401 has been merged. Thanks for the fix @yl565!

@ogrisel ogrisel closed this Sep 13, 2016

rsmith54 pushed a commit to rsmith54/scikit-learn that referenced this pull request Sep 14, 2016

dhimmel added a commit to dhimmel/machine-learning that referenced this pull request Sep 30, 2016

Upgrades: sklearn to 0.18.0, data to v5
Update data to latest available on figshare (v5). Expression and mutation have
not changed since v4, so do not expect changes from this upgrade.

Update sklearn to 0.18.0. See what's new at
http://scikit-learn.org/0.18/whats_new.html. Particularly exciting is:

> Fix incomplete `predict_proba` method delegation from
`model_selection.GridSearchCV` to `linear_model.SGDClassifier`
(scikit-learn/scikit-learn#7159) by Yichuan Liu (@yl565).

`3.TCGA-MLexample_Pathway.ipynb` had incorrect data location. Fixed so all
root notebooks would execute.

TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

dhimmel added a commit to cognoma/machine-learning that referenced this pull request Oct 4, 2016

Upgrades: sklearn to 0.18.0, data to v5 (#53)
* Upgrades: sklearn to 0.18.0, data to v5

Update data to latest available on figshare (v5). Expression and mutation have
not changed since v4, so do not expect changes from this upgrade.

Update sklearn to 0.18.0. See what's new at
http://scikit-learn.org/0.18/whats_new.html. Particularly exciting is:

> Fix incomplete `predict_proba` method delegation from
`model_selection.GridSearchCV` to `linear_model.SGDClassifier`
(scikit-learn/scikit-learn#7159) by Yichuan Liu (@yl565).

`3.TCGA-MLexample_Pathway.ipynb` had incorrect data location. Fixed so all
root notebooks would execute.

* Rerun with up-to-date local conda env

* Fix import deprecation warning

* Fix deprecation warning with cv grid scores

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.