Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Fixes #10284 Added store_cv_values to RidgeClassifierCV and a test. #10297

Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 11 additions & 4 deletions sklearn/linear_model/ridge.py
Expand Up @@ -1301,10 +1301,15 @@ class RidgeClassifierCV(LinearClassifierMixin, _BaseRidgeCV):
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``

store_cv_values : boolean, default=False
Flag indicating if the cross-validation values corresponding to
each alpha should be stored in the `cv_values_` attribute (see
below). This flag is only compatible with `cv=None` (i.e. using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! Dind't touch this, I copied it from RidgeCV. Should I change documentation for store_cv_values_ and cv_values_ too in other parts of the file to keep consistency??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so

Generalized Cross-Validation).

Attributes
----------
cv_values_ : array, shape = [n_samples, n_alphas] or \
shape = [n_samples, n_responses, n_alphas], optional
cv_values_ : array, shape = [n_samples, n_targets, n_alphas], optional
Cross-validation values for each alpha (if `store_cv_values=True` and
`cv=None`). After `fit()` has been called, this attribute will contain \
the mean squared errors (by default) or the values of the \
Expand Down Expand Up @@ -1332,11 +1337,13 @@ class RidgeClassifierCV(LinearClassifierMixin, _BaseRidgeCV):
a one-versus-all approach. Concretely, this is implemented by taking
advantage of the multi-variate response support in Ridge.
"""

def __init__(self, alphas=(0.1, 1.0, 10.0), fit_intercept=True,
normalize=False, scoring=None, cv=None, class_weight=None):
normalize=False, scoring=None, cv=None, class_weight=None,
store_cv_values=False):
super(RidgeClassifierCV, self).__init__(
alphas=alphas, fit_intercept=fit_intercept, normalize=normalize,
scoring=scoring, cv=cv)
scoring=scoring, cv=cv, store_cv_values=store_cv_values)
self.class_weight = class_weight

def fit(self, X, y, sample_weight=None):
Expand Down
44 changes: 37 additions & 7 deletions sklearn/linear_model/tests/test_ridge.py
Expand Up @@ -49,8 +49,11 @@
X_iris = sp.csr_matrix(iris.data)
y_iris = iris.target

DENSE_FILTER = lambda X: X
SPARSE_FILTER = lambda X: sp.csr_matrix(X)

def DENSE_FILTER(X): return X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I said, sure, get rid of the extra rng =, in general, we avoid changing unrelated things in PRs. They make it harder to review and may cause merge conflicts with other work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, pep8 made that. Generally, I reverse those changes but missed this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverse this



def SPARSE_FILTER(X): return sp.csr_matrix(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reverse that



def test_ridge():
Expand Down Expand Up @@ -352,7 +355,7 @@ def _test_ridge_loo(filter_):
assert_equal(ridge_gcv2.alpha_, alpha_)

# check that we get same best alpha with custom score_func
func = lambda x, y: -mean_squared_error(x, y)
def func(x, y): return -mean_squared_error(x, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse this change

scoring = make_scorer(func)
ridge_gcv3 = RidgeCV(fit_intercept=False, scoring=scoring)
f(ridge_gcv3.fit)(filter_(X_diabetes), y_diabetes)
Expand Down Expand Up @@ -576,7 +579,7 @@ def test_class_weights_cv():

def test_ridgecv_store_cv_values():
# Test _RidgeCV's store_cv_values attribute.
rng = rng = np.random.RandomState(42)
rng = np.random.RandomState(42)

n_samples = 8
n_features = 5
Expand All @@ -592,10 +595,37 @@ def test_ridgecv_store_cv_values():
assert_equal(r.cv_values_.shape, (n_samples, n_alphas))

# with len(y.shape) == 2
n_responses = 3
y = rng.randn(n_samples, n_responses)
n_targets = 3
y = rng.randn(n_samples, n_targets)
r.fit(x, y)
assert_equal(r.cv_values_.shape, (n_samples, n_targets, n_alphas))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert r.cv_values_.shape == (n_samples, n_targets, n_alphas) 



def test_ridge_classifier_cv_store_cv_values():
# Test RidgeClassifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment

x = np.array([[-1.0, -1.0], [-1.0, 0], [-.8, -1.0],
[1.0, 1.0], [1.0, 0.0]])
y = np.array([1, 1, 1, -1, -1])

n_samples = x.shape[0]

alphas = [1e-1, 1e0, 1e1]
n_alphas = len(alphas)

r = RidgeClassifierCV(alphas=alphas, store_cv_values=True)

# with len(y.shape) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(y.shape) == 2.

n_targets = 1
r.fit(x, y)
assert_equal(r.cv_values_.shape, (n_samples, n_targets, n_alphas))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain assert


# with len(y.shape) == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is the case with both tests. What do you intend to distinguish between the different ys?

Copy link
Contributor Author

@mabelvj mabelvj Dec 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to replicate the test for RidgeCV:

def test_ridgecv_store_cv_values():
    # Test _RidgeCV's store_cv_values attribute.
    rng = np.random.RandomState(42)

    n_samples = 8
    n_features = 5
    x = rng.randn(n_samples, n_features)
    alphas = [1e-1, 1e0, 1e1]
    n_alphas = len(alphas)

    r = RidgeCV(alphas=alphas, store_cv_values=True)

    # with len(y.shape) == 1
    y = rng.randn(n_samples)
    r.fit(x, y)
    assert_equal(r.cv_values_.shape, (n_samples, n_alphas))

    # with len(y.shape) == 2
    n_responses = 3
    y = rng.randn(n_samples, n_responses)
    r.fit(x, y)
    assert_equal(r.cv_values_.shape, (n_samples, n_responses, n_alphas))

However, for the first case # with len(y.shape) == 1I found that RidgeClassifierCV always returns cv_values with 3 dimensions. That's part of the confusion.

I guess then two tests are unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Removed it.

y = np.array([[1, 1, 1, -1, -1],
[1, -1, 1, -1, 1],
[-1, -1, 1, -1, -1]]).transpose()
n_targets = y.shape[1]
r.fit(x, y)
assert_equal(r.cv_values_.shape, (n_samples, n_responses, n_alphas))
assert_equal(r.cv_values_.shape, (n_samples, n_targets, n_alphas))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain assert



def test_ridgecv_sample_weight():
Expand Down