New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Correcting length of explained_variance_ratio_, eigen solver, final PR #7632

Merged
merged 10 commits into from Oct 25, 2016

Conversation

Projects
None yet
3 participants
@JPFrancoia
Contributor

JPFrancoia commented Oct 10, 2016

Reference Issue

Fix #6032

What does this implement/fix? Explain your changes.

Attribute explained_variance_ratio_ from LinearDiscriminantAnalysis class will be of length n_components (eigen solver).

Any other comments?

This PR follows PR 7616. I mixed up my git history, so it was easier to open a new PR.

JPFrancoia added some commits Oct 10, 2016

Fix issue #6032. Attribute explained_variance_ratio_ from
LinearDiscriminantAnalysis class will be of length n_components, if
provided. If not provided, will have a maximum length of n_classes. The
attribute will have the same length, whatever the solver (SVD, Eigen).
@@ -615,6 +616,9 @@ Linear, kernelized and related models
- Access to public attributes ``.X_`` and ``.y_`` has been deprecated in
:class:`isotonic.IsotonicRegression`. By `Jonathan Arfa`_.
- Length of `explained_variance_ratio` of :clas:`discriminant_analysis.LinearDiscriminantAnalysis`

This comment has been minimized.

@jnothman

jnothman Oct 10, 2016

Member

Still cannot be under 0.18.0

:class:`discriminant_analysis.LinearDiscriminantAnalysis` now returns
correct results. By `JPFrancoia`_
:clas:`discriminant_analysis.LinearDiscriminantAnalysis` now returns correct results.
Attribute `explained_variance_ratio` calculated with SVD and Eigen solver are now of

This comment has been minimized.

@jnothman

jnothman Oct 10, 2016

Member

Still cannot be under 0.18.0

@jnothman jnothman added this to the 0.18.1 milestone Oct 10, 2016

@JPFrancoia

This comment has been minimized.

Contributor

JPFrancoia commented Oct 11, 2016

Ok. Under what should it be then ? We are almost there (and no more test failures).

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 12, 2016

Ok. Under what should it be then ? We are almost there (and no more test failures).

@amueller, should we create a 0.18.1 heading in what's new in master?

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 12, 2016

There's now a 0.18.1 section

JPFrancoia added some commits Oct 12, 2016

Modification of the whats_new file. The changes I made are now under
0.18.1. I created the sub-section "API changes summary" under 0.18.1.
@JPFrancoia

This comment has been minimized.

Contributor

JPFrancoia commented Oct 12, 2016

Ok, I put my changes under 0.18.1. I created the sub-section API changes summary under 0.18.1 too.

@@ -69,6 +69,18 @@ Bug fixes
`#6497 <https://github.com/scikit-learn/scikit-learn/pull/6497>`_
by `Sebastian Säger`_
- Attribute ``explained_variance_ratio`` of :clas:`discriminant_analysis.LinearDiscriminantAnalysis`

This comment has been minimized.

@jnothman

jnothman Oct 12, 2016

Member

You've repeatedly use :clas:. Should be :class:

This comment has been minimized.

@JPFrancoia

JPFrancoia Oct 13, 2016

Contributor

Arf. Sorry to waste your time with a typo.

@@ -69,6 +69,18 @@ Bug fixes
`#6497 <https://github.com/scikit-learn/scikit-learn/pull/6497>`_
by `Sebastian Säger`_
- Attribute ``explained_variance_ratio`` of :class:`discriminant_analysis.LinearDiscriminantAnalysis`

This comment has been minimized.

@jnothman

jnothman Oct 13, 2016

Member

I can waste both of our time a little further by asking that you make the lines somewhat shorter (around 80 chars max) and insert a link to the issue/PR which seems to have been adopted here... Sorry.

This comment has been minimized.

@JPFrancoia

JPFrancoia Oct 13, 2016

Contributor

No problem. Done.

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 13, 2016

Otherwise LGTM

@jnothman jnothman changed the title from Correcting length of explained_variance_ratio_, eigen solver, final PR to [MRG+1] Correcting length of explained_variance_ratio_, eigen solver, final PR Oct 13, 2016

@@ -335,8 +335,15 @@ class scatter). This solver supports both classification and
St = _cov(X, shrinkage) # total scatter
Sb = St - Sw # between scatter
# Get maximum length for explained_variance_ratio_
if self.n_components is None:
max_length_exp = len(self.classes_)

This comment has been minimized.

@amueller

amueller Oct 13, 2016

Member

shouldn't it be len(self.classes_) - 1?

This comment has been minimized.

@JPFrancoia

JPFrancoia Oct 13, 2016

Contributor

I don't think so. If n_components is not provided, then explained_variance_ratio_ will have a length of maximum len(self.classes). So [:max_length_exp]

This comment has been minimized.

@amueller

amueller Oct 14, 2016

Member

Well, that's a bug, the last one will always be zero.

@amueller

amueller requested changes Oct 14, 2016 edited

explained_variance_ratio_ definitely needs to be at most of length n_classes - 1

@JPFrancoia

This comment has been minimized.

Contributor

JPFrancoia commented Oct 15, 2016

Please explain why you think it should be len(self.classes_) - 1, I don't understand what you mean.

If I introduce this change, the tests fail:

nosetests -v sklearn.tests.test_discriminant_analysis
sklearn.tests.test_discriminant_analysis.test_lda_predict ... ok
sklearn.tests.test_discriminant_analysis.test_lda_priors ... ok
sklearn.tests.test_discriminant_analysis.test_lda_coefs ... ok
sklearn.tests.test_discriminant_analysis.test_lda_transform ... ok
sklearn.tests.test_discriminant_analysis.test_lda_explained_variance_ratio ... FAIL
sklearn.tests.test_discriminant_analysis.test_lda_orthogonality ... ok
sklearn.tests.test_discriminant_analysis.test_lda_scaling ... ok
sklearn.tests.test_discriminant_analysis.test_qda ... ok
sklearn.tests.test_discriminant_analysis.test_qda_priors ... ok
sklearn.tests.test_discriminant_analysis.test_qda_store_covariances ... ok
sklearn.tests.test_discriminant_analysis.test_qda_regularization ... ok
sklearn.tests.test_discriminant_analysis.test_deprecated_lda_qda_deprecation ... ok
sklearn.tests.test_discriminant_analysis.test_covariance ... ok

======================================================================
FAIL: sklearn.tests.test_discriminant_analysis.test_lda_explained_variance_ratio
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/djipey/.local/share/virtualenvs/sk/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/djipey/.local/share/virtualenvs/sk/lib/python3.5/site-packages/sklearn/tests/test_discriminant_analysis.py", line 180, in test_lda_explained_variance_ratio
    clf_lda_eigen.explained_variance_ratio_)
  File "/home/djipey/.local/share/virtualenvs/sk/lib/python3.5/site-packages/numpy/testing/utils.py", line 918, in assert_array_almost_equal
    precision=decimal)
  File "/home/djipey/.local/share/virtualenvs/sk/lib/python3.5/site-packages/numpy/testing/utils.py", line 694, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 6 decimals

(shapes (3,), (2,) mismatch)
 x: array([  6.037955e-01,   3.962045e-01,   1.787350e-32])
 y: array([ 0.603796,  0.396204])

----------------------------------------------------------------------
Ran 13 tests in 0.112s

FAILED (failures=1)
@amueller

This comment has been minimized.

Member

amueller commented Oct 17, 2016

The test fails because it's also wrong for the other solver imho.

My understanding is that the explained_variance_ratio_ refers to how much variance each of the components explains. However, there can only be n_classes - 1 components, as the components are spanned by the means of the classes.

@JPFrancoia

This comment has been minimized.

Contributor

JPFrancoia commented Oct 20, 2016

You are right (I double-checked, just to be sure). I fixed it for the 2 solvers, and I updated the tests so in the future the length of this attribute will be checked.

- Length of `explained_variance_ratio` of
:class:`discriminant_analysis.LinearDiscriminantAnalysis`
is now the same for Eigen and SVD solvers. (`#7632

This comment has been minimized.

@amueller

amueller Oct 20, 2016

Member

I think you should be more explicit by saying that it changed for both and is now min(n_components, n_classes - 1).

@amueller

This comment has been minimized.

Member

amueller commented Oct 20, 2016

can you maybe add an attribute n_components_ that is computed in fit and just use that? It should also be used in transform which currently has a kind of nonsensical check.

@JPFrancoia

This comment has been minimized.

Contributor

JPFrancoia commented Oct 20, 2016

There is already an attribute self.n_components. It might be ambigious. Maybe I can call it max_components_ ?

@amueller

This comment has been minimized.

Member

amueller commented Oct 20, 2016

Not really. We sometimes have attributes with names that are the same as parameters (but with a trailing underscore) if the "real" value is computed from the training data. We could do self._n_components, too, if you want to make it private for now.

Creation of the private attribute _max_components in
LinearDiscriminantAnalysis class. Used to determinate the maximum number
of components returned by the class. Also used for the lenght of
explained_variance_ratio_. See PR 7632.

Fixed some stuff to be more PEP8.
@JPFrancoia

This comment has been minimized.

Contributor

JPFrancoia commented Oct 23, 2016

Hum ok, I decided to call it _max_components, because it is really more explicit. This attribute is only used to store the maximum number of components of the classifier, so let's call a cat a cat.

I made it private for now. Let me know if you agree with these changes.

@amueller

This comment has been minimized.

Member

amueller commented Oct 24, 2016

hm I'm not sure why you feel _max_components is more apt than _n_components. If n_components < n_classes - 1 it's not really the max, right? But I don't really have a strong opinion, it's private anyhow.

@amueller

This comment has been minimized.

Member

amueller commented Oct 24, 2016

LGTM :)

@amueller

This comment has been minimized.

Member

amueller commented Oct 24, 2016

(There's one +1 on the current version which is from me. I think @jnothman's doesn't apply any more because there were substantial changes. maybe he has time to review again)

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 25, 2016

LGTM

@jnothman jnothman merged commit f260898 into scikit-learn:master Oct 25, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment