Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Warn future change of default init in NMF #18525

Merged
merged 26 commits into from Oct 15, 2020
Merged

Conversation

cmarmo
Copy link
Member

@cmarmo cmarmo commented Oct 2, 2020

Reference Issues/PRs

Closes #18505

What does this implement/fix? Explain your changes.

  • Make 'nndsvda' default init in tests.
  • Prepare default change in 0.26

Any other comments?

To make tests pass I had to increase the tolerance for float32 float64 consistency.

Ping @jeremiedbb, @TomDLT

Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that the current PR will break backward compatibility:

  • raising an error when using solver="mu" and init="nndsvd" explicitly
  • changing the results with both solvers when using init=None. The changes will probably be small with the "cd" solver, but might be larger with the "mu" solver (though this might be considered a bugfix).
    I am fine with it, but other maintainers might disagree.

Other suggested changes:

  • add Not available for solver="mu". in the nndsvd section of the docstrings (in _initialize_nmf, NMF, and non_negative_matrix_factorization).
  • add a whats_new entry, including in the "Changed models" section.

Another possibility is to keep allowing init="nndsvd" with solver="mu" (the fixed sparsity might be considered as a kind of regularization), keep the warning, and change the behavior of init=None through a deprecation process.

Not sure what is the best solution, to be honest.

sklearn/decomposition/_nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/tests/test_nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/tests/test_nmf.py Outdated Show resolved Hide resolved
@GaelVaroquaux
Copy link
Member

If we go ahead with this change, a clear note in the whats_new should be added

@GaelVaroquaux
Copy link
Member

The proposed backward incompatible change may be against our guidelines. The problem being that people need to be able to update scikit-learn without worrying about breakage.

An argument for going ahead with such backward-incompatible change would be if the prior version should be considered as a bug. In other terms, if it leads often to significant breakage, it can be changed without a compatibility shim.

Trying to summarize things to make sure that I understand: in the current settings, the "transform" is broken by default, and the change is required to fix things? Am I correct?

@jeremiedbb
Copy link
Member

Trying to summarize things to make sure that I understand: in the current settings, the "transform" is broken by default, and the change is required to fix things? Am I correct?

This will not completely fix the fit.transform vs fit_transform issue. It's just a better default for the multiplicative update solver because it avoids zeros in the init.

@jeremiedbb
Copy link
Member

This PR currently changes the default for both solvers. Shouldn't it be only for the 'mu' solver ?

I'm in favor of keeping the warning and change the default for nmf through a deprecation cycle. (Note that for MiniBatchNMF you can already set the good default).

@cmarmo cmarmo changed the title [MRG] Make 'nndsvdar' default init for 'mu' solver. [WIP] Make 'nndsvdar' default init for 'mu' solver. Oct 4, 2020
@ogrisel
Copy link
Member

ogrisel commented Oct 5, 2020

I think I like the last suggestion of @TomDLT's #18525 (review) most:

Another possibility is to keep allowing init="nndsvd" with solver="mu" (the fixed sparsity might be considered as a kind of regularization), keep the warning, and change the behavior of init=None through a deprecation process.

@ogrisel
Copy link
Member

ogrisel commented Oct 5, 2020

I believe this is the same solution as @jeremiedbb's #18525 (comment).

@cmarmo cmarmo changed the title [WIP] Make 'nndsvdar' default init for 'mu' solver. [MRG] Deprecate 'nndsvd' default init for 'mu' solver. Oct 5, 2020
@cmarmo
Copy link
Member Author

cmarmo commented Oct 5, 2020

Thanks for all your comments.
I have reverted my changes and deprecated init='nndsvd'.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could take the opportunity to rename it "auto" (in 2 versions).

You also need to change the default in _initialize_nmf and maybe elsewhere
Could also make the warning visible (i.e even when init is none) in _check_string_parameters ?

sklearn/decomposition/_nmf.py Outdated Show resolved Hide resolved
Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change from 'nndsvd' to 'nndsvda' for all solvers. Indeed, I think it makes almost no difference for the "cd" solver, and it simplifies the behavior/documentation.

sklearn/decomposition/_nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/_nmf.py Outdated Show resolved Hide resolved
@cmarmo cmarmo changed the title [MRG] Deprecate 'nndsvd' default init for 'mu' solver. [MRG] Deprecate 'nndsvd' default init. Oct 6, 2020
sklearn/decomposition/_nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/_nmf.py Outdated Show resolved Hide resolved
sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
@TomDLT
Copy link
Member

TomDLT commented Oct 8, 2020

The (non negative) SVD initialization:

  • is used directly when init="nndsvd"
  • changes zeros into X.mean() when init="nndsvda"
  • changes zeros into random values when init="nndsvdar"

Importantly, the "mu" solver is based on multiplicative updates, so zeros in the initialization remain zeros during optimization. This is why init="nndsvd" should not be the default, and the reason of this PR.

@cmarmo
Copy link
Member Author

cmarmo commented Oct 9, 2020

I need some direction here.
The test test_nmf_regularization with solver='mu' and l1_ratio=1., fails when NMF is initialized with 'nndsvda' while passes when initialized with 'nndsvd'. I have to increase alpha to 2. in order to make the test pass again for all the combinations of solver and l1_ratio. The failing assertion is:

assert W_model.mean() > W_regul.mean()

I guess this is due to the fact that (quoting @TomDLT) "initialization changes zeros into X.mean() when init="nndsvda"?
Should I keep 'nndsvd' here as we want to separate the effect of the initialization from the one of regularization?

@jeremiedbb
Copy link
Member

jeremiedbb commented Oct 9, 2020

It's failing for both 'cd' and 'mu'. I wonder if the test is correct. I don't think it's guaranteed that the mean of W and the mean of H should both be smaller when using l2 reg. Don't you think we should check that ||W||^2 + ||H||^2 is smaller when using l2 reg instead ?

@cmarmo
Copy link
Member Author

cmarmo commented Oct 9, 2020

I wonder if the test is correct

I'm not the right person to ask... :)
Anyway, increasing max_iter fixes the other failing tests, so test_nmf_regularization is the last one to check.

@TomDLT
Copy link
Member

TomDLT commented Oct 9, 2020

I don't think it's guaranteed that the mean of W and the mean of H should both be smaller when using l2 reg. Don't you think we should check that ||W||^2 + ||H||^2 is smaller when using l2 reg instead ?

You are right, the test is wrong, we should check what you suggest.

  • The test is looking at W.mean() (L1 norm) whereas it should consider squared_norm(W) (L2 norm).
  • The test is looking at W and H independently, while it should consider the sum of their norms. To support this, you can see in the first figure in [MRG] Add scaling to alpha regularization parameter in NMF #5296 that increasing the regularization can sometime increase the mean of either W or H, but not both.

@cmarmo
Copy link
Member Author

cmarmo commented Oct 9, 2020

Thanks @jeremiedbb and @TomDLT, I will change the test then: is this worth a new PR?

@jeremiedbb
Copy link
Member

it's fine to do it in this pr

@cmarmo cmarmo changed the title [WIP] Deprecate 'nndsvd' default init. [MRG] Deprecate 'nndsvd' default init. Oct 11, 2020
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems that all the comments have been tackled. LGTM. Thanks for the fix @cmarmo and thanks @TomDLT for the interesting comments.

@TomDLT TomDLT changed the title [MRG] Deprecate 'nndsvd' default init. ENH Warn future change of default init in NMF Oct 15, 2020
@TomDLT TomDLT merged commit fe654ba into scikit-learn:master Oct 15, 2020
5 checks passed
@TomDLT
Copy link
Member

TomDLT commented Oct 15, 2020

Thanks @cmarmo !

@cmarmo cmarmo deleted the nmf-init branch October 16, 2020 06:28
amrcode pushed a commit to amrcode/scikit-learn that referenced this pull request Oct 19, 2020
@vene
Copy link
Member

vene commented Oct 20, 2020

Thanks for this and for catching the incorrect test! Just a note, there is a leftover comment in the test file that is now outdated (https://github.com/scikit-learn/scikit-learn/pull/18525/files#diff-e809907f1fe6840a81b7b9b9f1d0ef494052df97d328eeb25ae3e0495e238575R470) and it might confuse devs in the future.

(I probably introduced this mistake...)

@cmarmo
Copy link
Member Author

cmarmo commented Oct 20, 2020

Just a note, there is a leftover comment in the test file that is now outdated (https://github.com/scikit-learn/scikit-learn/pull/18525/files#diff-e809907f1fe6840a81b7b9b9f1d0ef494052df97d328eeb25ae3e0495e238575R470) and it might confuse devs in the future.

Right! Thanks @vene! I will fix it in #16948 if there are no objections.

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NMF default init with 'mu' solver
6 participants