Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG, API: Deprecate max_pca_components #8351

Merged
merged 12 commits into from
Oct 12, 2020
Merged

Conversation

larsoner
Copy link
Member

@larsoner larsoner commented Oct 7, 2020

Todo:

  • Deprecate max_pca_components
  • Get test_ica.py working
  • Add n_pca_components to plot_ica_overlay
  • Make keyword-only after MRG, BUG: Force keywords for verbose in short sigs #8350 is in
  • Update plot_40_artifact_correction_ica.py
  • Add warning when n_components leads to unstable pseudoinversion
  • Update doc/_static/diagrams/ica.dot
  • Decide if n_components as float selects based on the variance remaining of the n_pca_components components. This is how max_pca_components worked (it was relative to that)

Closes #8337
Closes #8342
Closes #8327
Closes #8328
Closes #8316
Closes #8313
Closes #7727
Closes #8353

@larsoner larsoner changed the title API: Deprecate max_pca_components MRG, API: Deprecate max_pca_components Oct 7, 2020
@larsoner
Copy link
Member Author

larsoner commented Oct 7, 2020

Okay this is ready for review/merge from my end

@larsoner
Copy link
Member Author

larsoner commented Oct 7, 2020

Travis failure is an unrelated conda-forge python 3.9-related bug that's hitting all PRs/branches

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoechenberger and @cbrnr you might want to review and test this branch before it gets merged

mne/preprocessing/ica.py Outdated Show resolved Hide resolved
mne/preprocessing/ica.py Outdated Show resolved Hide resolved
mne/preprocessing/ica.py Outdated Show resolved Hide resolved
mne/preprocessing/ica.py Outdated Show resolved Hide resolved
@agramfort
Copy link
Member

agramfort commented Oct 8, 2020 via email

@larsoner
Copy link
Member Author

larsoner commented Oct 8, 2020

should break. you cannot limit the number of PCA components and require all the variance to be explained for ICA.

It depends on if you're talking about wanting the n_components=1. fraction of explained variance of the N PCA components (which is how it works in master with max_pca_components) or the total variance of your original data (what you're saying/assuming). I can certainly code it this way, and if I were designing it from scratch this is probably how I would do it, but I suspect this is going to break more people's code than the way I have it coded now.

@larsoner
Copy link
Member Author

larsoner commented Oct 8, 2020

(It certainly breaks more tests, but those are quick enough to fix)

@cbrnr
Copy link
Contributor

cbrnr commented Oct 8, 2020

Wait, are we really distinguishing 1 from 1.0? That's a bit unexpected, so can we find a better solution for this edge case (wanting just 1 component should be rarely needed, whereas all components corresponding to a fraction of 1.0 should be the default).

@larsoner
Copy link
Member Author

larsoner commented Oct 8, 2020

Wait, are we really distinguishing 1 from 1.0

Yes float and int are different, I can add a test that n_components=1 will raise an error (it does already at least when selecting by exp var, I can make sure it does for the int case)

whereas all components corresponding to a fraction of 1.0 should be the default

This is the default as n_components=None means "use all"

@hoechenberger
Copy link
Member

@larsoner

It depends on if you're talking about wanting the n_components=1. fraction of explained variance of the N PCA components (which is how it works in master with max_pca_components) or the total variance of your original data (what you're saying/assuming). I can certainly code it this way, and if I were designing it from scratch this is probably how I would do it, but I suspect this is going to break more people's code than the way I have it coded now.

If we're re-thinking these questions from scratch anyway, I'd prefer if n_components: float would refer to the fraction of total explained variance. So we'd need to check if n_components: float <= n_pca_components

I cannot really imagine anyone would want a fraction of a fraction?

@hoechenberger
Copy link
Member

Wait, are we really distinguishing 1 from 1.0

Yes float and int are different, I can add a test that n_components=1 will raise an error (it does already at least when selecting by exp var, I can make sure it does for the int case)

I believe this would lead to a cleaner API, yes. Good idea.

@cbrnr
Copy link
Contributor

cbrnr commented Oct 8, 2020

What if I do want to select just one component?

@hoechenberger
Copy link
Member

What if I do want to select just one component?

Is that an actual use case or just a theoretical consideration?

@cbrnr
Copy link
Contributor

cbrnr commented Oct 8, 2020

Is that an actual use case or just a theoretical consideration?

It's theoretical, but I'm pretty sure someone will want to use it for a good reason.

Also, it is a pretty unusual behavior to treat 1 and 1.0 differently, because even though the types are different, 1 == 1.0, and I'd prefer a solution where we can avoid this case.

@hoechenberger
Copy link
Member

But ok, allowing 1 as int probably won't hurt anyone either..

@cbrnr
Copy link
Contributor

cbrnr commented Oct 8, 2020

But ok, allowing 1 as int probably won't hurt anyone either..

If both 1 and 1.0 do they same thing, then yes.

@larsoner
Copy link
Member Author

larsoner commented Oct 8, 2020

If both 1 and 1.0 do they same thing, then yes.

Can you open a separate issue about this? Let's solve it and discuss it elsewhere

@agramfort
Copy link
Member

agramfort commented Oct 8, 2020 via email

@larsoner larsoner marked this pull request as ready for review October 8, 2020 13:59
Copy link
Member Author

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • docs updated
  • killed 1. (float) and 1 (int) to be safe
  • updated behavior of n_components=None (see inline comment)
  • made n_components float value operate on total explained variance, not relative to n_pca_components

Comment on lines 154 to 158
- ``None``
``n_pca_components`` or ``0.999999`` will be used, whichever
results in fewer components. This is done to avoid numerical
stability problems when whitening.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important change/fix I think. The defaults of n_pca_components=None and n_components=None are still dangerous for rank-reduced data. This makes our defaults safe for rank-reduced data.

@larsoner
Copy link
Member Author

larsoner commented Oct 9, 2020

Why do we insist on the float values referring to more than the specified explained variance?

To be consistent with sklearn:

"If 0 < n_components < 1 and svd_solver == 'full', select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components"

For floats you generally shouldn't rely on equivalence anyway so this is really in practice a tiny (hopefully meaningless in practice) distinction here anyway

Copy link
Member

@hoechenberger hoechenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal to make the docs a little easier to understand, otherwise looks super great and good to merge!

mne/preprocessing/ica.py Outdated Show resolved Hide resolved
@hoechenberger
Copy link
Member

@cbrnr Would be great to get your feedback / approval too before merging. Thanks!

@hoechenberger
Copy link
Member

Friendly ping @cbrnr

@cbrnr
Copy link
Contributor

cbrnr commented Oct 12, 2020

I'll take a look within the next hour or so!

@hoechenberger
Copy link
Member

Great, thank you

@cbrnr
Copy link
Contributor

cbrnr commented Oct 12, 2020

Where can I find the updated rendered ICA diagram?

@hoechenberger
Copy link
Member

Where can I find the updated rendered ICA diagram?

The doc artifact is at https://22659-1301584-gh.circle-artifacts.com/0/dev/index.html

Copy link
Contributor

@cbrnr cbrnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I've tested this with one of my current data sets, and everything seems to work as before.

I have one comment that we could also discuss in a separate issue, but I wanted to bring it up here. Do we really need n_pca_components in ICA.__init__? As far as I can see, this parameter is neither used during initialization nor fitting. It is used in ICA.apply, which also has an n_pca_components parameter, which when set overrides the value passed in the initializer (I think). Wouldn't our API be much cleaner and easier to understand if we dropped n_pca_components from __init__ and just kept it in apply? Or am I missing something and we actually need to pass it when constructing the ICA object? Again, we don't have to discuss this here, so feel free to merge this PR and discuss this in a separate issue.

doc/changes/latest.inc Outdated Show resolved Hide resolved
@hoechenberger
Copy link
Member

hoechenberger commented Oct 12, 2020

@cbrnr

I have one comment that we could also discuss in a separate issue, but I wanted to bring it up here. Do we really need n_pca_components in ICA.__init__? As far as I can see, this parameter is neither used during initialization nor fitting. It is used in ICA.apply, which also has an n_pca_components parameter, which when set overrides the value passed in the initializer (I think). Wouldn't our API be much cleaner and easier to understand if we dropped n_pca_components from __init__ and just kept it in apply? Or am I missing something and we actually need to pass it when constructing the ICA object?

You're absolutely right, it's only used during apply(). @larsoner and I had a brief discussion somewhere (too lazy to look it up now) and figured it might be best to just keep in in __init__() so you could set it right upon ICA instantiation and can forget about it later, but still be able to override this setting when calling apply().

But happy to open this discussion again!

@hoechenberger hoechenberger merged commit 9147314 into mne-tools:master Oct 12, 2020
@hoechenberger
Copy link
Member

Thanks a bunch, @larsoner!

@larsoner larsoner deleted the ica branch October 12, 2020 13:46
@cbrnr
Copy link
Contributor

cbrnr commented Oct 12, 2020

This must have been the PR that closed the most issues at once! Thanks @larsoner!

@hoechenberger
Copy link
Member

Absolutely. I like how I kept opening issues and @larsoner would come along closing them. How convenient 💪☺️

marsipu pushed a commit to marsipu/mne-python that referenced this pull request Oct 14, 2020
* API: Deprecate max_pca_components

* FIX: Flake

* FIX: Test

* FIX: Fix viz tests

* FIX: Test

* ENH: Warn and doc

* FIX: Exp var and 1. and 1

* STY: Update docs [skip travis] [skip github]

* TST: Faster test

* DOC: Better docs [ci skip]

* Apply suggestions from code review [skip travis]

* Fix docstring [skip travis]

Co-authored-by: Richard Höchenberger <richard.hoechenberger@gmail.com>
@cbrnr
Copy link
Contributor

cbrnr commented Dec 1, 2020

@larsoner any chance we can backport this? It contains a nice fix in read_ica which I originally hadn't noticed. Previously, method was set to the default FastICA (as in ICA.__init__), but now we're actually reading in the method used to compute a particular solution. This is nice, because I have a bunch of files where I used picard, and currently I do need to install scikit-learn just because we don't read the actual method.

If it's too big of a change for a backport that's OK, it's not that I can't just install scikit-learn.

@cbrnr
Copy link
Contributor

cbrnr commented Dec 1, 2020

Although of course just because of this I'm running into the issue that scikit-learn has not been released for Python 3.9 yet, so generating the wheel myself takes ages.

@agramfort
Copy link
Member

agramfort commented Dec 1, 2020 via email

@cbrnr
Copy link
Contributor

cbrnr commented Dec 1, 2020

Yes, if we do not have to stick to a 6 month release cycle I'd be happy if the next one was a bit sooner.

@agramfort
Copy link
Member

agramfort commented Dec 1, 2020 via email

@hoechenberger
Copy link
Member

Yes, can we cut a new release in December already? These ICA changes are really… important. Also lots of @drammock's viz changes are a real game changer already!

@agramfort
Copy link
Member

agramfort commented Dec 1, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants