Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG, FIX: Test round-trip, fix EEGLAB ICA #8326

Merged
merged 7 commits into from Oct 4, 2020

Conversation

larsoner
Copy link
Member

@larsoner larsoner commented Sep 30, 2020

Closes #8325
Closes #8308
Closes #7657

@larsoner
Copy link
Member Author

larsoner commented Oct 1, 2020

On this branch I get the same plots as in #8308:

Screenshot from 2020-10-01 14-18-38

@christian-oreilly @marcpabst can you try it on your datasets to see if it makes sense? I was able to unify the code so that no branching is required, which is nice.

@larsoner larsoner changed the title WIP, FIX: Dont destabilize pinv MRG, FIX: Test round-trip, fix EEGLAB ICA Oct 1, 2020
@larsoner
Copy link
Member Author

larsoner commented Oct 2, 2020

Okay this one is ready for review @agramfort @hoechenberger . We should wait to hear from @christian-oreilly and @marcpabst before merge, though

Copy link
Member

@hoechenberger hoechenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a question, otherwise LGTM.
We need to do the same thing for n_components, for sake of consistency.

mne/utils/tests/test_numerics.py Show resolved Hide resolved
@larsoner
Copy link
Member Author

larsoner commented Oct 2, 2020

Argh, I went to update the documentation, now I'm confused why we need n_components, n_pca_components, and max_pca_components. I thought we'd only need two (n_pca_components and max_pca_components, and that max_pca_components determined signal reconstruction and n_pca_components was what was passed to the ICA algorithm, but this is not at all what our docs say. I'll revert my doc changes (other than the refs) and hopefully we can fix it later.

@larsoner
Copy link
Member Author

larsoner commented Oct 2, 2020

@hoechenberger I'm beginning to wonder if we need max_pca_componetns to be float. Really it should be enough to allow n_components to be a float I think. The instability of the inversion comes from dividing by np.sqrt(self.pca_explained_variance_[:self.n_components_]). So if n_components is set appropriately, it seems like there shouldn't be a problem.

@hoechenberger
Copy link
Member

hoechenberger commented Oct 2, 2020

@larsoner

@hoechenberger I'm beginning to wonder if we need max_pca_componetns to be float. Really it should be enough to allow n_components to be a float I think. The instability of the inversion comes from dividing by np.sqrt(self.pca_explained_variance_[:self.n_components_]). So if n_components is set appropriately, it seems like there shouldn't be a problem.

All max_pca_components are submitted to ICA for fitting; after fitting, only n_componentsICA components are retained.

If it is like you say and we can get rid of the instability caused by choosing inappropriate max_pca_components and produce sane results, then things should be okay… Still I'm a little worried. Also, max_pca_components being a float could be just another way to limit how many components to fit ICA on – would be consistent with n_components and n_pca_components

@hoechenberger
Copy link
Member

Argh, I went to update the documentation, now I'm confused why we need n_components, n_pca_components, and max_pca_components. I thought we'd only need two (n_pca_components and max_pca_components, and that max_pca_components determined signal reconstruction and n_pca_components was what was passed to the ICA algorithm, but this is not at all what our docs say. I'll revert my doc changes (other than the refs) and hopefully we can fix it later.

Yet another example of our API being way too complicated. We've had so many issues with this…

@larsoner
Copy link
Member Author

larsoner commented Oct 2, 2020

Okay let me see if I can write the docs properly

@hoechenberger
Copy link
Member

hoechenberger commented Oct 2, 2020

Maybe to clarify, my understanding is:

  • max_pca_components: PCA components to keep
  • n_components: number of ICA components that are retained after fitting ICA on data with dimensionality reduced to max_pca_components --> the docs say otherwise, so either the docs are wrong or I made a mistake reading the sources
  • n_pca_components: number of components (ICA + PCA!!!!) to use for signal reconstruction -> maybe this kwarg is just poorly named

@hoechenberger
Copy link
Member

(edited my above post a couple times)

@larsoner
Copy link
Member Author

larsoner commented Oct 2, 2020

n_components: number of ICA components that are retained after fitting ICA on data with dimensionality reduced to max_pca_components

The code has this:

        sel = slice(0, self.n_components_)
        if self.method == 'fastica':
            from sklearn.decomposition import FastICA
            ica = FastICA(whiten=False, random_state=random_state,
                          **self.fit_params)
            ica.fit(data_transformed[:, sel])

So I think the docs are correct -- ICA is fitted on the first n_components of the PCA.

n_pca_components isn't even used until apply so it only has to do with the apply step. I'll add this to the docs.

@larsoner
Copy link
Member Author

larsoner commented Oct 2, 2020

Now I wonder if using n_components as float would have been enough and max_pca_components as float is overkill... I'll push a commit that shows setting n_components to a float also gives the behavior we want, even with max_pca_components=n_chan

@hoechenberger
Copy link
Member

So I think the docs are correct -- ICA is fitted on the first n_components of the PCA.

Oh ok, I got this mixed up then.

@larsoner larsoner mentioned this pull request Oct 4, 2020
7 tasks
Copy link
Member

@hoechenberger hoechenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI is green

@hoechenberger hoechenberger merged commit 47ce37e into mne-tools:master Oct 4, 2020
2 checks passed
@hoechenberger
Copy link
Member

Thanks @larsoner! I'll want to try this on some of our data now before we move forward with the API redesign, ok?

@larsoner larsoner deleted the ica branch October 4, 2020 15:38
marsipu pushed a commit to marsipu/mne-python that referenced this pull request Oct 14, 2020
* FIX: Dont destabilize pinv

* FIX: sklearn

* FIX: Fix for EEGLAB

* TST: Better tests

* FIX: Better logic for n_pca_components

* FIX: Better tests

* Fix typo [skip travis]

Co-authored-by: Richard Höchenberger <richard.hoechenberger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants