Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permutation cluster test with TFCE: improvement of speed and memory usage in 2D #12609

Merged
merged 9 commits into from
May 17, 2024

Conversation

nfourcau
Copy link
Contributor

Reference issue

No ref, only a post in the forum

What does this implement/fix?

In mne.stats.cluster_level.py :
The use of TFCE with large 2D data implied a huge amount of memory because of the creation of as many boolean arrays of the same size of the data as the number of data points (with TFCE, each point is considered as a cluster - consisting of a single point - and need an array to describe it). For this case, I replace the large boolean array by a single index, and removed the clusters output when it was not necessary.

Additional information

It is my first ever PR, all comments are welcomed

Copy link

welcome bot commented May 14, 2024

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴

@nfourcau nfourcau changed the title cluster permutation test with TFCE: improvement of speed and memory usage in 2D WIP: cluster permutation test with TFCE: improvement of speed and memory usage in 2D May 14, 2024
@nfourcau nfourcau changed the title WIP: cluster permutation test with TFCE: improvement of speed and memory usage in 2D Cluster permutation test with TFCE: improvement of speed and memory usage in 2D May 14, 2024
@nfourcau nfourcau changed the title Cluster permutation test with TFCE: improvement of speed and memory usage in 2D Permutation cluster test with TFCE: improvement of speed and memory usage in 2D May 14, 2024
@nfourcau nfourcau marked this pull request as ready for review May 14, 2024 13:43
Comment on lines 377 to 378
clusters_out : bool
If True, clusters are returned, otherwise None is returned instead
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this param at all? Could we instead just effectively have clusters_out = not tfce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. Indeed the clusters_out point is to not make the long list of arrays at each call of _find_cluster during permutation computation.
Since 'clusters' with TFCE is simply a list of one cluster per point, we could just decide to make this construction (for TFCE only) outside of the _find_cluster function. This would still need to depend on the dimension of input (1d or 2d) and the type of output wanted 'indices' or 'mask'.
I do not know what is the simplest (or more compliant with MNE practices) between the current additional parameter and moving the TFCE specific lines (lines 494-505) in _permutation_cluster_test (somewhere around lines1021 to 1033). But yes I would agree with the moving.
Any advice ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for just doing the construction outside of the _find_cluster function wherever it's needed. As long as the public API output stays the same (and hopefully this is checked in a test already, if not then please add it!) then we should be okay to refactor however is cleanest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I moved the "clusters" construction for TFCE out of _find_clusters.

Regarding the API, there is already a test test_output_equiv to which I added the threshold parameter to be tested.
However, I also noticed the False value for adjacency was not tested and indeed there are problems here:

  • adjacency=False runs only for 1D inputs (which I think is not expected)
  • for 1D input, the output is always "indices" whatever is out_type

I guess it is rarely used... Should this be corrected in the same or another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is rarely used... Should this be corrected in the same or another PR?

Let's do it in another PR, I created #12613 so we don't forget

@larsoner
Copy link
Member

And merging main into your branch should fix CIs so I'll do it now

@nfourcau
Copy link
Contributor Author

nfourcau commented May 16, 2024

And merging main into your branch should fix CIs so I'll do it now

OK but with my new commits I need to choose a strategy (novice with this kind of stuff in git) :
git config pull.rebase false # merge (the default strategy)
git config pull.rebase true # rebase
git config pull.ff only # fast-forward only

What is the correct one? => I chose to merge (since it was was done to integrate the last commits of main in the current branch)

@larsoner
Copy link
Member

Pushed a tiny commit and merged main into your branch, marking for merge-when-green. Thanks in advance @nfourcau !

@larsoner larsoner enabled auto-merge (squash) May 16, 2024 13:32
@drammock
Copy link
Member

CI failure is due to edfio adding a post-checkout hook:
https://dev.azure.com/mne-tools/mne-python/_build/results?buildId=30068&view=logs&j=b9064c46-2375-5b70-72c1-f55d0d61c63a&t=e34ff71f-29f1-5601-0139-1a3a772fec70&l=496

@cbrnr do you know what that hook does / whether we should allow it to run in our CIs?

@cbrnr
Copy link
Contributor

cbrnr commented May 16, 2024

No, @hofaflo?

@hofaflo
Copy link
Contributor

hofaflo commented May 16, 2024

Hmm, strange! As far as I am aware, edfio did not add any hooks – didn't even know that this was possible on a non-local repo level. Opening .git/hooks/post-checkout locally shows this (which makes sense, as we're using LFS to store test files, but that has always been the case):

#!/bin/sh
command -v git-lfs >/dev/null 2>&1 || { echo >&2 "\nThis repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting the 'post-commit' file in the hooks directory (set by 'core.hookspath'; usually '.git/hooks').\n"; exit 2; }
git lfs post-commit "$@"

So no idea why this error is coming up now, sorry! 🤔

Edit: Found the relevant issue: git-lfs/git-lfs#5749

@larsoner
Copy link
Member

Weird, working on a workaround in #12615

@larsoner larsoner added this to the 1.8 milestone May 17, 2024
@larsoner larsoner merged commit cf0e12d into mne-tools:main May 17, 2024
30 checks passed
Copy link

welcome bot commented May 17, 2024

🎉 Congrats on merging your first pull request! 🥳 Looking forward to seeing more from you in the future! 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants