-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permutation cluster test with TFCE: improvement of speed and memory usage in 2D #12609
Conversation
Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴 |
mne/stats/cluster_level.py
Outdated
clusters_out : bool | ||
If True, clusters are returned, otherwise None is returned instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this param at all? Could we instead just effectively have clusters_out = not tfce
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation. Indeed the clusters_out point is to not make the long list of arrays at each call of _find_cluster during permutation computation.
Since 'clusters' with TFCE is simply a list of one cluster per point, we could just decide to make this construction (for TFCE only) outside of the _find_cluster function. This would still need to depend on the dimension of input (1d or 2d) and the type of output wanted 'indices' or 'mask'.
I do not know what is the simplest (or more compliant with MNE practices) between the current additional parameter and moving the TFCE specific lines (lines 494-505) in _permutation_cluster_test (somewhere around lines1021 to 1033). But yes I would agree with the moving.
Any advice ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for just doing the construction outside of the _find_cluster
function wherever it's needed. As long as the public API output stays the same (and hopefully this is checked in a test already, if not then please add it!) then we should be okay to refactor however is cleanest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I moved the "clusters" construction for TFCE out of _find_clusters
.
Regarding the API, there is already a test test_output_equiv
to which I added the threshold
parameter to be tested.
However, I also noticed the False
value for adjacency
was not tested and indeed there are problems here:
adjacency=False
runs only for 1D inputs (which I think is not expected)- for 1D input, the output is always
"indices"
whatever isout_type
I guess it is rarely used... Should this be corrected in the same or another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is rarely used... Should this be corrected in the same or another PR?
Let's do it in another PR, I created #12613 so we don't forget
And merging |
OK but with my new commits I need to choose a strategy (novice with this kind of stuff in git) : What is the correct one? => I chose to merge (since it was was done to integrate the last commits of main in the current branch) |
Pushed a tiny commit and merged |
CI failure is due to @cbrnr do you know what that hook does / whether we should allow it to run in our CIs? |
No, @hofaflo? |
Hmm, strange! As far as I am aware,
So no idea why this error is coming up now, sorry! 🤔 Edit: Found the relevant issue: git-lfs/git-lfs#5749 |
Weird, working on a workaround in #12615 |
🎉 Congrats on merging your first pull request! 🥳 Looking forward to seeing more from you in the future! 💪 |
Reference issue
No ref, only a post in the forum
What does this implement/fix?
In mne.stats.cluster_level.py :
The use of TFCE with large 2D data implied a huge amount of memory because of the creation of as many boolean arrays of the same size of the data as the number of data points (with TFCE, each point is considered as a cluster - consisting of a single point - and need an array to describe it). For this case, I replace the large boolean array by a single index, and removed the clusters output when it was not necessary.
Additional information
It is my first ever PR, all comments are welcomed