Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MATLAB-style implementations of the quantile and IQR functions #57

Merged
merged 5 commits into from Apr 10, 2021

Conversation

a-hurst
Copy link
Collaborator

@a-hurst a-hurst commented Apr 8, 2021

PR Description

Closes #56. As mentioned in that issue, MATLAB's quantile and IQR functions calculate quantiles slightly differently than Numpy and SciPy, leading to higher sensitivity to differences in channel correlation than in the original MATLAB PREP. This PR adds new mat_quantile and mat_iqr functions that match MATLAB's quantile logic and make PyPREP's internal correlation matrix for NoisyChannels.find_bad_by_correlation match the values in MATLAB PREP's equivalent matrix exactly.

TODO

  • I wasn't sure whether we'd want these functions to be part of the public API or private (i.e. _mat_irq and _mat_quantile) so I held off on writing docstrings.
  • I also wasn't sure what we wanted to do here for unit testing. For the sake of this PR, would individual tests for these two functions be fine, or should we add full-pipeline testing of these changes as well? Eventually we'll want to modify the CI tests to do proper comparisons between MATLAB and PyPREP output values for each stage of NoisyChannels, but that would require figuring out some good test data we could use for both pipelines that wouldn't demand too much in terms of RAM or CPU, so I think that's a bigger problem to tackle in a separate PR.
  • I need to update the changelog (which sort of depends on whether we want the functions to be public or private).

Merge Checklist

  • the PR has been reviewed and all comments are resolved
  • all CI checks pass
  • (if applicable): the PR description includes the phrase closes #<issue-number> to automatically close an issue
  • (if applicable): bug fixes, new features, or API changes are documented in whats_new.rst

@yjmantilla
Copy link
Collaborator

yjmantilla commented Apr 8, 2021

+1 for individual tests , at least at this stage. If you do a test comparing the saved matlab result (of only these functions) against this implementation and they come out as working better than before then I think that is enough; we are getting closer to matlab.

As you say a full pipeline test matlab vs python would be another PR . Thats a separate issue on its own.

+1 to private functions, I think this is too deep inside the code to be public, nor we are developing specifically these statistical functions.

@a-hurst a-hurst mentioned this pull request Apr 8, 2021
23 tasks
@sappelhoff sappelhoff added this to the 0.3.2 milestone Apr 8, 2021
@sappelhoff
Copy link
Owner

+1 to private functions, I think this is too deep inside the code to be public, nor we are developing specifically these statistical functions.

agreed

I wasn't sure whether we'd want these functions to be part of the public API or private (i.e. _mat_irq and _mat_quantile) so I held off on writing docstrings.

docstrings are almost always a good idea - even if it's not public-facing, the documentation will help us (the developers) in the future. Following the numpydoc convention makes the whole thing well structured and easy to read for us --> even if it does not end up being rendered as html on our docs page.

As you say a full pipeline test matlab vs python would be another PR . Thats a separate issue on its own.

agreed

pyprep/utils.py Outdated Show resolved Hide resolved
@a-hurst
Copy link
Collaborator Author

a-hurst commented Apr 9, 2021

@yjmantilla @sappelhoff Thanks for reviewing this! I've updated the pull request accordingly, and have also rebased to the current master branch.

Copy link
Owner

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great :) some minor questions/clarifications

tests/test_utils.py Show resolved Hide resolved
----------
arr : np.ndarray
Input array containing samples from the distribution to summarize.
axis : {int, tuple of int, None}, optional
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you get the idea to document the different options in curly brackets?

I usually do: axis : int | tuple of int | None, and then write what it defaults to in the description string below (not writing "optional" explicitly)

which doesn't mean that that's better (hence my question).

Copy link
Collaborator Author

@a-hurst a-hurst Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just copy/pasted that bit directly from the docstring of numpy.quantile since that arguments getting passed unmodified to that function anyway. No rationale whatsoever on my part, just figured I'd go with whatever the Numpy maintainers thought was the correct format (I'm new to Numpy docstrings, I've previously only used Google-style ones for Python).

If you'd like I can change it to your preferred format for the sake of consistency across PyPREP!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks, I read the numpydoc styleguide again, and it seems like the curly brackets is advertised. I think I picked up the | style from MNE-Python. Let's stick with the curly braces here and in general slowly convert to that style whenever we touch (git diff) a line that still uses the | style in the remaining codebase.

pyprep/utils.py Outdated Show resolved Hide resolved
docs/whats_new.rst Outdated Show resolved Hide resolved
@sappelhoff sappelhoff merged commit 4b4eb10 into sappelhoff:master Apr 10, 2021
@sappelhoff
Copy link
Owner

Thanks @a-hurst :-) on a roll with these PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bad channels by correlation differ between PyPREP and MATLAB PREP
3 participants