Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read raw eyelink fix #11823

Merged
merged 9 commits into from
Jul 26, 2023
Merged

Conversation

scott-huberty
Copy link
Contributor

@scott-huberty scott-huberty commented Jul 21, 2023

Fixes #11809
Fixes #11758

This fixes the bug the researcher reported, where the STATUS column wasn't present in their ASCII file.

Plus, I knew that read_raw_eyelink was slow, but it became really apparent once the researcher shared their problematic file with me, which was 200mb and over 4 million lines long.. which is way bigger than any ASCII file I've come across.

So, I profiled the code to look for bottlenecks, and since we already require pandas for read_raw_eyelink I refactored the reader to use vector operations whenever possible.

I think we are seeing a decent speed up. If we read one of our eyelink sample files:

from mne.datasets.eyelink import data_path
from mne.io import read_raw_eyelink
fname  = data_path() / "sub-01_task-plr_eyetrack.asc"
%timeit read_raw_eyelink(fname)

On the main branch we get:
2.59 s ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And on this branch we get
797 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So the refactored code is about 3x faster. Unfortunately that huge ASCII file still takes like about a minute to read into mne..

Finally, in my refactoring, i've tried to simplify the code and make it more organized.

TODOS:

  • add tests for the bug fix this PR addressed
  • get the testing coverage of this module to be above 95%
  • see if we can get any more performance speed ups, to quicken the loading time for the very large ASCII file

@scott-huberty scott-huberty marked this pull request as draft July 21, 2023 21:17
Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start, more red than green (hooray!)

mne/io/eyelink/_utils.py Outdated Show resolved Hide resolved
@@ -205,10 +204,10 @@ def __init__(
self.fname = Path(fname)
self._sample_lines = None # sample lines from file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this on the first read, but all this extra "stuff" is not supposed to be part of the RawEyelink class. You need to put it all in _raw_extras[0]. This can be a dict, too. This is going to be a big change so you don't need to do it here -- we can figure it out later. I'll create a new issue about adding a test for this since it's possible there are other readers that need to be updated, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip, I was trying to see if there was a way not to bring all this baggage after reading the file.

Will happily to move it all to _raw_extras

pd.replace is very slow. Using numpy is 10x faster on a large file.
for example on a file with 4 million lines, pd.replace takes 30 seconds
and using numpy where takes 2.5 seconds.

Also nested the system message check in parse_recordings blocks under
the block of code where we know those messages appear. this way we dont
check for system messages on literally every line of the file!
new test creates a new temp file and adds additional channels so that more of
read_raw_eyelink will be touched in testing
@scott-huberty scott-huberty marked this pull request as ready for review July 25, 2023 20:18
@scott-huberty
Copy link
Contributor Author

Sorry @larsoner we're no longer more red than green : ( but most of the lines I added were tests, and the mne.io.eyelink.eyelink module now has +95% coverage!

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than two minor comments LGTM!

scott-huberty and others added 2 commits July 25, 2023 16:48
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
now that we use the tmp_path fixture, no unlinking needed
Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple drive-by comments

mne/io/eyelink/_utils.py Show resolved Hide resolved
mne/io/eyelink/eyelink.py Outdated Show resolved Hide resolved
@scott-huberty
Copy link
Contributor Author

I'm not sure what is causing the failing tests? (they seem unrelated?).

@drammock
Copy link
Member

drammock commented Jul 26, 2023

looks like a change in how NumPy represents arrays of strings in printed output. No need to fix that here.

EDIT: also numbers are getting repr'd as np.int64(71) for example

@larsoner
Copy link
Member

That would almost certainly be numpy/numpy#22449

I should start paying closer attention to NEPs...

@larsoner
Copy link
Member

See also #11836. In the meantime I'll review and merge this if it looks good

@larsoner
Copy link
Member

95.62% of diff hit (target 95.00%) 🔥

Thanks @scott-huberty !

@larsoner larsoner merged commit cc1d182 into mne-tools:main Jul 26, 2023
19 of 22 checks passed
@scott-huberty scott-huberty deleted the read_raw_eyelink_fix branch July 28, 2023 18:14
snwnde pushed a commit to snwnde/mne-python that referenced this pull request Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read_raw_eyelink assumes STATUS data are always present Improve read_raw_eyelink
3 participants