Read raw eyelink fix #11823

scott-huberty · 2023-07-21T21:16:54Z

This fixes the bug the researcher reported, where the STATUS column wasn't present in their ASCII file.

Plus, I knew that read_raw_eyelink was slow, but it became really apparent once the researcher shared their problematic file with me, which was 200mb and over 4 million lines long.. which is way bigger than any ASCII file I've come across.

So, I profiled the code to look for bottlenecks, and since we already require pandas for read_raw_eyelink I refactored the reader to use vector operations whenever possible.

I think we are seeing a decent speed up. If we read one of our eyelink sample files:

from mne.datasets.eyelink import data_path
from mne.io import read_raw_eyelink
fname  = data_path() / "sub-01_task-plr_eyetrack.asc"
%timeit read_raw_eyelink(fname)

On the main branch we get:
2.59 s ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And on this branch we get
797 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So the refactored code is about 3x faster. Unfortunately that huge ASCII file still takes like about a minute to read into mne..

Finally, in my refactoring, i've tried to simplify the code and make it more organized.

TODOS:

add tests for the bug fix this PR addressed
get the testing coverage of this module to be above 95%
see if we can get any more performance speed ups, to quicken the loading time for the very large ASCII file

…ead_raw_eyelink_fix

larsoner

Looks like a good start, more red than green (hooray!)

mne/io/eyelink/_utils.py

larsoner · 2023-07-22T12:39:35Z

mne/io/eyelink/eyelink.py

@@ -205,10 +204,10 @@ def __init__(
        self.fname = Path(fname)
        self._sample_lines = None  # sample lines from file


I missed this on the first read, but all this extra "stuff" is not supposed to be part of the RawEyelink class. You need to put it all in _raw_extras[0]. This can be a dict, too. This is going to be a big change so you don't need to do it here -- we can figure it out later. I'll create a new issue about adding a test for this since it's possible there are other readers that need to be updated, too.

Thanks for the tip, I was trying to see if there was a way not to bring all this baggage after reading the file.

Will happily to move it all to _raw_extras

pd.replace is very slow. Using numpy is 10x faster on a large file. for example on a file with 4 million lines, pd.replace takes 30 seconds and using numpy where takes 2.5 seconds. Also nested the system message check in parse_recordings blocks under the block of code where we know those messages appear. this way we dont check for system messages on literally every line of the file!

new test creates a new temp file and adds additional channels so that more of read_raw_eyelink will be touched in testing

fixed conflict in test_eyelink from 11826

scott-huberty · 2023-07-25T20:23:45Z

Sorry @larsoner we're no longer more red than green : ( but most of the lines I added were tests, and the mne.io.eyelink.eyelink module now has +95% coverage!

mne/io/eyelink/tests/test_eyelink.py

larsoner

Other than two minor comments LGTM!

Co-authored-by: Eric Larson <larson.eric.d@gmail.com>

now that we use the tmp_path fixture, no unlinking needed

drammock

just a couple drive-by comments

mne/io/eyelink/_utils.py

mne/io/eyelink/eyelink.py

Required update of tests to catch warnings

scott-huberty · 2023-07-26T13:57:02Z

I'm not sure what is causing the failing tests? (they seem unrelated?).

drammock · 2023-07-26T14:11:28Z

looks like a change in how NumPy represents arrays of strings in printed output. No need to fix that here.

EDIT: also numbers are getting repr'd as np.int64(71) for example

larsoner · 2023-07-26T20:15:29Z

That would almost certainly be numpy/numpy#22449

I should start paying closer attention to NEPs...

larsoner · 2023-07-26T20:18:32Z

See also #11836. In the meantime I'll review and merge this if it looks good

larsoner · 2023-07-26T20:23:36Z

95.62% of diff hit (target 95.00%) 🔥

Thanks @scott-huberty !

scott-huberty added 2 commits July 21, 2023 16:55

FIX, ENH: fix status column bug and make performance improvements

99a0abf

Merge branch 'main' of https://github.com/mne-tools/mne-python into r…

16d2d6f

…ead_raw_eyelink_fix

scott-huberty marked this pull request as draft July 21, 2023 21:17

larsoner reviewed Jul 22, 2023

View reviewed changes

larsoner mentioned this pull request Jul 22, 2023

BUG: Some raw classes use attributes rather than ._raw_extras #11825

Closed

scott-huberty added 3 commits July 22, 2023 12:31

FIX: Add tests

bfa4416

new test creates a new temp file and adds additional channels so that more of read_raw_eyelink will be touched in testing

Merge remote-tracking branch 'upstream/main' into read_raw_eyelink_fix

68f8401

fixed conflict in test_eyelink from 11826

scott-huberty marked this pull request as ready for review July 25, 2023 20:18

larsoner reviewed Jul 25, 2023

View reviewed changes

mne/io/eyelink/tests/test_eyelink.py Outdated Show resolved Hide resolved

larsoner reviewed Jul 25, 2023

View reviewed changes

mne/io/eyelink/tests/test_eyelink.py Outdated Show resolved Hide resolved

larsoner reviewed Jul 25, 2023

View reviewed changes

mne/io/eyelink/tests/test_eyelink.py Outdated Show resolved Hide resolved

larsoner reviewed Jul 25, 2023

View reviewed changes

scott-huberty and others added 2 commits July 25, 2023 16:48

Apply suggestions from code review [ci skip]

b704b79

Co-authored-by: Eric Larson <larson.eric.d@gmail.com>

FIX: remove unlink

7d45f5d

now that we use the tmp_path fixture, no unlinking needed

drammock reviewed Jul 25, 2023

View reviewed changes

mne/io/eyelink/_utils.py Show resolved Hide resolved

mne/io/eyelink/eyelink.py Outdated Show resolved Hide resolved

scott-huberty added 2 commits July 26, 2023 08:41

FIX: Use mne.utils.warn as suggested by Dan

9f76eae

Required update of tests to catch warnings

DOC: update changelog

647a0b4

larsoner merged commit cc1d182 into mne-tools:main Jul 26, 2023
19 of 22 checks passed

scott-huberty deleted the read_raw_eyelink_fix branch July 28, 2023 18:14

scott-huberty mentioned this pull request Aug 21, 2023

MAINT, WIP: move extra attributes to _raw_extras in eyelink reader #11910

Merged

snwnde pushed a commit to snwnde/mne-python that referenced this pull request Mar 20, 2024

Read raw eyelink fix (mne-tools#11823)

5b722a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read raw eyelink fix #11823

Read raw eyelink fix #11823

scott-huberty commented Jul 21, 2023 •

edited

Loading

larsoner left a comment

larsoner Jul 22, 2023

scott-huberty Jul 22, 2023

scott-huberty commented Jul 25, 2023

larsoner left a comment

drammock left a comment

scott-huberty commented Jul 26, 2023

drammock commented Jul 26, 2023 •

edited

Loading

larsoner commented Jul 26, 2023

larsoner commented Jul 26, 2023

larsoner commented Jul 26, 2023

		@@ -205,10 +204,10 @@ def __init__(
		self.fname = Path(fname)
		self._sample_lines = None # sample lines from file

Read raw eyelink fix #11823

Read raw eyelink fix #11823

Conversation

scott-huberty commented Jul 21, 2023 • edited Loading

larsoner left a comment

Choose a reason for hiding this comment

larsoner Jul 22, 2023

Choose a reason for hiding this comment

scott-huberty Jul 22, 2023

Choose a reason for hiding this comment

scott-huberty commented Jul 25, 2023

larsoner left a comment

Choose a reason for hiding this comment

drammock left a comment

Choose a reason for hiding this comment

scott-huberty commented Jul 26, 2023

drammock commented Jul 26, 2023 • edited Loading

larsoner commented Jul 26, 2023

larsoner commented Jul 26, 2023

larsoner commented Jul 26, 2023

scott-huberty commented Jul 21, 2023 •

edited

Loading

drammock commented Jul 26, 2023 •

edited

Loading