-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read raw eyelink fix #11823
Read raw eyelink fix #11823
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start, more red than green (hooray!)
@@ -205,10 +204,10 @@ def __init__( | |||
self.fname = Path(fname) | |||
self._sample_lines = None # sample lines from file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this on the first read, but all this extra "stuff" is not supposed to be part of the RawEyelink
class. You need to put it all in _raw_extras[0]
. This can be a dict, too. This is going to be a big change so you don't need to do it here -- we can figure it out later. I'll create a new issue about adding a test for this since it's possible there are other readers that need to be updated, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip, I was trying to see if there was a way not to bring all this baggage after reading the file.
Will happily to move it all to _raw_extras
pd.replace is very slow. Using numpy is 10x faster on a large file. for example on a file with 4 million lines, pd.replace takes 30 seconds and using numpy where takes 2.5 seconds. Also nested the system message check in parse_recordings blocks under the block of code where we know those messages appear. this way we dont check for system messages on literally every line of the file!
new test creates a new temp file and adds additional channels so that more of read_raw_eyelink will be touched in testing
fixed conflict in test_eyelink from 11826
Sorry @larsoner we're no longer more red than green : ( but most of the lines I added were tests, and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than two minor comments LGTM!
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
now that we use the tmp_path fixture, no unlinking needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple drive-by comments
Required update of tests to catch warnings
I'm not sure what is causing the failing tests? (they seem unrelated?). |
looks like a change in how NumPy represents arrays of strings in printed output. No need to fix that here. EDIT: also numbers are getting repr'd as |
That would almost certainly be numpy/numpy#22449 I should start paying closer attention to NEPs... |
See also #11836. In the meantime I'll review and merge this if it looks good |
Thanks @scott-huberty ! |
Fixes #11809
Fixes #11758
This fixes the bug the researcher reported, where the STATUS column wasn't present in their ASCII file.
Plus, I knew that
read_raw_eyelink
was slow, but it became really apparent once the researcher shared their problematic file with me, which was 200mb and over 4 million lines long.. which is way bigger than any ASCII file I've come across.So, I profiled the code to look for bottlenecks, and since we already require
pandas
forread_raw_eyelink
I refactored the reader to use vector operations whenever possible.I think we are seeing a decent speed up. If we read one of our eyelink sample files:
On the main branch we get:
2.59 s ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
And on this branch we get
797 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So the refactored code is about 3x faster. Unfortunately that huge ASCII file still takes like about a minute to read into mne..
Finally, in my refactoring, i've tried to simplify the code and make it more organized.
TODOS: