-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: reading large CNT files (more than 2Gb files) #6537
Conversation
@LM-thinking can you try this in your files? Thx. |
lets see how the CIs tourn out |
This pull request introduces 1 alert when merging a34a635 into 7d64af3 - view on LGTM.com new alerts:
|
Codecov Report
@@ Coverage Diff @@
## master #6537 +/- ##
==========================================
- Coverage 89.34% 87.93% -1.41%
==========================================
Files 416 416
Lines 74865 74882 +17
Branches 12341 12343 +2
==========================================
- Hits 66885 65845 -1040
- Misses 5137 6194 +1057
Partials 2843 2843 |
This pull request introduces 2 alerts when merging 85e376d into 70789c4 - view on LGTM.com new alerts:
|
After reviewing the commit history of cnt.py, I found similar problem in #4520, where the issuer mentioned that files bigger than 2G will result the event_offset be negative.
if event_offset < data_offset: # no events
data_size = n_samples * n_channels
else:
data_size = event_offset - (data_offset + 75 * n_channels) |
In the _utils, cnt file size is used to decide whether to caculate the event table pos. This is in fact not so robust, because 2G is not so match with the overflow threshold of the int32. One possible solution might as following code, which obtain the event table pos by check wheather the suffix of binary reprentation between cacluated and the directly readed value of event_table_pos are identical. And infer the nbytes at the same time. fid.seek(SETUP_NSAMPLES_OFFSET)
(n_samples,) = np.frombuffer(fid.read(4), dtype='<i4')
fid.seek(SETUP_NCHANNELS_OFFSET)
(n_channels,) = np.frombuffer(fid.read(2), dtype='<u2')
fid.seek(SETUP_EVENTTABLEPOS_OFFSET)
(event_table_pos,) = np.frombuffer(fid.read(4), dtype='<i4')
def _infer_n_bytes_event_table_pos(readed_event_table_pos):
readed_event_table_pos_feature = np.binary_repr(readed_event_table_pos).lstrip('-')
for n_bytes in [2, 4]:
event_table_pos = (900 + 75 * int(n_channels) + n_bytes * int(n_channels) * int(n_samples))
if np.binary_repr(event_table_pos).endswith(readed_event_table_pos_feature):
return n_bytes, event_table_pos
raise Exception("event_table_dismatch") Note that all these mentioned above base on the fact that n_samples value is reliable, but the comment in cnt.py said the fact is not.
However, the n_samples metadata in the SETUP part is trusted in the source code of EEGLAB, and it seems to be unreasonable to say it is not reliable. |
@AimerLee makes sense. However it is not working for me. I'm sure is a silly mistake. |
660eb9d
to
b16c2f3
Compare
I liked 660eb9d, it used I've done this toy example from CNT private date people have been sharing with me overtime. import os.path as op
import numpy as np
import pytest
from mne import __file__ as _mne_file
from mne.utils import run_tests_if_main
from mne.io.cnt import read_raw_cnt
from mne.io.cnt._utils import CNTEventType1, CNTEventType2, CNTEventType3
def foo(fname):
SETUP_NCHANNELS_OFFSET = 370
SETUP_NSAMPLES_OFFSET = 864
SETUP_EVENTTABLEPOS_OFFSET = 886
def _compute(xx):
return (900 + 75 * int(n_channels) +
xx * int(n_channels) * int(n_samples))
with open(fname, 'rb') as fid:
fid.seek(SETUP_NSAMPLES_OFFSET)
(n_samples,) = np.frombuffer(fid.read(4), dtype='<i4')
fid.seek(SETUP_NCHANNELS_OFFSET)
(n_channels,) = np.frombuffer(fid.read(2), dtype='<u2')
fid.seek(SETUP_EVENTTABLEPOS_OFFSET)
(readed_event_table_pos,) = np.frombuffer(fid.read(4), dtype='<i4')
print('readed_event_table_pos: ', readed_event_table_pos)
print('readed b:', np.binary_repr(readed_event_table_pos))
print('readed_ b:', np.binary_repr(readed_event_table_pos).lstrip('-'))
print('compute(2) b:', np.binary_repr(_compute(2)))
print('compute(4) b:', np.binary_repr(_compute(4)))
print('compute(2) :', _compute(2))
print('compute(4) :', _compute(4))
readed_event_table_pos_feature = np.binary_repr(
readed_event_table_pos).lstrip('-')
for n_bytes_candidate in [2, 4]:
computed_event_table_pos = _compute(n_bytes_candidate)
if (
np.binary_repr(computed_event_table_pos)
.endswith(readed_event_table_pos_feature)
):
n_bytes = n_bytes_candidate
event_table_pos = computed_event_table_pos
print('match found: (', 'n_bytes: ', n_bytes,
'event_table_pos: ', computed_event_table_pos, ')')
break
else:
n_bytes, event_table_pos = None, None
if event_table_pos is None:
print('No match')
return n_channels, n_samples, event_table_pos, n_bytes
pp = pytest.param
@pytest.mark.parametrize((
'fname,expected_n_bytes,expected_event_type,expected_event_table_pos'), [
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/914flankers.cnt'),
4, CNTEventType2, 156474479),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/cont_68chan_32bit.cnt'),
4, CNTEventType2, 57267440),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/'
'pilote_resting_01_neurospin_2019-03-04_15-18-40.cnt'),
2, CNTEventType2, 15518747),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/'
'cont_22chan_4gb_32bit_toolong.cnt'),
4, CNTEventType3, 4971618850),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/SampleCNTFile_16bit.cnt'),
2, CNTEventType2, 133700),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/cnt_files/'
'SampleCNTFile_16bit.cnt'),
2, CNTEventType2, 133700),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/cnt_files/'
'BoyoAEpic1_16bit.cnt'),
2, CNTEventType2, 78536260),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/cnt_files/'
'cont_67chan_resp_32bit.cnt'),
4, CNTEventType2, 54570725),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/BoyoAEpic1_16bit.cnt'),
2, CNTEventType2, 78536260),
pp(op.join(op.dirname(_mne_file),
'../sandbox/data/confidential/cnt/cont_67chan_resp_32bit.cnt'),
4, CNTEventType2, 54570725),
], ids=[str(n) for n in range(10)]
)
def test_foo(fname, expected_n_bytes, expected_event_type,
expected_event_table_pos):
"""Test reading raw cnt files."""
print('\n')
print('expected_n_bytes: ', expected_n_bytes)
print('expected_event_table_pos: ', expected_event_table_pos)
n_channels, n_samples, event_table_pos, n_bytes = foo(fname)
assert True
run_tests_if_main() Check out the trace: rootdir: /home/sik/code/mne-python, inifile: setup.cfg
plugins: timeout-1.3.3, sugar-0.9.2, pudb-0.7.0, mock-1.10.3, faulthandler-1.5.0, cov-2.6.1
collecting ...
expected_n_bytes: 4
expected_event_table_pos: 156474479
readed_event_table_pos: 156474479
readed b: 1001010100111001110001101111
readed_ b: 1001010100111001110001101111
compute(2) b: 100101001110011110011100101
compute(4) b: 1001010011100110001010100101
compute(2) : 78068965
compute(4) : 156132005
No match
sandbox/mwe/6535_test_cnt.py ✓ 10% █
expected_n_bytes: 4
expected_event_table_pos: 57267440
readed_event_table_pos: 57267440
readed b: 11011010011101010011110000
readed_ b: 11011010011101010011110000
compute(2) b: 1101101001111011000110000
compute(4) b: 11011010011101010011110000
compute(2) : 28636720
compute(4) : 57267440
match found: ( n_bytes: 4 event_table_pos: 57267440 )
sandbox/mwe/6535_test_cnt.py ✓✓ 20% ██
expected_n_bytes: 2
expected_event_table_pos: 15518747
readed_event_table_pos: 15518747
readed b: 111011001100110000011011
readed_ b: 111011001100110000011011
compute(2) b: 101000100111011
compute(4) b: 1001100000110101
compute(2) : 20795
compute(4) : 38965
No match
sandbox/mwe/6535_test_cnt.py ✓✓✓ 30% ███
expected_n_bytes: 4
expected_event_table_pos: 4971618850
readed_event_table_pos: 676651554
readed b: 101000010101001110001000100010
readed_ b: 101000010101001110001000100010
compute(2) b: 10010100001010100111011010100010
compute(4) b: 100101000010101001110001000100010
compute(2) : 2485810850
compute(4) : 4971618850
match found: ( n_bytes: 4 event_table_pos: 4971618850 )
sandbox/mwe/6535_test_cnt.py ✓✓✓✓ 40% ████
expected_n_bytes: 2
expected_event_table_pos: 133700
readed_event_table_pos: 133700
readed b: 100000101001000100
readed_ b: 100000101001000100
compute(2) b: 100000101001000100
compute(4) b: 111111111001000100
compute(2) : 133700
compute(4) : 261700
match found: ( n_bytes: 2 event_table_pos: 133700 )
sandbox/mwe/6535_test_cnt.py ✓✓✓✓✓ 50% █████
expected_n_bytes: 2
expected_event_table_pos: 133700
readed_event_table_pos: 133700
readed b: 100000101001000100
readed_ b: 100000101001000100
compute(2) b: 100000101001000100
compute(4) b: 111111111001000100
compute(2) : 133700
compute(4) : 261700
match found: ( n_bytes: 2 event_table_pos: 133700 )
sandbox/mwe/6535_test_cnt.py ✓✓✓✓✓✓ 60% ██████
expected_n_bytes: 2
expected_event_table_pos: 78536260
readed_event_table_pos: 78536260
readed b: 100101011100101111001000100
readed_ b: 100101011100101111001000100
compute(2) b: 1111111111111111111010100110111000100
compute(4) b: 11111111111111111110101000010101000100
compute(2) : 137438776772
compute(4) : 274877547844
No match
sandbox/mwe/6535_test_cnt.py ✓✓✓✓✓✓✓ 70% ███████
expected_n_bytes: 4
expected_event_table_pos: 54570725
readed_event_table_pos: 54570725
readed b: 11010000001010111011100101
readed_ b: 11010000001010111011100101
compute(2) b: 1101000000110001100000101
compute(4) b: 11010000001010111011100101
compute(2) : 27288325
compute(4) : 54570725
match found: ( n_bytes: 4 event_table_pos: 54570725 )
sandbox/mwe/6535_test_cnt.py ✓✓✓✓✓✓✓✓ 80% ████████
expected_n_bytes: 2
expected_event_table_pos: 78536260
readed_event_table_pos: 78536260
readed b: 100101011100101111001000100
readed_ b: 100101011100101111001000100
compute(2) b: 1111111111111111111010100110111000100
compute(4) b: 11111111111111111110101000010101000100
compute(2) : 137438776772
compute(4) : 274877547844
No match
sandbox/mwe/6535_test_cnt.py ✓✓✓✓✓✓✓✓✓ 90% █████████
expected_n_bytes: 4
expected_event_table_pos: 54570725
readed_event_table_pos: 54570725
readed b: 11010000001010111011100101
readed_ b: 11010000001010111011100101
compute(2) b: 1101000000110001100000101
compute(4) b: 11010000001010111011100101
compute(2) : 27288325
compute(4) : 54570725
match found: ( n_bytes: 4 event_table_pos: 54570725 )
sandbox/mwe/6535_test_cnt.py ✓✓✓✓✓✓✓✓✓✓ 100% ██████████
It works for some cases, in particular it fixes the >2Gb case: expected_n_bytes: 4
expected_event_table_pos: 4971618850
readed_event_table_pos: 676651554
readed b: 101000010101001110001000100010
readed_ b: 101000010101001110001000100010
compute(2) b: 10010100001010100111011010100010
compute(4) b: 100101000010101001110001000100010
compute(2) : 2485810850
compute(4) : 4971618850
match found: ( n_bytes: 4 event_table_pos: 4971618850 )
sandbox/mwe/6535_test_cnt.py ✓✓✓✓ 40% ████
But it is not able to capture cases where it should work. |
@agramfort, @AimerLee maybe we should merge with |
thx @massich |
Fixes #6535