Skip to content

Conversation

@AimerLee
Copy link

@AimerLee AimerLee commented Jul 10, 2019

In previous version, the way to obtain the last 32 bits of the event_table_pos is in fact wrong, which is fix in this PR.

As for the file mentioned in the Issue #6550, named 914flankers.cnt, is in fact a special file which has no overflow problem, but the calculated event table pos and the readed version from SETUP section just unmatch. That means reading this file will still throw an exception. One possible reason is there is the number of samples is not correct, and I will give another PR to fix this problem (one simple solution is not to throw an exception, but return the readed value)

Before that, I have test all the cnt file in my hand, if possible, could try the new version on your dataset agian?

@codecov
Copy link

codecov bot commented Jul 11, 2019

Codecov Report

Merging #24 into fix_cnt_better_auto_data_format will increase coverage by <.01%.
The diff coverage is 83.33%.

@@                         Coverage Diff                         @@
##           fix_cnt_better_auto_data_format      #24      +/-   ##
===================================================================
+ Coverage                            89.11%   89.12%   +<.01%     
===================================================================
  Files                                  416      416              
  Lines                                74865    74868       +3     
  Branches                             12340    12340              
===================================================================
+ Hits                                 66718    66728      +10     
+ Misses                                5307     5302       -5     
+ Partials                              2840     2838       -2

1 similar comment
@codecov
Copy link

codecov bot commented Jul 11, 2019

Codecov Report

Merging #24 into fix_cnt_better_auto_data_format will increase coverage by <.01%.
The diff coverage is 83.33%.

@@                         Coverage Diff                         @@
##           fix_cnt_better_auto_data_format      #24      +/-   ##
===================================================================
+ Coverage                            89.11%   89.12%   +<.01%     
===================================================================
  Files                                  416      416              
  Lines                                74865    74868       +3     
  Branches                             12340    12340              
===================================================================
+ Hits                                 66718    66728      +10     
+ Misses                                5307     5302       -5     
+ Partials                              2840     2838       -2

Copy link
Owner

@massich massich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't fully get is the logic we are following to detect the new position.

I mean if the distance is 0 it means that event_table_pos was right in the first place. How does it handle the overflow?

If I understand properly we start searching for this event_table_pos at whatever is in the SETUP and we keep incrementing by unit32. What if its 16?

What if we try to replicate the process that generate the value of the SETUP? If I understand properly the value depends on n_bytes, sfreq, n_samples and n_channels. We have them all but n_bytes.

We can carry the computation twice one with int32 and once with np.int64 or c_longlong then we detect the overflow and compare the result of the overflow with whatever is in the event_table_pos. If they match it means that we found it.

Just thinking out loud



def _compute_robust_event_table_position(fid):
def _compute_robust_event_table_position(fid, data_format):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the data_format? we could always test agains [2, 4] and who ever matches we return it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand properly we start searching for this event_table_pos at whatever is in the SETUP and we keep incrementing by unit32. What if its 16?

Yes, but why 16? For example, if the value of readed event table pos is 3 , and the file size is 30 bytes, the maximum of uint32 is 10. The real event table pos must in [3, 13, 23], right? And now, n_bytes is unknown, if cal_event_table=5, when n_bytes=2 and cal_event_table=13, when n_bytes=4. Then we can say the overflow occurs, real event table pos is 13, n_bytes=4.

What if we try to replicate the process that generate the value of the SETUP? If I understand properly the value depends on n_bytes, sfreq, n_samples and n_channels. We have them all but n_bytes.
Yes, tha's the reason, we will try both n_bytes =2 and n_bytes = 4.

We can carry the computation twice one with int32 and once with np.int64 or c_longlong then we detect the overflow and compare the result of the overflow with whatever is in the event_table_pos. If they match it means that we found it.
Yes, and Maybe int16 and int32.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the data_format? we could always test agains [2, 4] and who ever matches we return it.

The inference is based on the fact n_samples is reliable. But sometimes it not true (the 914flankers.cnt case), which makes the inference not so reliable. At that time, use the data format from user, could help use find out the correct event table pos correctly.

@massich
Copy link
Owner

massich commented Jul 11, 2019

As for the file mentioned in the Issue mne-tools#6550, named 914flankers.cnt, is in fact a special file which has no overflow problem, but the calculated event table pos and the readed version from SETUP section just unmatch. That means reading this file will still throw an exception. One possible reason is there is the number of samples is not correct, and I will give another PR to fix this problem (one simple solution is not to throw an exception, but return the readed value)

That was the problem all along, that the solution was breaking the current test suite 'cos the computation was not identifying those cases. i.e: https://github.com/mne-tools/mne-python/pull/6550/checks?check_run_id=165686581

@massich
Copy link
Owner

massich commented Jul 11, 2019

Then we need to find a way around it. But we need to make both scenarios work.

Any ideas on what to do if we can't trust n_samples?

@massich
Copy link
Owner

massich commented Jul 11, 2019

I guess the question is:
If n_samples is right do we always find a match between even_table_pos and the computation?

If so, it means that overflow or not we would match. If we don't match, we have to assume that either even_table_pos or n_samples are wrong. In which chase we can assume that the wrong one is n_samples.

@massich
Copy link
Owner

massich commented Jul 11, 2019

WDYT?

@AimerLee
Copy link
Author

AimerLee commented Jul 11, 2019

I guess the question is:
If n_samples is right do we always find a match between even_table_pos and the computation?

Definetly, I think. At first, there is little possible that the calc_event_table_pos could be wrong if n_samples is correct, n_channels has little possible to be wrong and overflow. And we also don't need to consider the situation n_samples overflows ( this takes a long experiment time, approximatly hundreds of hours).

If so, it means that overflow or not we would match. If we don't match, we have to assume that either >even_table_pos or n_samples are wrong. In which chase we can assume that the wrong one is >n_samples.

Yes, In my code, the n_samples will be considered as wrong instead of the event_table_pos. And the n_samples will returned directly without correction. Because I am not sure whether the data in the gap between n_samples and TEEG has any meaning, and should considered as sample point.

                                  Meaningless data?
|                               |xxxxxxxxxxxxxxxxxx|                     |
                          n_samples            Event_table_pos

@massich
Copy link
Owner

massich commented Aug 5, 2019

did we finish all this? now I'm confused. can you refresh me @AimerLee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants