Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VWIP] Add EEGLAB raw event reader #2745

Closed
wants to merge 7 commits into from

Conversation

jona-sassenhagen
Copy link
Contributor

Start on a basic event reader for EEGLAB files.
Modelled partially after the Brainvision way of handling string events.

Needs everything a PR can need but for basic functionality.

@jona-sassenhagen
Copy link
Contributor Author

At least this (and the EEGLAB reader) work for my files.

unknown-51

@jona-sassenhagen
Copy link
Contributor Author

forgot to ping @jasmainak

@jasmainak
Copy link
Member

yeah, I think I'll take care of it. It needs to be integrated properly. If you have something working for now, that's great :)

@jona-sassenhagen
Copy link
Contributor Author

Note EEGLAB events can have fully arbitrary fields. (I have old EEGLAB files with 20 manually constructed fields). Quoting the manual,

In general, only the fields "type", "latency", and "urevent" are always a part of the event structure, other fields like "epoch" and "duration" are created automatically by EEGLAB as a result of extracting epochs and data rejection, respectively. There are also user defined fields

Maybe there should be a warning if there are extra fields that they will be dropped, and "duration" could be read out on demand.

This may have broken functionality, didn't test.
@teonbrooks teonbrooks mentioned this pull request Dec 24, 2015
6 tasks
@teonbrooks
Copy link
Member

to make a stim channel, all I do is create an event array of start times, trigger values, and optionally duration (default to one if not specified). then I create a vector the length of the data, and for the start times, I just add the trigger value for that duration
some pseudocode

events = [[25, 128, 1], [100, 16, 1]]
stim_channel = np.zeros(data.shape[1], int)
for event in events:
    stim_channel[event[0]:event[0] + event[2]] = event[1]

@jona-sassenhagen
Copy link
Contributor Author

No, that works (it's actually easier here than creating the array), I mean, how do I add it to the MNE raw object as a channel? Creating the info etc.

@teonbrooks
Copy link
Member

oh yeah, you will have to copy your data unless you generate an array with an additional empty channel. if you add the stim_channel to the info['chs'] prior to reading data, I think it will allocate an empty stim channel and you can bypass the extra copy

@@ -445,3 +445,46 @@ def __init__(self, input_fname, events=None, event_id=None, tmin=0,
reject=reject, flat=flat, reject_tmin=reject_tmin,
reject_tmax=reject_tmax, add_eeg_ref=False, verbose=verbose)
logger.info('Ready.')


def _create_events_from_eeglab_raw(fname, event_id=dict()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to try integrating it into the reader? :) Take a look at edf.py how the stim_channel argument is added. You'll have to do this for read_raw_eeglab.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd try putting it into the reader. I don't think we need a dedicated function. I'll probably refocus the reader to return an events channel directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a stim channel to the data before you call the _BaseEpochs and also update the info dict accordingly. Don't return anything from the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking having it work like e.g. the Brainvision reader, which just returns a raw with a stim channel. And this would remain a private method that is called internally by read_raw_eeglab to fill the stim channel with values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a plan. Let us know when you are ready for review :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I'm done visiting all my grandparents and in-laws ...

@teonbrooks
Copy link
Member

@jasmainak was is the _read_segments_file function in utils.py? why not just override the _read_segment_file like the rest of the readers. Is the idea to make a generic use case?

@kingjr
Copy link
Member

kingjr commented Dec 24, 2015

Have a break guys, it's Christmas Eve ;)

On Thursday, 24 December 2015, Teon L Brooks notifications@github.com
wrote:

@jasmainak https://github.com/jasmainak was is the _read_segments_file
function in utils.py? why not just override the _read_segment_file like
the rest of the readers. Is the idea to make a generic use case?


Reply to this email directly or view it on GitHub
#2745 (comment)
.

@jona-sassenhagen
Copy link
Contributor Author

Have a break guys, it's Christmas Eve ;)

I'll see if I can manage opening a PR on Dec 31st, 24:00 :)

@jasmainak
Copy link
Member

come on, in India we have so many holidays I am bored ;)

@teonbrooks teonbrooks changed the title [VWIP] Add EELAB raw event reader [VWIP] Add EEGLAB raw event reader Dec 26, 2015
@jona-sassenhagen
Copy link
Contributor Author

@teonlamont I moved _synthesize_stim_channel out of brainvision and into io.utils (for code reusage here and in BV). I think I didn't break anything, but maybe you can check.

@jasmainak I'm creating a stim channel for the info, and I'm adding a stim channel under raw._event_ch (this is how the BV importer is doing it). However, I don't understand how to make this work with on-demand reading of the data. So currently, the code doesn't work at all. Can you take a look?

entirely dropping them (with a warning) if this is impossible"""
from scipy.io import loadmat

eeg = loadmat(fname, struct_as_record=False, squeeze_me=True)["EEG"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you load this again? It's already loaded once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, because it used to be a standalone function.
I'll pass eeg as the 1st argument instead.

@jasmainak
Copy link
Member

@jasmainak I'm creating a stim channel for the info, and I'm adding a stim channel under raw._event_ch (this is how the BV importer is doing it). However, I don't understand how to make this work with on-demand reading of the data. So currently, the code doesn't work at all. Can you take a look?

You need to hack the _read_segment_file function to have a stim param so it takes the stim channel and returns you the segment of the stim channel that is read on-demand.

@jona-sassenhagen
Copy link
Contributor Author

Ugh. That indeed sounds hacky.

@jasmainak
Copy link
Member

Let me know when it's ready :) I can review the code.

@jona-sassenhagen
Copy link
Contributor Author

Sorry, I can't quite wrap my head around this. I've hacked together something that works if preload=True, but it's very ugly and fails for preload=False. I just don't understand the data io stuff enough.

def _read_segments_file(raw, data, idx, fi, start, stop, cals, mult,
                        dtype='<i2', eeglab=False):
    """Read a chunk of raw data"""
    n_channels = raw.info['nchan'] - (1 if eeglab else 0)
    n_bytes = np.dtype(dtype).itemsize
    # data_offset and data_left count data samples (channels x time points),
    # not bytes.
    data_offset = n_channels * start * n_bytes
    data_left = (stop - start) * n_channels
    if eeglab:
        idx = slice(0, n_channels, None)
        cals = cals[:-1]

    # Read up to 100 MB of data at a time, block_size is in data samples
    block_size = ((int(100e6) // n_bytes) // n_channels) * n_channels
    block_size = min(data_left, block_size)
    with open(raw._filenames[fi], 'rb', buffering=0) as fid:
        fid.seek(data_offset)
        # extract data in chunks
        for sample_start in np.arange(0, data_left, block_size) // n_channels:

            count = min(block_size, data_left - sample_start * n_channels)
            block = np.fromfile(fid, dtype, count)
            block = block.reshape(n_channels, -1, order='F')
            n_samples = block.shape[1]  # = count // n_channels
            sample_stop = sample_start + n_samples
            data_view = data[:-(1 if eeglab else 0), sample_start:sample_stop]
            _mult_cal_one(data_view, block, idx, cals, mult)
            if eeglab:
                data[-1, sample_start:sample_stop] = \
                raw._event_ch[sample_start:sample_stop]

(Of course, this would also have to be amended to deal with cases where the last channel isn't the stim channel via properly using a picks-like selection.)

Can you come up with something better @jasmainak ?

@jasmainak
Copy link
Member

can you push the changes so I can see exactly how it fails?

@jona-sassenhagen
Copy link
Contributor Author

I can, but isn't it awkward to push something you know is really bad and will have to be completely remade? But if you say so, I can push.

@jasmainak
Copy link
Member

you can do $ git commit --amend to amend your last commit when you remade it :)

@jona-sassenhagen
Copy link
Contributor Author

@jasmainak I've pushed. I haven't fixed any of the other stuff you've noted yet, will do when this one is solved.

non-integer part of events containing integers, and completely
dropping any events without integer parts. If non-integer events
should be read, this should be a dict mapping from their names to
integers, e.g. dict(fmri_scan_onset=199, recording_start=255).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/KISS_principle

we need to simplify this. Why not generate events from the unique event keys (which you get from EEGLAB). So, basically you generate a mapping yourself and then tell the user about the mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this, but consider this solution superior. 1. this is how BV does it. 2. most users use mostly int codes, and many readers already (silently) drop the non-integer parts of an event id (for example, the BV and EGI readers do this). 3. you want to predictably map the same triggers to the same event_ids across subjects, ruling out many ways of automatically generating mappings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this, but consider this solution superior. 1. this is how BV does it.

ok

  1. most users use mostly int codes, and many readers already (silently) drop the non-integer parts of an event id (for example, the BV and EGI readers do this).

silent stuff is always scary

  1. you want to predictably map the same triggers to the same event_ids across subjects, ruling out many ways of automatically generating mappings.

okay, I like this point. But you can sort the unique event keys and then assign them trigger codes? Kind of what we do in plot_events ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe look at hashlib if it can generate integer hashes ... that could be another way to go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silent stuff is always scary

Yes, that's why I'm warning here!

But you can sort the unique event keys and then assign them trigger codes?

What if one subject has a trigger S1 and the other doesn't? Then the mapping will differ for every single trigger, and it will be very complicated to work across subjects.

or maybe look at hashlib if it can generate integer hashes ... that could be another way to go

The problem with hashing is they're unpredictable. This way, if a stimulus is called "S196" or "D196", it will be 196 - this is, I think, the least surprising solution. You don't have to look up what hash 8sj@us is, you just know "S196" == 196.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because for BV, most regular triggers have one of two types (style 'S 1', 'S 10' etc). But EEGLAB triggers can be arbitrary strings (although I expect most to be of the same type as the BV one). There are also string-only triggers in BV files, but they should be rare (essentially recording breaks/data discontinuities and MR scan onsets are the only ones I know).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because for BV, most regular triggers have one of two types (style 'S 1', 'S 10' etc). But EEGLAB triggers can be arbitrary strings

But this is exactly what you want for eeglab default behavior

So you could make both readers API consistent

(although I expect most to be of the same type as the BV one). There are also string-only triggers in BV files, but they should be rare (essentially recording breaks/data discontinuities and MR scan onsets are the only ones I know).


Reply to this email directly or view it on GitHub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underlyingly, what we IMO want is something like

events = list()
for event_code in list_of_event_codes:
    parsed_code = event_id.get(event_code, event_id_func(event_code))
    events.append(parsed_code)

And for BV, event_id_func is always dropping leading alphabetic characters and returning the int part (with the exception of 3 or so events of the type recording_break or so). So for BV, event_id_func is unnecessary.

So I think having a separate event_id and event_id_func is clearer; first, look it up in an optional dict, and default to calling an optional function. For BV, the function is always the same, so it's not exposed. For EEGLAB, it defaults to the same one for BV, but it's exposed for dealing with other input types.

If you really think BV should have a callable option though, I'll ping Teon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort @jona-sassenhagen are we converging? @jona-sassenhagen whatever solution you have, can you make it work first? You have a failing test: https://ci.appveyor.com/project/Eric89GXL/mne-python/build/1.0.4561#L695 and it's not because of preload

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underlyingly, what you IMO want is something like

events = list()
for event_code in list_of_event_codes:
parsed_code = event_id.get(event_code, event_id_func(event_code))
events.append(parsed_code)

And for BV, event_id_func is always dropping leading alphabetic
characters and returning the int part (with the exception of 3 or so events
of the type recording_break or so). So for BV, event_id_func is
unnecessary.

Ok you win. looks like we converged :)

@jasmainak
Copy link
Member

Let me know when @jona-sassenhagen when you've simplified the logic a bit.

@jona-sassenhagen
Copy link
Contributor Author

Oh, I see I've made it fail for Nicolet files - that I haven't looked at at all for now, I first want something that basically works.

@jona-sassenhagen
Copy link
Contributor Author

This one's next, as soon as I've understood how Jaakko did it.

@jaeilepp
Copy link
Contributor

If you rebase now, you might be able to use the new _read_segments_file function. You just have to pass the number of channels you want to read from the file (excluding stim channel that is added after), offset to point to the start of the actual data in the file and the data for the trigger channel.
It should be usable if you don't have to read header data from the file in midst of data reading.

@jona-sassenhagen
Copy link
Contributor Author

No - the way it works is, a fully functional stim channel is created and stored on the raw object. Will see.

@agramfort
Copy link
Member

can I close?

@jona-sassenhagen
Copy link
Contributor Author

Yes - will rework hopefully with a bit of assistance of @jaeilepp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants