Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support exporting mne objects to xarray DataArray objects #11464

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

mmagnuski
Copy link
Member

@mmagnuski mmagnuski commented Feb 10, 2023

Adresses #7793

Part 1 of N parts:

  • tests adding xarray as dependency
  • adds .to_xarray() methods to Evoked and Epochs objects
  • adds a simple example of the functionality

other mne objects will be covered in future PRs (and also maybe exporting metadata to additional xarray coords).

TODO:

  • requires xarray decorator for pytest?
  • support picks
  • .to_xarray() for Evoked
  • .to_xarray() for Epochs
  • include 'bads' in .to_xarray()? (or just add "bad" channel coordinate?)
  • simple example

@mmagnuski
Copy link
Member Author

Ok, so far so good, test for a to_xarray sketch work. I was thinking about channel picks and handling types:

  • I don't think adding picks to .to_xarray() method would be necessary, one can just do evoked.copy().pick(picks).to_xarray()
  • I'm not sure how to handle channel types. I can just assign additional coordinate that contains these types with a 'ch_type' label:
    xarr = xarr.assign_coords(
        ch_type=('chan', evoked.get_channel_types())
    )

@larsoner
Copy link
Member

I'm not sure how to handle channel types. I can just assign additional coordinate that contains these types

I have no experience with xarray so no ideas here. Maybe @drammock or @hoechenberger ?


if isinstance(mne_inst, Epochs):
data = mne_inst.get_data()
dims = ('chan', 'trial', 'time')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call the dimension "channel"?

Copy link
Member Author

@mmagnuski mmagnuski Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer shorter dimension names as you use them often during various operations, but I will change the name if this would be the common preference. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're internally consistent on questions like this. E.g., in function names sometimes we use channel and sometimes ch. Also when exporting to data frames, the spectrum class makes a column freq (not frequency). So I'm not really sure what to do WRT "channel". However, I would say that trial is not ideal, it should be epoch (for e.g. resting state recordings cut into sequential chunks "trial" doesn't make sense)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes I will definitelly change trial to epoch, especially that you can have multiple epochs from the same trial (for example successively presented items to remember).

@drammock
Copy link
Member

I don't think adding picks to .to_xarray() method would be necessary, one can just do evoked.copy().pick(picks).to_xarray()

not necessary, but wouldn't it avoid an extra copy operation to have picks in the to_xarray method?

I'm not sure how to handle channel types. I can just assign additional coordinate that contains these types with a 'ch_type' label:
python xarr = xarr.assign_coords( ch_type=('chan', evoked.get_channel_types()) )

I think yes, an additional coordinate that is "parallel to" or "overlaid on" the channels coord makes sense here (not sure exactly what the right terminology is, I don't use xarray very much)

@mmagnuski
Copy link
Member Author

not necessary, but wouldn't it avoid an extra copy operation to have picks in the to_xarray method?

It could lead to less characters used, for sure:

evoked.to_xarray(picks=picks)

vs

evoked.copy().pick(picks).to_xarray(copy=False)

I am still not sure, however. For operations like .to_data_frame() having picks seems a good idea as you frequently would like to export just a subset of channels. But for xarray, I actually expect you would frequently transform the whole object to xarray without any picking. And then at some point you can just use .sel(chan=ch_names) (or .sel(channel=ch_names) if we settle on such naming) or .isel(chan=picks) if you need a subset.

@drammock
Copy link
Member

it's a good point that picks might not get used very frequently (though I can imagine maybe it getting used to pull out all channels of one type, e.g., all grads). But what I'm saying is that .to_xarray() is already going to need to make a copy of the data, and if someone wants to subset via picks then they would need another copy operation before the call to .to_xarray(). So it would be more memory efficient to include the picks as a parameter to .to_xarray() (provided that the approach indexes the data array internally to the .to_xarray method)

@larsoner
Copy link
Member

FWIW internally to_xarray() should probably just call self.get_data(picks=picks) so I think it would be a pass-through argument / low effort to include it

@mmagnuski
Copy link
Member Author

But what I'm saying is that .to_xarray() is already going to need to make a copy of the data, and if someone wants to subset via picks then they would need another copy operation before the call to .to_xarray().

That's why the example used .to_xarray(copy=False) to avoid the second copy (current mne.utils.xarray.to_xarray already has copy). But I agree picks should be an easy addition, so I'll add it.

@larsoner
Copy link
Member

copy=False won't always even work, though -- sometimes the data isn't even loaded from disk. And if you .get_data(picks=[0]) it should read just the first channel's data from disk for example.

So we might need to think/plan out more deeply how all the copy stuff works with preloaded data vs not (at least for Epochs, where it matters).

@larsoner
Copy link
Member

IIRC I think .get_data always returns a copy of the data, so if that's actually the case I would expect .to_xarray to internally always implicitly make a copy and have the returned xarray own that data (and any modifications to it not be reflected in the Epochs/Evoked object itself)

@mmagnuski
Copy link
Member Author

IIRC I think .get_data always returns a copy of the data

The docs for Evoked claim that .get_data() returns a view, but the docs may be wrong. :)

@mmagnuski
Copy link
Member Author

@larsoner You are right. The docs say that the data returned in .get_data() are A view on evoked data., but this seems incorrect:

import numpy as np
import mne

info = mne.create_info(list('abcd'), sfreq=250)
data = np.random.rand(4, 350)
erp = mne.EvokedArray(data, info, tmin=-0.5)

data = erp.get_data()
data[0, 0] = 23.

print(data[0, 0])
print(erp.data[0, 0])

@mmagnuski
Copy link
Member Author

(I still remember about this PR and should have some time to finish it soon)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants