[WIP] ENH: resampling with annotations #11408

jasmainak · 2023-01-05T22:36:48Z

It begins to get a little hairy when one gets into the details, particularly preloaded vs not ... and how to handle concatenated raws

larsoner · 2023-01-06T00:09:52Z

I think it's actually going to be a bit worse than this (ignoring the preload vs not for now, which I think isn't the most annoying part). The result of resampling with and without annotations needs to end up the same number of samples, and the annotation boundaries will likely land in between samples, so we have to handle that properly...

For example, if you have a 1 sec signal at 1000 Hz and the 100-200 ms annotated as bad, in principle you want to resample the samples 0-99 and 200-999, and copy 100-199. But in practice if you are resampling to some non-integer factor like 2/3 the sample rate (e.g., 1666 Hz or something), you have to be very careful about floor/round/ceil-ing the number of samples within each resampled or copied interval... yikes.

Also, I'm realizing now that we might not need to add upfirdn in this PR. Instead we should maybe resample all intervals the same way, including the ones within the BAD sections. If we want to use polyphase/upfirdn to do this rather than freq-domain resampling, we should probably resurrect #5136 to help us first.

jasmainak · 2023-01-06T00:41:11Z

I guess we could have a unit test where the annotation period has a huge artifact (like in my data), and then check that the artifacts don't spread to the non-annotation periods ... that would probably address the rounding issues you are concerned about.

I like the idea of using the same resampling method in annotation vs non-annotation periods ... although I'd be happy if it works even with just fft-based methods.

scott-huberty · 2023-07-10T13:20:59Z

Hi @larsoner and @jasmainak - I think having a "skip_by_annotation"-like behavior for resampling would be good for eye-tracking data (for example, there can be nan's in the signal during blinks or tracker dropout).

If you still think that this is feasible given the challenges you raised regarding non-integer sampling frequencies, let me know, maybe I can help out on this at some point this summer.

jasmainak · 2023-07-18T13:55:12Z

This PR dropped from my priority list ... feel free to take over from where I left off !

larsoner · 2023-11-29T01:15:42Z

The more I think about this the more I think it'll be really difficult to resample segment-by-segment.

A potentially simple way to figure out segment resampling mapping is to take the original signal of length N with K usable segments, we can create an array of shape (N,) and fill each sample with which segment k (starting from 1) the data belong to, leaving the NaN/bad segments as 0. If we interp1d this in nearest mode, we end up with the mapping to new usable segments. For data with 3 usable segments and 12 samples for example we might end up with an array that looks like:

1 1 1 0 2 2 0 0 0 3 3 3

if we do interp1d on it to downsample it by a factor of 2 for example I think we'd get something like:

1 1 2 0 0 3

You get some ugly stuff here like the first three samples end up being resampled to 2 samples in the output, and the last 3 samples get resampled to 1 in the output. These have different downsampling factors (!). This problem goes away asymptotically as samples increase but do not go away completely. So frequencies get remapped differently depending on their offset from the downsampling factor.

So instead, I'm starting to think at least one safe thing to do would be to:

Resurrect MRG: Add polyphase resampling #5136 (i.e., allow upfirdn mode)
Raise an informative error when not in upfirdn in either of these cases:
1. When skip_by_annotation=True (default False for backward compat)
2. When not np.isfinite(data).all() (don't know our current behavior actually, might already raise)
Figure out the amount of signal spread from the FIR step
Expand the range of annotations by the signal spread

Step (4) can be optional if people really want it to be, but we can add an option to disable the annotation expansion if desired.

scott-huberty · 2023-11-29T16:10:24Z

I think i can see your point about why the first simple approach is limited.

I'm not too familiar with polyphase resampling but if I'm understanding correctly, is the idea that when skip_by_annotation=True and upfirnd=True, we resample by block using polyphase resampling? Or is it that in this case we use polyphase resampling on the whole signal (and I'm assuming values like NaN won't propogate across the whole signal, hence steps 3 and 4 in your proposal)

larsoner · 2023-11-29T17:27:20Z

Or is it that in this case we use polyphase resampling on the whole signal (and I'm assuming values like NaN won't propogate across the whole signal, hence steps 3 and 4 in your proposal)

Yes this is it -- bad values will spread but we spread the BAD_ annotations to reflect that. And hopefully they don't spread super far (I don't think they will at least)

eduardosand · 2023-11-30T22:21:15Z

The more I think about this the more I think it'll be really difficult to resample segment-by-segment.

A potentially simple way to figure out segment resampling mapping is to take the original signal of length N with K usable segments, we can create an array of shape (N,) and fill each sample with which segment k (starting from 1) the data belong to, leaving the NaN/bad segments as 0. If we interp1d this in nearest mode, we end up with the mapping to new usable segments. For data with 3 usable segments and 12 samples for example we might end up with an array that looks like:
1 1 1 0 2 2 0 0 0 3 3 3
if we do interp1d on it to downsample it by a factor of 2 for example I think we'd get something like:
1 1 2 0 0 3
You get some ugly stuff here like the first three samples end up being resampled to 2 samples in the output, and the last 3 samples get resampled to 1 in the output. These have different downsampling factors (!). This problem goes away asymptotically as samples increase but do not go away completely. So frequencies get remapped differently depending on their offset from the downsampling factor.

It seems like if you interpolate to replace the 0 values, ahead of downsampling, you at least partly mitigate this issue, because the valid values will bleed into the 0s. So linear interpolation in my scheme would look like

1 1 1 1.5 2 2 2.25 2.5 2.75 3 3 3

And then running a decimate with factor 2X, gets you
1 1 2 2.25 2.75 3

On my end, I've been using cubic splines to interpolate across dropped samples and then pre-processing as normal.

larsoner · 2023-12-01T00:53:38Z

It seems like if you interpolate to replace the 0 values, ahead of downsampling, you at least partly mitigate this issue, because the valid values will bleed into the 0s.

But the mapping between original-signal number of samples (three for the first good segment, three for the last good segment) and resampled-signal number of samples (two for the first good segment, one for the last good segment) will always have this wacky ratio problem. So I think the "resample segment by segment" idea is likely still doomed unless you're very careful (somehow) about padding each good segment the same way or something, which would have its own issues.

So I think the polyphase + annotation expansion approach is probably safest, and a pretty straightforward route forward. Maybe someday we could add cubic or linear interpolation in the bad segments but I think that can be a separate function like interpolate_bad_segments(raw, mode='cubic' | ) or similar. And maybe/hopefully it won't even be needed (YAGNI) with the annotations being expanded by the resample function, we'll see!

ENH: resampling with annotations

ab97185

jasmainak mentioned this pull request Jul 18, 2023

Consider unifying raw.resample() and raw.decimate() #11797

Closed

scott-huberty mentioned this pull request Aug 14, 2023

Meta issue: eye-tracking todos #11879

Open

6 tasks

larsoner mentioned this pull request Nov 28, 2023

ENH: Add support for reading Neuralynx data #11969

Merged

1 task

larsoner added this to the 1.7 milestone Nov 29, 2023

larsoner mentioned this pull request Nov 29, 2023

BUG: Gaps in neuralynx not handled properly #12247

Open

larsoner mentioned this pull request Dec 5, 2023

ENH: Add polyphase resampling #12268

Merged

larsoner modified the milestones: 1.7, 1.8 Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] ENH: resampling with annotations #11408

[WIP] ENH: resampling with annotations #11408

jasmainak commented Jan 5, 2023 •

edited

larsoner commented Jan 6, 2023

jasmainak commented Jan 6, 2023 •

edited

scott-huberty commented Jul 10, 2023 •

edited

jasmainak commented Jul 18, 2023

larsoner commented Nov 29, 2023

scott-huberty commented Nov 29, 2023

larsoner commented Nov 29, 2023

eduardosand commented Nov 30, 2023 •

edited

larsoner commented Dec 1, 2023

[WIP] ENH: resampling with annotations #11408

Are you sure you want to change the base?

[WIP] ENH: resampling with annotations #11408

Conversation

jasmainak commented Jan 5, 2023 • edited

larsoner commented Jan 6, 2023

jasmainak commented Jan 6, 2023 • edited

scott-huberty commented Jul 10, 2023 • edited

jasmainak commented Jul 18, 2023

larsoner commented Nov 29, 2023

scott-huberty commented Nov 29, 2023

larsoner commented Nov 29, 2023

eduardosand commented Nov 30, 2023 • edited

larsoner commented Dec 1, 2023

jasmainak commented Jan 5, 2023 •

edited

jasmainak commented Jan 6, 2023 •

edited

scott-huberty commented Jul 10, 2023 •

edited

eduardosand commented Nov 30, 2023 •

edited