Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] ENH: resampling with annotations #11408

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jasmainak
Copy link
Member

@jasmainak jasmainak commented Jan 5, 2023

closes #10447

It begins to get a little hairy when one gets into the details, particularly preloaded vs not ... and how to handle concatenated raws

@larsoner
Copy link
Member

larsoner commented Jan 6, 2023

I think it's actually going to be a bit worse than this (ignoring the preload vs not for now, which I think isn't the most annoying part). The result of resampling with and without annotations needs to end up the same number of samples, and the annotation boundaries will likely land in between samples, so we have to handle that properly...

For example, if you have a 1 sec signal at 1000 Hz and the 100-200 ms annotated as bad, in principle you want to resample the samples 0-99 and 200-999, and copy 100-199. But in practice if you are resampling to some non-integer factor like 2/3 the sample rate (e.g., 1666 Hz or something), you have to be very careful about floor/round/ceil-ing the number of samples within each resampled or copied interval... yikes.

Also, I'm realizing now that we might not need to add upfirdn in this PR. Instead we should maybe resample all intervals the same way, including the ones within the BAD sections. If we want to use polyphase/upfirdn to do this rather than freq-domain resampling, we should probably resurrect #5136 to help us first.

@jasmainak
Copy link
Member Author

jasmainak commented Jan 6, 2023

I guess we could have a unit test where the annotation period has a huge artifact (like in my data), and then check that the artifacts don't spread to the non-annotation periods ... that would probably address the rounding issues you are concerned about.

I like the idea of using the same resampling method in annotation vs non-annotation periods ... although I'd be happy if it works even with just fft-based methods.

@scott-huberty
Copy link
Contributor

scott-huberty commented Jul 10, 2023

Hi @larsoner and @jasmainak - I think having a "skip_by_annotation"-like behavior for resampling would be good for eye-tracking data (for example, there can be nan's in the signal during blinks or tracker dropout).

If you still think that this is feasible given the challenges you raised regarding non-integer sampling frequencies, let me know, maybe I can help out on this at some point this summer.

@jasmainak
Copy link
Member Author

This PR dropped from my priority list ... feel free to take over from where I left off !

@larsoner
Copy link
Member

The more I think about this the more I think it'll be really difficult to resample segment-by-segment.

A potentially simple way to figure out segment resampling mapping is to take the original signal of length N with K usable segments, we can create an array of shape (N,) and fill each sample with which segment k (starting from 1) the data belong to, leaving the NaN/bad segments as 0. If we interp1d this in nearest mode, we end up with the mapping to new usable segments. For data with 3 usable segments and 12 samples for example we might end up with an array that looks like:

1 1 1 0 2 2 0 0 0 3 3 3

if we do interp1d on it to downsample it by a factor of 2 for example I think we'd get something like:

1 1 2 0 0 3

You get some ugly stuff here like the first three samples end up being resampled to 2 samples in the output, and the last 3 samples get resampled to 1 in the output. These have different downsampling factors (!). This problem goes away asymptotically as samples increase but do not go away completely. So frequencies get remapped differently depending on their offset from the downsampling factor.

So instead, I'm starting to think at least one safe thing to do would be to:

  1. Resurrect MRG: Add polyphase resampling #5136 (i.e., allow upfirdn mode)

  2. Raise an informative error when not in upfirdn in either of these cases:

    1. When skip_by_annotation=True (default False for backward compat)
    2. When not np.isfinite(data).all() (don't know our current behavior actually, might already raise)
  3. Figure out the amount of signal spread from the FIR step

  4. Expand the range of annotations by the signal spread

Step (4) can be optional if people really want it to be, but we can add an option to disable the annotation expansion if desired.

@scott-huberty
Copy link
Contributor

I think i can see your point about why the first simple approach is limited.

I'm not too familiar with polyphase resampling but if I'm understanding correctly, is the idea that when skip_by_annotation=True and upfirnd=True, we resample by block using polyphase resampling? Or is it that in this case we use polyphase resampling on the whole signal (and I'm assuming values like NaN won't propogate across the whole signal, hence steps 3 and 4 in your proposal)

@larsoner
Copy link
Member

Or is it that in this case we use polyphase resampling on the whole signal (and I'm assuming values like NaN won't propogate across the whole signal, hence steps 3 and 4 in your proposal)

Yes this is it -- bad values will spread but we spread the BAD_ annotations to reflect that. And hopefully they don't spread super far (I don't think they will at least)

@eduardosand
Copy link

eduardosand commented Nov 30, 2023

The more I think about this the more I think it'll be really difficult to resample segment-by-segment.

A potentially simple way to figure out segment resampling mapping is to take the original signal of length N with K usable segments, we can create an array of shape (N,) and fill each sample with which segment k (starting from 1) the data belong to, leaving the NaN/bad segments as 0. If we interp1d this in nearest mode, we end up with the mapping to new usable segments. For data with 3 usable segments and 12 samples for example we might end up with an array that looks like:

1 1 1 0 2 2 0 0 0 3 3 3

if we do interp1d on it to downsample it by a factor of 2 for example I think we'd get something like:

1 1 2 0 0 3

You get some ugly stuff here like the first three samples end up being resampled to 2 samples in the output, and the last 3 samples get resampled to 1 in the output. These have different downsampling factors (!). This problem goes away asymptotically as samples increase but do not go away completely. So frequencies get remapped differently depending on their offset from the downsampling factor.

It seems like if you interpolate to replace the 0 values, ahead of downsampling, you at least partly mitigate this issue, because the valid values will bleed into the 0s. So linear interpolation in my scheme would look like

1 1 1 1.5 2 2 2.25 2.5 2.75 3 3 3

And then running a decimate with factor 2X, gets you
1 1 2 2.25 2.75 3

On my end, I've been using cubic splines to interpolate across dropped samples and then pre-processing as normal.

@larsoner
Copy link
Member

larsoner commented Dec 1, 2023

It seems like if you interpolate to replace the 0 values, ahead of downsampling, you at least partly mitigate this issue, because the valid values will bleed into the 0s.

But the mapping between original-signal number of samples (three for the first good segment, three for the last good segment) and resampled-signal number of samples (two for the first good segment, one for the last good segment) will always have this wacky ratio problem. So I think the "resample segment by segment" idea is likely still doomed unless you're very careful (somehow) about padding each good segment the same way or something, which would have its own issues.

So I think the polyphase + annotation expansion approach is probably safest, and a pretty straightforward route forward. Maybe someday we could add cubic or linear interpolation in the bad segments but I think that can be a separate function like interpolate_bad_segments(raw, mode='cubic' | ) or similar. And maybe/hopefully it won't even be needed (YAGNI) with the annotations being expanded by the resample function, we'll see!

@larsoner larsoner modified the milestones: 1.7, 1.8 Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add skip_by_annotation to notch_filter and resample
4 participants