Resample function optimizations #2035

choldgraf · 2015-04-28T19:08:06Z

Currently it takes me quite a long time to carry out a resample function. I've got about 120 channels with about 10 minutes of data, sampled at 5Khz, which equals about 120 x 3,000,000 data points. Obviously my first step in this process is to resample the data such that it's not so densely-sampled. However, this can take a really long time and use a lot of memory.

I have two questions regarding this:

Is the resampling optimized at all in terms of making sure that the signal is a ^2 in length? In the past I've tried to pad the signal with 0's to the next pow2, but this can be a memory issue if there are enough datapoints. Another option is to find a close number with lots of factors, no?
What kind of speedup do you all see when you use CUDA? Is it worth the time to set up in terms of the performance boost?

Anyway just trying to see if anyone has thoughts on improving the speed of these functions. If more people use MNE that do ECoG research (and thus have little control over the recording parameters), it may prove useful.

agramfort · 2015-04-28T19:13:58Z

the culprit is the resample method from scipy.signal that honestly sucks... it does a dummy fft over the entire signal length... the problem has to be fixed in scipy. you might want to try: http://scikits.appspot.com/samplerate

larsoner · 2015-04-28T19:14:12Z

In my experince CUDA is about an order of magnitude faster. I'm pretty sure that was both for FIR filtering and resampling. What would really help the CPU case is using upfirdn as discussed in #1814.

choldgraf · 2015-04-28T19:58:31Z

Yeah, the scipy signal processing is the one thing that makes me miss matlab (just a little bit). I've tried creating functions like make_len_pow_2 that just dumbly the next power of 2 and appends 0's to the signal. That definitely speeds things up, but can explode the memory. What if something like that was added to the resample function as an option to npad? So if you put in a string like 'pow2' then it would find how many datapoints to add to make it a power of 2, then split it in half and add it to the front/back of the signal?

larsoner · 2015-04-28T20:16:10Z

+1 for the string idea, we already do something similar for filter lengths

choldgraf · 2015-04-29T07:16:42Z

Did you already begin an upfirdn contribution to scipy? That may be outside of my signal processing chops :/

larsoner · 2015-04-29T13:59:35Z

No, and it's going to take some time to get right. So for now the padding to power of 2 should help.

larsoner · 2015-04-29T14:00:03Z

Someone actually has a SWIG'ed version available they gave me permission to relicense as BSD for scipy, but I'm not sure how complex that problem is going to be.

choldgraf · 2015-04-29T16:52:14Z

Ah ok - for the resample string thing, do you know if there is a function
in python for "give me a number, and I will tell you the next highest
number with greater than N factors"? I've tried padding to a power of two,
but once you're in the higher regions it becomes memory-limited. That said,
this sounds like it may be too complicated to include natively :P.

Actually another thought might be to allow someone to specify the total
length that they want the signal to be, rather than the amount of padding.
E.g.: have some parameter length, take the diff from current length,
divide that in half, and add zeros to the front/back accordingly.

On Wed, Apr 29, 2015 at 7:00 AM, Eric Larson notifications@github.com
wrote:

Someone actually has a SWIG'ed version available they gave me permission
to relicense as BSD for scipy, but I'm not sure how complex that problem is
going to be.

—
Reply to this email directly or view it on GitHub
#2035 (comment)
.

larsoner · 2015-04-29T16:55:20Z

No, I don't know of one. If memory is an issue, one thing that might fix that is iterating over channels instead of operating on them as a contiguous block (e.g., as axis=1). The overhead for a few hundred channels should be pretty negligible, but you'd want to test it to make sure.

choldgraf · 2015-04-29T17:00:42Z

Ya that is true - if you do it for one channel at a time it's not a big
deal, so in that case you would basically do:

  all_res_chans = []
  for chan in channels:
    chan = pad_to_power_2(chan)
    chan_res = resample(chan)
    all_res_chans.append(remove_padding(chan_res))
  make_back_into_Raw(all_res_chans)

or something like this

On Wed, Apr 29, 2015 at 9:55 AM, Eric Larson notifications@github.com
wrote:

No, I don't know of one. If memory is an issue, one thing that might fix
that is iterating over channels instead of operating on them as a
contiguous block (e.g., as axis=1). The overhead for a few hundred
channels should be pretty negligible, but you'd want to test it to make
sure.

—
Reply to this email directly or view it on GitHub
#2035 (comment)
.

larsoner · 2015-04-29T17:05:01Z

Yeah, that's the pseudocode anyway. Getting it to work with the current resampling might take some work, but should be doable, at least for the n_jobs=1 case. Sometimes we branch based on whether n_jobs == 1 or not, and this would be a case where it could be useful to do so. You can't really avoid the memcpy-type operation with multiple jobs, unfortunately, so it will only work for a single-job case.

You're welcome to take a stab at it if you have time. I think if n_jobs=1 you'd want to preallocate the new array, and fill it in row-by-row by resampling.

choldgraf · 2015-04-29T17:09:29Z

mm that's a good idea. I will try to include this if I finish my stuff for
this week early. Have a plane flight tomorrow so maybe then.

On Wed, Apr 29, 2015 at 10:05 AM, Eric Larson notifications@github.com
wrote:

Yeah, that's the pseudocode anyway. Getting it to work with the current
resampling might take some work, but should be doable, at least for the
n_jobs=1 case. Sometimes we branch based on whether n_jobs == 1 or not,
and this would be a case where it could be useful to do so. You can't
really avoid the memcpy-type operation with multiple jobs, unfortunately,
so it will only work for a single-job case.

You're welcome to take a stab at it if you have time. I think if n_jobs=1
you'd want to preallocate the new array, and fill it in row-by-row by
resampling.

—
Reply to this email directly or view it on GitHub
#2035 (comment)
.

larsoner · 2015-10-21T15:35:16Z

Hopefully I'll have time to actually make this PR work and get it into scipy:

scipy/scipy#5186

larsoner · 2015-12-18T17:15:57Z

An update -- this should be made ~2x faster in the near term (months) by scipy/scipy#5592 and the related scipy/scipy#5610. By my estimate, using upfirdn through scipy/scipy#5186 will require more time (~6-12 months).

choldgraf · 2015-12-21T19:51:29Z

That's great - looking forward to not dreading my resampling steps :)

larsoner · 2016-01-20T18:43:55Z

FYI scipy/scipy#5610 (upfirdn) has been merged and I've started a upfirdn-based resample discussion in scipy/scipy#5746.

choldgraf · 2016-01-20T18:54:55Z

wohoo! wheels are a-turning :)

On Wed, Jan 20, 2016 at 10:43 AM, Eric Larson notifications@github.com
wrote:

FYI scipy/scipy#5610 scipy/scipy#5610 (upfirdn)
has been merged and I've started a upfirdn-based resample discussion in
scipy/scipy#5746 scipy/scipy#5746.

—
Reply to this email directly or view it on GitHub
#2035 (comment)
.

jona-sassenhagen · 2016-01-20T22:32:01Z

Indeed, v. cool.

larsoner mentioned this issue Sep 1, 2015

BUG? epochs.resample takes foreever #2445

Closed

larsoner mentioned this issue Dec 18, 2015

ENH: Speed up frequency functions #2722

Closed

larsoner mentioned this issue Jan 15, 2016

EHN: raw decimate? #2792

Closed

larsoner mentioned this issue Mar 2, 2016

MRG: Faster raw resampling #2978

Merged

larsoner closed this as completed in #2978 Mar 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample function optimizations #2035

Resample function optimizations #2035

choldgraf commented Apr 28, 2015

agramfort commented Apr 28, 2015 via email

larsoner commented Apr 28, 2015

choldgraf commented Apr 28, 2015

larsoner commented Apr 28, 2015 via email

choldgraf commented Apr 29, 2015

larsoner commented Apr 29, 2015

larsoner commented Apr 29, 2015

choldgraf commented Apr 29, 2015

larsoner commented Apr 29, 2015

choldgraf commented Apr 29, 2015

larsoner commented Apr 29, 2015

choldgraf commented Apr 29, 2015

larsoner commented Oct 21, 2015

larsoner commented Dec 18, 2015

choldgraf commented Dec 21, 2015

larsoner commented Jan 20, 2016

choldgraf commented Jan 20, 2016

jona-sassenhagen commented Jan 20, 2016

Resample function optimizations #2035

Resample function optimizations #2035

Comments

choldgraf commented Apr 28, 2015

agramfort commented Apr 28, 2015 via email

larsoner commented Apr 28, 2015

choldgraf commented Apr 28, 2015

larsoner commented Apr 28, 2015 via email

choldgraf commented Apr 29, 2015

larsoner commented Apr 29, 2015

larsoner commented Apr 29, 2015

choldgraf commented Apr 29, 2015

larsoner commented Apr 29, 2015

choldgraf commented Apr 29, 2015

larsoner commented Apr 29, 2015

choldgraf commented Apr 29, 2015

larsoner commented Oct 21, 2015

larsoner commented Dec 18, 2015

choldgraf commented Dec 21, 2015

larsoner commented Jan 20, 2016

choldgraf commented Jan 20, 2016

jona-sassenhagen commented Jan 20, 2016