ENH: Port upfirdn #5186

larsoner · 2015-08-27T14:59:05Z

PR to eventually add upfirdn polyphase filtering to scipy. Built off of BSD-licensed upfirdn Python package:

https://code.google.com/p/upfirdn/

Eventually this polyphase filtering approach be used to hopefully speed up resampling.

Opening WIP PR to get some feedback on how to scipy-ify this code. It will not run as is.

The original code used (and still uses) SWIG to compile C++ to .so. I don't really see that done anywhere else in scipy, so I'm assuming that's the wrong way to go. Should I change the code over to C instead? If so, right now it supports real and complex signals and filters -- should I make different functions for the four different combinations of inputs (RR, CR, RC, CC), or is there another approach I should follow (e.g., a nice example of "the right thing to do")? We could also just do RR for now, and leave the other cases for future work.

Let me know if I should put this to the scipy-dev listserv instead.

cc the original author @sirtom67, thanks for writing what appears to be very clear code!

rgommers · 2015-10-10T16:42:31Z

SWIG is kind of a fossil, it generates large files that we'd have to check in. For sparsetools we got rid of it last year, @pv replaced it by manual wrappers (see gh-3440).

For cKDTree @sturlamolden wrote a whole bunch of C++ code recently and used that from Cython.

Probably Cython is the way to go (http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html), but it would be good to get a more expert opinion than mine.

EDIT: this is pretty small though, so SWIG may be OK.

sturlamolden · 2015-10-10T19:13:00Z

Cython is good for wrapping C++, as is SIP, PyCXX and Boost.Python. Swig is a bit fossile, as @rgommers said, but can be used as well. I am not sure which ones are "approved" for SciPy, apart from Swig and Cython.

With Cython you need to be aware of the difference between wrapping C++ for Python and using C++ in Cython. There are also two ways of wrapping C++ for Python with C++. Either you can wrap the C++ as shown in the user guide @rgommers linked or export the C++ through C callable wrappers. The latter more tedious, but is what ZMQ has done, and also roughly what we did for cKDTree. One reason you might prefer to do the latter has to do with the GIL. If you release the GIL in C++ and an exception is thrown, you don't want the exception handling to jump back into Cython without reacquiring the GIL. One way is to use a C++ function as proxy, and let it take care of the GIL and translating C++ exceptions to Cython. There is a code example in cKDTree. If you try to release the GIL in Cython before you call into C++, it will complain about exception handling in a nogil context (unless things have improved).

Another thing when using C++ is you have to beware that an exception handler can jump out of a block surrounded byNPY_BEGIN_ALLOW_THREADS and NPY_END_ALLOW_THREADS, and thus mess up your GIL handling. So be very, very careful to make sure it is working correctly. As a minimum you will need to put something like this to ensure safe GIL handling in C++:

{
    int got_exc = 0;
    NPY_END_ALLOW_THREADS
    try {
         // safe to use C++ 
    }
    catch(...)  {
        got_exc = 1;
    }
    NPY_END_ALLOW_THREADS
    if (NPY_UNLIKELY(got_exc)) throw;
}

This will trap the exception, get safely out of a nogil block, and re-throw the C++ exception. Now Cython can be allowed to catch the C++ exception and raise a Python exception.

But that begs the question why not just do this?

extern "C" PyObject*
foobar(void *some_cpp_object)
{
    NPY_END_ALLOW_THREADS
    try {
         // safe to use C++ 
    }
    catch(...)  {
        // temporarily reacquire the GIL
        // translate C++ exception to Python exception
        // release the GIL
        //
        // just steal the function from cKDTree :-)
        translate_cpp_exception_with_gil(); 
    }
    NPY_END_ALLOW_THREADS
    if (NPY_UNLIKELY(PyErr_Occured())) 
        return NULL;
    Py_RETURN_NONE;
}

And now you don't really need Cython to wrap C++, because it just looks like C as far as Cython is concerned.

For cKDTree I came to the conclusion that manual wrappers were needed anyway, so all the C++ in Cython really did was to allow me to use STL containers in Cython.

sirtom67 · 2015-10-11T12:04:38Z

First a little background on upfirdn.

Named after the MATLAB function of the same name, upfirdn is a one-dimensional FIR filtering tool with the added ability of resampling the signal. It is "vectorized" in the sense that an N-D "bank" of filters can be applied to an N-D signal array. The implementation of upfirdn that has been submitted is a python wrapper of a templatized one-dimensional "Resampler" class. The template types are the input type, output type, and coefficient type. Swig wrappers are generated for "RR", "RC", "CR" and "CC" double precision cases (R=real, C=complex, the first one is the signal type which applies to both input and output, the second one is the coefficient type).

From Resampler.i:

%template(ResamplerRR) Resampler<double, double, double>;
%template(ResamplerRC) Resampler<double, complex<double>, complex<double> >;
%template(ResamplerCR) Resampler<complex<double>, complex<double>, double >;
%template(ResamplerCC) Resampler<complex<double>, complex<double>, complex<double> >;

The N-D case is handled by creating a numpy object array, where each element is a Resampler of the desired type with equivalent filter lengths and up- and down-rates. Most use cases are "one-shot" so you don't have to handle filter state retention between calls; however the Resampler<> class supports state retention for when you want to process a small amount of signal at a time - think of it as a stream of samples that goes on forever, you can keep feeding input samples into it in chunks and keep getting output samples in chunks as well.

Now to the question at hand.

If SWIG is in fact approved for scipy then that is probably the easiest way to import the Resampler. This is much smaller than the sparse stuff (by a factor of about 100), but it sounds like there may be some benefits besides the size in using cython. I don't understand why the swig generated wrappers need to be checked in under revision control. That is not ideal.

I have no experience with cython but I'm interested in learning about it. If I get anywhere with a cython wrapped Resampler I'll let you know. I don't understand the stuff @sturlamolden brings up about the GIL. Not sure this isn't a problem with SWIG too.

sturlamolden · 2015-10-11T15:11:12Z

Looking at Motorola's code it is just a tiny piece of C++:

https://code.google.com/p/upfirdn/source/browse/upfirdn/Resampler.h

I see no reason we cannot re-implement this in plain Cython instead.

(Another thing is that Motorola's C++ code can leak memory in two different places. Do you see the errors? This is what gives C++ such a bad reputation. If they had sticked to C those mistakes would not have been made. When porting to Cython we must be careful because it is easy to make the same errors in Cython. )

sturlamolden · 2015-10-11T15:26:16Z

Here are the memory leaks squashed out:

/*
Copyright (c) 2009, Motorola, Inc

All Rights Reserved.

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright 
notice, this list of conditions and the following disclaimer in the 
documentation and/or other materials provided with the distribution.

* Neither the name of Motorola nor the names of its contributors may be 
used to endorse or promote products derived from this software without 
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS 
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,  
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR 
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef RESAMPLER_H
#define RESAMPLER_H

using namespace std;

#include <stdexcept>
#include <complex>
#include <vector>

template<class S1, class S2, class C>
class Resampler{
public:
    typedef    S1 inputType;
    typedef    S2 outputType;
    typedef    C coefType;

    Resampler(int upRate, int downRate, C *coefs, int coefCount);
    virtual ~Resampler();

    int        apply(S1* in, int inCount, S2* out, int outCount);
    int        neededOutCount(int inCount);
    int        coefsPerPhase() { return _coefsPerPhase; }

private:
    int        _upRate;
    int        _downRate;

    coefType   *_transposedCoefs;
    inputType  *_state;
    inputType  *_stateEnd;

    int        _paddedCoefCount;  // ceil(len(coefs)/upRate)*upRate
    int        _coefsPerPhase;    // _paddedCoefCount / upRate

    int        _t;                // "time" (modulo upRate)
    int        _xOffset;

};


#include <iostream>
#include <cmath>

/*
using std::cout;
using std::endl;

using std::fill;
using std::copy;
*/

using std::invalid_argument;

template<class S1, class S2, class C>
Resampler<S1, S2, C>::Resampler(int upRate, int downRate, C *coefs,
                                int coefCount):
  _upRate(upRate), _downRate(downRate), _t(0), _xOffset(0), 

// also initialize these members
_transposedCoefs(NULL), _state(NULL)

/*
  The coefficients are copied into local storage in a transposed, flipped
  arrangement.  For example, suppose upRate is 3, and the input number
  of coefficients coefCount = 10, represented as h[0], ..., h[9].
  Then the internal buffer will look like this:
        h[9], h[6], h[3], h[0],   // flipped phase 0 coefs
           0, h[7], h[4], h[1],   // flipped phase 1 coefs (zero-padded)
           0, h[8], h[5], h[2],   // flipped phase 2 coefs (zero-padded)
*/
{
    _paddedCoefCount = coefCount;
    while (_paddedCoefCount % _upRate) {
        _paddedCoefCount++;
    }
    _coefsPerPhase = _paddedCoefCount / _upRate;

    _transposedCoefs = new coefType[_paddedCoefCount];
    fill(_transposedCoefs, _transposedCoefs + _paddedCoefCount, 0.);

    _state = new inputType[_coefsPerPhase - 1];
    _stateEnd = _state + _coefsPerPhase - 1;
    fill(_state, _stateEnd, 0.);


    /* This both transposes, and "flips" each phase, while
     * copying the defined coefficients into local storage.
     * There is probably a faster way to do this
     */
    for (int i=0; i<_upRate; ++i) {
        for (int j=0; j<_coefsPerPhase; ++j) {
            if (j*_upRate + i  < coefCount)
                _transposedCoefs[(_coefsPerPhase-1-j) + i*_coefsPerPhase] =
                                                coefs[j*_upRate + i];
        }
    }
}

template<class S1, class S2, class C>
Resampler<S1, S2, C>::~Resampler() {

    // check that they are not NULL or this might segfault
    // delete [] _transposedCoefs;
    // delete [] _state;

    if (_transposedCoefs) delete [] _transposedCoefs;
    if (_state) delete [] _state;
}

template<class S1, class S2, class C>
int Resampler<S1, S2, C>::neededOutCount(int inCount)
/* compute how many outputs will be generated for inCount inputs  */
{
    unsigned long np = inCount * (unsigned long) _upRate;
    int need = np / _downRate;
    if ((_t + _upRate * _xOffset) < (np % _downRate))
        need++;
    return need;
}

template<class S1, class S2, class C>
int Resampler<S1, S2, C>::apply(S1* in, int inCount, 
                                S2* out, int outCount) {
    if (outCount < neededOutCount(inCount)) 
        throw invalid_argument("Not enough output samples");

    // x points to the latest processed input sample
    inputType *x = in + _xOffset;
    outputType *y = out;
    inputType *end = in + inCount;
    while (x < end) {
        outputType acc = 0.;
        coefType *h = _transposedCoefs + _t*_coefsPerPhase;
        inputType *xPtr = x - _coefsPerPhase + 1;
        int offset = in - xPtr;
        if (offset > 0) {
            // need to draw from the _state buffer
            inputType *statePtr = _stateEnd - offset;
            while (statePtr < _stateEnd) {
                acc += *statePtr++ * *h++;
            }
            xPtr += offset;
        }
        while (xPtr <= x) {
            acc += *xPtr++ * *h++;
        }
        *y++ = acc;
        _t += _downRate;

        int advanceAmount = _t / _upRate;

        x += advanceAmount;
        // which phase of the filter to use
        _t %= _upRate;
    }
    _xOffset = x - end;

    // manage _state buffer
    // find number of samples retained in buffer:
    int retain = (_coefsPerPhase - 1) - inCount;
    if (retain > 0) {
        // for inCount smaller than state buffer, copy end of buffer
        // to beginning:
        copy(_stateEnd - retain, _stateEnd, _state);
        // Then, copy the entire (short) input to end of buffer
        copy(in, end, _stateEnd - inCount);
    } else {
        // just copy last input samples into state buffer
        copy(end - (_coefsPerPhase - 1), end, _state);
    }
    // number of samples computed
    return y - out;
}

template<class S1, class S2, class C>
void upfirdn(int upRate, int downRate, 
             S1 *input, int inLength, C *filter, int filterLength, 
             vector<S2> &results)
/*
This template function provides a one-shot resampling.  Extra samples
are padded to the end of the input in order to capture all of the non-zero 
output samples.
The output is in the "results" vector which is modified by the function.

Note, I considered returning a vector instead of taking one on input, but
then the C++ compiler has trouble with implicit template instantiation
(e.g. have to say upfirdn<float, float, float> every time - this
way we can let the compiler infer the template types).

Thanks to Lewis Anderson (lkanders@ucsd.edu) at UCSD for
the original version of this function.
*/
{
    // Create the Resampler
    Resampler<S1, S2, C> theResampler(upRate, downRate, filter, filterLength);

    // pad input by length of one polyphase of filter to flush all values out
    int padding = theResampler.coefsPerPhase() - 1;

    // possible memory leak: new used outside constructor
    // S1 *inputPadded = new S1[inLength + padding];
    std::vector<S1> inputPadded(inLength + padding);

    for (int i = 0; i < inLength + padding; i++) {
        if (i < inLength)
            inputPadded[i] = input[i];
        else
            inputPadded[i] = 0;
    }

    // calc size of output
    int resultsCount = theResampler.neededOutCount(inLength + padding); 

    results.resize(resultsCount);

    // run filtering

    // crappy code with unused local variable
    // int numSamplesComputed = theResampler.apply(inputPadded, 
    //        inLength + padding, &results[0], resultsCount);
    theResampler.apply(&inputPadded[0], inLength + padding, &results[0], resultsCount);

   // Don't need this
   // delete[] inputPadded;
}

template<class S1, class S2, class C>
void upfirdn(int upRate, int downRate, 
             vector<S1> &input, vector<C> &filter, vector<S2> &results)
/*
This template function provides a one-shot resampling.
The output is in the "results" vector which is modified by the function.
In this version, the input and filter are vectors as opposed to 
pointer/count pairs.
*/
{
    upfirdn<S1, S2, C>(upRate, downRate, &input[0], input.size(), &filter[0], 
                       filter.size(), results);
}


#endif

sturlamolden · 2015-10-11T15:38:59Z

Oh yes, we also need to fix this for working correctly on 32 and 64 bit

/*
Copyright (c) 2009, Motorola, Inc

All Rights Reserved.

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright 
notice, this list of conditions and the following disclaimer in the 
documentation and/or other materials provided with the distribution.

* Neither the name of Motorola nor the names of its contributors may be 
used to endorse or promote products derived from this software without 
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS 
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,  
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR 
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/


#ifndef RESAMPLER_H
#define RESAMPLER_H

#include <Python.h>
#include "numpy/arrayobject.h"

using namespace std;

#include <stdexcept>
#include <complex>
#include <vector>

template<class S1, class S2, class C>
class Resampler{
public:
    typedef    S1 inputType;
    typedef    S2 outputType;
    typedef    C coefType;

    Resampler(npy_intp upRate, npy_intp downRate, C *coefs, npy_intp coefCount);
    virtual ~Resampler();

    npy_intp        apply(S1* in, npy_intp inCount, S2* out, npy_intp outCount);
    npy_intp        neededOutCount(npy_intp inCount);
    npy_intp        coefsPerPhase() { return _coefsPerPhase; }

private:
    npy_intp        _upRate;
    npy_intp        _downRate;

    coefType   *_transposedCoefs;
    inputType  *_state;
    inputType  *_stateEnd;

    npy_intp        _paddedCoefCount;  // ceil(len(coefs)/upRate)*upRate
    npy_intp        _coefsPerPhase;    // _paddedCoefCount / upRate

    npy_intp        _t;                // "time" (modulo upRate)
    npy_intp        _xOffset;

};


#include <iostream>
#include <cmath>

/*
using std::cout;
using std::endl;

using std::fill;
using std::copy;
*/

using std::invalid_argument;

template<class S1, class S2, class C>
Resampler<S1, S2, C>::Resampler(npy_intp upRate, npy_intp downRate, C *coefs,
                                npy_intp coefCount):
  _upRate(upRate), _downRate(downRate), _t(0), _xOffset(0), 

  // also initialize these members
  _transposedCoefs(NULL), _state(NULL)

/*
  The coefficients are copied npy_intpo local storage in a transposed, flipped
  arrangement.  For example, suppose upRate is 3, and the input number
  of coefficients coefCount = 10, represented as h[0], ..., h[9].
  Then the npy_intpernal buffer will look like this:
        h[9], h[6], h[3], h[0],   // flipped phase 0 coefs
           0, h[7], h[4], h[1],   // flipped phase 1 coefs (zero-padded)
           0, h[8], h[5], h[2],   // flipped phase 2 coefs (zero-padded)
*/
{
    _paddedCoefCount = coefCount;
    while (_paddedCoefCount % _upRate) {
        _paddedCoefCount++;
    }
    _coefsPerPhase = _paddedCoefCount / _upRate;

    _transposedCoefs = new coefType[_paddedCoefCount];
    fill(_transposedCoefs, _transposedCoefs + _paddedCoefCount, 0.);

    _state = new inputType[_coefsPerPhase - 1];
    _stateEnd = _state + _coefsPerPhase - 1;
    fill(_state, _stateEnd, 0.);


    /* This both transposes, and "flips" each phase, while
     * copying the defined coefficients npy_intpo local storage.
     * There is probably a faster way to do this
     */
    for (npy_intp i=0; i<_upRate; ++i) {
        for (npy_intp j=0; j<_coefsPerPhase; ++j) {
            if (j*_upRate + i  < coefCount)
                _transposedCoefs[(_coefsPerPhase-1-j) + i*_coefsPerPhase] =
                                                coefs[j*_upRate + i];
        }
    }
}

template<class S1, class S2, class C>
Resampler<S1, S2, C>::~Resampler() {

    // check that they are not NULL or this might segfault
    // delete [] _transposedCoefs;
    // delete [] _state;

    if (_transposedCoefs) delete [] _transposedCoefs;
    if (_state) delete [] _state;
}

template<class S1, class S2, class C>
npy_intp Resampler<S1, S2, C>::neededOutCount(npy_intp inCount)
/* compute how many outputs will be generated for inCount inputs  */
{
    npy_uintp np = inCount * (npy_uintp) _upRate;
    npy_intp need = np / _downRate;
    if ((_t + _upRate * _xOffset) < (np % _downRate))
        need++;
    return need;
}

template<class S1, class S2, class C>
npy_intp Resampler<S1, S2, C>::apply(S1* in, npy_intp inCount, 
                                S2* out, npy_intp outCount) {
    if (outCount < neededOutCount(inCount)) 
        throw invalid_argument("Not enough output samples");

    // x points to the latest processed input sample
    inputType *x = in + _xOffset;
    outputType *y = out;
    inputType *end = in + inCount;
    while (x < end) {
        outputType acc = 0.;
        coefType *h = _transposedCoefs + _t*_coefsPerPhase;
        inputType *xPtr = x - _coefsPerPhase + 1;
        npy_intp offset = in - xPtr;
        if (offset > 0) {
            // need to draw from the _state buffer
            inputType *statePtr = _stateEnd - offset;
            while (statePtr < _stateEnd) {
                acc += *statePtr++ * *h++;
            }
            xPtr += offset;
        }
        while (xPtr <= x) {
            acc += *xPtr++ * *h++;
        }
        *y++ = acc;
        _t += _downRate;

        npy_intp advanceAmount = _t / _upRate;

        x += advanceAmount;
        // which phase of the filter to use
        _t %= _upRate;
    }
    _xOffset = x - end;

    // manage _state buffer
    // find number of samples retained in buffer:
    npy_intp retain = (_coefsPerPhase - 1) - inCount;
    if (retain > 0) {
        // for inCount smaller than state buffer, copy end of buffer
        // to beginning:
        copy(_stateEnd - retain, _stateEnd, _state);
        // Then, copy the entire (short) input to end of buffer
        copy(in, end, _stateEnd - inCount);
    } else {
        // just copy last input samples npy_intpo state buffer
        copy(end - (_coefsPerPhase - 1), end, _state);
    }
    // number of samples computed
    return y - out;
}

template<class S1, class S2, class C>
void upfirdn(npy_intp upRate, npy_intp downRate, 
             S1 *input, npy_intp inLength, C *filter, npy_intp filterLength, 
             vector<S2> &results)
/*
This template function provides a one-shot resampling.  Extra samples
are padded to the end of the input in order to capture all of the non-zero 
output samples.
The output is in the "results" vector which is modified by the function.

Note, I considered returning a vector instead of taking one on input, but
then the C++ compiler has trouble with implicit template instantiation
(e.g. have to say upfirdn<float, float, float> every time - this
way we can let the compiler infer the template types).

Thanks to Lewis Anderson (lkanders@ucsd.edu) at UCSD for
the original version of this function.
*/
{
    // Create the Resampler
    Resampler<S1, S2, C> theResampler(upRate, downRate, filter, filterLength);

    // pad input by length of one polyphase of filter to flush all values out
    npy_intp padding = theResampler.coefsPerPhase() - 1;

    // possible memory leak: new used outside constructor
    // S1 *inputPadded = new S1[inLength + padding];
    std::vector<S1> inputPadded(inLength + padding);

    for (npy_intp i = 0; i < inLength + padding; i++) {
        if (i < inLength)
            inputPadded[i] = input[i];
        else
            inputPadded[i] = 0;
    }

    // calc size of output
    npy_intp resultsCount = theResampler.neededOutCount(inLength + padding); 

    results.resize(resultsCount);

    // run filtering

    // crappy code with unused local variable
    // npy_intp numSamplesComputed = theResampler.apply(inputPadded, 
    //        inLength + padding, &results[0], resultsCount);
    theResampler.apply(&inputPadded[0], inLength + padding, &results[0], resultsCount);

   // Don't need this
   // delete[] inputPadded;
}

template<class S1, class S2, class C>
void upfirdn(npy_intp upRate, npy_intp downRate, 
             vector<S1> &input, vector<C> &filter, vector<S2> &results)
/*
This template function provides a one-shot resampling.
The output is in the "results" vector which is modified by the function.
In this version, the input and filter are vectors as opposed to 
ponpy_intper/count pairs.
*/
{
    upfirdn<S1, S2, C>(upRate, downRate, &input[0], input.size(), &filter[0], 
                       filter.size(), results);
}


#endif

larsoner · 2015-10-11T17:41:20Z

Can you open a pr into my branch? It would be much easier to follow changes that way

larsoner · 2015-10-11T17:59:58Z

Or you're welcome to open a different PR into scipy if it's easier, I just want the feature in one way or another :)

rgommers · 2015-10-11T19:00:58Z

I don't understand why the swig generated wrappers need to be checked in under revision control. That is not ideal.

Because otherwise everyone who builds scipy from a git clone needs to have SWIG installed - that's just a complete no-go.

sturlamolden · 2015-10-11T19:49:43Z

I still don't understand why we need C++ for this. It is a tiny piece of C++ that requires more code to wrap than to write from scratch in Cython. I even wonder if we need compiled code. Looking at the C++ code it seems to me that some plain Python calling scipy.signal.lfilter or scipy.signal.convolve should work just fine.

sturlamolden · 2015-10-11T20:21:18Z

Here is what Octave's upfirdn.m does:
http://octave-signal.sourcearchive.com/documentation/1.0.9/upfirdn_8m-source.html
We only need a few lines of Python code including a call to lfilter for this.

larsoner · 2015-10-11T21:07:24Z

@sturlamolden you can implement it that way, but doing that will involve a bunch of unnecessary calculations (thinking about the zero-insertion step of upsampling and the subselection step of downsampling). IIRC this is a polyphase filtering implementation that avoids those unnecessary calculations.

sirtom67 · 2015-10-11T22:50:39Z

Yes @Eric89GXL is correct, this is an efficient polyphase implementation - it only does the necessary multiplies and summations. In fact the "resample" function at line 49 in the test code at
https://code.google.com/p/upfirdn/source/browse/upfirdn/test_upfirdn.py
is the slow equivalent implementation, that is used as a reference. It can be 100s-1000s of times slower depending on the uprate, downrate, and number of coefficients.

I am the original author back when I was with Motorola. upfirdn is a staple of signal processing in matlab and I always wanted it in python. We were hoping it would be included in numpy or scipy - better late than never! As it is now I install this package along with numpy and scipy on every new machine.

@sturlamolden - good catch on the memory leak! I wasn't aware that the _transposedCoefs and _state could leak - I was assuming that the constructor would always finish. Now I see if either "new" throws an assertion, we would have this problem. Your approach of initializing to NULL solves this. Do I have that right or is there another issue here causing the leak? How would you test for this since most machines have a lot of RAM these days?

As far as the upfirdn C++ function: this is for folks wanting a C++ only implementation and was contributed later. This is not used by the python ResamplerBank wrapper class. It needn't be included in this version.

sturlamolden · 2015-10-11T23:00:54Z

Yes, if you were using a DSP that might be a concern. But this is SciPy, performance is mostly memory-bound, and we are not aiming to be the 'fastest' but rather 'fast enough'.

Simplicity of the source code, readability and easy maintenance are also important, as is avoiding a complicated build involving C++ and Swig. If you look at SciPy you will see that most of it is actually plain Python code. It is much more easy to maintain. The order of preference is still

Python > Cython > C > C++ > Fortran

How much faster is a C++ version likely to be? Not much. Slicing and lfilter are already in compiled C code. And in most cases the extra overhead from the Swig layer would dwarf any such savings.

The only valid concern I can see is if you need a huge upsampling and want to save memory by avoiding temporary storage. But you can do this with lfilter too because it can take zi as an optional argument. So you can just chop up the work in acceptable chunks of data. The extra Python overhead should be tiny.

sturlamolden · 2015-10-11T23:06:21Z

@sirtom67 The two new[] in the contructor cannot leak, but they can cause a segfault if they throw a std::bad_alloc and the destructor subsequently does a delete [] on a wild pointer. The one that can leak is the new in upfirdn which should be replaced by a std::vector on the stack. You can provoke it by having Resampler::apply throw std::invalid_argument. There are also other cases that can provoke this, e.g. if std::vector::resize or std::copy throws an exception. In case of an exception the delete statement will not be reached.

sirtom67 · 2015-10-11T23:23:39Z

As a point of reference, for many years (may still be the case) there were two MEX files in the signal processing toolbox in MATLAB: remez, and upfirdn. It is MUCH faster in C (or C++) compared to python. Much of the package (ResamplerBank and the upfirdn function) is already python. Only the 1D FIR resampling code, the most critical piece, is in native code. I'm all for a cython or c implementation if someone doesn't mind translating it.

As for bugs: the unittests for this code are pretty thorough. The bug that you found was not encountered because it was not tested in a way that exhausted the memory. The other leak in the upfirdn function doesn't matter because that code is not going to be included - the main thing is the Resampler C++ class. The unittests exercise Resampler with lot of combinations of length of filter, uprate, downrate, and size of signals being applied. I have pounded on it a lot and it's proven stable for a while.

sturlamolden · 2015-10-11T23:35:13Z

Ok, I didn't look at the Python code, but yes upfirdn in C++ is not used in the Python API. But the bug is still in there ;-)

sturlamolden · 2015-10-11T23:38:06Z

If the Swig wrappers are known to work correctly I don't really see the purpose of not using them. If they break sometime in the future they can be regenerated or replaced.

larsoner · 2015-10-11T23:38:30Z

How much faster is a C++ version likely to be? Not much.

The previous reply suggested 100-1000x can occur, which is about what I expected. These gains are relevant to many use cases, especially resampling. If you have tried scipy's FFT-based resampling and compared it to that of MATLAB, which wraps to upfirdn, the differences are striking. This is one of the few places I still really feel pain in scientific computing in Python compared to MATLAB. So to me it's worth the complexity of compiled code, even if the complexity can't be reduced from what it is now. But I can definitely try changing it over to Cython to simplify it if others also agree it's worthwhile.

sturlamolden · 2015-10-11T23:42:49Z

MATLAB also have the advantage of using FFTW (unless they have switched to MKL) for FFTs, while we are stuck with FFTPACK because of licensing issues.

sturlamolden · 2015-10-11T23:48:18Z

100x to 1000x speed difference is what we expect when comparing algorithmic Python code to C code. But here we already have the slicing and filtering code in C, which constitutes the bulk of the work. I would say 10x is more likely. But does anyone have a timing of upfirdn on Python?

Speed-differences in Matlab is not really relevant. It is often a lot slower than Python.

FFT-based resampling is O(n log n) compared to O(k * n) for convolution based, with k being the filter length. For small values of k this can matter a lot. But then we could solve this by using convolution instead of FFT-based resampling. I don't think the speed of SciPY's FFT based resampler has any relevance here.

larsoner · 2015-10-12T01:13:10Z

I think the 100-1000x was comparing full upsample fir downsample to polyphase, even if both are in C. There are examples where you'd do 1000x+ unnecessary calculations given the set of computations performed, regardless of language. (Think about upsampling by 1000, FIR filtering, then downsampling by 999. There are a lot of zero multiplies in the FIR and discarded intermediate results in the decimation in there.) But I can run some benchmarks if you still aren't convinced.

sturlamolden · 2015-10-12T01:20:14Z

You don't have to convince me :-)

I didn't think about all those multiply-by-zeros, but that is a good point.

sirtom67 · 2015-10-12T13:13:54Z

I thought a bit more about the algorithmic differences. The direct approach of (upsample by factor of P with zero insertion, FIR filter of length Nh, downsample by factor of Q) is O(Nh*Q) per output sample. The efficient polyphase implementation is O(Nh/P).

larsoner · 2015-10-12T14:23:33Z

Cool, I'll add that to the docstring, and also see if I can simplify the wrappers. It would be good for me to learn Cython C++ anyway.

larsoner · 2015-10-22T16:26:49Z

Okay, I've made some progress first translating to simple Python code (with tests passing), now I can go about converting it to Cython (not pushed for now). I'm just going to do the real case first, then expand later. Thanks for the feedback folks, hopefully I can have something useful soon.

larsoner · 2015-11-30T16:21:53Z

@sturlamolden thanks for the tip, I rebased and now we're green.

Ready for review/merge from my end.

larsoner · 2015-12-04T16:03:55Z

@endolith any interest in reviewing another filtering PR? This one hopefully is simpler than SOS with only +437 :)

endolith · 2015-12-05T23:10:34Z

@Eric89GXL So many things, so little time. :D

endolith · 2015-12-05T23:11:26Z

scipy/signal/_upfirdn.py

+
+    Parameters
+    ----------
+    h : array-like


I've always seen these as array_like. I don't know why, but consistency is good.

larsoner · 2015-12-07T15:59:45Z

Comments addressed.

ewmoore · 2016-01-18T15:51:06Z

@Eric89GXL, if you could squash this down a little bit, I'll merge it.

sturlamolden · 2016-01-19T05:25:26Z

Pyflakes error

larsoner · 2016-01-19T14:28:51Z

@sturlamolden yes but it's in benchmarks/benchmarks/signal_filtering.py:30:5 which I didn't change in this PR. So ready to merge then?

sturlamolden · 2016-01-19T14:44:48Z

I don't think there is a pyflakes error in master, so you should probably rebase.

codecov-io · 2016-01-19T15:40:39Z

@@            master   #5186   diff @@
======================================
  Files          234     235     +1
  Stmts        43180   43211    +31
  Branches      8161    8163     +2
  Methods          0       0       
======================================
+ Hit          33500   33531    +31
  Partial       2602    2602       
  Missed        7078    7078

Review entire Coverage Diff as of b64d39d

Powered by Codecov. Updated on successful CI builds.

larsoner · 2016-01-19T15:41:22Z

All better

sturlamolden · 2016-01-19T16:33:32Z

Yes, it looks ready to be merged now, I think.

ewmoore · 2016-01-20T17:57:10Z

Thanks @Eric89GXL!

ENH: Port upfirdn

larsoner · 2016-01-20T18:20:52Z

Thanks for the reviews! Now to make a fast resampling algorithm...

sirtom67 · 2016-01-20T18:48:55Z

Nice work Eric and other contributors! A very worthwhile addition.

larsoner mentioned this pull request Sep 1, 2015

BUG? epochs.resample takes foreever mne-tools/mne-python#2445

Closed

rgommers added enhancement A new feature or improvement scipy.signal labels Oct 10, 2015

larsoner mentioned this pull request Oct 21, 2015

Resample function optimizations mne-tools/mne-python#2035

Closed

larsoner mentioned this pull request Nov 30, 2015

ENH: Add zero_phase kwarg to signal.decimate #5392

Merged

rgommers added this to the 0.18.0 milestone Dec 1, 2015

endolith reviewed Dec 5, 2015
View reviewed changes

larsoner force-pushed the upfirdn branch 2 times, most recently from 9548d4d to f70ccea Compare January 2, 2016 16:55

larsoner force-pushed the upfirdn branch from f70ccea to 7abe128 Compare January 18, 2016 17:19

larsoner and others added 3 commits January 19, 2016 09:49

ENH: Add upfirdn

bba78ac

MAINT: make upfirdn compile

245246b

FIX: Add datatypes, address comments, coverage

e689e37

larsoner force-pushed the upfirdn branch from 7abe128 to e689e37 Compare January 19, 2016 14:49

ewmoore added a commit that referenced this pull request Jan 20, 2016

Merge pull request #5186 from Eric89GXL/upfirdn

8c841b6

ENH: Port upfirdn

ewmoore merged commit 8c841b6 into scipy:master Jan 20, 2016

larsoner mentioned this pull request Jul 15, 2016

Complex-valued resampling bmcfee/resampy#14

Closed

larsoner deleted the upfirdn branch July 15, 2016 11:38

ENH: Port upfirdn #5186

ENH: Port upfirdn #5186

Conversation

larsoner commented Aug 27, 2015

rgommers commented Oct 10, 2015

sturlamolden commented Oct 10, 2015

sirtom67 commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

larsoner commented Oct 11, 2015 via email

larsoner commented Oct 11, 2015 via email

rgommers commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

larsoner commented Oct 11, 2015

sirtom67 commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

sirtom67 commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

larsoner commented Oct 11, 2015 via email

sturlamolden commented Oct 11, 2015

sturlamolden commented Oct 11, 2015

larsoner commented Oct 12, 2015 via email

sturlamolden commented Oct 12, 2015

sirtom67 commented Oct 12, 2015

larsoner commented Oct 12, 2015 via email

larsoner commented Oct 22, 2015

larsoner commented Nov 30, 2015

larsoner commented Dec 4, 2015

endolith commented Dec 5, 2015

endolith Dec 5, 2015

Choose a reason for hiding this comment

larsoner commented Dec 7, 2015

ewmoore commented Jan 18, 2016

sturlamolden commented Jan 19, 2016

larsoner commented Jan 19, 2016

sturlamolden commented Jan 19, 2016

codecov-io commented Jan 19, 2016

larsoner commented Jan 19, 2016

sturlamolden commented Jan 19, 2016

ewmoore commented Jan 20, 2016

larsoner commented Jan 20, 2016

sirtom67 commented Jan 20, 2016