Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Port upfirdn #5186

Merged
merged 3 commits into from
Jan 20, 2016
Merged

ENH: Port upfirdn #5186

merged 3 commits into from
Jan 20, 2016

Conversation

larsoner
Copy link
Member

PR to eventually add upfirdn polyphase filtering to scipy. Built off of BSD-licensed upfirdn Python package:

https://code.google.com/p/upfirdn/

Eventually this polyphase filtering approach be used to hopefully speed up resampling.

Opening WIP PR to get some feedback on how to scipy-ify this code. It will not run as is.

The original code used (and still uses) SWIG to compile C++ to .so. I don't really see that done anywhere else in scipy, so I'm assuming that's the wrong way to go. Should I change the code over to C instead? If so, right now it supports real and complex signals and filters -- should I make different functions for the four different combinations of inputs (RR, CR, RC, CC), or is there another approach I should follow (e.g., a nice example of "the right thing to do")? We could also just do RR for now, and leave the other cases for future work.

Let me know if I should put this to the scipy-dev listserv instead.

cc the original author @sirtom67, thanks for writing what appears to be very clear code!

@rgommers
Copy link
Member

SWIG is kind of a fossil, it generates large files that we'd have to check in. For sparsetools we got rid of it last year, @pv replaced it by manual wrappers (see gh-3440).

For cKDTree @sturlamolden wrote a whole bunch of C++ code recently and used that from Cython.

Probably Cython is the way to go (http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html), but it would be good to get a more expert opinion than mine.

EDIT: this is pretty small though, so SWIG may be OK.

@sturlamolden
Copy link
Contributor

Cython is good for wrapping C++, as is SIP, PyCXX and Boost.Python. Swig is a bit fossile, as @rgommers said, but can be used as well. I am not sure which ones are "approved" for SciPy, apart from Swig and Cython.

With Cython you need to be aware of the difference between wrapping C++ for Python and using C++ in Cython. There are also two ways of wrapping C++ for Python with C++. Either you can wrap the C++ as shown in the user guide @rgommers linked or export the C++ through C callable wrappers. The latter more tedious, but is what ZMQ has done, and also roughly what we did for cKDTree. One reason you might prefer to do the latter has to do with the GIL. If you release the GIL in C++ and an exception is thrown, you don't want the exception handling to jump back into Cython without reacquiring the GIL. One way is to use a C++ function as proxy, and let it take care of the GIL and translating C++ exceptions to Cython. There is a code example in cKDTree. If you try to release the GIL in Cython before you call into C++, it will complain about exception handling in a nogil context (unless things have improved).

Another thing when using C++ is you have to beware that an exception handler can jump out of a block surrounded byNPY_BEGIN_ALLOW_THREADS and NPY_END_ALLOW_THREADS, and thus mess up your GIL handling. So be very, very careful to make sure it is working correctly. As a minimum you will need to put something like this to ensure safe GIL handling in C++:

{
    int got_exc = 0;
    NPY_END_ALLOW_THREADS
    try {
         // safe to use C++ 
    }
    catch(...)  {
        got_exc = 1;
    }
    NPY_END_ALLOW_THREADS
    if (NPY_UNLIKELY(got_exc)) throw;
} 

This will trap the exception, get safely out of a nogil block, and re-throw the C++ exception. Now Cython can be allowed to catch the C++ exception and raise a Python exception.

But that begs the question why not just do this?

extern "C" PyObject*
foobar(void *some_cpp_object)
{
    NPY_END_ALLOW_THREADS
    try {
         // safe to use C++ 
    }
    catch(...)  {
        // temporarily reacquire the GIL
        // translate C++ exception to Python exception
        // release the GIL
        //
        // just steal the function from cKDTree :-)
        translate_cpp_exception_with_gil(); 
    }
    NPY_END_ALLOW_THREADS
    if (NPY_UNLIKELY(PyErr_Occured())) 
        return NULL;
    Py_RETURN_NONE;
} 

And now you don't really need Cython to wrap C++, because it just looks like C as far as Cython is concerned.

For cKDTree I came to the conclusion that manual wrappers were needed anyway, so all the C++ in Cython really did was to allow me to use STL containers in Cython.

@sirtom67
Copy link
Contributor

First a little background on upfirdn.

Named after the MATLAB function of the same name, upfirdn is a one-dimensional FIR filtering tool with the added ability of resampling the signal. It is "vectorized" in the sense that an N-D "bank" of filters can be applied to an N-D signal array. The implementation of upfirdn that has been submitted is a python wrapper of a templatized one-dimensional "Resampler" class. The template types are the input type, output type, and coefficient type. Swig wrappers are generated for "RR", "RC", "CR" and "CC" double precision cases (R=real, C=complex, the first one is the signal type which applies to both input and output, the second one is the coefficient type).

From Resampler.i:

%template(ResamplerRR) Resampler<double, double, double>;
%template(ResamplerRC) Resampler<double, complex<double>, complex<double> >;
%template(ResamplerCR) Resampler<complex<double>, complex<double>, double >;
%template(ResamplerCC) Resampler<complex<double>, complex<double>, complex<double> >;

The N-D case is handled by creating a numpy object array, where each element is a Resampler of the desired type with equivalent filter lengths and up- and down-rates. Most use cases are "one-shot" so you don't have to handle filter state retention between calls; however the Resampler<> class supports state retention for when you want to process a small amount of signal at a time - think of it as a stream of samples that goes on forever, you can keep feeding input samples into it in chunks and keep getting output samples in chunks as well.

Now to the question at hand.

If SWIG is in fact approved for scipy then that is probably the easiest way to import the Resampler. This is much smaller than the sparse stuff (by a factor of about 100), but it sounds like there may be some benefits besides the size in using cython. I don't understand why the swig generated wrappers need to be checked in under revision control. That is not ideal.

I have no experience with cython but I'm interested in learning about it. If I get anywhere with a cython wrapped Resampler I'll let you know. I don't understand the stuff @sturlamolden brings up about the GIL. Not sure this isn't a problem with SWIG too.

@sturlamolden
Copy link
Contributor

Looking at Motorola's code it is just a tiny piece of C++:

https://code.google.com/p/upfirdn/source/browse/upfirdn/Resampler.h

I see no reason we cannot re-implement this in plain Cython instead.

(Another thing is that Motorola's C++ code can leak memory in two different places. Do you see the errors? This is what gives C++ such a bad reputation. If they had sticked to C those mistakes would not have been made. When porting to Cython we must be careful because it is easy to make the same errors in Cython. )

@sturlamolden
Copy link
Contributor

Here are the memory leaks squashed out:

/*
Copyright (c) 2009, Motorola, Inc

All Rights Reserved.

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright 
notice, this list of conditions and the following disclaimer in the 
documentation and/or other materials provided with the distribution.

* Neither the name of Motorola nor the names of its contributors may be 
used to endorse or promote products derived from this software without 
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS 
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,  
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR 
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef RESAMPLER_H
#define RESAMPLER_H

using namespace std;

#include <stdexcept>
#include <complex>
#include <vector>

template<class S1, class S2, class C>
class Resampler{
public:
    typedef    S1 inputType;
    typedef    S2 outputType;
    typedef    C coefType;

    Resampler(int upRate, int downRate, C *coefs, int coefCount);
    virtual ~Resampler();

    int        apply(S1* in, int inCount, S2* out, int outCount);
    int        neededOutCount(int inCount);
    int        coefsPerPhase() { return _coefsPerPhase; }

private:
    int        _upRate;
    int        _downRate;

    coefType   *_transposedCoefs;
    inputType  *_state;
    inputType  *_stateEnd;

    int        _paddedCoefCount;  // ceil(len(coefs)/upRate)*upRate
    int        _coefsPerPhase;    // _paddedCoefCount / upRate

    int        _t;                // "time" (modulo upRate)
    int        _xOffset;

};


#include <iostream>
#include <cmath>

/*
using std::cout;
using std::endl;

using std::fill;
using std::copy;
*/

using std::invalid_argument;

template<class S1, class S2, class C>
Resampler<S1, S2, C>::Resampler(int upRate, int downRate, C *coefs,
                                int coefCount):
  _upRate(upRate), _downRate(downRate), _t(0), _xOffset(0), 

// also initialize these members
_transposedCoefs(NULL), _state(NULL)

/*
  The coefficients are copied into local storage in a transposed, flipped
  arrangement.  For example, suppose upRate is 3, and the input number
  of coefficients coefCount = 10, represented as h[0], ..., h[9].
  Then the internal buffer will look like this:
        h[9], h[6], h[3], h[0],   // flipped phase 0 coefs
           0, h[7], h[4], h[1],   // flipped phase 1 coefs (zero-padded)
           0, h[8], h[5], h[2],   // flipped phase 2 coefs (zero-padded)
*/
{
    _paddedCoefCount = coefCount;
    while (_paddedCoefCount % _upRate) {
        _paddedCoefCount++;
    }
    _coefsPerPhase = _paddedCoefCount / _upRate;

    _transposedCoefs = new coefType[_paddedCoefCount];
    fill(_transposedCoefs, _transposedCoefs + _paddedCoefCount, 0.);

    _state = new inputType[_coefsPerPhase - 1];
    _stateEnd = _state + _coefsPerPhase - 1;
    fill(_state, _stateEnd, 0.);


    /* This both transposes, and "flips" each phase, while
     * copying the defined coefficients into local storage.
     * There is probably a faster way to do this
     */
    for (int i=0; i<_upRate; ++i) {
        for (int j=0; j<_coefsPerPhase; ++j) {
            if (j*_upRate + i  < coefCount)
                _transposedCoefs[(_coefsPerPhase-1-j) + i*_coefsPerPhase] =
                                                coefs[j*_upRate + i];
        }
    }
}

template<class S1, class S2, class C>
Resampler<S1, S2, C>::~Resampler() {

    // check that they are not NULL or this might segfault
    // delete [] _transposedCoefs;
    // delete [] _state;

    if (_transposedCoefs) delete [] _transposedCoefs;
    if (_state) delete [] _state;
}

template<class S1, class S2, class C>
int Resampler<S1, S2, C>::neededOutCount(int inCount)
/* compute how many outputs will be generated for inCount inputs  */
{
    unsigned long np = inCount * (unsigned long) _upRate;
    int need = np / _downRate;
    if ((_t + _upRate * _xOffset) < (np % _downRate))
        need++;
    return need;
}

template<class S1, class S2, class C>
int Resampler<S1, S2, C>::apply(S1* in, int inCount, 
                                S2* out, int outCount) {
    if (outCount < neededOutCount(inCount)) 
        throw invalid_argument("Not enough output samples");

    // x points to the latest processed input sample
    inputType *x = in + _xOffset;
    outputType *y = out;
    inputType *end = in + inCount;
    while (x < end) {
        outputType acc = 0.;
        coefType *h = _transposedCoefs + _t*_coefsPerPhase;
        inputType *xPtr = x - _coefsPerPhase + 1;
        int offset = in - xPtr;
        if (offset > 0) {
            // need to draw from the _state buffer
            inputType *statePtr = _stateEnd - offset;
            while (statePtr < _stateEnd) {
                acc += *statePtr++ * *h++;
            }
            xPtr += offset;
        }
        while (xPtr <= x) {
            acc += *xPtr++ * *h++;
        }
        *y++ = acc;
        _t += _downRate;

        int advanceAmount = _t / _upRate;

        x += advanceAmount;
        // which phase of the filter to use
        _t %= _upRate;
    }
    _xOffset = x - end;

    // manage _state buffer
    // find number of samples retained in buffer:
    int retain = (_coefsPerPhase - 1) - inCount;
    if (retain > 0) {
        // for inCount smaller than state buffer, copy end of buffer
        // to beginning:
        copy(_stateEnd - retain, _stateEnd, _state);
        // Then, copy the entire (short) input to end of buffer
        copy(in, end, _stateEnd - inCount);
    } else {
        // just copy last input samples into state buffer
        copy(end - (_coefsPerPhase - 1), end, _state);
    }
    // number of samples computed
    return y - out;
}

template<class S1, class S2, class C>
void upfirdn(int upRate, int downRate, 
             S1 *input, int inLength, C *filter, int filterLength, 
             vector<S2> &results)
/*
This template function provides a one-shot resampling.  Extra samples
are padded to the end of the input in order to capture all of the non-zero 
output samples.
The output is in the "results" vector which is modified by the function.

Note, I considered returning a vector instead of taking one on input, but
then the C++ compiler has trouble with implicit template instantiation
(e.g. have to say upfirdn<float, float, float> every time - this
way we can let the compiler infer the template types).

Thanks to Lewis Anderson (lkanders@ucsd.edu) at UCSD for
the original version of this function.
*/
{
    // Create the Resampler
    Resampler<S1, S2, C> theResampler(upRate, downRate, filter, filterLength);

    // pad input by length of one polyphase of filter to flush all values out
    int padding = theResampler.coefsPerPhase() - 1;

    // possible memory leak: new used outside constructor
    // S1 *inputPadded = new S1[inLength + padding];
    std::vector<S1> inputPadded(inLength + padding);

    for (int i = 0; i < inLength + padding; i++) {
        if (i < inLength)
            inputPadded[i] = input[i];
        else
            inputPadded[i] = 0;
    }

    // calc size of output
    int resultsCount = theResampler.neededOutCount(inLength + padding); 

    results.resize(resultsCount);

    // run filtering

    // crappy code with unused local variable
    // int numSamplesComputed = theResampler.apply(inputPadded, 
    //        inLength + padding, &results[0], resultsCount);
    theResampler.apply(&inputPadded[0], inLength + padding, &results[0], resultsCount);

   // Don't need this
   // delete[] inputPadded;
}

template<class S1, class S2, class C>
void upfirdn(int upRate, int downRate, 
             vector<S1> &input, vector<C> &filter, vector<S2> &results)
/*
This template function provides a one-shot resampling.
The output is in the "results" vector which is modified by the function.
In this version, the input and filter are vectors as opposed to 
pointer/count pairs.
*/
{
    upfirdn<S1, S2, C>(upRate, downRate, &input[0], input.size(), &filter[0], 
                       filter.size(), results);
}


#endif

@sturlamolden
Copy link
Contributor

Oh yes, we also need to fix this for working correctly on 32 and 64 bit

/*
Copyright (c) 2009, Motorola, Inc

All Rights Reserved.

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright 
notice, this list of conditions and the following disclaimer in the 
documentation and/or other materials provided with the distribution.

* Neither the name of Motorola nor the names of its contributors may be 
used to endorse or promote products derived from this software without 
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS 
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,  
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR 
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/


#ifndef RESAMPLER_H
#define RESAMPLER_H

#include <Python.h>
#include "numpy/arrayobject.h"

using namespace std;

#include <stdexcept>
#include <complex>
#include <vector>

template<class S1, class S2, class C>
class Resampler{
public:
    typedef    S1 inputType;
    typedef    S2 outputType;
    typedef    C coefType;

    Resampler(npy_intp upRate, npy_intp downRate, C *coefs, npy_intp coefCount);
    virtual ~Resampler();

    npy_intp        apply(S1* in, npy_intp inCount, S2* out, npy_intp outCount);
    npy_intp        neededOutCount(npy_intp inCount);
    npy_intp        coefsPerPhase() { return _coefsPerPhase; }

private:
    npy_intp        _upRate;
    npy_intp        _downRate;

    coefType   *_transposedCoefs;
    inputType  *_state;
    inputType  *_stateEnd;

    npy_intp        _paddedCoefCount;  // ceil(len(coefs)/upRate)*upRate
    npy_intp        _coefsPerPhase;    // _paddedCoefCount / upRate

    npy_intp        _t;                // "time" (modulo upRate)
    npy_intp        _xOffset;

};


#include <iostream>
#include <cmath>

/*
using std::cout;
using std::endl;

using std::fill;
using std::copy;
*/

using std::invalid_argument;

template<class S1, class S2, class C>
Resampler<S1, S2, C>::Resampler(npy_intp upRate, npy_intp downRate, C *coefs,
                                npy_intp coefCount):
  _upRate(upRate), _downRate(downRate), _t(0), _xOffset(0), 

  // also initialize these members
  _transposedCoefs(NULL), _state(NULL)

/*
  The coefficients are copied npy_intpo local storage in a transposed, flipped
  arrangement.  For example, suppose upRate is 3, and the input number
  of coefficients coefCount = 10, represented as h[0], ..., h[9].
  Then the npy_intpernal buffer will look like this:
        h[9], h[6], h[3], h[0],   // flipped phase 0 coefs
           0, h[7], h[4], h[1],   // flipped phase 1 coefs (zero-padded)
           0, h[8], h[5], h[2],   // flipped phase 2 coefs (zero-padded)
*/
{
    _paddedCoefCount = coefCount;
    while (_paddedCoefCount % _upRate) {
        _paddedCoefCount++;
    }
    _coefsPerPhase = _paddedCoefCount / _upRate;

    _transposedCoefs = new coefType[_paddedCoefCount];
    fill(_transposedCoefs, _transposedCoefs + _paddedCoefCount, 0.);

    _state = new inputType[_coefsPerPhase - 1];
    _stateEnd = _state + _coefsPerPhase - 1;
    fill(_state, _stateEnd, 0.);


    /* This both transposes, and "flips" each phase, while
     * copying the defined coefficients npy_intpo local storage.
     * There is probably a faster way to do this
     */
    for (npy_intp i=0; i<_upRate; ++i) {
        for (npy_intp j=0; j<_coefsPerPhase; ++j) {
            if (j*_upRate + i  < coefCount)
                _transposedCoefs[(_coefsPerPhase-1-j) + i*_coefsPerPhase] =
                                                coefs[j*_upRate + i];
        }
    }
}

template<class S1, class S2, class C>
Resampler<S1, S2, C>::~Resampler() {

    // check that they are not NULL or this might segfault
    // delete [] _transposedCoefs;
    // delete [] _state;

    if (_transposedCoefs) delete [] _transposedCoefs;
    if (_state) delete [] _state;
}

template<class S1, class S2, class C>
npy_intp Resampler<S1, S2, C>::neededOutCount(npy_intp inCount)
/* compute how many outputs will be generated for inCount inputs  */
{
    npy_uintp np = inCount * (npy_uintp) _upRate;
    npy_intp need = np / _downRate;
    if ((_t + _upRate * _xOffset) < (np % _downRate))
        need++;
    return need;
}

template<class S1, class S2, class C>
npy_intp Resampler<S1, S2, C>::apply(S1* in, npy_intp inCount, 
                                S2* out, npy_intp outCount) {
    if (outCount < neededOutCount(inCount)) 
        throw invalid_argument("Not enough output samples");

    // x points to the latest processed input sample
    inputType *x = in + _xOffset;
    outputType *y = out;
    inputType *end = in + inCount;
    while (x < end) {
        outputType acc = 0.;
        coefType *h = _transposedCoefs + _t*_coefsPerPhase;
        inputType *xPtr = x - _coefsPerPhase + 1;
        npy_intp offset = in - xPtr;
        if (offset > 0) {
            // need to draw from the _state buffer
            inputType *statePtr = _stateEnd - offset;
            while (statePtr < _stateEnd) {
                acc += *statePtr++ * *h++;
            }
            xPtr += offset;
        }
        while (xPtr <= x) {
            acc += *xPtr++ * *h++;
        }
        *y++ = acc;
        _t += _downRate;

        npy_intp advanceAmount = _t / _upRate;

        x += advanceAmount;
        // which phase of the filter to use
        _t %= _upRate;
    }
    _xOffset = x - end;

    // manage _state buffer
    // find number of samples retained in buffer:
    npy_intp retain = (_coefsPerPhase - 1) - inCount;
    if (retain > 0) {
        // for inCount smaller than state buffer, copy end of buffer
        // to beginning:
        copy(_stateEnd - retain, _stateEnd, _state);
        // Then, copy the entire (short) input to end of buffer
        copy(in, end, _stateEnd - inCount);
    } else {
        // just copy last input samples npy_intpo state buffer
        copy(end - (_coefsPerPhase - 1), end, _state);
    }
    // number of samples computed
    return y - out;
}

template<class S1, class S2, class C>
void upfirdn(npy_intp upRate, npy_intp downRate, 
             S1 *input, npy_intp inLength, C *filter, npy_intp filterLength, 
             vector<S2> &results)
/*
This template function provides a one-shot resampling.  Extra samples
are padded to the end of the input in order to capture all of the non-zero 
output samples.
The output is in the "results" vector which is modified by the function.

Note, I considered returning a vector instead of taking one on input, but
then the C++ compiler has trouble with implicit template instantiation
(e.g. have to say upfirdn<float, float, float> every time - this
way we can let the compiler infer the template types).

Thanks to Lewis Anderson (lkanders@ucsd.edu) at UCSD for
the original version of this function.
*/
{
    // Create the Resampler
    Resampler<S1, S2, C> theResampler(upRate, downRate, filter, filterLength);

    // pad input by length of one polyphase of filter to flush all values out
    npy_intp padding = theResampler.coefsPerPhase() - 1;

    // possible memory leak: new used outside constructor
    // S1 *inputPadded = new S1[inLength + padding];
    std::vector<S1> inputPadded(inLength + padding);

    for (npy_intp i = 0; i < inLength + padding; i++) {
        if (i < inLength)
            inputPadded[i] = input[i];
        else
            inputPadded[i] = 0;
    }

    // calc size of output
    npy_intp resultsCount = theResampler.neededOutCount(inLength + padding); 

    results.resize(resultsCount);

    // run filtering

    // crappy code with unused local variable
    // npy_intp numSamplesComputed = theResampler.apply(inputPadded, 
    //        inLength + padding, &results[0], resultsCount);
    theResampler.apply(&inputPadded[0], inLength + padding, &results[0], resultsCount);

   // Don't need this
   // delete[] inputPadded;
}

template<class S1, class S2, class C>
void upfirdn(npy_intp upRate, npy_intp downRate, 
             vector<S1> &input, vector<C> &filter, vector<S2> &results)
/*
This template function provides a one-shot resampling.
The output is in the "results" vector which is modified by the function.
In this version, the input and filter are vectors as opposed to 
ponpy_intper/count pairs.
*/
{
    upfirdn<S1, S2, C>(upRate, downRate, &input[0], input.size(), &filter[0], 
                       filter.size(), results);
}


#endif

@larsoner
Copy link
Member Author

larsoner commented Oct 11, 2015 via email

@larsoner
Copy link
Member Author

larsoner commented Oct 11, 2015 via email

@rgommers
Copy link
Member

I don't understand why the swig generated wrappers need to be checked in under revision control. That is not ideal.

Because otherwise everyone who builds scipy from a git clone needs to have SWIG installed - that's just a complete no-go.

@sturlamolden
Copy link
Contributor

I still don't understand why we need C++ for this. It is a tiny piece of C++ that requires more code to wrap than to write from scratch in Cython. I even wonder if we need compiled code. Looking at the C++ code it seems to me that some plain Python calling scipy.signal.lfilter or scipy.signal.convolve should work just fine.

@sturlamolden
Copy link
Contributor

Here is what Octave's upfirdn.m does:
http://octave-signal.sourcearchive.com/documentation/1.0.9/upfirdn_8m-source.html
We only need a few lines of Python code including a call to lfilter for this.

@larsoner
Copy link
Member Author

@sturlamolden you can implement it that way, but doing that will involve a bunch of unnecessary calculations (thinking about the zero-insertion step of upsampling and the subselection step of downsampling). IIRC this is a polyphase filtering implementation that avoids those unnecessary calculations.

@sirtom67
Copy link
Contributor

Yes @Eric89GXL is correct, this is an efficient polyphase implementation - it only does the necessary multiplies and summations. In fact the "resample" function at line 49 in the test code at
https://code.google.com/p/upfirdn/source/browse/upfirdn/test_upfirdn.py
is the slow equivalent implementation, that is used as a reference. It can be 100s-1000s of times slower depending on the uprate, downrate, and number of coefficients.

I am the original author back when I was with Motorola. upfirdn is a staple of signal processing in matlab and I always wanted it in python. We were hoping it would be included in numpy or scipy - better late than never! As it is now I install this package along with numpy and scipy on every new machine.

@sturlamolden - good catch on the memory leak! I wasn't aware that the _transposedCoefs and _state could leak - I was assuming that the constructor would always finish. Now I see if either "new" throws an assertion, we would have this problem. Your approach of initializing to NULL solves this. Do I have that right or is there another issue here causing the leak? How would you test for this since most machines have a lot of RAM these days?

As far as the upfirdn C++ function: this is for folks wanting a C++ only implementation and was contributed later. This is not used by the python ResamplerBank wrapper class. It needn't be included in this version.

@sturlamolden
Copy link
Contributor

Yes, if you were using a DSP that might be a concern. But this is SciPy, performance is mostly memory-bound, and we are not aiming to be the 'fastest' but rather 'fast enough'.

Simplicity of the source code, readability and easy maintenance are also important, as is avoiding a complicated build involving C++ and Swig. If you look at SciPy you will see that most of it is actually plain Python code. It is much more easy to maintain. The order of preference is still

Python > Cython > C > C++ > Fortran

How much faster is a C++ version likely to be? Not much. Slicing and lfilter are already in compiled C code. And in most cases the extra overhead from the Swig layer would dwarf any such savings.

The only valid concern I can see is if you need a huge upsampling and want to save memory by avoiding temporary storage. But you can do this with lfilter too because it can take zi as an optional argument. So you can just chop up the work in acceptable chunks of data. The extra Python overhead should be tiny.

@sturlamolden
Copy link
Contributor

@sirtom67 The two new[] in the contructor cannot leak, but they can cause a segfault if they throw a std::bad_alloc and the destructor subsequently does a delete [] on a wild pointer. The one that can leak is the new in upfirdn which should be replaced by a std::vector on the stack. You can provoke it by having Resampler::apply throw std::invalid_argument. There are also other cases that can provoke this, e.g. if std::vector::resize or std::copy throws an exception. In case of an exception the delete statement will not be reached.

@sirtom67
Copy link
Contributor

As a point of reference, for many years (may still be the case) there were two MEX files in the signal processing toolbox in MATLAB: remez, and upfirdn. It is MUCH faster in C (or C++) compared to python. Much of the package (ResamplerBank and the upfirdn function) is already python. Only the 1D FIR resampling code, the most critical piece, is in native code. I'm all for a cython or c implementation if someone doesn't mind translating it.

As for bugs: the unittests for this code are pretty thorough. The bug that you found was not encountered because it was not tested in a way that exhausted the memory. The other leak in the upfirdn function doesn't matter because that code is not going to be included - the main thing is the Resampler C++ class. The unittests exercise Resampler with lot of combinations of length of filter, uprate, downrate, and size of signals being applied. I have pounded on it a lot and it's proven stable for a while.

@sturlamolden
Copy link
Contributor

Ok, I didn't look at the Python code, but yes upfirdn in C++ is not used in the Python API. But the bug is still in there ;-)

@sturlamolden
Copy link
Contributor

If the Swig wrappers are known to work correctly I don't really see the purpose of not using them. If they break sometime in the future they can be regenerated or replaced.

@larsoner
Copy link
Member Author

larsoner commented Oct 11, 2015 via email

@sturlamolden
Copy link
Contributor

MATLAB also have the advantage of using FFTW (unless they have switched to MKL) for FFTs, while we are stuck with FFTPACK because of licensing issues.

@sturlamolden
Copy link
Contributor

100x to 1000x speed difference is what we expect when comparing algorithmic Python code to C code. But here we already have the slicing and filtering code in C, which constitutes the bulk of the work. I would say 10x is more likely. But does anyone have a timing of upfirdn on Python?

Speed-differences in Matlab is not really relevant. It is often a lot slower than Python.

FFT-based resampling is O(n log n) compared to O(k * n) for convolution based, with k being the filter length. For small values of k this can matter a lot. But then we could solve this by using convolution instead of FFT-based resampling. I don't think the speed of SciPY's FFT based resampler has any relevance here.

@larsoner
Copy link
Member Author

larsoner commented Oct 12, 2015 via email

@sturlamolden
Copy link
Contributor

You don't have to convince me :-)

I didn't think about all those multiply-by-zeros, but that is a good point.

@sirtom67
Copy link
Contributor

I thought a bit more about the algorithmic differences. The direct approach of (upsample by factor of P with zero insertion, FIR filter of length Nh, downsample by factor of Q) is O(Nh*Q) per output sample. The efficient polyphase implementation is O(Nh/P).

@larsoner
Copy link
Member Author

larsoner commented Oct 12, 2015 via email

@larsoner
Copy link
Member Author

Okay, I've made some progress first translating to simple Python code (with tests passing), now I can go about converting it to Cython (not pushed for now). I'm just going to do the real case first, then expand later. Thanks for the feedback folks, hopefully I can have something useful soon.

@larsoner
Copy link
Member Author

@sturlamolden thanks for the tip, I rebased and now we're green.

Ready for review/merge from my end.

@rgommers rgommers added this to the 0.18.0 milestone Dec 1, 2015
@larsoner
Copy link
Member Author

larsoner commented Dec 4, 2015

@endolith any interest in reviewing another filtering PR? This one hopefully is simpler than SOS with only +437 :)

@endolith
Copy link
Member

endolith commented Dec 5, 2015

@Eric89GXL So many things, so little time. :D


Parameters
----------
h : array-like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always seen these as array_like. I don't know why, but consistency is good.

@larsoner
Copy link
Member Author

larsoner commented Dec 7, 2015

Comments addressed.

@larsoner larsoner force-pushed the upfirdn branch 2 times, most recently from 9548d4d to f70ccea Compare January 2, 2016 16:55
@ewmoore
Copy link
Member

ewmoore commented Jan 18, 2016

@Eric89GXL, if you could squash this down a little bit, I'll merge it.

@sturlamolden
Copy link
Contributor

Pyflakes error

@larsoner
Copy link
Member Author

@sturlamolden yes but it's in benchmarks/benchmarks/signal_filtering.py:30:5 which I didn't change in this PR. So ready to merge then?

@sturlamolden
Copy link
Contributor

I don't think there is a pyflakes error in master, so you should probably rebase.

@codecov-io
Copy link

@@            master   #5186   diff @@
======================================
  Files          234     235     +1
  Stmts        43180   43211    +31
  Branches      8161    8163     +2
  Methods          0       0       
======================================
+ Hit          33500   33531    +31
  Partial       2602    2602       
  Missed        7078    7078       

Review entire Coverage Diff as of b64d39d

Powered by Codecov. Updated on successful CI builds.

@larsoner
Copy link
Member Author

All better

@sturlamolden
Copy link
Contributor

Yes, it looks ready to be merged now, I think.

@ewmoore
Copy link
Member

ewmoore commented Jan 20, 2016

Thanks @Eric89GXL!

ewmoore added a commit that referenced this pull request Jan 20, 2016
@ewmoore ewmoore merged commit 8c841b6 into scipy:master Jan 20, 2016
@larsoner
Copy link
Member Author

Thanks for the reviews! Now to make a fast resampling algorithm...

@sirtom67
Copy link
Contributor

Nice work Eric and other contributors! A very worthwhile addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.signal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants