-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Port upfirdn #5186
ENH: Port upfirdn #5186
Conversation
SWIG is kind of a fossil, it generates large files that we'd have to check in. For For Probably Cython is the way to go (http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html), but it would be good to get a more expert opinion than mine. EDIT: this is pretty small though, so SWIG may be OK. |
Cython is good for wrapping C++, as is SIP, PyCXX and Boost.Python. Swig is a bit fossile, as @rgommers said, but can be used as well. I am not sure which ones are "approved" for SciPy, apart from Swig and Cython. With Cython you need to be aware of the difference between wrapping C++ for Python and using C++ in Cython. There are also two ways of wrapping C++ for Python with C++. Either you can wrap the C++ as shown in the user guide @rgommers linked or export the C++ through C callable wrappers. The latter more tedious, but is what ZMQ has done, and also roughly what we did for cKDTree. One reason you might prefer to do the latter has to do with the GIL. If you release the GIL in C++ and an exception is thrown, you don't want the exception handling to jump back into Cython without reacquiring the GIL. One way is to use a C++ function as proxy, and let it take care of the GIL and translating C++ exceptions to Cython. There is a code example in cKDTree. If you try to release the GIL in Cython before you call into C++, it will complain about exception handling in a nogil context (unless things have improved). Another thing when using C++ is you have to beware that an exception handler can jump out of a block surrounded by {
int got_exc = 0;
NPY_END_ALLOW_THREADS
try {
// safe to use C++
}
catch(...) {
got_exc = 1;
}
NPY_END_ALLOW_THREADS
if (NPY_UNLIKELY(got_exc)) throw;
} This will trap the exception, get safely out of a nogil block, and re-throw the C++ exception. Now Cython can be allowed to catch the C++ exception and raise a Python exception. But that begs the question why not just do this? extern "C" PyObject*
foobar(void *some_cpp_object)
{
NPY_END_ALLOW_THREADS
try {
// safe to use C++
}
catch(...) {
// temporarily reacquire the GIL
// translate C++ exception to Python exception
// release the GIL
//
// just steal the function from cKDTree :-)
translate_cpp_exception_with_gil();
}
NPY_END_ALLOW_THREADS
if (NPY_UNLIKELY(PyErr_Occured()))
return NULL;
Py_RETURN_NONE;
} And now you don't really need Cython to wrap C++, because it just looks like C as far as Cython is concerned. For cKDTree I came to the conclusion that manual wrappers were needed anyway, so all the C++ in Cython really did was to allow me to use STL containers in Cython. |
First a little background on upfirdn. Named after the MATLAB function of the same name, upfirdn is a one-dimensional FIR filtering tool with the added ability of resampling the signal. It is "vectorized" in the sense that an N-D "bank" of filters can be applied to an N-D signal array. The implementation of upfirdn that has been submitted is a python wrapper of a templatized one-dimensional "Resampler" class. The template types are the input type, output type, and coefficient type. Swig wrappers are generated for "RR", "RC", "CR" and "CC" double precision cases (R=real, C=complex, the first one is the signal type which applies to both input and output, the second one is the coefficient type). From Resampler.i:
The N-D case is handled by creating a numpy object array, where each element is a Resampler of the desired type with equivalent filter lengths and up- and down-rates. Most use cases are "one-shot" so you don't have to handle filter state retention between calls; however the Resampler<> class supports state retention for when you want to process a small amount of signal at a time - think of it as a stream of samples that goes on forever, you can keep feeding input samples into it in chunks and keep getting output samples in chunks as well. Now to the question at hand. If SWIG is in fact approved for scipy then that is probably the easiest way to import the Resampler. This is much smaller than the sparse stuff (by a factor of about 100), but it sounds like there may be some benefits besides the size in using cython. I don't understand why the swig generated wrappers need to be checked in under revision control. That is not ideal. I have no experience with cython but I'm interested in learning about it. If I get anywhere with a cython wrapped Resampler I'll let you know. I don't understand the stuff @sturlamolden brings up about the GIL. Not sure this isn't a problem with SWIG too. |
Looking at Motorola's code it is just a tiny piece of C++: https://code.google.com/p/upfirdn/source/browse/upfirdn/Resampler.h I see no reason we cannot re-implement this in plain Cython instead. (Another thing is that Motorola's C++ code can leak memory in two different places. Do you see the errors? This is what gives C++ such a bad reputation. If they had sticked to C those mistakes would not have been made. When porting to Cython we must be careful because it is easy to make the same errors in Cython. ) |
Here are the memory leaks squashed out: /*
Copyright (c) 2009, Motorola, Inc
All Rights Reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Motorola nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef RESAMPLER_H
#define RESAMPLER_H
using namespace std;
#include <stdexcept>
#include <complex>
#include <vector>
template<class S1, class S2, class C>
class Resampler{
public:
typedef S1 inputType;
typedef S2 outputType;
typedef C coefType;
Resampler(int upRate, int downRate, C *coefs, int coefCount);
virtual ~Resampler();
int apply(S1* in, int inCount, S2* out, int outCount);
int neededOutCount(int inCount);
int coefsPerPhase() { return _coefsPerPhase; }
private:
int _upRate;
int _downRate;
coefType *_transposedCoefs;
inputType *_state;
inputType *_stateEnd;
int _paddedCoefCount; // ceil(len(coefs)/upRate)*upRate
int _coefsPerPhase; // _paddedCoefCount / upRate
int _t; // "time" (modulo upRate)
int _xOffset;
};
#include <iostream>
#include <cmath>
/*
using std::cout;
using std::endl;
using std::fill;
using std::copy;
*/
using std::invalid_argument;
template<class S1, class S2, class C>
Resampler<S1, S2, C>::Resampler(int upRate, int downRate, C *coefs,
int coefCount):
_upRate(upRate), _downRate(downRate), _t(0), _xOffset(0),
// also initialize these members
_transposedCoefs(NULL), _state(NULL)
/*
The coefficients are copied into local storage in a transposed, flipped
arrangement. For example, suppose upRate is 3, and the input number
of coefficients coefCount = 10, represented as h[0], ..., h[9].
Then the internal buffer will look like this:
h[9], h[6], h[3], h[0], // flipped phase 0 coefs
0, h[7], h[4], h[1], // flipped phase 1 coefs (zero-padded)
0, h[8], h[5], h[2], // flipped phase 2 coefs (zero-padded)
*/
{
_paddedCoefCount = coefCount;
while (_paddedCoefCount % _upRate) {
_paddedCoefCount++;
}
_coefsPerPhase = _paddedCoefCount / _upRate;
_transposedCoefs = new coefType[_paddedCoefCount];
fill(_transposedCoefs, _transposedCoefs + _paddedCoefCount, 0.);
_state = new inputType[_coefsPerPhase - 1];
_stateEnd = _state + _coefsPerPhase - 1;
fill(_state, _stateEnd, 0.);
/* This both transposes, and "flips" each phase, while
* copying the defined coefficients into local storage.
* There is probably a faster way to do this
*/
for (int i=0; i<_upRate; ++i) {
for (int j=0; j<_coefsPerPhase; ++j) {
if (j*_upRate + i < coefCount)
_transposedCoefs[(_coefsPerPhase-1-j) + i*_coefsPerPhase] =
coefs[j*_upRate + i];
}
}
}
template<class S1, class S2, class C>
Resampler<S1, S2, C>::~Resampler() {
// check that they are not NULL or this might segfault
// delete [] _transposedCoefs;
// delete [] _state;
if (_transposedCoefs) delete [] _transposedCoefs;
if (_state) delete [] _state;
}
template<class S1, class S2, class C>
int Resampler<S1, S2, C>::neededOutCount(int inCount)
/* compute how many outputs will be generated for inCount inputs */
{
unsigned long np = inCount * (unsigned long) _upRate;
int need = np / _downRate;
if ((_t + _upRate * _xOffset) < (np % _downRate))
need++;
return need;
}
template<class S1, class S2, class C>
int Resampler<S1, S2, C>::apply(S1* in, int inCount,
S2* out, int outCount) {
if (outCount < neededOutCount(inCount))
throw invalid_argument("Not enough output samples");
// x points to the latest processed input sample
inputType *x = in + _xOffset;
outputType *y = out;
inputType *end = in + inCount;
while (x < end) {
outputType acc = 0.;
coefType *h = _transposedCoefs + _t*_coefsPerPhase;
inputType *xPtr = x - _coefsPerPhase + 1;
int offset = in - xPtr;
if (offset > 0) {
// need to draw from the _state buffer
inputType *statePtr = _stateEnd - offset;
while (statePtr < _stateEnd) {
acc += *statePtr++ * *h++;
}
xPtr += offset;
}
while (xPtr <= x) {
acc += *xPtr++ * *h++;
}
*y++ = acc;
_t += _downRate;
int advanceAmount = _t / _upRate;
x += advanceAmount;
// which phase of the filter to use
_t %= _upRate;
}
_xOffset = x - end;
// manage _state buffer
// find number of samples retained in buffer:
int retain = (_coefsPerPhase - 1) - inCount;
if (retain > 0) {
// for inCount smaller than state buffer, copy end of buffer
// to beginning:
copy(_stateEnd - retain, _stateEnd, _state);
// Then, copy the entire (short) input to end of buffer
copy(in, end, _stateEnd - inCount);
} else {
// just copy last input samples into state buffer
copy(end - (_coefsPerPhase - 1), end, _state);
}
// number of samples computed
return y - out;
}
template<class S1, class S2, class C>
void upfirdn(int upRate, int downRate,
S1 *input, int inLength, C *filter, int filterLength,
vector<S2> &results)
/*
This template function provides a one-shot resampling. Extra samples
are padded to the end of the input in order to capture all of the non-zero
output samples.
The output is in the "results" vector which is modified by the function.
Note, I considered returning a vector instead of taking one on input, but
then the C++ compiler has trouble with implicit template instantiation
(e.g. have to say upfirdn<float, float, float> every time - this
way we can let the compiler infer the template types).
Thanks to Lewis Anderson (lkanders@ucsd.edu) at UCSD for
the original version of this function.
*/
{
// Create the Resampler
Resampler<S1, S2, C> theResampler(upRate, downRate, filter, filterLength);
// pad input by length of one polyphase of filter to flush all values out
int padding = theResampler.coefsPerPhase() - 1;
// possible memory leak: new used outside constructor
// S1 *inputPadded = new S1[inLength + padding];
std::vector<S1> inputPadded(inLength + padding);
for (int i = 0; i < inLength + padding; i++) {
if (i < inLength)
inputPadded[i] = input[i];
else
inputPadded[i] = 0;
}
// calc size of output
int resultsCount = theResampler.neededOutCount(inLength + padding);
results.resize(resultsCount);
// run filtering
// crappy code with unused local variable
// int numSamplesComputed = theResampler.apply(inputPadded,
// inLength + padding, &results[0], resultsCount);
theResampler.apply(&inputPadded[0], inLength + padding, &results[0], resultsCount);
// Don't need this
// delete[] inputPadded;
}
template<class S1, class S2, class C>
void upfirdn(int upRate, int downRate,
vector<S1> &input, vector<C> &filter, vector<S2> &results)
/*
This template function provides a one-shot resampling.
The output is in the "results" vector which is modified by the function.
In this version, the input and filter are vectors as opposed to
pointer/count pairs.
*/
{
upfirdn<S1, S2, C>(upRate, downRate, &input[0], input.size(), &filter[0],
filter.size(), results);
}
#endif
|
Oh yes, we also need to fix this for working correctly on 32 and 64 bit /*
Copyright (c) 2009, Motorola, Inc
All Rights Reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Motorola nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef RESAMPLER_H
#define RESAMPLER_H
#include <Python.h>
#include "numpy/arrayobject.h"
using namespace std;
#include <stdexcept>
#include <complex>
#include <vector>
template<class S1, class S2, class C>
class Resampler{
public:
typedef S1 inputType;
typedef S2 outputType;
typedef C coefType;
Resampler(npy_intp upRate, npy_intp downRate, C *coefs, npy_intp coefCount);
virtual ~Resampler();
npy_intp apply(S1* in, npy_intp inCount, S2* out, npy_intp outCount);
npy_intp neededOutCount(npy_intp inCount);
npy_intp coefsPerPhase() { return _coefsPerPhase; }
private:
npy_intp _upRate;
npy_intp _downRate;
coefType *_transposedCoefs;
inputType *_state;
inputType *_stateEnd;
npy_intp _paddedCoefCount; // ceil(len(coefs)/upRate)*upRate
npy_intp _coefsPerPhase; // _paddedCoefCount / upRate
npy_intp _t; // "time" (modulo upRate)
npy_intp _xOffset;
};
#include <iostream>
#include <cmath>
/*
using std::cout;
using std::endl;
using std::fill;
using std::copy;
*/
using std::invalid_argument;
template<class S1, class S2, class C>
Resampler<S1, S2, C>::Resampler(npy_intp upRate, npy_intp downRate, C *coefs,
npy_intp coefCount):
_upRate(upRate), _downRate(downRate), _t(0), _xOffset(0),
// also initialize these members
_transposedCoefs(NULL), _state(NULL)
/*
The coefficients are copied npy_intpo local storage in a transposed, flipped
arrangement. For example, suppose upRate is 3, and the input number
of coefficients coefCount = 10, represented as h[0], ..., h[9].
Then the npy_intpernal buffer will look like this:
h[9], h[6], h[3], h[0], // flipped phase 0 coefs
0, h[7], h[4], h[1], // flipped phase 1 coefs (zero-padded)
0, h[8], h[5], h[2], // flipped phase 2 coefs (zero-padded)
*/
{
_paddedCoefCount = coefCount;
while (_paddedCoefCount % _upRate) {
_paddedCoefCount++;
}
_coefsPerPhase = _paddedCoefCount / _upRate;
_transposedCoefs = new coefType[_paddedCoefCount];
fill(_transposedCoefs, _transposedCoefs + _paddedCoefCount, 0.);
_state = new inputType[_coefsPerPhase - 1];
_stateEnd = _state + _coefsPerPhase - 1;
fill(_state, _stateEnd, 0.);
/* This both transposes, and "flips" each phase, while
* copying the defined coefficients npy_intpo local storage.
* There is probably a faster way to do this
*/
for (npy_intp i=0; i<_upRate; ++i) {
for (npy_intp j=0; j<_coefsPerPhase; ++j) {
if (j*_upRate + i < coefCount)
_transposedCoefs[(_coefsPerPhase-1-j) + i*_coefsPerPhase] =
coefs[j*_upRate + i];
}
}
}
template<class S1, class S2, class C>
Resampler<S1, S2, C>::~Resampler() {
// check that they are not NULL or this might segfault
// delete [] _transposedCoefs;
// delete [] _state;
if (_transposedCoefs) delete [] _transposedCoefs;
if (_state) delete [] _state;
}
template<class S1, class S2, class C>
npy_intp Resampler<S1, S2, C>::neededOutCount(npy_intp inCount)
/* compute how many outputs will be generated for inCount inputs */
{
npy_uintp np = inCount * (npy_uintp) _upRate;
npy_intp need = np / _downRate;
if ((_t + _upRate * _xOffset) < (np % _downRate))
need++;
return need;
}
template<class S1, class S2, class C>
npy_intp Resampler<S1, S2, C>::apply(S1* in, npy_intp inCount,
S2* out, npy_intp outCount) {
if (outCount < neededOutCount(inCount))
throw invalid_argument("Not enough output samples");
// x points to the latest processed input sample
inputType *x = in + _xOffset;
outputType *y = out;
inputType *end = in + inCount;
while (x < end) {
outputType acc = 0.;
coefType *h = _transposedCoefs + _t*_coefsPerPhase;
inputType *xPtr = x - _coefsPerPhase + 1;
npy_intp offset = in - xPtr;
if (offset > 0) {
// need to draw from the _state buffer
inputType *statePtr = _stateEnd - offset;
while (statePtr < _stateEnd) {
acc += *statePtr++ * *h++;
}
xPtr += offset;
}
while (xPtr <= x) {
acc += *xPtr++ * *h++;
}
*y++ = acc;
_t += _downRate;
npy_intp advanceAmount = _t / _upRate;
x += advanceAmount;
// which phase of the filter to use
_t %= _upRate;
}
_xOffset = x - end;
// manage _state buffer
// find number of samples retained in buffer:
npy_intp retain = (_coefsPerPhase - 1) - inCount;
if (retain > 0) {
// for inCount smaller than state buffer, copy end of buffer
// to beginning:
copy(_stateEnd - retain, _stateEnd, _state);
// Then, copy the entire (short) input to end of buffer
copy(in, end, _stateEnd - inCount);
} else {
// just copy last input samples npy_intpo state buffer
copy(end - (_coefsPerPhase - 1), end, _state);
}
// number of samples computed
return y - out;
}
template<class S1, class S2, class C>
void upfirdn(npy_intp upRate, npy_intp downRate,
S1 *input, npy_intp inLength, C *filter, npy_intp filterLength,
vector<S2> &results)
/*
This template function provides a one-shot resampling. Extra samples
are padded to the end of the input in order to capture all of the non-zero
output samples.
The output is in the "results" vector which is modified by the function.
Note, I considered returning a vector instead of taking one on input, but
then the C++ compiler has trouble with implicit template instantiation
(e.g. have to say upfirdn<float, float, float> every time - this
way we can let the compiler infer the template types).
Thanks to Lewis Anderson (lkanders@ucsd.edu) at UCSD for
the original version of this function.
*/
{
// Create the Resampler
Resampler<S1, S2, C> theResampler(upRate, downRate, filter, filterLength);
// pad input by length of one polyphase of filter to flush all values out
npy_intp padding = theResampler.coefsPerPhase() - 1;
// possible memory leak: new used outside constructor
// S1 *inputPadded = new S1[inLength + padding];
std::vector<S1> inputPadded(inLength + padding);
for (npy_intp i = 0; i < inLength + padding; i++) {
if (i < inLength)
inputPadded[i] = input[i];
else
inputPadded[i] = 0;
}
// calc size of output
npy_intp resultsCount = theResampler.neededOutCount(inLength + padding);
results.resize(resultsCount);
// run filtering
// crappy code with unused local variable
// npy_intp numSamplesComputed = theResampler.apply(inputPadded,
// inLength + padding, &results[0], resultsCount);
theResampler.apply(&inputPadded[0], inLength + padding, &results[0], resultsCount);
// Don't need this
// delete[] inputPadded;
}
template<class S1, class S2, class C>
void upfirdn(npy_intp upRate, npy_intp downRate,
vector<S1> &input, vector<C> &filter, vector<S2> &results)
/*
This template function provides a one-shot resampling.
The output is in the "results" vector which is modified by the function.
In this version, the input and filter are vectors as opposed to
ponpy_intper/count pairs.
*/
{
upfirdn<S1, S2, C>(upRate, downRate, &input[0], input.size(), &filter[0],
filter.size(), results);
}
#endif |
Can you open a pr into my branch? It would be much easier to follow changes
that way
|
Or you're welcome to open a different PR into scipy if it's easier, I just
want the feature in one way or another :)
|
Because otherwise everyone who builds scipy from a git clone needs to have SWIG installed - that's just a complete no-go. |
I still don't understand why we need C++ for this. It is a tiny piece of C++ that requires more code to wrap than to write from scratch in Cython. I even wonder if we need compiled code. Looking at the C++ code it seems to me that some plain Python calling |
Here is what Octave's |
@sturlamolden you can implement it that way, but doing that will involve a bunch of unnecessary calculations (thinking about the zero-insertion step of upsampling and the subselection step of downsampling). IIRC this is a polyphase filtering implementation that avoids those unnecessary calculations. |
Yes @Eric89GXL is correct, this is an efficient polyphase implementation - it only does the necessary multiplies and summations. In fact the "resample" function at line 49 in the test code at I am the original author back when I was with Motorola. upfirdn is a staple of signal processing in matlab and I always wanted it in python. We were hoping it would be included in numpy or scipy - better late than never! As it is now I install this package along with numpy and scipy on every new machine. @sturlamolden - good catch on the memory leak! I wasn't aware that the _transposedCoefs and _state could leak - I was assuming that the constructor would always finish. Now I see if either "new" throws an assertion, we would have this problem. Your approach of initializing to NULL solves this. Do I have that right or is there another issue here causing the leak? How would you test for this since most machines have a lot of RAM these days? As far as the upfirdn C++ function: this is for folks wanting a C++ only implementation and was contributed later. This is not used by the python ResamplerBank wrapper class. It needn't be included in this version. |
Yes, if you were using a DSP that might be a concern. But this is SciPy, performance is mostly memory-bound, and we are not aiming to be the 'fastest' but rather 'fast enough'. Simplicity of the source code, readability and easy maintenance are also important, as is avoiding a complicated build involving C++ and Swig. If you look at SciPy you will see that most of it is actually plain Python code. It is much more easy to maintain. The order of preference is still Python > Cython > C > C++ > Fortran How much faster is a C++ version likely to be? Not much. Slicing and The only valid concern I can see is if you need a huge upsampling and want to save memory by avoiding temporary storage. But you can do this with |
@sirtom67 The two new[] in the contructor cannot leak, but they can cause a segfault if they throw a |
As a point of reference, for many years (may still be the case) there were two MEX files in the signal processing toolbox in MATLAB: remez, and upfirdn. It is MUCH faster in C (or C++) compared to python. Much of the package (ResamplerBank and the upfirdn function) is already python. Only the 1D FIR resampling code, the most critical piece, is in native code. I'm all for a cython or c implementation if someone doesn't mind translating it. As for bugs: the unittests for this code are pretty thorough. The bug that you found was not encountered because it was not tested in a way that exhausted the memory. The other leak in the upfirdn function doesn't matter because that code is not going to be included - the main thing is the Resampler C++ class. The unittests exercise Resampler with lot of combinations of length of filter, uprate, downrate, and size of signals being applied. I have pounded on it a lot and it's proven stable for a while. |
Ok, I didn't look at the Python code, but yes |
If the Swig wrappers are known to work correctly I don't really see the purpose of not using them. If they break sometime in the future they can be regenerated or replaced. |
How much faster is a C++ version likely to be? Not much.
The previous reply suggested 100-1000x can occur, which is about what I
expected. These gains are relevant to many use cases, especially
resampling. If you have tried scipy's FFT-based resampling and compared it
to that of MATLAB, which wraps to upfirdn, the differences are striking.
This is one of the few places I still really feel pain in scientific
computing in Python compared to MATLAB.
So to me it's worth the complexity of compiled code, even if the complexity
can't be reduced from what it is now. But I can definitely try changing it
over to Cython to simplify it if others also agree it's worthwhile.
|
MATLAB also have the advantage of using FFTW (unless they have switched to MKL) for FFTs, while we are stuck with FFTPACK because of licensing issues. |
100x to 1000x speed difference is what we expect when comparing algorithmic Python code to C code. But here we already have the slicing and filtering code in C, which constitutes the bulk of the work. I would say 10x is more likely. But does anyone have a timing of upfirdn on Python? Speed-differences in Matlab is not really relevant. It is often a lot slower than Python. FFT-based resampling is O(n log n) compared to O(k * n) for convolution based, with k being the filter length. For small values of k this can matter a lot. But then we could solve this by using convolution instead of FFT-based resampling. I don't think the speed of SciPY's FFT based resampler has any relevance here. |
I think the 100-1000x was comparing full upsample fir downsample to
polyphase, even if both are in C. There are examples where you'd do 1000x+
unnecessary calculations given the set of computations performed,
regardless of language. (Think about upsampling by 1000, FIR filtering,
then downsampling by 999. There are a lot of zero multiplies in the FIR and
discarded intermediate results in the decimation in there.) But I can run
some benchmarks if you still aren't convinced.
|
You don't have to convince me :-) I didn't think about all those multiply-by-zeros, but that is a good point. |
I thought a bit more about the algorithmic differences. The direct approach of (upsample by factor of P with zero insertion, FIR filter of length Nh, downsample by factor of Q) is O(Nh*Q) per output sample. The efficient polyphase implementation is O(Nh/P). |
Cool, I'll add that to the docstring, and also see if I can simplify the
wrappers. It would be good for me to learn Cython C++ anyway.
|
Okay, I've made some progress first translating to simple Python code (with tests passing), now I can go about converting it to Cython (not pushed for now). I'm just going to do the real case first, then expand later. Thanks for the feedback folks, hopefully I can have something useful soon. |
@sturlamolden thanks for the tip, I rebased and now we're green. Ready for review/merge from my end. |
@endolith any interest in reviewing another filtering PR? This one hopefully is simpler than SOS with only +437 :) |
@Eric89GXL So many things, so little time. :D |
|
||
Parameters | ||
---------- | ||
h : array-like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always seen these as array_like
. I don't know why, but consistency is good.
Comments addressed. |
9548d4d
to
f70ccea
Compare
@Eric89GXL, if you could squash this down a little bit, I'll merge it. |
Pyflakes error |
@sturlamolden yes but it's in |
I don't think there is a pyflakes error in master, so you should probably rebase. |
@@ master #5186 diff @@
======================================
Files 234 235 +1
Stmts 43180 43211 +31
Branches 8161 8163 +2
Methods 0 0
======================================
+ Hit 33500 33531 +31
Partial 2602 2602
Missed 7078 7078
|
All better |
Yes, it looks ready to be merged now, I think. |
Thanks @Eric89GXL! |
Thanks for the reviews! Now to make a fast |
Nice work Eric and other contributors! A very worthwhile addition. |
PR to eventually add
upfirdn
polyphase filtering to scipy. Built off of BSD-licensedupfirdn
Python package:https://code.google.com/p/upfirdn/
Eventually this polyphase filtering approach be used to hopefully speed up resampling.
Opening WIP PR to get some feedback on how to
scipy
-ify this code. It will not run as is.The original code used (and still uses) SWIG to compile C++ to
.so
. I don't really see that done anywhere else inscipy
, so I'm assuming that's the wrong way to go. Should I change the code over to C instead? If so, right now it supports real and complex signals and filters -- should I make different functions for the four different combinations of inputs (RR, CR, RC, CC), or is there another approach I should follow (e.g., a nice example of "the right thing to do")? We could also just do RR for now, and leave the other cases for future work.Let me know if I should put this to the
scipy-dev
listserv instead.cc the original author @sirtom67, thanks for writing what appears to be very clear code!