-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added optional pyFFTW backend #379
Conversation
Thanks for doing this! Would you mind posting some benchmark results with and without fftw? Additionally, the following things should be added
Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions. librosa/core/spectrum.py, line 13 [r1] (raw file):
Should this side effect be optional? Or is there ever a situation in which you wouldn't enable fftw caching? librosa/core/spectrum.py, line 17 [r1] (raw file):
I don't think we should have a warning here. Comments from Reviewable |
I just tried some benchmarks locally, and i'm not totally convinced. Maybe I'm doing something wrong? Using pyfftw/fftw out of conda, I get the following: In [3]: y, sr = librosa.load(librosa.util.example_audio_file(), sr=None)
In [4]: %timeit librosa.stft(y)
10 loops, best of 3: 129 ms per loop
In [5]: import pyfftw.interfaces.scipy_fftpack as fft
In [6]: librosa.core.spectrum.fft = fft
In [7]: %timeit librosa.stft(y)
1 loop, best of 3: 543 ms per loop
In [9]: pyfftw.interfaces.cache.enable()
In [10]: %timeit librosa.stft(y)
1 loop, best of 3: 428 ms per loop
In [18]: import scipy.fftpack
In [19]: librosa.core.spectrum.fft = scipy.fftpack
In [20]: %timeit librosa.stft(y)
10 loops, best of 3: 129 ms per loop Changing the frame length (and hop length) doesn't qualitatively change the run time. |
Relevant: http://stackoverflow.com/questions/25527291/fastest-method-to-do-an-fft If it's possible to revert to numpy.fft and remove scipy.fftpack then it would be easy to transparently switch to a faster pyFFTW interface, with something like this: import multiprocessing
try:
from pyfftw.builders import fft, ifft
fft = pyfftw.builders.fft(a, overwrite_input=True, threads=multiprocessing.cpu_count(), avoid_copy=True)
ifft = pyfftw.builders.ifft(a, overwrite_input=True, threads=multiprocessing.cpu_count(), avoid_copy=True)
except ImportError:
from numpy.fft import fft, ifft
b = fft() # This works just like numpy.fft Is |
Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions. librosa/core/spectrum.py, line 13 [r1] (raw file):
|
Yes, Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. librosa/core/spectrum.py, line 13 [r1] (raw file):
|
Yeah... Not awesome. Back to using the cache, for now. Let me think. |
I think in particular what makes FFTW slow is that it needs to plan for a lot of different shapes. The frame chunking will differ a lot, even when the number of bins are fixed (which is probably pretty likely in most use cases) and the trimming at the end also causes an extra plan to occur. Would it be possible to maintain the same shape for all fft(...) calls in stft (and vice versa with ifft for istft) or is that frowned upon? |
Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke. .travis_dependencies.sh, line 39 [r5] (raw file):
This would be better done through conda. You can also conda install fftw, so no need for apt. librosa/core/spectrum.py, line 180 [r5] (raw file):
does fftw not take an axis parameter? Comments from Reviewable |
Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke. .travis_dependencies.sh, line 39 [r5] (raw file):
|
Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke. .travis_dependencies.sh, line 39 [r5] (raw file):
|
Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke. .travis_dependencies.sh, line 39 [r5] (raw file):
|
Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke. .travis_dependencies.sh, line 39 [r5] (raw file):
|
Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions. .travis_dependencies.sh, line 39 [r5] (raw file):
|
Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions. .travis_dependencies.sh, line 39 [r5] (raw file):
|
I'm stuck on what would be a good way of using FFTW actually. For quick, interactive data analysis and stuff, we'd expect people to mix different FFTs, thus requiring a new FFTW plan on pretty much every However, when there's a lot of audio to be transformed, or even a series of I'm reluctantly proposing an API change, in which stft/istft/fmt would be just as before, but with a new function called "plan()" or something that users could call to warmup a FFTW object (preferably one for both forward/backwards if possible, despite librosa currently using RFFT). Then in stft/istft/fmt a check for whether a global FFTW object is set would determine if that should be used or not, with some care taken to make sure that parameters match and stuff too, of course. It would probably be thread-safe but I need to double check that with pyFFTW first. This API would also support cuFFT, which could be sweet. It seems there should be a better way of going about this though. I'm still thinking. I'd love some feedback. @bmcfee, @hgomersall thoughts? |
@carlthome hey, apologies for the delay on this, I've been away for a bit. Yeah, this sounds like an issue. Do you have any idea how Matlab handles it? Calls to execute FFTW objects should be thread safe (assuming they don't clobber the same bits of memory) - either through the It sounds a little like you're suggesting exactly what As an aside (and apologies if you've noted with already or is clear from the code which I haven't read) it's possible to do an stft using stride tricks with pyFFTW, eliminating a copy - whether this speeds things up or not is to be determined. |
How would that work with windowing? If I understand correctly, you'd either invoke a copy when applying the window in the time domain, or after the FFT by convolution. |
@bmcfee Ah yes, good point - just ignore me :) |
Reviewed 2 of 4 files at r7. setup.py, line 47 [r7] (raw file):
I'd rather have the extras for this called librosa/core/spectrum.py, line 6 [r7] (raw file):
where do multiprocessing and functools get used here? Comments from Reviewable |
@@ -44,6 +44,6 @@ | |||
'matplotlib >= 1.5'], | |||
'numba': ['numba >= 0.25'], | |||
'display': ['matplotlib >= 1.5'], | |||
'stft': ['pyfftw'] | |||
'fftw': ['pyfftw => 0.10.4'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha, oops.
Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions. setup.py, line 47 [r7] (raw file):
|
Reviewed 1 of 2 files at r8, 1 of 1 files at r9. Comments from Reviewable |
The code looks good now (thanks!), but I'd still like to see some benchmarks demonstrating the speedups before merging. |
TODO on this one:
|
@carlthome I'd like to push out 0.5 at the end of this month. FFTW support would be nice to have, if you have the time to finish this one up. |
Sorry, been very busy. Will get to this today! |
I too fail to see any substantial performance improvement, even with longer audio files. I believe my original performance speedup had more to do with the GIL rather (I was calling import librosa as lr
import numpy as np
import pyfftw
import pyfftw.interfaces.scipy_fftpack as fft
import scipy.fftpack
short_audio, sr = lr.load(lr.util.example_audio_file())
long_audio = np.tile(short_audio, 100)
print("Short audio file: {:.0f} seconds\nLong audio file: {:.0f} seconds"
.format(len(short_audio) / sr, len(long_audio) / sr))
lr.core.spectrum.fft = fft
pyfftw.interfaces.cache.enable()
%timeit lr.stft(short_audio)
%timeit lr.stft(long_audio)
lr.core.spectrum.fft = scipy.fftpack
%timeit lr.stft(short_audio)
%timeit lr.stft(long_audio)
However, I noticed in SciPy's documentation that they offer more optimization if one calls
Perhaps that's better than nothing. I propose closing this pull request and rethinking how to use FFTW via Python in the future. It requires a bit more planning than anticipated (pun intended). Agreed? |
Agreed. Thanks for all your effort on this! |
I'm trying to get why FFTW via PyFFTW is slower than scipy.fftpack (especially as SciPy's own docs mention PyFFTW for when performance is critical). Working with PyFFTW master (e.g. librosa.stft with scipy.fftpack
librosa.stft with PyFFTW's scipy.fftpack interface
@hgomersall, do you have some insight into why PyFFTW is slower than scipy.fftpack in librosa.core.spectrum? It looks like the SciPy interface delegates to the NumPy interface, could that cause extra overhead? Why is this copy necessary? I bet some silly ndarray.flags.writeable is false or something. |
Not really; the best you can do is move the copy elsewhere. It's necessary here to allow for windowing. If you don't copy here, you need a separate copy / convolution in the frequency domain to achieve the same effect. |
A couple of things come to mind. The copy is a little strange for - what's the transform you're trying to do? Most transforms don't copy by default, but the inverse real transforms with dimensions >= 2 do. This is because FFTW will clobber the arrays in those cases, so the only safe approach is to copy (this can be undone with It seems to be spending rather too much time in The interfaces stuff adds quite an overhead to the FFTW call. I suspect the fftpack call is a very thin shim around some C (I'm aware quite a bit of work has been done on speeding it up). There is probably something to be said for speeding up the interfaces code if there is appetite for that... 😉 |
As proposed in #353 scipy.fftpack is a terrible bottleneck and supporting FFTW would be immensely useful. This pull request adds an optional wrapper for FFTW. If pyFFTW is installed librosa will prefer that over scipy.fftpack. Everything will work as normal if the user is missing pyFFTW.
@bmcfee, thoughts? The speedup is substantial, and as librosa depends on STFT all over the place (even for rmse) I feel this is a necessary addition, and a lot nicer than having users monkey patch outside of librosa.
This change is