Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added optional pyFFTW backend #379

Closed
wants to merge 15 commits into from
Closed

Added optional pyFFTW backend #379

wants to merge 15 commits into from

Conversation

carlthome
Copy link
Contributor

@carlthome carlthome commented Jun 28, 2016

As proposed in #353 scipy.fftpack is a terrible bottleneck and supporting FFTW would be immensely useful. This pull request adds an optional wrapper for FFTW. If pyFFTW is installed librosa will prefer that over scipy.fftpack. Everything will work as normal if the user is missing pyFFTW.

@bmcfee, thoughts? The speedup is substantial, and as librosa depends on STFT all over the place (even for rmse) I feel this is a necessary addition, and a lot nicer than having users monkey patch outside of librosa.


This change is Reviewable

@bmcfee
Copy link
Member

bmcfee commented Jun 28, 2016

Thanks for doing this!

Would you mind posting some benchmark results with and without fftw?

Additionally, the following things should be added

  • extras_require dependency for pyfftw
  • build matrix entry for pyfftw so we get test coverage with and without it. Check the dtw PR for an example of how to modify travis configs to achieve this.

Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions.


librosa/core/spectrum.py, line 13 [r1] (raw file):

try:
    import pyfftw
    pyfftw.interfaces.cache.enable()

Should this side effect be optional? Or is there ever a situation in which you wouldn't enable fftw caching?


librosa/core/spectrum.py, line 17 [r1] (raw file):

except ImportError:
    import scipy.fftpack as fft
    warnings.warn('Install pyFFTW for faster spectral transforms.')

I don't think we should have a warning here.


Comments from Reviewable

@bmcfee
Copy link
Member

bmcfee commented Jun 28, 2016

I just tried some benchmarks locally, and i'm not totally convinced. Maybe I'm doing something wrong?

Using pyfftw/fftw out of conda, I get the following:

In [3]: y, sr = librosa.load(librosa.util.example_audio_file(), sr=None)

In [4]: %timeit librosa.stft(y)
10 loops, best of 3: 129 ms per loop

In [5]: import pyfftw.interfaces.scipy_fftpack as fft

In [6]: librosa.core.spectrum.fft = fft

In [7]: %timeit librosa.stft(y)
1 loop, best of 3: 543 ms per loop

In [9]: pyfftw.interfaces.cache.enable()

In [10]: %timeit librosa.stft(y)
1 loop, best of 3: 428 ms per loop

In [18]: import scipy.fftpack

In [19]: librosa.core.spectrum.fft = scipy.fftpack

In [20]: %timeit librosa.stft(y)
10 loops, best of 3: 129 ms per loop

Changing the frame length (and hop length) doesn't qualitatively change the run time.

@bmcfee bmcfee added the enhancement Does this improve existing functionality? label Jun 28, 2016
@bmcfee bmcfee self-assigned this Jun 28, 2016
@carlthome
Copy link
Contributor Author

librosa.util.example_audio_file() is very short and there's a lot of overhead involved (FFTW planning). I've only tested on large runs of a couple of hours of audio material, in which the overhead wasn't noticeable.

Relevant: http://stackoverflow.com/questions/25527291/fastest-method-to-do-an-fft

If it's possible to revert to numpy.fft and remove scipy.fftpack then it would be easy to transparently switch to a faster pyFFTW interface, with something like this:

import multiprocessing
try:
    from pyfftw.builders import fft, ifft
    fft = pyfftw.builders.fft(a, overwrite_input=True, threads=multiprocessing.cpu_count(), avoid_copy=True)
    ifft = pyfftw.builders.ifft(a, overwrite_input=True, threads=multiprocessing.cpu_count(), avoid_copy=True)
except ImportError:
    from numpy.fft import fft, ifft

b = fft() # This works just like numpy.fft

Is scipy.fftpack needed?

@carlthome
Copy link
Contributor Author

Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions.


librosa/core/spectrum.py, line 13 [r1] (raw file):

Previously, bmcfee (Brian McFee) wrote…

Should this side effect be optional? Or is there ever a situation in which you wouldn't enable fftw caching?

It should always be enabled. However, perhaps using the [builders](http://pyfftw.github.io/pyFFTW/pyfftw/builders/builders.html) would make more sense.

librosa/core/spectrum.py, line 17 [r1] (raw file):

Previously, bmcfee (Brian McFee) wrote…

I don't think we should have a warning here.

Removed.

Comments from Reviewable

@bmcfee
Copy link
Member

bmcfee commented Jun 28, 2016

Yes, scipy.fftpack is generally preferred over numpy's fft module, and I'd prefer to not revert.


Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


librosa/core/spectrum.py, line 13 [r1] (raw file):

Previously, carlthome (Carl Thomé) wrote…

It should always be enabled. However, perhaps using the builders would make more sense.

Is it possible to use the builders with the scipy_fftpack interface?

More generally, I don't like the idea of hard-coding parameters to pyfftw inside librosa, since this makes it difficult (impossible?) for users to modify the configuration.


Comments from Reviewable

@carlthome
Copy link
Contributor Author

More generally, I don't like the idea of hard-coding parameters to pyfftw inside librosa, since this makes it difficult (impossible?) for users to modify the configuration.

Yeah... Not awesome. Back to using the cache, for now. Let me think.

@carlthome
Copy link
Contributor Author

I think in particular what makes FFTW slow is that it needs to plan for a lot of different shapes. The frame chunking will differ a lot, even when the number of bins are fixed (which is probably pretty likely in most use cases) and the trimming at the end also causes an extra plan to occur.

Would it be possible to maintain the same shape for all fft(...) calls in stft (and vice versa with ifft for istft) or is that frowned upon?

@bmcfee
Copy link
Member

bmcfee commented Jun 29, 2016

Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke.


.travis_dependencies.sh, line 39 [r5] (raw file):

        if [ "$ENABLE_FFTW" = true ]; then
            pip install pyfftw

This would be better done through conda. You can also conda install fftw, so no need for apt.


librosa/core/spectrum.py, line 180 [r5] (raw file):

        x = np.array(fft_window * y_frames, dtype='float32')
        y = np.empty((x.shape[0]//2 + 1, x.shape[1]), dtype=dtype)
        pyfftw.FFTW(x.T, y.T)()

does fftw not take an axis parameter?


Comments from Reviewable

@carlthome
Copy link
Contributor Author

Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke.


.travis_dependencies.sh, line 39 [r5] (raw file):

Previously, bmcfee (Brian McFee) wrote…

This would be better done through conda. You can also conda install fftw, so no need for apt.

Hmm, are you sure? I get this:
conda install fftw
Error: Package missing in current linux-64 channels: 
  - fftw

conda install pyfftw
Error: Package missing in current linux-64 channels: 
  - pyfftw

librosa/core/spectrum.py, line 180 [r5] (raw file):

Previously, bmcfee (Brian McFee) wrote…

does fftw not take an axis parameter?

It does, but I want to follow the existing code as much as possible at the moment, to easier catch where code duplication can be avoided.

Comments from Reviewable

@bmcfee
Copy link
Member

bmcfee commented Jun 29, 2016

Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke.


.travis_dependencies.sh, line 39 [r5] (raw file):

Previously, carlthome (Carl Thomé) wrote…

Hmm, are you sure? I get this:

conda install fftw
Error: Package missing in current linux-64 channels: 
  - fftw

conda install pyfftw
Error: Package missing in current linux-64 channels: 
  - pyfftw
O_o it works for me?
 →  conda search fftw
Fetching package metadata: ..........
fftw                         3.3.4                         0  https://conda.binstar.org/conda-forge/linux-64/ 
                             3.3.4                         1  https://conda.binstar.org/conda-forge/linux-64/ 
                             3.3.4                         2  https://conda.binstar.org/conda-forge/linux-64/ 
                          *  3.3.4                         3  https://conda.binstar.org/conda-forge/linux-64/ 
pyfftw                       0.10.1               np19py35_0  https://conda.binstar.org/conda-forge/linux-64/ 
                             0.10.1               np19py34_0  https://conda.binstar.org/conda-forge/linux-64/ 
                             0.10.1               np19py27_0  https://conda.binstar.org/conda-forge/linux-64/ 
...

Comments from Reviewable

@carlthome
Copy link
Contributor Author

Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke.


.travis_dependencies.sh, line 39 [r5] (raw file):

Previously, bmcfee (Brian McFee) wrote…

O_o it works for me?

 →  conda search fftw
Fetching package metadata: ..........
fftw                         3.3.4                         0  https://conda.binstar.org/conda-forge/linux-64/ 
                             3.3.4                         1  https://conda.binstar.org/conda-forge/linux-64/ 
                             3.3.4                         2  https://conda.binstar.org/conda-forge/linux-64/ 
                          *  3.3.4                         3  https://conda.binstar.org/conda-forge/linux-64/ 
pyfftw                       0.10.1               np19py35_0  https://conda.binstar.org/conda-forge/linux-64/ 
                             0.10.1               np19py34_0  https://conda.binstar.org/conda-forge/linux-64/ 
                             0.10.1               np19py27_0  https://conda.binstar.org/conda-forge/linux-64/ 
...
Ah, but you're using conda-forge! Is Travis using that channel too?

Comments from Reviewable

@bmcfee
Copy link
Member

bmcfee commented Jun 29, 2016

Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks broke.


.travis_dependencies.sh, line 39 [r5] (raw file):

Previously, carlthome (Carl Thomé) wrote…

Ah, but you're using conda-forge! Is Travis using that channel too?

Apparently not, but we could (should?) add it to the travis config.

Comments from Reviewable

@carlthome
Copy link
Contributor Author

Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions.


.travis_dependencies.sh, line 39 [r5] (raw file):

Previously, bmcfee (Brian McFee) wrote…

Apparently not, but we could (should?) add it to the travis config.

Whatever floats your boat. 👍
  1. In that case, I also propose to install ffmpeg (currently apt in .travis.yml) and python-coveralls (currently in .travis_dependencies.sh with pip) with conda just to be consistent.
  2. I'd also consider flattening the shell script and inlining create_conda() because it confused me and I don't like feeling stupid. 😕

Comments from Reviewable

@carlthome
Copy link
Contributor Author

Review status: 0 of 4 files reviewed at latest revision, 3 unresolved discussions.


.travis_dependencies.sh, line 39 [r5] (raw file):

Previously, carlthome (Carl Thomé) wrote…

Whatever floats your boat. 👍

  1. In that case, I also propose to install ffmpeg (currently apt in .travis.yml) and python-coveralls (currently in .travis_dependencies.sh with pip) with conda just to be consistent.
  2. I'd also consider flattening the shell script and inlining create_conda() because it confused me and I don't like feeling stupid. 😕
We could sneak this into this pull request, or if you'd like: just do a commit directly on master.

Comments from Reviewable

@carlthome
Copy link
Contributor Author

carlthome commented Jun 30, 2016

I'm stuck on what would be a good way of using FFTW actually. For quick, interactive data analysis and stuff, we'd expect people to mix different FFTs, thus requiring a new FFTW plan on pretty much every librosa.stft() call. The first call will always be slow (even with FFTW_ESTIMATE) and if there's ever only going to be a single call, in those cases perhaps it would make sense to rely on scipy.fftpack.

However, when there's a lot of audio to be transformed, or even a series of librosa.stft() calls with the same n_fft, hop, window size and so on (like in my case for machine learning and transforming entire datasets, out-of-core) then it makes more sense to rely on a shared FFTW object.

I'm reluctantly proposing an API change, in which stft/istft/fmt would be just as before, but with a new function called "plan()" or something that users could call to warmup a FFTW object (preferably one for both forward/backwards if possible, despite librosa currently using RFFT). Then in stft/istft/fmt a check for whether a global FFTW object is set would determine if that should be used or not, with some care taken to make sure that parameters match and stuff too, of course. It would probably be thread-safe but I need to double check that with pyFFTW first. This API would also support cuFFT, which could be sweet.

It seems there should be a better way of going about this though. I'm still thinking. I'd love some feedback. @bmcfee, @hgomersall thoughts?

@hgomersall
Copy link

hgomersall commented Jul 10, 2016

@carlthome hey, apologies for the delay on this, I've been away for a bit. Yeah, this sounds like an issue. Do you have any idea how Matlab handles it?

Calls to execute FFTW objects should be thread safe (assuming they don't clobber the same bits of memory) - either through the __call__ interface or through execute() explicitly, though I've little experience of it directly myself. If you find anything problematic I'd be keen to know about it. FFTW itself is thread safe for execution and pyFFTW doesn't complicate that aspect.

It sounds a little like you're suggesting exactly what pyfftw.interface does, albeit with optimisations for librosa (including the cache aspect).

As an aside (and apologies if you've noted with already or is clear from the code which I haven't read) it's possible to do an stft using stride tricks with pyFFTW, eliminating a copy - whether this speeds things up or not is to be determined.

@bmcfee
Copy link
Member

bmcfee commented Jul 10, 2016

As an aside (and apologies if you've noted with already or is clear from the code which I haven't read) it's possible to do an stft using stride tricks with pyFFTW, eliminating a copy - whether this speeds things up or not is to be determined.

How would that work with windowing? If I understand correctly, you'd either invoke a copy when applying the window in the time domain, or after the FFT by convolution.

@hgomersall
Copy link

@bmcfee Ah yes, good point - just ignore me :)

@carlthome carlthome closed this Aug 31, 2016
@carlthome carlthome reopened this Aug 31, 2016
@bmcfee
Copy link
Member

bmcfee commented Aug 31, 2016

Reviewed 2 of 4 files at r7.
Review status: 2 of 4 files reviewed at latest revision, 2 unresolved discussions.


setup.py, line 47 [r7] (raw file):

        'numba': ['numba >= 0.25'],
        'display': ['matplotlib >= 1.5'],
        'stft': ['pyfftw']

I'd rather have the extras for this called fftw, since stft will work either way. This would be consistent with the numba extra.


librosa/core/spectrum.py, line 6 [r7] (raw file):

import multiprocessing
import functools

where do multiprocessing and functools get used here?


Comments from Reviewable

@@ -44,6 +44,6 @@
'matplotlib >= 1.5'],
'numba': ['numba >= 0.25'],
'display': ['matplotlib >= 1.5'],
'stft': ['pyfftw']
'fftw': ['pyfftw => 0.10.4']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha, oops.

@carlthome
Copy link
Contributor Author

Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions.


setup.py, line 47 [r7] (raw file):

Previously, bmcfee (Brian McFee) wrote…

I'd rather have the extras for this called fftw, since stft will work either way. This would be consistent with the numba extra.

Done.

librosa/core/spectrum.py, line 6 [r7] (raw file):

Previously, bmcfee (Brian McFee) wrote…

where do multiprocessing and functools get used here?

Done.

Comments from Reviewable

@bmcfee
Copy link
Member

bmcfee commented Sep 6, 2016

Reviewed 1 of 2 files at r8, 1 of 1 files at r9.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@bmcfee
Copy link
Member

bmcfee commented Sep 6, 2016

The code looks good now (thanks!), but I'd still like to see some benchmarks demonstrating the speedups before merging.

@bmcfee
Copy link
Member

bmcfee commented Dec 16, 2016

TODO on this one:

  • rebase from master to resolve a conflict on core/spectrum.py
  • report some benchmarks

@bmcfee
Copy link
Member

bmcfee commented Jan 5, 2017

@carlthome I'd like to push out 0.5 at the end of this month. FFTW support would be nice to have, if you have the time to finish this one up.

@carlthome
Copy link
Contributor Author

Sorry, been very busy. Will get to this today!

@carlthome
Copy link
Contributor Author

I too fail to see any substantial performance improvement, even with longer audio files. I believe my original performance speedup had more to do with the GIL rather (I was calling librosa.stft on multiple files asynchronously, which maybe got held up with fftpack but not the C code behind pyffftw, I guess. Or I'm just imagining things, the placebo is strong with this one... 🐨 ).

import librosa as lr
import numpy as np
import pyfftw
import pyfftw.interfaces.scipy_fftpack as fft
import scipy.fftpack

short_audio, sr = lr.load(lr.util.example_audio_file())
long_audio = np.tile(short_audio, 100)
print("Short audio file: {:.0f} seconds\nLong audio file: {:.0f} seconds"
      .format(len(short_audio) / sr, len(long_audio) / sr))

lr.core.spectrum.fft = fft
pyfftw.interfaces.cache.enable()
%timeit lr.stft(short_audio)
%timeit lr.stft(long_audio)

lr.core.spectrum.fft = scipy.fftpack
%timeit lr.stft(short_audio)
%timeit lr.stft(long_audio)

Short audio file: 61 seconds
Long audio file: 6146 seconds
10 loops, best of 100: 56.2 ms per loop
1 loop, best of 10: 7.01 s per loop
10 loops, best of 100: 42.9 ms per loop
1 loop, best of 10: 5.94 s per loop

However, I noticed in SciPy's documentation that they offer more optimization if one calls fftpack.rfft explicitly:

"If the data type of x is real, a “real FFT” algorithm is automatically used, which roughly halves the computation time. To increase efficiency a little further, use rfft, which does the same calculation, but only outputs half of the symmetrical spectrum. If the data is both real and symmetrical, the dct can again double the efficiency, by generating half of the spectrum from half of the signal."

Perhaps that's better than nothing. I propose closing this pull request and rethinking how to use FFTW via Python in the future. It requires a bit more planning than anticipated (pun intended). Agreed?

@bmcfee
Copy link
Member

bmcfee commented Jan 27, 2017

Perhaps that's better than nothing. I propose closing this pull request and rethinking how to use FFTW via Python in the future. It requires a bit more planning than anticipated (pun intended). Agreed?

Agreed. Thanks for all your effort on this!

@bmcfee bmcfee closed this Jan 27, 2017
@carlthome
Copy link
Contributor Author

carlthome commented Jan 5, 2018

I'm trying to get why FFTW via PyFFTW is slower than scipy.fftpack (especially as SciPy's own docs mention PyFFTW for when performance is critical).

Working with PyFFTW master (e.g. pip install git+http://github.com/pyfftw/pyfftw --upgrade) and profiling librosa.stft with the changes in this pull request leads me to believe that the performance hit comes from an extra copy being done per STFT frame. Could that be avoided?

librosa.stft with scipy.fftpack

         3565 function calls in 0.097 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      171    0.059    0.000    0.061    0.000 basic.py:184(fft)
        1    0.023    0.023    0.096    0.096 spectrum.py:34(stft)

librosa.stft with PyFFTW's scipy.fftpack interface

         6985 function calls in 0.189 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      171    0.062    0.000    0.142    0.001 _utils.py:50(_Xfftn)
      171    0.034    0.000    0.066    0.000 _utils.py:96(_Xfftn)
      344    0.031    0.000    0.031    0.000 {method 'copy' of 'numpy.ndarray' objects}
        1    0.030    0.030    0.187    0.187 spectrum.py:35(stft)

@hgomersall, do you have some insight into why PyFFTW is slower than scipy.fftpack in librosa.core.spectrum? It looks like the SciPy interface delegates to the NumPy interface, could that cause extra overhead? Why is this copy necessary? I bet some silly ndarray.flags.writeable is false or something.

@bmcfee
Copy link
Member

bmcfee commented Jan 5, 2018

with the changes in this pull request leads me to believe that the performance hit comes from an extra copy being done per STFT frame. Could that be avoided?

Not really; the best you can do is move the copy elsewhere. It's necessary here to allow for windowing. If you don't copy here, you need a separate copy / convolution in the frequency domain to achieve the same effect.

@hgomersall
Copy link

A couple of things come to mind.

The copy is a little strange for - what's the transform you're trying to do? Most transforms don't copy by default, but the inverse real transforms with dimensions >= 2 do. This is because FFTW will clobber the arrays in those cases, so the only safe approach is to copy (this can be undone with overwrite_input=True). The only other thing that should cause a copy is as you identified, the flags.writeable being True. I'd be interested to know if you're finding a copy still when you've ruled those bits out.

It seems to be spending rather too much time in _Xfftn. What is it doing in there? Do you have the cache turned on? It might be that the cache default time is too long given the other stuff that is happening?

The interfaces stuff adds quite an overhead to the FFTW call. I suspect the fftpack call is a very thin shim around some C (I'm aware quite a bit of work has been done on speeding it up). There is probably something to be said for speeding up the interfaces code if there is appetite for that... 😉

This was referenced Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Does this improve existing functionality?
Development

Successfully merging this pull request may close these issues.

None yet

3 participants