Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numbafy #157

Closed
5 of 16 tasks
bmcfee opened this issue Jan 24, 2015 · 13 comments · Fixed by #1112
Closed
5 of 16 tasks

numbafy #157

bmcfee opened this issue Jan 24, 2015 · 13 comments · Fixed by #1112
Assignees
Labels
enhancement Does this improve existing functionality?
Milestone

Comments

@bmcfee
Copy link
Member

bmcfee commented Jan 24, 2015

Wherever it makes sense to accelerate loop code, we should jit-compile.

Specifically,

  • util.match_events
  • util.match_intervals
  • util.peak_pick
  • beat.__beat_track_dp
  • core.constantq.cqt internals
    • trim_stack
    • num_two_factors
  • core.spectrum.stft
  • core.spectrum.istft
  • decompose.nn_filter
  • feature.util.sync
  • feature.util.stack_memory
  • segment.lag_to_recurrence, segment.recurrence_to_lag
  • effects.remix
  • filters.mel_frequencies
  • filters.chroma
@bmcfee bmcfee self-assigned this Jan 24, 2015
@bmcfee bmcfee added this to the 0.5 milestone Jan 24, 2015
@bmcfee
Copy link
Member Author

bmcfee commented Jul 7, 2015

I took a quick cut at cythonizing stft today.

A naive, frame-oriented cython implementation gets almost as fast as our current batched (pure-python) implementation.

@bmcfee bmcfee changed the title cythonize numbafy Jul 27, 2016
@bmcfee
Copy link
Member Author

bmcfee commented Jul 27, 2016

Updating: now that #323 is merged, we have optional numba jit decoration. This will be considerably easier to deploy than cythonization.

@bmcfee
Copy link
Member Author

bmcfee commented Sep 30, 2016

Tagging @ejhumphrey if he wants to take a look into this

@bmcfee bmcfee added the enhancement Does this improve existing functionality? label Sep 30, 2016
@bmcfee
Copy link
Member Author

bmcfee commented Oct 7, 2016

Maybe as a subtask, it would help to implement a profiling test suite so we know where the hot spots are.

@stefan-balke
Copy link
Member

Like that one? https://pypi.python.org/pypi/pytest-profiling

That would be the output for py.test test_core.py --profile

Profiling (from prof/combined.prof):
Sat Oct  8 17:08:24 2016    prof/combined.prof

         4845410 function calls (4844615 primitive calls) in 127.496 seconds

   Ordered by: cumulative time
   List reduced from 272 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2120    0.097    0.000  127.498    0.060 {built-in method builtins.exec}
     2120    0.025    0.000  127.395    0.060 pluggy.py:586(execute)
     2120    0.208    0.000  127.367    0.060 python.py:277(pytest_pyfunc_call)
     1228    0.092    0.000  101.133    0.082 test_core.py:761(__test)
     2681    0.316    0.000  100.665    0.038 spectrum.py:770(fmt)
     2465    0.338    0.000   98.920    0.040 interpolate.py:408(__init__)
     2458    1.273    0.001   98.318    0.040 interpolate.py:2312(splmake)
     1704    1.959    0.001   96.600    0.057 interpolate.py:2036(_find_smoothest)
     1704   72.881    0.043   74.928    0.044 decomp_svd.py:15(svd)
     2120    0.004    0.000   26.903    0.013 <string>:1(<module>)
      660    0.228    0.000   25.895    0.039 test_core.py:296(__test)
    13632   17.909    0.001   17.909    0.001 {built-in method numpy.core.multiarray.dot}
      660    7.432    0.011   17.654    0.027 spectrum.py:177(istft)
   350929    0.954    0.000    7.993    0.000 basic.py:279(ifft)
      661    1.576    0.002    7.065    0.011 spectrum.py:25(stft)
   350929    5.882    0.000    6.320    0.000 basic.py:78(_fake_cfft)
     8864    3.970    0.000    4.086    0.000 basic.py:176(fft)
   357335    1.490    0.000    1.490    0.000 {method 'conj' of 'numpy.ndarray' objects}
     8130    0.166    0.000    1.188    0.000 _util.py:141(_asarray_validated)
     3137    0.023    0.000    1.127    0.000 numeric.py:2310(allclose)

Next step could be to run pstats for visualizations.

@bmcfee
Copy link
Member Author

bmcfee commented Oct 8, 2016

Like that one? https://pypi.python.org/pypi/pytest-profiling

Indeed! But that probably requires waiting til after #391 happens to do properly.

@bmcfee
Copy link
Member Author

bmcfee commented Dec 3, 2016

I did some tinkering on this.

No benefit to jitting sync, since the callouts to aggregate break any benefits. Probably the same applies to nn_filter and like.

Other functions become difficult to jit because of list comprehensions, so optimization will have to follow some slight refactoring.

@bmcfee
Copy link
Member Author

bmcfee commented Dec 12, 2016

Since 0.5 is already getting pretty bloated, I'm going to put this one off to 0.6.

@bmcfee bmcfee removed this from the 0.5 milestone Dec 12, 2016
@bmcfee
Copy link
Member Author

bmcfee commented Jul 14, 2017

More updates: numba now has pip wheels, so we can make a hard dependency going forward. This should simplify a few things.

@bmcfee
Copy link
Member Author

bmcfee commented Jul 15, 2017

Update from the scipy2017 sprint:

  • numba (llvmlite) wheels are WIP, but should be usable by august'17
  • many of the librosa ops we want to optimize are not quite numba-friendly, for a few reasons:
    • use of axis= arguments
    • use of fancy indexing (tuple of slices)
    • use of ufuncs as parameters (eg, aggregate)
    • use of sparse matrices

OTOH, things like match_events and match_intervals could be accelerated.

@bmcfee
Copy link
Member Author

bmcfee commented Sep 18, 2017

Updates:

  • llvmlite wheels are now shipping
  • resampy 0.2.0 has shipped, and puts numba in librosa's dependency chain

@bmcfee
Copy link
Member Author

bmcfee commented Jun 8, 2018

No benefit to jitting sync, since the callouts to aggregate break any benefits. Probably the same applies to nn_filter and like.

A possible workaround to this is given in the numba faq, which suggests using a jitted closure to compile a separate backend function that depends on aggregate. This will have a bit of one-time overhead cost, but the jit cost is probably less than the interpreter cost that we're currently invoking anyway (for long signals).

@bmcfee bmcfee added this to the 0.8.0 milestone Jan 23, 2020
@nartes
Copy link

nartes commented Apr 17, 2020

An example of accelerated librosa.effects.trim + librosa.effects.remix to achieve something
alike to noise truncation in @audacity:

import numpy
import cython
import librosa.effects

def remix(
    d,
    frame_length,
    hop_length,
    top_db,
    margin,
):
    '''
        Truncate silence in a mono f32 waveform, zeros are aligned
        hop_length, frame_length, top_db as in librosa.effects.trim
        margin is amount of zeros between consequent intervals
    '''
    i = librosa.effects.split(
        d,
        top_db=top_db,
        frame_length=frame_length,
        hop_length=hop_length,
    )

    assert isinstance(margin, int) and margin > 0

    i2 = numpy.empty_like(i)
    cython.inline(
        '''
            cdef long t1
            cdef long t2
            cdef long t3
            cdef long t4

            t4 = d.shape[0]

            for t1 in range(i.shape[0]):
                t2 = i[t1, 0]
                t3 = t2
                while t3 > 0 and d[t3] * d[t2] >= 0:
                    t3 -= 1
                i2[t1, 0] = t3

                t2 = i[t1, 1]
                t3 = t2
                while t3 < t4 and d[t3] * d[t2] >= 0:
                    t3 += 1
                i2[t1, 1] = t3
        ''',
        d=d,
        i=i,
        i2=i2,
    )
    o = (i2[:, 1] - i2[:, 0])
    d2 = numpy.zeros(
        o.sum() + 1 + (o.shape[0] - 1) * margin,
        dtype=d.dtype
    )
    cython.inline(
        '''
            cdef long t1
            cdef long t2
            cdef long t3
            cdef long t4

            t3 = 0
            t4 = i2.shape[0]
            for t1 in range(i2.shape[0]):
                t2 = i2[t1, 1] - i2[t1, 0]
                d2[t3 :][:t2] = d[
                    i2[t1, 0] : i2[t1, 1]
                ]
                if t1 > 0:
                    d2[t3] = 0
                if t1 < t4:
                    d2[t3 + t2] = 0
                t3 += t2
                if t1 < t4:
                    t3 += margin
        ''',
        d=d,
        d2=d2,
        margin=margin,
        i2=i2,
    )
    return d2

@bmcfee bmcfee linked a pull request May 6, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Does this improve existing functionality?
Development

Successfully merging a pull request may close this issue.

4 participants