Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPP Implementation of lfilter for CPU #290

Closed
wants to merge 19 commits into from

Conversation

engineerchuan
Copy link
Contributor

@engineerchuan engineerchuan commented Sep 18, 2019

To compare with the current python implementation of lfilter (implementation 0), we developed 2 cpp implementations.
1.) an element wise implementation using accessors (implementation 1)
2.) a matrix based implementation (implementation 2)

We test these implementations against each other using random inputs, and a variety of signal lengths and filter orders:

Performance

  1. Element Wise CPP (implementation 1) is typically much faster, on the order of >800x for lower order filters and 400x for higher order filters. This is an argument for including this into the codebase. We should in the future also compile on GPU.

Correctness

  1. For all data sizes and order filters, both matrix based implementations (0 and 2) return the same results at <3e-4 tolerance.
  2. For higher order filters and longer inputs, the element wise implementation can diverge from the matrix. This makes sense as floating point round offs will typically accumulate. If the order of operations in matrix method and element wise method are not identical, these results would not expect to match. Generally, the user should be expected to supply filter coefficients that are stable.
  3. This has been tested against by using a 2 order filter compared to SoX biquad results.

Questions / Comments

  1. Is there a better way to "slice" that is more efficient? @cpuhrsch
  2. @yf225 Thank you for your suggestion on a previous PR to parallelize across channel. I think that generally will reduce time by at most 4x because audio waveforms have usually 2 channels. So I don't think it'd be meaningful in choosing between matrix
  3. @vincentqb Any thoughts?

@engineerchuan
Copy link
Contributor Author

Raw performance test results:

--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 8000], Filter Order: 5
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   0.413754 s
CPP Element Wise Runtime       :   0.000630 s
CPP Matrix Runtime             :   0.273378 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     657.10 x
✓ - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 80000], Filter Order: 5
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   5.420492 s
CPP Element Wise Runtime       :   0.006082 s
CPP Matrix Runtime             :   3.899572 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     891.26 x
✓ - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 800000], Filter Order: 5
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:  58.134758 s
CPP Element Wise Runtime       :   0.044133 s
CPP Matrix Runtime             :  34.690434 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :    1317.27 x
✓ - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 8000], Filter Order: 8
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   0.633596 s
CPP Element Wise Runtime       :   0.000647 s
CPP Matrix Runtime             :   0.390352 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     979.18 x
✓ - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 80000], Filter Order: 8
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   6.352927 s
CPP Element Wise Runtime       :   0.005172 s
CPP Matrix Runtime             :   3.909453 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :    1228.44 x
✓ - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 800000], Filter Order: 8
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:  63.547796 s
CPP Element Wise Runtime       :   0.050219 s
CPP Matrix Runtime             :  28.999176 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :    1265.42 x
✓ - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 8000], Filter Order: 18
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   0.416699 s
CPP Element Wise Runtime       :   0.000992 s
CPP Matrix Runtime             :   0.281268 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     420.03 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 80000], Filter Order: 18
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   4.140644 s
CPP Element Wise Runtime       :   0.009811 s
CPP Matrix Runtime             :   2.813545 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     422.05 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 800000], Filter Order: 18
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:  41.416457 s
CPP Element Wise Runtime       :   0.093863 s
CPP Matrix Runtime             :  28.061018 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     441.25 x
.
----------------------------------------------------------------------
Ran 1 test in 284.240s

@vincentqb
Copy link
Contributor

We talked about order 40. Is this too high to test? :)

@engineerchuan
Copy link
Contributor Author

@vincentqb, here are the results for 40'th, code in commit. The python is now about 200x slower than the elementwise, so the CPP is about 100x times slower.

--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 8000], Filter Order: 40
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   0.417130 s
CPP Element Wise Runtime       :   0.002043 s
CPP Matrix Runtime             :   0.292215 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     204.20 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 80000], Filter Order: 40
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   4.260019 s
CPP Element Wise Runtime       :   0.019245 s
CPP Matrix Runtime             :   2.902780 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     221.35 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 800000], Filter Order: 40
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:  41.547153 s
CPP Element Wise Runtime       :   0.183620 s
CPP Matrix Runtime             :  29.410577 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     226.27 x
.
----------------------------------------------------------------------

@engineerchuan
Copy link
Contributor Author

Hi,

I refactored the CPP implementations to use a Python wrapper. Basically I tried to write the bare minimum necessary in CPP.

  1. Asserts are being compiled out in production mode for CPP code
  2. Pythonic interface a little more mature and expressive
  3. Trying to keep all memory allocation outside of the cpp code for cleanliness
  4. Matrix and Element Wise implementation can share error checking code etc..

Performance results are similar as before. For small data, may be affected by Python wrapper addition but its a fixed cost that does not scale with data size.

Chuan

--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 8000], Filter Order: 5
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   0.433291 s
CPP Element Wise Runtime       :   0.000642 s
CPP Matrix Runtime             :   0.283377 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     674.59 x
PASS - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 80000], Filter Order: 5
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   4.302623 s
CPP Element Wise Runtime       :   0.006038 s
CPP Matrix Runtime             :   2.805403 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     712.54 x
PASS - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 800000], Filter Order: 5
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:  57.354847 s
CPP Element Wise Runtime       :   0.044153 s
CPP Matrix Runtime             :  29.991314 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :    1299.01 x
PASS - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 8000], Filter Order: 8
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   0.438865 s
CPP Element Wise Runtime       :   0.000667 s
CPP Matrix Runtime             :   0.282346 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     657.64 x
PASS - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 80000], Filter Order: 8
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   4.281600 s
CPP Element Wise Runtime       :   0.006768 s
CPP Matrix Runtime             :   2.810288 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     632.65 x
PASS - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 800000], Filter Order: 8
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:  45.460900 s
CPP Element Wise Runtime       :   0.048582 s
CPP Matrix Runtime             :  28.323625 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     935.76 x
PASS - all outputs are identical
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 8000], Filter Order: 18
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   0.433779 s
CPP Element Wise Runtime       :   0.001148 s
CPP Matrix Runtime             :   0.283811 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     377.78 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 80000], Filter Order: 18
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   4.324719 s
CPP Element Wise Runtime       :   0.010699 s
CPP Matrix Runtime             :   2.854490 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     404.21 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 800000], Filter Order: 18
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:  43.343172 s
CPP Element Wise Runtime       :   0.085304 s
CPP Matrix Runtime             :  28.245840 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     508.10 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 8000], Filter Order: 40
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   0.435738 s
CPP Element Wise Runtime       :   0.002127 s
CPP Matrix Runtime             :   0.291140 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     204.82 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 80000], Filter Order: 40
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:   4.386663 s
CPP Element Wise Runtime       :   0.018836 s
CPP Matrix Runtime             :   2.883111 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     232.89 x
--------------------------------------------------------------------------------
lfilter perf - Data Size: [2 x 800000], Filter Order: 40
--------------------------------------------------------------------------------
Python Matrix Runtime [current]:  43.455556 s
CPP Element Wise Runtime       :   0.174953 s
CPP Matrix Runtime             :  28.797021 s
--------------------------------------------------------------------------------
Ratio Python / CPP ElementWise :     248.38 x

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! The gain in performance may be worth offering this C++ implementation to the user, so I'm ok with offering it. I'd like having a simple mechanism to offer/remove the interface to the implementation.

What's the status with GPU? Is it only for CPU in its current form? Should we have a forking mechanism when on CUDA and use the python implementation instead?

This PR has a lot of commits from other PRs (e.g. deltas, SpecAugment) that are tangle with it, and so needs to be rebased to isolate what is meant to be part of this PR.

torchaudio/functional.py Outdated Show resolved Hide resolved
# Perform sanity checks, input check, and memory allocation in python

# Current limitations to be removed in future
assert(waveform.dtype == torch.float32 or waveform.dtype == torch.float64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this implementation in C++ restricting the functionality w/r to the python one?

def biquad(waveform, b0, b1, b2, a0, a1, a2):
# type: (Tensor, float, float, float, float, float, float) -> Tensor
r"""Performs a biquad filter of input tensor. Initial conditions set to 0.
https://en.wikipedia.org/wiki/Digital_biquad_filter

Args:
waveform (torch.Tensor): audio waveform of dimension of `(n_channel, n_frames)`
waveform (torch.Tensor): Audio waveform of dimension of `(n_channel, n_frames)`.
Currently only supports float32. Normalized [-1, 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other float types?

torchaudio/functional.py Outdated Show resolved Hide resolved
test/test_transforms.py Outdated Show resolved Hide resolved
test/test_functional.py Outdated Show resolved Hide resolved
test/test_functional_filtering.py Outdated Show resolved Hide resolved
torchaudio/augmentations.py Outdated Show resolved Hide resolved
torchaudio/functional.py Outdated Show resolved Hide resolved
torchaudio/transforms.py Outdated Show resolved Hide resolved
@vincentqb
Copy link
Contributor

By the way, it'd be nice to make a comparison with jit compiled python code, #326.

@vincentqb
Copy link
Contributor

cc #1238

@mthrok
Copy link
Collaborator

mthrok commented Mar 6, 2021

Superseded by #1238, #1318, #1319 and #1310

@mthrok mthrok closed this Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants