CPP extension for overdrive effect in functional #580

bhargavkathivarapu · 2020-04-24T15:05:04Z

Hi ,

I have tried implementing existing fucntional overdrive using cpp as the python version is slow compared to Sox version . ( #260 ( Reducing dependency in Sox) )
Though cpp version is slight slower than sox version . It is much better compated to python version

comparing Sox with new overdrive(with cpp ext )

old python overdrive effect took 80000 ms

Almost >700X speed compared to python implementation

sox comapatibilty - passed
batch test - passed
Torch script - not passed . I think cpp extension cannot be converted to torch script

$ python test/test_torchscript_consistency.py
.............................E.....sssssssssssssssssssssssssssssssssss................ssssssssssssssss
======================================================================
ERROR: test_overdrive (__main__.TestFunctionalCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_torchscript_consistency.py", line 474, in test_overdrive
    self._assert_consistency(func, waveform)
  File "test/test_torchscript_consistency.py", line 38, in _assert_consistency
    return _assert_functional_consistency(func, tensor, self.device, shape_only=shape_only)
  File "test/test_torchscript_consistency.py", line 14, in _assert_functional_consistency
    ts_func = torch.jit.script(func)
  File "/Users/ka387861/pytorch/torch/jit/__init__.py", line 1296, in script
    fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(obj))
  File "/Users/ka387861/pytorch/torch/jit/_recursive.py", line 559, in try_compile_fn
    return torch.jit.script(fn, _rcb=rcb)
  File "/Users/ka387861/pytorch/torch/jit/__init__.py", line 1296, in script
    fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(obj))
RuntimeError: 
Python builtin <built-in method _overdrive_helper of PyCapsule object at 0x143fe2270> is currently not supported in Torchscript:
  File "/Users/ka387861/audio/torchaudio/functional.py", line 1291

    # TODO: Implement a torch CPP extension
    _overdrive_helper(waveform, temp, last_in, last_out, output_waveform)
    ~~~~~~~~~~~~~~~~~ <--- HERE

    return output_waveform.clamp(min=-1, max=1).view(actual_shape)
'overdrive' is being compiled since it was called from 'func'
  File "test/test_torchscript_consistency.py", line 472
            gain = 30.
            colour = 50.
            return F.overdrive(tensor, gain, colour)
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE


----------------------------------------------------------------------
Ran 102 tests in 1839.777s

FAILED (errors=1, skipped=51)

should we create a new folder to organize cpp codes in torchaudio ?
- Cuda kernels and cpp files for other functions might clutter the torchaudio folder
Planning to write a cuda kernel , but i am getting some weird errors when running existing torch script test on remote GPU docker
ENV : pytorch - 1.4 , CUDA 10 , Multi GPU

Below is a part of the log

======================================================================
ERROR: test_Spectrogram (__main__.TestTransformsCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_torchscript_consistency.py", line 486, in test_Spectrogram
    self._assert_consistency(T.Spectrogram(), tensor)
  File "test/test_torchscript_consistency.py", line 482, in _assert_consistency
    _assert_transforms_consistency(transform, tensor, self.device)
  File "test/test_torchscript_consistency.py", line 27, in _assert_transforms_consistency
    ts_transform = torch.jit.script(transform)
  File "/miniconda/envs/python36/lib/python3.6/site-packages/torch/jit/__init__.py", line 1255, in script
    return torch.jit._recursive.recursive_script(obj)
  File "/miniconda/envs/python36/lib/python3.6/site-packages/torch/jit/_recursive.py", line 534, in recursive_script
    return create_script_module(nn_module, infer_methods_to_compile(nn_module))
  File "/miniconda/envs/python36/lib/python3.6/site-packages/torch/jit/_recursive.py", line 296, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, cpp_module, stubs)
  File "/miniconda/envs/python36/lib/python3.6/site-packages/torch/jit/_recursive.py", line 340, in create_script_module_impl
    create_methods_from_stubs(concrete_type, stubs)
  File "/miniconda/envs/python36/lib/python3.6/site-packages/torch/jit/_recursive.py", line 259, in create_methods_from_stubs
    concrete_type._create_methods(defs, rcbs, defaults)
RuntimeError: Can't redefine method: forward on class: __torch__.torchaudio.transforms.Spectrogram (addMethod at /opt/conda/conda-bld/pytorch_1579027003190/work/torch/csrc/jit/script/class_type.cpp:73)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f55078ed627 in /miniconda/envs/python36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::ClassType::addMethod(torch::jit::Function*) + 0x1d9 (0x7f550d426f69 in /miniconda/envs/python36/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: torch::jit::script::CompilationUnit::define(c10::optional<c10::QualifiedName> const&, torch::jit::script::Def const&, std::shared_ptr<torch::jit::script::Resolver> const&, torch::jit::script::Self const*, std::unordered_map<std::string, torch::jit::Function*, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, torch::jit::Function*> > > const&, bool) const + 0x6d7 (0x7f550d3b3f47 in /miniconda/envs/python36/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #3: torch::jit::script::CompilationUnit::define(c10::optional<c10::QualifiedName> const&, std::vector<torch::jit::script::Def, std::allocator<torch::jit::script::Def> > const&, std::vector<std::shared_ptr<torch::jit::script::Resolver>, std::allocator<std::shared_ptr<torch::jit::script::Resolver> > > const&, torch::jit::script::Self const*, bool) + 0x17d (0x7f550d3b468d in /miniconda/envs/python36/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #4: <unknown function> + 0x7897af (0x7f5538dd57af in /miniconda/envs/python36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x28c076 (0x7f55388d8076 in /miniconda/envs/python36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>


======================================================================
FAIL: test_TimeStretch (__main__.TestTransformsCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_torchscript_consistency.py", line 533, in test_TimeStretch
    tensor,
  File "test/test_torchscript_consistency.py", line 482, in _assert_consistency
    _assert_transforms_consistency(transform, tensor, self.device)
  File "test/test_torchscript_consistency.py", line 30, in _assert_transforms_consistency
    torch.testing.assert_allclose(ts_output, output)
  File "/miniconda/envs/python36/lib/python3.6/site-packages/torch/testing/__init__.py", line 59, in assert_allclose
    count - 1, 100 * count / actual.numel()))
AssertionError: Not within tolerance rtol=0.0001 atol=1e-05 at input[7, 0, 254, 5, 0] (-0.030245909467339516 vs. -0.029713068157434464) and 101 other locations (0.00%)

----------------------------------------------------------------------
Ran 102 tests in 254.452s

FAILED (failures=1, errors=14, skipped=2)

Any idea of this error above "Can't redefine method: forward on class"

@mthrok or @vincentqb could review these changes

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

mthrok · 2020-04-24T15:33:21Z

Hi @bhargavkathivarapu

Thanks for the PR. This is exciting. I will take a look into it.

You can run only the related test with pytest test -v -k overdrive
Can you run python -m torch.utils.collect_env and paste the result here.
In my env, both TimeStretch and Spectrogram tests run fine.

Environment (Docker `nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04`)

$ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.6.0a0+de4d2e9
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 418.116.00
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.6.0a0+de4d2e9
[pip] torchaudio==0.6.0a0+954d512
[conda] blas                      1.0                         mkl
[conda] magma-cuda101             2.5.2                         1    pytorch
[conda] mkl                       2020.0                      166
[conda] mkl-include               2020.0                      166
[conda] mkl-service               2.3.0            py38he904b0f_0
[conda] mkl_fft                   1.0.15           py38ha843d7b_0
[conda] mkl_random                1.1.0            py38h962f231_0
[conda] numpy                     1.18.1           py38h4f9e942_0
[conda] numpy-base                1.18.1           py38hde5b4d6_1
[conda] torch                     1.6.0a0+de4d2e9           dev_0    <develop>
[conda] torchaudio                0.6.0a0+954d512           dev_0    <develop>

TimeStretch

$ pytest test -v -k TimeStretch
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.8.2, pytest-5.4.1, py-1.8.1, pluggy-0.13.1 -- /home/moto/conda/envs/PY3.8-cuda101/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/scratch/moto/torchaudio/.hypothesis/examples')
rootdir: /scratch/moto/torchaudio
plugins: hypothesis-5.8.3
collected 253 items / 250 deselected / 3 selected

test/test_batch_consistency.py::TestTransforms::test_batch_TimeStretch PASSED                                                                                                                                                          [ 33%]
test/test_torchscript_consistency.py::TestTransformsCPU::test_TimeStretch PASSED                                                                                                                                                       [ 66%]
test/test_torchscript_consistency.py::TestTransformsCUDA::test_TimeStretch PASSED                                                                                                                                                      [100%]

===================================================================================================== 3 passed, 250 deselected in 5.91s ======================================================================================================

Spectrogram

$ pytest test -v -k Spectrogram
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.8.2, pytest-5.4.1, py-1.8.1, pluggy-0.13.1 -- /home/moto/conda/envs/PY3.8-cuda101/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/scratch/moto/torchaudio/.hypothesis/examples')
rootdir: /scratch/moto/torchaudio
plugins: hypothesis-5.8.3
collected 253 items / 243 deselected / 10 selected

test/test_batch_consistency.py::TestTransforms::test_batch_melspectrogram PASSED                                                                                                                                                       [ 10%]
test/test_batch_consistency.py::TestTransforms::test_batch_spectrogram PASSED                                                                                                                                                          [ 20%]
test/test_compliance_kaldi.py::Test_Kaldi::test_spectrogram PASSED                                                                                                                                                                     [ 30%]
test/test_torchscript_consistency.py::TestFunctionalCPU::test_spectrogram PASSED                                                                                                                                                       [ 40%]
test/test_torchscript_consistency.py::TestFunctionalCUDA::test_spectrogram PASSED                                                                                                                                                      [ 50%]
test/test_torchscript_consistency.py::TestTransformsCPU::test_MelSpectrogram PASSED                                                                                                                                                    [ 60%]
test/test_torchscript_consistency.py::TestTransformsCPU::test_Spectrogram PASSED                                                                                                                                                       [ 70%]
test/test_torchscript_consistency.py::TestTransformsCUDA::test_MelSpectrogram PASSED                                                                                                                                                   [ 80%]
test/test_torchscript_consistency.py::TestTransformsCUDA::test_Spectrogram PASSED                                                                                                                                                      [ 90%]
test/test_transforms.py::Tester::test_melspectrogram_load_save PASSED                                                                                                                                                                  [100%]

============================================================================================================== warnings summary ==============================================================================================================
test/test_compliance_kaldi.py::Test_Kaldi::test_spectrogram
  ../torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program.

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=============================================================================================== 10 passed, 243 deselected, 1 warning in 7.01s ================================================================================================

bhargavkathivarapu · 2020-04-24T15:47:52Z

2. Can you run `python -m torch.utils.collect_env` and paste the result here.

My remote docker GPU environment

Collecting environment information...

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.0

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB

Nvidia driver version: 410.79
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.17.2
[pip] torch==1.4.0
[pip] torchaudio==0.6.0a0+fddbded
[pip] torchvision==0.2.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.0.130 0
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.0.14 py36ha843d7b_0
[conda] mkl_random 1.1.0 py36hd6b4f25_0
[conda] numpy 1.17.2 py36haad9e8e_0
[conda] numpy-base 1.17.2 py36hde5b4d6_0
[conda] pytorch 1.4.0 py3.6_cuda10.0.130_cudnn7.6.3_0 pytorch
[conda] torchaudio 0.6.0a0+fddbded dev_0
[conda] torchvision 0.2.2 py_3 pytorch

vincentqb

Adding C++ extensions is an ongoing topic of discussion for torchaudio/torchtext/torchvision (cc @fmassa), and requires some alignment, since we need to do so in a pytorch way to maintain jitability, and gpu-support. For instance:

We would love to see torchaudio.load linked in a way that maintains jitability.
In CPP Implementation of lfilter for CPU #290, we decided to delay merging a C++ implementation of lfilter for these reasons even though we know that the pytorch implementation is slower.

Since this is of interest to you, and we are indeed interested in offering something like this, we can provide guidance to align this with our plans though it will take a little bit more time to get this merged properly.

vincentqb · 2020-04-24T16:05:51Z

For completeness, in terms of performance, it'd be nice to see a comparison with the jitted version available in pytorch. :)

bhargavkathivarapu · 2020-04-25T09:21:51Z

For completeness, in terms of performance, it'd be nice to see a comparison with the jitted version available in pytorch. I don't expect a significance difference though. :)

comparsion between jit and python versions for overdrive implemented using python.

bhargavkathivarapu · 2020-04-25T10:06:35Z

Adding C++ extensions is an ongoing topic of discussion for torchaudio/torchtext/torchvision (cc @fmassa), and requires some alignment, since we need to do so in a pytorch way to maintain jitability, and gpu-support. For instance:
* We would love to see `torchaudio.load` linked in a way that maintains jitability.

* In #290, we decided to delay merging a C++ implementation of `lfilter` for these reasons even though we know that the pytorch implementation is slower.
Since this is of interest to you, and we are indeed interested in offering something like this, we can provide guidance to align this with our plans though it will take a little bit more time to get this merged properly.

ok . Once the approach for the integrating cpp, cuda extensions is finalized may be you can put an github issue for optimizing existing codes, I would like to contribute and learn some torch cpp and cuda internals in the process
Meanwhile I will try implementing other sox effects in python for reducing sox dependency

cpuhrsch · 2020-05-06T05:25:52Z

torchaudio/torch_overdrive.cpp

+  int64_t n_frames = waveform_accessor.size(1);
+  int64_t n_channels = waveform_accessor.size(0);
+
+  for (int64_t i_channel = 0; i_channel < n_channels; ++i_channel) {


Depending on the amount of work you might benefit from using parallel_for.

Most PyTorch CPU operators are parallelized, unless there's no obvious need due to memory-boundedness.

Another issue with pure C for C++ extensions, for now, is autovectorization. We can't ship avx2 code without a CPU capability based dispatch. That means for C code in extensions like this we're for now restricted to SSE and related.

Of course this is taken care of when you call into at:: operations directly, since they each take advantage being part of libtorch.

@cpuhrsch , yeah parallelization can be applied only for the channels loop . I was not sure how the parallel_for treats the inner sequential loop , so kept it without the parallel_for. A parallel thread won't interfere with other parallel thread's inner loop working right ?

@bhargavkathivarapu - I'm not sure what you mean by "interfere with" exactly. As in, shared variables or creating integers etc.? In this particular case it seems that the inner loops are independent of each other given that they differ in i_channel. The pointers and such will still be picked up as shared variables, but as long as you don't write to a single memory location from multiple threads concurrently etc., there's no issue.

By default PyTorch uses openmp which yields this implementation. Look into openmp's omp parallel (here is what looks like a good explanation) for some more detail on what that means.

cpuhrsch · 2020-05-06T05:51:16Z

torchaudio/torch_overdrive.cpp

+    for (int64_t i_frame = 0; i_frame < n_frames; ++i_frame) {
+      last_out_accessor[i_channel] = temp_accessor[i_channel][i_frame] -
+          last_in_accessor[i_channel] + 0.995 * last_out_accessor[i_channel];
+      last_in_accessor[i_channel] = temp_accessor[i_channel][i_frame];


You're setting the value of last_in to the value of temp for the current iteration so that the next iteration those values may be used . But instead you could just read from temp all the time (except for the first iteration) right? I added a similar comment for the Python code above.

But instead you could just read from temp all the time (except for the first iteration) right?

Yes , the first iteration needs to be handled then we can remove the last_in variable

mthrok · 2021-02-14T15:27:58Z

Hi @bhargavkathivarapu

Sorry for taking such long time to get back to this, but we finally cleaned up the build process and how we can write C++ extension. Recently, @parmeet has added C++ loop for lfilter, in #1244. This PR can follow the exact same pattern to achieve this. If you are still around and interested, would you like to give another shot? If not, we can take your commits and update them while keeping your credit. Let me know what you think.

bhargavkathivarapu · 2021-02-15T16:03:08Z

Hi @mthrok , Thanks for sharing the update on this. I will try to implement overdrive CPP extension similar to lfilter

bhargavkathivarapu · 2021-02-23T15:36:47Z

Closing this PR as there is new version of this PR (new - #1299 )

bhargavkathivarapu added 3 commits April 24, 2020 12:09

cpp overdrive

8ea3f4d

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

dynamic dispatch

fddbded

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

cpp linting

3a29cd6

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

mthrok requested review from vincentqb and mthrok and removed request for vincentqb April 24, 2020 15:38

vincentqb suggested changes Apr 24, 2020

View reviewed changes

cpuhrsch reviewed May 6, 2020

View reviewed changes

mthrok mentioned this pull request May 16, 2020

Add windows binary jobs #642

Merged

facebook-github-bot added the CLA Signed label Oct 30, 2020

bhargavkathivarapu mentioned this pull request Feb 23, 2021

Overdrive cpp extension #1299

Merged

bhargavkathivarapu closed this Feb 23, 2021

mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021

Fix some minor issues in Custom C++ and CUDA Extensions (pytorch#580)

75a581c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPP extension for overdrive effect in functional #580

CPP extension for overdrive effect in functional #580

bhargavkathivarapu commented Apr 24, 2020

mthrok commented Apr 24, 2020

bhargavkathivarapu commented Apr 24, 2020

vincentqb left a comment

vincentqb commented Apr 24, 2020 •

edited

bhargavkathivarapu commented Apr 25, 2020

bhargavkathivarapu commented Apr 25, 2020

cpuhrsch May 6, 2020

bhargavkathivarapu May 10, 2020

cpuhrsch May 12, 2020 •

edited

cpuhrsch May 6, 2020

bhargavkathivarapu May 10, 2020

mthrok commented Feb 14, 2021

bhargavkathivarapu commented Feb 15, 2021

bhargavkathivarapu commented Feb 23, 2021

CPP extension for overdrive effect in functional #580

CPP extension for overdrive effect in functional #580

Conversation

bhargavkathivarapu commented Apr 24, 2020

mthrok commented Apr 24, 2020

bhargavkathivarapu commented Apr 24, 2020

vincentqb left a comment

Choose a reason for hiding this comment

vincentqb commented Apr 24, 2020 • edited

bhargavkathivarapu commented Apr 25, 2020

bhargavkathivarapu commented Apr 25, 2020

cpuhrsch May 6, 2020

Choose a reason for hiding this comment

bhargavkathivarapu May 10, 2020

Choose a reason for hiding this comment

cpuhrsch May 12, 2020 • edited

Choose a reason for hiding this comment

cpuhrsch May 6, 2020

Choose a reason for hiding this comment

bhargavkathivarapu May 10, 2020

Choose a reason for hiding this comment

mthrok commented Feb 14, 2021

bhargavkathivarapu commented Feb 15, 2021

bhargavkathivarapu commented Feb 23, 2021

vincentqb commented Apr 24, 2020 •

edited

cpuhrsch May 12, 2020 •

edited