Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overdrive cpp extension #1299

Merged
merged 7 commits into from Feb 24, 2021
Merged

Conversation

bhargavkathivarapu
Copy link
Contributor

New PR for overdrive C++ extension similar to lfilter . Relates to #580 ( old PR )

Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bhargavkathivarapu

Thanks for the contribution, and sorry for taking so long to get back to you.
Overall it looks good to me.
@cpuhrsch Can you take a second look?

# for i in range(waveform.shape[-1]):
# last_out = temp[:, i] - last_in + 0.995 * last_out
# last_in = temp[:, i]
# output_waveform[:, i] = waveform[:, i] * 0.5 + last_out * 0.75
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get rid of these comments now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok sure will remove them.
Also I will try to implement the parallel_for as mentioned by @cpuhrsch in the old PR comments ( #580 ).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhargavkathivarapu sounds good. Could you try measure the performance improvement?
Something like the following and change the shape of tensor to variety of shapes.
If numactl is missing, you can install it with sudo apt-get install -y numactl on Ubuntu.

OMP_NUM_THREADS=1 numactl --membind 0 --cpubind 0 python -m timeit -n 100 -r 5 -s """
import torch;
import torchaudio
x = torch.zeros(32, 100, dtype=torch.float)
""" """
torchaudio.functional.overdrive(x)
"""

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM depending some basic benchmarks that show clear improvement.

@bhargavkathivarapu
Copy link
Contributor Author

bhargavkathivarapu commented Feb 24, 2021

@mthrok

Changes made

  • Removed commented code
  • Added parallel_for in overdrive cpp.

Performance details
( 60 sec audio clip, VM with 2 vCPUs)

Screenshot from 2021-02-24 17-17-58

Numctl details ( VM with 2 vCPUs)
Command used :

OMP_NUM_THREADS=1 numactl --membind 0 --cpubind 0 python -m timeit -n 100 -r 5 -s """
import torch
import torchaudio
x = torch.zeros(2, 100, dtype=torch.float)
torchaudio.functional.overdrive(x)
"""

(2 , 100 ) = 100 loops, best of 5: 14.1 nsec per loop
(32 , 100 ) = 100 loops, best of 5: 16.8 nsec per loop
(1000, 100) = 100 loops, best of 5: 28.2 nsec per loop
(2, 100000) = 100 loops, best of 5: 40.4 nsec per loop
(100, 100000) = 100 loops, best of 5: 44.3 nsec per loop
(1000, 100000) = 100 loops, best of 5: 55.7 nsec per loop
(10000, 10000) =100 loops, best of 5: 49.7 nsec per loop

@bhargavkathivarapu bhargavkathivarapu marked this pull request as ready for review February 24, 2021 12:13
@mthrok
Copy link
Collaborator

mthrok commented Feb 24, 2021

@mthrok

Changes made

  • Removed commented code
  • Added parallel_for in overdrive cpp.

Performance details
( 60 sec audio clip, VM with 2 vCPUs)

Screenshot from 2021-02-24 17-17-58

Numctl details ( VM with 2 vCPUs)
Command used :

OMP_NUM_THREADS=1 numactl --membind 0 --cpubind 0 python -m timeit -n 100 -r 5 -s """
import torch
import torchaudio
x = torch.zeros(2, 100, dtype=torch.float)
torchaudio.functional.overdrive(x)
"""

(2 , 100 ) = 100 loops, best of 5: 14.1 nsec per loop
(32 , 100 ) = 100 loops, best of 5: 16.8 nsec per loop
(1000, 100) = 100 loops, best of 5: 28.2 nsec per loop
(2, 100000) = 100 loops, best of 5: 40.4 nsec per loop
(100, 100000) = 100 loops, best of 5: 44.3 nsec per loop
(1000, 100000) = 100 loops, best of 5: 55.7 nsec per loop
(10000, 10000) =100 loops, best of 5: 49.7 nsec per loop

@bhargavkathivarapu

Thanks for the update. So my understanding is that the speed is comparable to sox implementation.

Can you compare the same torchaudio.functional.overdrive function and Tensor shape before adding the C++ code? That will tell more directly how the C++ implementation improves the performance.

@mthrok
Copy link
Collaborator

mthrok commented Feb 24, 2021

Can you merge the latest master commit to include #1297? I believe I killed the flaky bug. 🐛

@bhargavkathivarapu
Copy link
Contributor Author

bhargavkathivarapu commented Feb 24, 2021

@mthrok . Now all checks passed after merging master.

Can you compare the same torchaudio.functional.overdrive function and Tensor shape before adding the C++ code? That will tell more directly how the C++ implementation improves the performance.

( Reference 60 second clip ) Tensor shape of w = [1, 1323000]
Existing torchaudio overdrive timing
Screenshot from 2021-02-24 21-06-37

Comparison for that tensor w

Implementation Relative run time Run time
sox overdrive 1X 38.7ms
overdrive new(cpp) 2X 78.2ms
overdrive old (python) ~2000X 77000ms

@mthrok
Copy link
Collaborator

mthrok commented Feb 24, 2021

@mthrok . Now all checks passed after merging master.

Can you compare the same torchaudio.functional.overdrive function and Tensor shape before adding the C++ code? That will tell more directly how the C++ implementation improves the performance.

( Reference 60 second clip ) Tensor shape of w = [1, 1323000]
Existing torchaudio overdrive timing
Screenshot from 2021-02-24 21-06-37

Comparison for that tensor w

Implementation Relative run time Run time
sox overdrive 1X 38.7ms
overdrive new(cpp) 2X 78.2ms
overdrive old (python) ~2000X 77000ms

Wonderful! Thanks!

@mthrok mthrok merged commit 23e9ed3 into pytorch:master Feb 24, 2021
mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants