-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
biquad filter similar to SoX #275
Conversation
Hi Community, We are currently exploring ways to make torchaudio's dependence on SoX optional (per #260). A subset of SoX's functionality is its frequency filtering operations e.g. highpass, lowpass, bandpass, etc.. Sox's implementation are based on the digital biquad filter. (https://en.wikipedia.org/wiki/Digital_biquad_filter). This WIP PR ports SoX's implementation of the biquad effect, lowpass, and highpass filters. Question: Which parts of the frequency filtering functions would be helpful to include in torchaudio vs keep separate?
Thank you, Chuan |
Relates to #260 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look, and good job so far :)
A test is failing, but does not appear related to your changes.
test/test_datasets_vctk.py::TestVCTK::test_make_manifest FAILED
…tional_filtering to functional_sox_convenience
torchaudio/filtering.cpp
Outdated
assert(output_waveform.size(0) == n_channels); | ||
assert(output_waveform.size(1) == n_frames); | ||
|
||
auto input_accessor = input_waveform.accessor<float,2>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to look into using the appropriate C/GPU interface here. Unfortunately, we do not yet have the CI tools for anything else but linux on CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added GPU typedefs based on my understanding of code but we need to test this.
torchaudio/filtering.cpp
Outdated
int n_order = a_coeffs.size(0); // n'th order - 1 filter | ||
assert(a_coeffs.size(0) == b_coeffs.size(0)); | ||
|
||
for (int64_t i_channel = 0; i_channel < n_channels; ++i_channel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the channels in one pass for each frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure there is necessarily a faster way. Would you use tensors to slice all channels at each frame? Would that be much faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slicing would be great, yes. I'd expect this to work faster on GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 -- We want to implement a transformation lfilter
that could "almost" be implemented by convolutions. We can't because of the way the terms depend on each other. @engineerchuan suggests implementing it in C++ (see here, in torchaudio/filtering.cpp
), since the for-loop in time is much faster and comparable to scipy in speed on CPU. We shouldn't need a for-loop over the channels though. Thoughts on things to be careful about here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the computation for each channel doesn't depend on each other, we might be able to use at::parallel_for
(https://github.com/pytorch/pytorch/blob/cc61af3c3d8ca8b46f7234383513b5166e10150c/aten/src/ATen/Parallel.h#L48) to speed up on CPU, and thrust::for_each
(or a custom CUDA kernel launch) to speed up on GPU.
3 different implementations of lfilter were explored in this commit.
All 3 implementations return same result at 1e-5 tolerance for a variety of random inputs. We have confidence they are doing the same math. Performance (all on CPU): Input (2 channel x 100K samples):
Input (10 channel x 10K samples):
Input (100 channel x 10K samples):
The basic element wise implementation seems much faster. I need help evaluating my tensor based implementation. Is the method shown of slicing the most efficient way to access? Do I need to contiguous() at all? Thank you, Chuan |
Unwound the cpp lfilter implementations. We will move these to a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! We're essentially ready to merge this :)
Use biquad filter similar to SoX.