New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompute transforms.Resample kernel #1499
Precompute transforms.Resample kernel #1499
Conversation
torchaudio/functional/functional.py
Outdated
@@ -1377,9 +1378,16 @@ def resample( | |||
but less efficient. We suggest around 4 to 10 for normal use. (Default: ``6``) | |||
rolloff (float, optional): The roll-off frequency of the filter, as a fraction of the Nyquist. | |||
Lower values reduce anti-aliasing, but also reduce some of the highest frequencies. (Default: ``0.99``) | |||
kernel (Tensor, optional): Tensor of dimension (f, 1, w) representing the windowed sinc function that is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mthrok I am not sure how I feel about adding exposing kernel as a parameter here (without also exposing the _get_sinc_resample_kernel
function to users) because the dimensions and values seem very specific to how we are computing it (ex/ the dimensions are based on first simplifying the input frequencies and on the width/rolloff). I don't think these computations/dimensions should be enforced exactly if users do want to compute their own kernel differently, but that could also lead to using improperly sized kernels that could result in wrong results. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have two questions;
- Will passing
width
from the result of_get_sinc_resample_kernel
make the computation flow simpler? - Could you measure the performance improvement from kernel caching? If it's negligible, then we are good without caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't think passing in width will make computation flow (much) simpler -- my concern is that different libraries may compute kernels differently or use different convolution sizes (which is what width is being used for), and it is therefore hard to do sanity checks on their input to make sure it is reasonable and aligns with their parameters. Also because there is simplification to be done to create these kernels (like dividing by gcd) and it isn't super intuitive, I don't think kernel should be a parameter that should be exposed to users (unless we also expose the kernel function and provide concrete documentation for it)
%timeit torchaudio.functional.resample(sig_torch, P, Q)
gives10.6 ms ± 218 µs per loop
resampler = torchaudio.transforms.Resample(); %timeit resampler(sig_torch)
gives1.46 ms ± 95.6 µs per loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so the speed up looks significant to give up. Let me try one more suggestion. Instead of making it public, how about allowing passing the precomputed kernel as a private function? So that users will not use, but we can still add variety of kernels.
- Extract the part that applies kernel from
functional.resample
.
So thatfunctional.resample
is basically a high level function that calls two helper functions, one for kernel generation and convolution. - From
transforms.Resample
we pass pre-computed kernel to the second part of resampling function directly.
So in the gist looks like,
# in functional.py
def resample(...):
kernel, width = _get_kernel(...)
resampled = _apply_kernel(...)
...
and in transform
,
class Resample(nn.Module):
def __init__(self, ...):
...
self.kernel, self.width = functional._get_kernel(...)
def forward(self, ...):
resampled = _apply_kernel(...)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, pre-computed kernels are only applicable for set of fixed ratio of sampling rates, right?
So if we want to reuse the kernel in Transforms, and assuming that the target sampling rate is fixed, that means, that the expected original sample rate is fixed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to introduce a new Transform for fixed sampling rate, for faster computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the kernel is dependent on a set of parameters (old_freq, new_freq, lowpass_filter_width, rolloff,..) and is the same for that given set of parameters. Currently in transforms.Resample, all of these parameters are being passed in init, and the only parameter for forward is the waveform itself, so I think it is fine to reuse the current transform.
The new suggestion seems pretty reasonable to me as well.
bf0979c
to
313c4d1
Compare
313c4d1
to
74c0ac0
Compare
74c0ac0
to
89b0630
Compare
89b0630
to
7a482fb
Compare
# pack batch | ||
shape = waveform.size() | ||
waveform = waveform.view(-1, shape[-1]) | ||
kernel = kernel.to(device=waveform.device, dtype=waveform.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the fact that this function will be called from Transform, we might want to avoid moving parameter to a certain device.
In a use case like
- Initialize
Resample
- Move the pipeline to a specific device
- Run a inference.
and let's say that the pipeline is in GPU but the waveform is on CPU. We want the pipeline to fail so that users can fix the pipeline, instead of performing the operation on CPU, which incurs the cost of moving kernel from GPU to CPU. (well and then it will fail in the next step of pipeline, which expects the Tensor to be on GPU)
I think, moving the kernel
to target device can happen in between _get_kernel
function and _apply_kernel
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, where do you think target device/dtype should initially be set in transforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, where do you think target device/dtype should initially be set in transforms?
They are expected to be set by users with .to
, so we do not need to set it explicitly.
resamp = T.Resample(...)
resamp.to(device, dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, this is BC-breaking and left as a follow-up
566c0db
to
21e61c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com>
The kernel used for resampling computations is the same for a given set of resampling parameters, but is currently being recomputed every call to
resample
. To make computation more efficient fortransforms.Resample
, we can precompute the kernel and pass it in to the function instead.Follow-ups:
cc #1487