Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute transforms.Resample kernel #1499

Merged
merged 5 commits into from May 14, 2021

Conversation

carolineechen
Copy link
Contributor

@carolineechen carolineechen commented May 11, 2021

The kernel used for resampling computations is the same for a given set of resampling parameters, but is currently being recomputed every call to resample. To make computation more efficient for transforms.Resample, we can precompute the kernel and pass it in to the function instead.

Follow-ups:

  • [BC-breaking] do not move kernel device and dtype during computation

cc #1487

@@ -1377,9 +1378,16 @@ def resample(
but less efficient. We suggest around 4 to 10 for normal use. (Default: ``6``)
rolloff (float, optional): The roll-off frequency of the filter, as a fraction of the Nyquist.
Lower values reduce anti-aliasing, but also reduce some of the highest frequencies. (Default: ``0.99``)
kernel (Tensor, optional): Tensor of dimension (f, 1, w) representing the windowed sinc function that is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok I am not sure how I feel about adding exposing kernel as a parameter here (without also exposing the _get_sinc_resample_kernel function to users) because the dimensions and values seem very specific to how we are computing it (ex/ the dimensions are based on first simplifying the input frequencies and on the width/rolloff). I don't think these computations/dimensions should be enforced exactly if users do want to compute their own kernel differently, but that could also lead to using improperly sized kernels that could result in wrong results. Any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have two questions;

  1. Will passing width from the result of _get_sinc_resample_kernel make the computation flow simpler?
  2. Could you measure the performance improvement from kernel caching? If it's negligible, then we are good without caching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think passing in width will make computation flow (much) simpler -- my concern is that different libraries may compute kernels differently or use different convolution sizes (which is what width is being used for), and it is therefore hard to do sanity checks on their input to make sure it is reasonable and aligns with their parameters. Also because there is simplification to be done to create these kernels (like dividing by gcd) and it isn't super intuitive, I don't think kernel should be a parameter that should be exposed to users (unless we also expose the kernel function and provide concrete documentation for it)
  2. %timeit torchaudio.functional.resample(sig_torch, P, Q) gives 10.6 ms ± 218 µs per loop
    resampler = torchaudio.transforms.Resample(); %timeit resampler(sig_torch) gives 1.46 ms ± 95.6 µs per loop

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so the speed up looks significant to give up. Let me try one more suggestion. Instead of making it public, how about allowing passing the precomputed kernel as a private function? So that users will not use, but we can still add variety of kernels.

  1. Extract the part that applies kernel from functional.resample.
    So that functional.resample is basically a high level function that calls two helper functions, one for kernel generation and convolution.
  2. From transforms.Resample we pass pre-computed kernel to the second part of resampling function directly.

So in the gist looks like,

# in functional.py
def resample(...):
    kernel, width = _get_kernel(...)
    resampled = _apply_kernel(...)
    ...

and in transform,

class Resample(nn.Module):
    def __init__(self, ...):
        ...
        self.kernel, self.width = functional._get_kernel(...)

    def forward(self, ...):
        resampled = _apply_kernel(...)
        ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carolineechen

Just to be sure, pre-computed kernels are only applicable for set of fixed ratio of sampling rates, right?
So if we want to reuse the kernel in Transforms, and assuming that the target sampling rate is fixed, that means, that the expected original sample rate is fixed as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to introduce a new Transform for fixed sampling rate, for faster computation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the kernel is dependent on a set of parameters (old_freq, new_freq, lowpass_filter_width, rolloff,..) and is the same for that given set of parameters. Currently in transforms.Resample, all of these parameters are being passed in init, and the only parameter for forward is the waveform itself, so I think it is fine to reuse the current transform.

The new suggestion seems pretty reasonable to me as well.

torchaudio/transforms.py Outdated Show resolved Hide resolved
@carolineechen carolineechen marked this pull request as ready for review May 12, 2021 17:49
torchaudio/transforms.py Outdated Show resolved Hide resolved
# pack batch
shape = waveform.size()
waveform = waveform.view(-1, shape[-1])
kernel = kernel.to(device=waveform.device, dtype=waveform.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the fact that this function will be called from Transform, we might want to avoid moving parameter to a certain device.

In a use case like

  1. Initialize Resample
  2. Move the pipeline to a specific device
  3. Run a inference.

and let's say that the pipeline is in GPU but the waveform is on CPU. We want the pipeline to fail so that users can fix the pipeline, instead of performing the operation on CPU, which incurs the cost of moving kernel from GPU to CPU. (well and then it will fail in the next step of pipeline, which expects the Tensor to be on GPU)

I think, moving the kernel to target device can happen in between _get_kernel function and _apply_kernel functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, where do you think target device/dtype should initially be set in transforms?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, where do you think target device/dtype should initially be set in transforms?

They are expected to be set by users with .to, so we do not need to set it explicitly.

resamp = T.Resample(...)
resamp.to(device, dtype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, this is BC-breaking and left as a follow-up

Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@carolineechen carolineechen merged commit 52e7bfd into pytorch:master May 14, 2021
@mthrok mthrok mentioned this pull request May 25, 2021
@carolineechen carolineechen mentioned this pull request Jun 1, 2021
6 tasks
mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022
Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants