[BC-Breaking] Avoid moving resampling kernel device and dtype moves #1514

carolineechen · 2021-05-18T22:56:03Z

Initially, the kernel used for resampling was computed only after it was fed a waveform, and it would be initialized to be of the same device and dtype as the input waveform that is resampled. In a later change, we cached the resampling kernel in transforms.Resample, since kernel computation is significant and redundant when using a given set of resampling parameters.

To maintain the original behavior would require moving the cached kernel to the device and dtype of the input waveform every time transforms.Resample is called, and this could result in unintended side effects; running a CPU transforms on CUDA inputs would have overhead from constantly moving a cached CPU kernel to CUDA, and the expectation is that the user will manually move the transforms themself to the correct device and dtype (ex/ resample = transforms.Resample; resample = resample.to(device=torch.device('cuda'), dtype=torch.float16). This PR removes the moving of the cached kernel to be of the correct device and dtype at every call to it, which will now throw an error if a user does not move the transform to CUDA but calls the function on a CUDA waveform.

This PR additionally results in slight differences in results, because of the precision of the kernel. In the previous/functional implementation, the kernel computation is done in the dtype corresponding to the waveform from the start, but in the new transforms implementation, the kernel computation will be done in float64 before being moved and cached as float32 in __init__, prior to the user moving it to the correct dtype themself. This results in higher precision resampling using transforms when resampling on waveforms of dtype smaller than float32, and slightly lower precision resampling on waveforms of dtype greater than float32, since it was intermediately cached as float32.

cc #1487

mthrok

Looks good!

cpuhrsch · 2021-05-19T18:12:35Z

torchaudio/functional/functional.py

@@ -1360,7 +1362,8 @@ def _get_sinc_resample_kernel(
    # they will have a lot of almost zero values to the left or to the right...
    # There is probably a way to evaluate those filters more efficiently, but this is kept for
    # future work.
-    idx = torch.arange(-width, width + orig_freq, dtype=torch.float64)
+    idx_dtype = dtype if dtype is not None else torch.float64
+    idx = torch.arange(-width, width + orig_freq, device=device, dtype=idx_dtype)


What if someone passes a low precision type like uint8? I think it might be better to pick whatever dtype is most efficient for this operation.

following offline discussion, we can keep higher precision type float64 because the kernel computation is a one-time computation, whose dimensions are limited to roughly orig_freq // gcd x new_freq // gcd. normal resampling frequencies will generally have large gcd, in which case dtype differences will have minor computation differences

cpuhrsch · 2021-05-19T18:13:08Z

torchaudio/functional/functional.py

-    return torch.stack(kernels).view(new_freq, 1, -1).mul_(scale), width
+    kernels = torch.stack(kernels).view(new_freq, 1, -1).mul_(scale)
+    if dtype is None:
+        kernels = kernels.to(dtype=torch.float32)


It might be better to just return the kernel and do the dtype and device cast after the callsite, since you're not using dtype outside of arange.

following offline discussion, it is fine to convert to this generally "default" type prior to returning the kernel to the call from transforms

mthrok · 2021-05-22T21:15:36Z

Maybe we can override to method, so that when the target dtype is 64bit, we can regenerate kernel.

* Update rpc_ddp_tutorial.rst Replace `RRef` with `RemoteModule` in some descriptions. * Update rpc_ddp_tutorial.rst typo fix Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com>

carolineechen requested a review from mthrok May 18, 2021 22:56

facebook-github-bot added the CLA Signed label May 18, 2021

carolineechen requested a review from cpuhrsch May 18, 2021 22:56

mthrok approved these changes May 18, 2021

View reviewed changes

rebase

a041c3f

carolineechen force-pushed the resample-kernel-device branch from 604cc27 to a041c3f Compare May 19, 2021 15:45

cpuhrsch reviewed May 19, 2021

View reviewed changes

carolineechen merged commit 079b3f5 into pytorch:master May 19, 2021

mthrok mentioned this pull request May 25, 2021

Resampling #1487

Closed

6 tasks

carolineechen mentioned this pull request Jun 7, 2021

[Cherry-picked 0.9] Add dtype argument for kernel caching precision #1556

Merged

yoyololicon mentioned this pull request Jun 30, 2021

Error when using Resampler on GPU with torchaudio 0.9 #1619

Closed

carolineechen mentioned this pull request Jun 6, 2022

[Cherry-picked 0.12] Modify Pitchshift for faster resampling #2441

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BC-Breaking] Avoid moving resampling kernel device and dtype moves #1514

[BC-Breaking] Avoid moving resampling kernel device and dtype moves #1514

carolineechen commented May 18, 2021 •

edited

mthrok left a comment

cpuhrsch May 19, 2021

carolineechen May 19, 2021

cpuhrsch May 19, 2021

carolineechen May 19, 2021

mthrok commented May 22, 2021

[BC-Breaking] Avoid moving resampling kernel device and dtype moves #1514

[BC-Breaking] Avoid moving resampling kernel device and dtype moves #1514

Conversation

carolineechen commented May 18, 2021 • edited

mthrok left a comment

Choose a reason for hiding this comment

cpuhrsch May 19, 2021

Choose a reason for hiding this comment

carolineechen May 19, 2021

Choose a reason for hiding this comment

cpuhrsch May 19, 2021

Choose a reason for hiding this comment

carolineechen May 19, 2021

Choose a reason for hiding this comment

mthrok commented May 22, 2021

carolineechen commented May 18, 2021 •

edited