-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BC-Breaking] Avoid moving resampling kernel device and dtype moves #1514
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1305,7 +1305,9 @@ def _get_sinc_resample_kernel( | |
lowpass_filter_width: int, | ||
rolloff: float, | ||
resampling_method: str, | ||
beta: Optional[float]): | ||
beta: Optional[float], | ||
device: torch.device = torch.device("cpu"), | ||
dtype: Optional[torch.dtype] = None): | ||
|
||
if not (int(orig_freq) == orig_freq and int(new_freq) == new_freq): | ||
warnings.warn( | ||
|
@@ -1360,7 +1362,8 @@ def _get_sinc_resample_kernel( | |
# they will have a lot of almost zero values to the left or to the right... | ||
# There is probably a way to evaluate those filters more efficiently, but this is kept for | ||
# future work. | ||
idx = torch.arange(-width, width + orig_freq, dtype=torch.float64) | ||
idx_dtype = dtype if dtype is not None else torch.float64 | ||
idx = torch.arange(-width, width + orig_freq, device=device, dtype=idx_dtype) | ||
|
||
for i in range(new_freq): | ||
t = (-i / new_freq + idx / orig_freq) * base_freq | ||
|
@@ -1379,7 +1382,10 @@ def _get_sinc_resample_kernel( | |
kernels.append(kernel) | ||
|
||
scale = base_freq / orig_freq | ||
return torch.stack(kernels).view(new_freq, 1, -1).mul_(scale), width | ||
kernels = torch.stack(kernels).view(new_freq, 1, -1).mul_(scale) | ||
if dtype is None: | ||
kernels = kernels.to(dtype=torch.float32) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be better to just return the kernel and do the dtype and device cast after the callsite, since you're not using dtype outside of arange. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. following offline discussion, it is fine to convert to this generally "default" type prior to returning the kernel to the call from transforms |
||
return kernels, width | ||
|
||
|
||
def _apply_sinc_resample_kernel( | ||
|
@@ -1396,7 +1402,6 @@ def _apply_sinc_resample_kernel( | |
# pack batch | ||
shape = waveform.size() | ||
waveform = waveform.view(-1, shape[-1]) | ||
kernel = kernel.to(device=waveform.device, dtype=waveform.dtype) | ||
|
||
num_wavs, length = waveform.shape | ||
waveform = torch.nn.functional.pad(waveform, (width, width + orig_freq)) | ||
|
@@ -1452,6 +1457,6 @@ def resample( | |
gcd = math.gcd(int(orig_freq), int(new_freq)) | ||
|
||
kernel, width = _get_sinc_resample_kernel(orig_freq, new_freq, gcd, lowpass_filter_width, rolloff, | ||
resampling_method, beta) | ||
resampling_method, beta, waveform.device, waveform.dtype) | ||
resampled = _apply_sinc_resample_kernel(waveform, orig_freq, new_freq, gcd, kernel, width) | ||
return resampled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if someone passes a low precision type like uint8? I think it might be better to pick whatever dtype is most efficient for this operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following offline discussion, we can keep higher precision type float64 because the kernel computation is a one-time computation, whose dimensions are limited to roughly orig_freq // gcd x new_freq // gcd. normal resampling frequencies will generally have large gcd, in which case dtype differences will have minor computation differences