-
Notifications
You must be signed in to change notification settings - Fork 759
Description
🐛 Describe the bug
Hi, I've seen this issue about when center=False, but even with center=True, I'm finding that passing a waveform into torchaudio.transforms.Spectrogram(power=None, n_fft=n_fft, hop_length=hop_length) and then immediatley running that result through torchaudio.transforms.InverseSpectrogram(n_fft=n_fft, hop_length=hop_length) results in an output waveform that is shorter than the input waveform.
I can imagine that the output might be longer due to zero-padding, but I'm not figuring out why it's shorter.
Changing to center=False did not fix the problem, rather it resulted in a runtime error.
Pre-padding the input waveform up to a power of 2 in length removes the issue, but... I would've thought this would be handled automatically.
Example:
from torchaudio import transforms as T
waveform, sr = torchaudio.load('example.wav')
print("waveform.shape = ",waveform.shape)
n_fft=1024
hop_length=256
center=True
stft = T.Spectrogram(power=None, n_fft=n_fft, hop_length=hop_length, center=center)
istft = T.InverseSpectrogram( n_fft=n_fft, hop_length=hop_length, center=center)
spec = stft(waveform)
recon = istft(spec)
print("spec.shape, recon.shape = ",spec.shape, recon.shape)
assert waveform.shape == recon.shape, \
f"Expected waveform.shape ({waveform.shape}) == recon.shape ({recon.shape})"Output:
waveform.shape = torch.Size([1, 55728])
spec.shape, recon.shape = torch.Size([1, 513, 218]) torch.Size([1, 55552])
AssertionError: Expected waveform.shape (torch.Size([1, 55728])) == recon.shape (torch.Size([1, 55552]))
Again, I can imagine why recon.shape[-1] might be greater than 55728, but I'm having a hard time understanding why it'd be less.
Can you help me?
Also, note that current documenation for InverseSpectrogram doesn't say anything regarding any length kwarg that was mentioned in the closed Issue I referenced earlier.
Thanks for your work on this!
Versions
$ python collect_env.py
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.24.3
Libc version: glibc-2.31
Python version: 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-1023-aws-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
[NOTE: censoring output manually: removing additional GPU's from list]
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] nwt-pytorch==0.0.4
[pip3] perceiver-pytorch==0.8.7
[pip3] pytorch-lightning==1.8.6
[pip3] torch==1.13.1
[pip3] torchaudio==0.13.1
[pip3] torchmetrics==0.11.0
[pip3] torchvision==0.14.1
[pip3] vector-quantize-pytorch==0.10.14
[conda] Could not collect