Skip to content

Inconsistent default and TS-incompatible lazy behavior in MelScale #1454

@mthrok

Description

@mthrok
  1. All the spectrogram related transforms have default frequency bin n_fft: int = 400, while MelScale and InverseMelScale is n_stft: Optional[int] = None.

  2. In MelScale, when n_stft=None, it tries to resize the buffer in forward, but this causes TorchScripted (and loaded from file) version fail.

specgram (Tensor): A spectrogram STFT of dimension (..., freq, time).
Returns:
Tensor: Mel frequency spectrogram of size (..., ``n_mels``, time).
"""
# pack batch
shape = specgram.size()
specgram = specgram.reshape(-1, shape[-2], shape[-1])
if self.fb.numel() == 0:
tmp_fb = F.create_fb_matrix(specgram.size(1), self.f_min, self.f_max,
self.n_mels, self.sample_rate, self.norm,
self.mel_scale)
# Attributes cannot be reassigned outside __init__ so workaround
self.fb.resize_(tmp_fb.size())
self.fb.copy_(tmp_fb)
# (channel, frequency, time).transpose(...) dot (frequency, n_mels)
# -> (channel, time, n_mels).transpose(...)
mel_specgram = torch.matmul(specgram.transpose(1, 2), self.fb).transpose(1, 2)

https://app.circleci.com/pipelines/github/pytorch/audio/5716/workflows/fe399658-a33f-47f2-8227-7750b2f0af2f/jobs/197223/tests#failed-test-0

>       return callable(*args, **kwargs)
E       RuntimeError: The following operation failed in the TorchScript interpreter.
E       Traceback of TorchScript, serialized code (most recent call last):
E         File "code/__torch__/torchaudio/transforms.py", line 20, in forward
E           if torch.eq(torch.numel(self.fb), 0):
E             tmp_fb = _0(torch.size(specgram0, 1), 0., 8000., 128, 16000, self.norm, self.mel_scale, )
E             _1 = torch.resize_(self.fb, torch.size(tmp_fb), memory_format=None)
E                  ~~~~~~~~~~~~~ <--- HERE
E             _2 = torch.copy_(self.fb, tmp_fb, False)
E           else:
E       
E       Traceback of TorchScript, original code (most recent call last):
E         File "/root/project/env/lib/python3.9/site-packages/torchaudio-0.9.0a0+bb886e7-py3.9-linux-x86_64.egg/torchaudio/transforms.py", line 302, in forward
E                                               self.mel_scale)
E                   # Attributes cannot be reassigned outside __init__ so workaround
E                   self.fb.resize_(tmp_fb.size())
E                   ~~~~~~~~~~~~~~~ <--- HERE
E                   self.fb.copy_(tmp_fb)
E           
E       RuntimeError: Trying to resize storage that is not resizable at /opt/conda/conda-bld/pytorch_1617951974812/work/aten/src/TH/THStorageFunctions.cpp:87

To reproduce,

  1. construct MelScale with n_stft=None.
  2. Script the transform and save on file
  3. Load the transform from file and feed a spectrogram Tensor.

Once the transform is scripted and dumped, there is no way to fix this issue.
The library code should not be hacking around, which can generate such a stack state.

For fix, since all the n_fft defaults to 400, n_stft should default to 201 as well.
This will remove the need of the above resize_ hack.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions