Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does transforms.TimeStretch return of type complex64? #3688

Closed
kuraga opened this issue Nov 5, 2023 · 4 comments
Closed

Why does transforms.TimeStretch return of type complex64? #3688

kuraga opened this issue Nov 5, 2023 · 4 comments

Comments

@kuraga
Copy link
Contributor

kuraga commented Nov 5, 2023

🐛 Describe the bug

Good day!

https://pytorch.org/audio/2.1.0/generated/torchaudio.transforms.TimeStretch.html#torchaudio.transforms.TimeStretch.forward:

Stretched spectrogram. The resulting tensor is of the same dtype as the input spectrogram, but the number of frames is changed to ceil(num_frame / rate).

But:

s = torchaudio.transforms.Spectrogram()(x)
s.dtype  # => torch.float32

t = torchaudio.transforms.TimeStretch(fixed_rate=0.9)(s)
t.dtype  # =>  torch.complex64

Should I collect a bug report or don't I understand time stretching?

(previously posted at the forum)

Versions

torchaudio 2.1.1 from Google Colab

@mthrok
Copy link
Collaborator

mthrok commented Nov 8, 2023

TimeStretch (or underlying phase_vocoder) expects input to be raw spectrogram (the one with power=None) because it manipulates the input signal in complex plane based on the phase information. It alters both phase and magnitude, then returns the complex spectrogram.

def phase_vocoder(complex_specgrams: Tensor, rate: float, phase_advance: Tensor) -> Tensor:

torchaudio.transforms.Spectrogram has argument power with default value of 2, which produces real-valued power spectrogram. It discards phase information. In this case, TimeStretch interprets the input signal as having zero phase everywhere.

I feel like it is more user friendly to warn or reject real-valued spectrogram input in TimeStretch.

@kuraga
Copy link
Contributor Author

kuraga commented Nov 9, 2023

@mthrok , thanks!

https://pytorch.org/audio/2.1.0/generated/torchaudio.transforms.TimeStretch.html:

  1. Seems like we need to show the way of getting the picture in the Example.

  2. And fix the statement:

Stretched spectrogram. The resulting tensor is of the same dtype as the input spectrogram, but the number of frames is changed to ceil(num_frame / rate).

  1. Also:

hop_length (int) or None, optional) – Length of hop between STFT windows. (Default: win_length // 2)

But there is no win_length argument.

  1. Your idea about the warning.

@mthrok
Copy link
Collaborator

mthrok commented Nov 10, 2023

@kuraga

#3694 will fix the documentation and #3695 will add the warning if real-valued tensor is passed.

Seems like we need to show the way of getting the picture in the Example.

It is found in Feature Extraction tutorial found in the same documentation, so I will defer to it.

@kuraga
Copy link
Contributor Author

kuraga commented Nov 10, 2023

@mthrok

#3694 will fix the documentation and #3695 will add the warning if real-valued tensor is passed.

Wow-wow, thanks!!

Seems like we need to show the way of getting the picture in the Example.

It is found in Feature Extraction tutorial found in the same documentation, so I will defer to it.

I meant librosa.amplitude_to_db call (or .abs().pow(2) etc. call) isn't reflected.
But now I see visualisation details are not reflected at methods' documentation.

@kuraga kuraga closed this as completed Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants