Complex STFT transform from spectrogram #327

vincentqb · 2019-11-01T19:42:50Z

We offer a layer that applies complex STFT without the power/normalization done in spectrogram. However, we do not want to overload the name "STFT" existing in pytorch, as done in the original proposal in the proposal from #285.

@ksanjeevan -- thoughts?

CC @ksanjeevan @keunwoochoi @cpuhrsch

ksanjeevan · 2019-11-06T23:02:31Z

@vincentqb Yeah that would work! So I'm not seeing anything on Spectrogram that checks that power is int or anything like that so just with this change of the functional we could have a complex stft transform correct?

ksanjeevan · 2019-11-06T23:05:24Z

torchaudio/functional.py

    # default values are consistent with librosa.core.spectrum._spectrogram
    spec_f = _stft(
        waveform, n_fft, hop_length, win_length, window, True, "reflect", False, True
    )

+    # unpack batch
+    spec_f = spec_f.reshape(shape[:-1] + spec_f.shape[-3:])


@vincentqb is support for batching going to be done optionally then? 😁 I can add it to the augmentation transforms if needed.

Yeah, we decided to go ahead and add batching using reshape now :) we'll then update these codes when nested tensor comes out

vincentqb · 2019-11-07T01:59:02Z

@vincentqb Yeah that would work! So I'm not seeing anything on Spectrogram that checks that power is int or anything like that so just with this change of the functional we could have a complex stft transform correct?

Right, there should only be a check for none to disable the power computation

keunwoochoi · 2019-11-10T01:38:38Z

torchaudio/functional.py

    r"""
    spectrogram(waveform, pad, window, n_fft, hop_length, win_length, power, normalized)

    Create a spectrogram from a raw audio signal.

    Args:
-        waveform (torch.Tensor): Tensor of audio of dimension (channel, time)
+        waveform (torch.Tensor): Tensor of audio of dimension ([batch,] channel, time)


It's not directly about this PR, but probably the optional last dimension for complex stft should be mentioned as well.

The convention was specifying the optional batch dimension as (*, channel, time). Maybe it's nicer to follow?

Yes, I agree, thanks for pointing this out!

keunwoochoi · 2019-11-10T01:39:20Z

torchaudio/functional.py

@@ -218,42 +218,51 @@ def istft(
 def spectrogram(
    waveform, pad, window, n_fft, hop_length, win_length, power, normalized
 ):
-    # type: (Tensor, int, Tensor, int, int, int, int, bool) -> Tensor
+    # type: (Tensor, int, Tensor, int, int, int, Optional[int], bool) -> Tensor
    r"""
    spectrogram(waveform, pad, window, n_fft, hop_length, win_length, power, normalized)

    Create a spectrogram from a raw audio signal.


I think the {optional / magnitude} should be also mentioned here, too.

So is the optional batch dimension too. So.. something like

Create a spectrogram or a batch of spectrograms from a raw audio signal. The spectrogram can be either magnitude-only or complex.

keunwoochoi · 2019-11-10T01:41:30Z

test/test_transforms.py

+        waveform, sample_rate = torchaudio.load(self.test_filepath)  # (2, 278756), 44100
+
+        # Single then transform then batch
+        expected = transforms.Spectrogram()(waveform).unsqueeze(0).repeat(3,1,1,1)


Linter will fix this anyway, but (3,1,1,1) --> (3, 1, 1, 1)

keunwoochoi · 2019-11-10T01:41:43Z

test/test_transforms.py

+        expected = transforms.Spectrogram()(waveform).unsqueeze(0).repeat(3,1,1,1)
+
+        # Batch then transform
+        waveform = waveform.unsqueeze(0).repeat(3,1,1)


also (3, 1, 1)

cpuhrsch · 2019-11-11T18:55:21Z

torchaudio/functional.py


    Args:
-        waveform (torch.Tensor): Tensor of audio of dimension (channel, time)
+        waveform (torch.Tensor): Tensor of audio of dimension (*, channel, time)


I'd explicitly call this 'batch', i.e. ([batch], channel, time) or such.

This is more general than batch. It works with a tensor of any dimension ending in (channel, time). In fact, this could be updated to (*, time) by making channel optional. Saying "(batch, channel, time) or (channel, time) or (time)" is a little heavy. And saying "([batch], [channel], time)" hides the generality. Thoughts?

I can't think of anything more compact. Just by following what it is something like ..(*, time), e.g., (time), (channel, time), (batch, channel, time)? It's kinda verbose but if we can reduce its ambiguity :)

I agree with reducing ambiguity :)

Is (..., time) clearer than (*, time)?

Hmm............ maybe? (But if I might have confused you, I meant ..audio of dimensions (*, time), e.g., (time), blah -.) The current torch.stft doc uses (*, time, ..) to indicate an optional batch dimension which is more limited than we want here. Probably (..., time) makes more sense then.

Good point, the better globbing to use would be (**, time) not (*, time) for (..., time) -- but (..., time) is most transparent :)

This reverts commit 008791c.

cpuhrsch

LGTM

vincentqb changed the title ~~Complex STFT transform from spectrogram.~~ Complex STFT transform from spectrogram Nov 1, 2019

vincentqb force-pushed the batchedstft branch 2 times, most recently from dbe4f0c to 008791c Compare November 6, 2019 19:55

ksanjeevan reviewed Nov 6, 2019

View reviewed changes

vincentqb marked this pull request as ready for review November 7, 2019 18:34

keunwoochoi reviewed Nov 10, 2019

View reviewed changes

vincentqb mentioned this pull request Nov 11, 2019

Batching for transforms #337

Merged

cpuhrsch reviewed Nov 11, 2019

View reviewed changes

vincentqb force-pushed the batchedstft branch from cd8b0bc to a67ed63 Compare November 18, 2019 16:00

vincentqb added 9 commits November 18, 2019 11:37

STFT transform and function from pytorch#285

1f87602

merge options in existing functionality.

594823b

Revert "STFT transform and function from pytorch#285"

18e55d0

This reverts commit 008791c.

remove dimension 2 check. add test.

8fbb982

writing batch for dimension name.

4dfad93

* convention for batch.

270bb76

upadte description.

f1fb7cd

using ...

9409257

update spectrogram test.

08b268b

vincentqb force-pushed the batchedstft branch from a67ed63 to 08b268b Compare November 18, 2019 16:37

cpuhrsch approved these changes Nov 18, 2019

View reviewed changes

vincentqb merged commit 1500d4e into pytorch:master Nov 18, 2019

vincentqb mentioned this pull request Dec 20, 2019

Update audio preprocessing tutorial pytorch/tutorials#797

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complex STFT transform from spectrogram #327

Complex STFT transform from spectrogram #327

vincentqb commented Nov 1, 2019 •

edited

ksanjeevan commented Nov 6, 2019

ksanjeevan Nov 6, 2019

vincentqb Nov 7, 2019

vincentqb commented Nov 7, 2019

keunwoochoi Nov 10, 2019

keunwoochoi Nov 10, 2019

vincentqb Nov 11, 2019

keunwoochoi Nov 10, 2019

keunwoochoi Nov 10, 2019

keunwoochoi Nov 10, 2019

keunwoochoi Nov 10, 2019

cpuhrsch Nov 11, 2019

vincentqb Nov 11, 2019

keunwoochoi Nov 11, 2019

vincentqb Nov 12, 2019

keunwoochoi Nov 12, 2019

vincentqb Nov 12, 2019 •

edited

cpuhrsch left a comment

Complex STFT transform from spectrogram #327

Complex STFT transform from spectrogram #327

Conversation

vincentqb commented Nov 1, 2019 • edited

ksanjeevan commented Nov 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentqb commented Nov 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentqb Nov 12, 2019 • edited

Choose a reason for hiding this comment

cpuhrsch left a comment

Choose a reason for hiding this comment

vincentqb commented Nov 1, 2019 •

edited

vincentqb Nov 12, 2019 •

edited