Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex STFT transform from spectrogram #327

Merged
merged 9 commits into from
Nov 18, 2019

Conversation

vincentqb
Copy link
Contributor

@vincentqb vincentqb commented Nov 1, 2019

We offer a layer that applies complex STFT without the power/normalization done in spectrogram. However, we do not want to overload the name "STFT" existing in pytorch, as done in the original proposal in the proposal from #285.

@ksanjeevan -- thoughts?

CC @ksanjeevan @keunwoochoi @cpuhrsch

@vincentqb vincentqb changed the title Complex STFT transform from spectrogram. Complex STFT transform from spectrogram Nov 1, 2019
@vincentqb vincentqb force-pushed the batchedstft branch 2 times, most recently from dbe4f0c to 008791c Compare November 6, 2019 19:55
@ksanjeevan
Copy link
Contributor

@vincentqb Yeah that would work! So I'm not seeing anything on Spectrogram that checks that power is int or anything like that so just with this change of the functional we could have a complex stft transform correct?

# default values are consistent with librosa.core.spectrum._spectrogram
spec_f = _stft(
waveform, n_fft, hop_length, win_length, window, True, "reflect", False, True
)

# unpack batch
spec_f = spec_f.reshape(shape[:-1] + spec_f.shape[-3:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentqb is support for batching going to be done optionally then? 😁 I can add it to the augmentation transforms if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we decided to go ahead and add batching using reshape now :) we'll then update these codes when nested tensor comes out

@vincentqb
Copy link
Contributor Author

@vincentqb Yeah that would work! So I'm not seeing anything on Spectrogram that checks that power is int or anything like that so just with this change of the functional we could have a complex stft transform correct?

Right, there should only be a check for none to disable the power computation

@vincentqb vincentqb marked this pull request as ready for review November 7, 2019 18:34
r"""
spectrogram(waveform, pad, window, n_fft, hop_length, win_length, power, normalized)

Create a spectrogram from a raw audio signal.

Args:
waveform (torch.Tensor): Tensor of audio of dimension (channel, time)
waveform (torch.Tensor): Tensor of audio of dimension ([batch,] channel, time)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not directly about this PR, but probably the optional last dimension for complex stft should be mentioned as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention was specifying the optional batch dimension as (*, channel, time). Maybe it's nicer to follow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, thanks for pointing this out!

@@ -218,42 +218,51 @@ def istft(
def spectrogram(
waveform, pad, window, n_fft, hop_length, win_length, power, normalized
):
# type: (Tensor, int, Tensor, int, int, int, int, bool) -> Tensor
# type: (Tensor, int, Tensor, int, int, int, Optional[int], bool) -> Tensor
r"""
spectrogram(waveform, pad, window, n_fft, hop_length, win_length, power, normalized)

Create a spectrogram from a raw audio signal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the {optional / magnitude} should be also mentioned here, too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the optional batch dimension too. So.. something like

Create a spectrogram or a batch of spectrograms from a raw audio signal. The spectrogram can be
either magnitude-only or complex. 

waveform, sample_rate = torchaudio.load(self.test_filepath) # (2, 278756), 44100

# Single then transform then batch
expected = transforms.Spectrogram()(waveform).unsqueeze(0).repeat(3,1,1,1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linter will fix this anyway, but (3,1,1,1) --> (3, 1, 1, 1)

expected = transforms.Spectrogram()(waveform).unsqueeze(0).repeat(3,1,1,1)

# Batch then transform
waveform = waveform.unsqueeze(0).repeat(3,1,1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also (3, 1, 1)


Args:
waveform (torch.Tensor): Tensor of audio of dimension (channel, time)
waveform (torch.Tensor): Tensor of audio of dimension (*, channel, time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd explicitly call this 'batch', i.e. ([batch], channel, time) or such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more general than batch. It works with a tensor of any dimension ending in (channel, time). In fact, this could be updated to (*, time) by making channel optional. Saying "(batch, channel, time) or (channel, time) or (time)" is a little heavy. And saying "([batch], [channel], time)" hides the generality. Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything more compact. Just by following what it is something like ..(*, time), e.g., (time), (channel, time), (batch, channel, time)? It's kinda verbose but if we can reduce its ambiguity :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with reducing ambiguity :)

Is (..., time) clearer than (*, time)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm............ maybe? (But if I might have confused you, I meant ..audio of dimensions (*, time), e.g., (time), blah -.) The current torch.stft doc uses (*, time, ..) to indicate an optional batch dimension which is more limited than we want here. Probably (..., time) makes more sense then.

Copy link
Contributor Author

@vincentqb vincentqb Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the better globbing to use would be (**, time) not (*, time) for (..., time) -- but (..., time) is most transparent :)

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vincentqb vincentqb merged commit 1500d4e into pytorch:master Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants