-
Notifications
You must be signed in to change notification settings - Fork 730
Adopt native complex dtype in griffnlim #1368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
torchaudio/functional/functional.py
Outdated
batch, freq, frames = specgram.size() | ||
if rand_init: | ||
angles = 2 * math.pi * torch.rand(batch, freq, frames) | ||
angles = torch.rand(batch, freq, frames, dtype=torch.cfloat, device=specgram.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: check to see if cfloat
is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah or even take a step back and consider why we were creating float tensors by default and if it makes sense to add a dtype kwarg
that defaults to float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this and the followings are my finding;
- The use of
cfloat
here is fine when the input Tensor isfloat32
andfloat64
.
This is because the dtype ofangles
are updated to the correct one in the nextfor
loop. Specifically,specgram * angles
there produces the same dtype asspecgram
so after the first iteration, all the Tensor has the matching dtype as the input. - The above suggests that the
griffinlim
will not work forfloat16
input type. But I did test andpow
is not implemented forfloat16
. (see the detail bellow) so I do not think it's and issue.
With this finding, I suggest to proceed with the current approach (using cfloat
for intermediate variables) since it is working for float32
and float64
. Also note that float16
is not in general tested in torchaudio
.
The following is the result of running griffinlim
for the different dtype on before/after this PR.
Script
script
import torch
import torchaudio.transforms as T
def test(waveform, dtype):
n_fft = 1024
win_length = None
hop_length = 512
spectrogram = T.Spectrogram(
n_fft=n_fft,
win_length=win_length,
hop_length=hop_length,
)
griffin_lim = T.GriffinLim(
n_fft=n_fft,
win_length=win_length,
hop_length=hop_length,
)
spec = spectrogram(waveform).to(dtype)
print('*******')
print('input:', spec.dtype)
recon = griffin_lim(spec)
print('output:', recon.dtype)
assert recon.dtype == spec.dtype
return recon
torch.random.manual_seed(0)
waveform = torch.randn(2, 1024)
test(waveform, torch.float64)
test(waveform, torch.float32)
test(waveform, torch.float16)
Result
Before applying this PR (commit 0433b7a)
*******
input: torch.float64
/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/functional.py:655: UserWarning: istft will require a complex-valued input tensor in a future PyTorch release. Matching the output from stft with return_complex=True. (Triggered internally at /opt/conda/conda-bld/pytorch_1617001512472/work/aten/src/ATen/native/SpectralOps.cpp:807.)
return _VF.istft(input, n_fft, hop_length, win_length, window, center, # type: ignore
output: torch.float64
*******
input: torch.float32
output: torch.float32
*******
input: torch.float16
Traceback (most recent call last):
File "foo.py", line 36, in <module>
test(waveform, torch.float16)
File "foo.py", line 24, in test
recon = griffin_lim(spec)
File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/moto/torchaudio/torchaudio/transforms.py", line 189, in forward
return F.griffinlim(specgram, self.window, self.n_fft, self.hop_length, self.win_length, self.power,
File "/scratch/moto/torchaudio/torchaudio/functional/functional.py", line 168, in griffinlim
specgram = specgram.pow(1 / power)
RuntimeError: "pow" not implemented for 'Half'
After applying this PR
*******
input: torch.float64
output: torch.float64
*******
input: torch.float32
output: torch.float32
*******
input: torch.float16
Traceback (most recent call last):
File "foo.py", line 36, in <module>
test(waveform, torch.float16)
File "foo.py", line 24, in test
recon = griffin_lim(spec)
File "/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1015, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/moto/torchaudio/torchaudio/transforms.py", line 189, in forward
return F.griffinlim(specgram, self.window, self.n_fft, self.hop_length, self.win_length, self.power,
File "/scratch/moto/torchaudio/torchaudio/functional/functional.py", line 168, in griffinlim
specgram = specgram.pow(1 / power)
RuntimeError: "pow" not implemented for 'Half'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR pytorch/pytorch#50999 recently added pow
for torch.float16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of cfloat here is fine when the input Tensor is float32 and float64.
This is because the dtype of angles are updated to the correct one in the next for loop. Specifically, specgram * angles there produces the same dtype as specgram so after the first iteration, all the Tensor has the matching dtype as the input.
So in that case, I think it makes sense to instead create angles of the same dtype as specgram
since multiplication in the for loop (specgram
* angles
) promotes all input tensors to the same dtype, so I think it makes sense to create angles
with dtype specgram.dtype
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this could be the reason why autograd test in #1421 is failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR pytorch/pytorch#50999 recently added
pow
fortorch.float16
Looks like it's still landing? pytorch/pytorch#55280
So in that case, I think it makes sense to instead create angles of the same dtype as
specgram
since multiplication in the for loop (specgram
*angles
) promotes all input tensors to the same dtype, so I think it makes sense to createangles
with dtypespecgram.dtype
.
Added the necessary change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, pytorch/pytorch#55280 would hopefully land soon to enable pow
for float16
on CPU.
pow
for float16
is already supported on CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imaginary-person Thanks for the info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs JIT, autograd and function correctness tests for complex to verify if everything works as expected
2baec15
to
218db3d
Compare
@mthrok this PR needs performance benchmarking! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Moto. let's merge this after #1421 is merged
angles = 2 * math.pi * torch.rand(batch, freq, frames) | ||
angles = torch.rand( | ||
specgram.size(), | ||
dtype=_get_complex_dtype(specgram.dtype), device=specgram.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait the angles here should be a floating point tensor, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, angles
is complex value. Griffin-Lim
algorithm iteratively optimizes the phase (or the direction in complex plain) of each element of the given spectrogram so that at the end istft
give the original waveform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*The original implementation was reusing the variable name, constructing the complex value Tensor called angles
from real valued tensor called angles
and magnitude
.
de1877e
to
4b599e5
Compare
The autograd tests are failing for the case Failing test log: https://app.circleci.com/pipelines/github/pytorch/audio/5665/workflows/746a53f5-6f18-4f2e-b717-869bf342eaf4/jobs/194528 The following is the corresponding code; However, it does not fail on my local env with PyTorch nightly from March 29th or today's.
I was wondering if this is a regression but not sure as it works on my local with the latest nightly as well. What do you think? |
17f61e7
to
12f77e1
Compare
Hmm, I think it's an expected behavior, cuz numerical method will (I'm not sure though) forward the function multiple times, and with For functions like it which have random behavior internally, maybe a |
Hi @yoyololicon Thanks for the insight. You are right. I added a wrapper that sets the random seed and the test passes now. |
@anjali411 This PR is ready for the final review. |
Get rid of pseudo complex type in
F.griffinlim
.Part of #1337
griffinlim
is testedfloat32
andfloat64
librosa
consistency on CPUTODO
cfloat
in the intermediate variable.Benchmark
script