Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT/IFFT instead of u-law encoding #119

Closed
nakosung opened this issue Oct 4, 2016 · 9 comments
Closed

FFT/IFFT instead of u-law encoding #119

nakosung opened this issue Oct 4, 2016 · 9 comments

Comments

@nakosung
Copy link
Contributor

nakosung commented Oct 4, 2016

https://www.youtube.com/watch?v=NYDeH-knnAI

Is it worth trying with FFT/IFFT? Most audio signal processing involves FFT/IFFT so I think it is natural to process with frequency domain. What do you think about this approach?

@ibab
Copy link
Owner

ibab commented Oct 4, 2016

I don't think the network will be able to do a good job at predicting the next FFT sample from all previous ones, but it might be worth trying.
I think that WaveNet is quite specific to sequence prediction, considering the way we train/generate and the causality of the filter.

Edit: Thanks for pointing out that a spectrogram was meant. This makes more sense.

@lemonzi
Copy link
Collaborator

lemonzi commented Oct 4, 2016

I think he means a spectrogram. Notice that in that case we would be doing
multivariate regression, not classification, so the loss function would
have to be adjusted.

The whole point of this network, though, is that it can extract a
meaningful representation from raw audio -- we usually use FFT and
spectrograms because it's the best we know, but they are destructive
because we discard the phase and introduce a lot of artifacts because of
the windowing and the time-frequency duality.

It's worth a shot if you feel like playing with the model, though!

El dt., 4 oct. 2016 a les 11:42, Igor Babuschkin (notifications@github.com)
va escriure:

I don't think the network will be able to do a good job at predicting the
next FFT sample from all previous ones, but it might be worth trying.
I think that WaveNet is quite specific to sequence prediction, considering
the way we train/generate and the causality of the filter.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#119 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADCF5iR7kWqfVY1IbywMUz5hVnyjrOGGks5qwnPigaJpZM4KNsGW
.

@nakosung
Copy link
Contributor Author

nakosung commented Oct 4, 2016

I agree with the artifacts IFFT would introduce so if we feed FFT'd frame as an additional input data(spectrogram), I think it is possible that network can capture more meaningful information which cannot be detected with receptive field. (it also could be done by receptive field with matching size)

btw, (kind of off-topic) as wavenet proves its power of signal processing and reconstruction, could we apply same technique to motion synthesis which is described in above video?

@lemonzi
Copy link
Collaborator

lemonzi commented Oct 5, 2016

I think the receptive fields we are using now are already as large as an FFT frame, and if not they should be.

Isn't that new paper using for motion synthesis using a similar concept? It wouldn't be a WaveNet anymore (I would restrict "WaveNet" to the original paper, about auto-regressing a 1D time-series using a cascade of dilated convolutions, skip connections, etc., and maybe with a one-hot input). But I agree the concept of using dilated convolutions for time-series modelling rather than the classic recurrent units / LSTM is very promising.

@nakosung
Copy link
Contributor Author

nakosung commented Oct 8, 2016

Closing this issue.

@nakosung nakosung closed this as completed Oct 8, 2016
@Cortexelus
Copy link

but they are destructive because we discard the phase

List of Papers on Phase Recovery

@lemonzi
Copy link
Collaborator

lemonzi commented Oct 8, 2016

These are all approximations that enforce different sets of constraints.

On Fri, Oct 7, 2016, 22:42 CJ Carr notifications@github.com wrote:

but they are destructive because we discard the phase

List of Papers on Phase Recovery
https://www.evernote.com/shard/s260/sh/72efd25c-491c-4a8a-a8db-aa2d6959ee92/1d6b05ae86f948d3


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#119 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADCF5spZldVas5uDcVxW7nDvuA2dAkByks5qxwMpgaJpZM4KNsGW
.

@nakosung
Copy link
Contributor Author

nakosung commented Oct 8, 2016

@Cortexelus @lemonzi Could real + imaginary numbers maintain phase information? If so, we could feed Real+Im which are transformed by FFT into wavenet.

@Cortexelus
Copy link

Cortexelus commented Oct 8, 2016

Real+imaginary maintain phase, yes. Think polar geometry. If your complex number is 12 + 5i, phase is the angle θ, magnitude is the absolute value r

Polar geometry

Likely better results with (magnitude, phase) than (real, imag) because magnitudes are more strongly correlated among each other vertically (same frame, different bin) and horizontally (same bin, different frame).

You could also try (magnitude, delta phase) or pairs of (instantaneous frequency, magnitude). This may help you more easily exploit correlations among frequencies in steady (harmonic) signals. But it may have trouble with transients (percussion, onsets). Absolute phase matters in transients. The delta phase (difference in phase between frames) can be used to calculate instantaneous frequency. A great explanation of this is here Pitch shifting using the FT.

Also not every spectrogram has a true waveform that corresponds to it. If you generate an untrue spectrogram, the iFFT may give you something close with artifacts. Sometimes a phase-recovery method corrects it. The simplest phase recovery method (Griffin Lim) iterates iFFT>FFT>iFFT>FFT>iFFT while enforcing the magnitude to be constant over iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants