FFT/IFFT instead of u-law encoding #119

nakosung · 2016-10-04T13:40:38Z

https://www.youtube.com/watch?v=NYDeH-knnAI

Is it worth trying with FFT/IFFT? Most audio signal processing involves FFT/IFFT so I think it is natural to process with frequency domain. What do you think about this approach?

ibab · 2016-10-04T15:42:23Z

I don't think the network will be able to do a good job at predicting the next FFT sample from all previous ones, but it might be worth trying.
I think that WaveNet is quite specific to sequence prediction, considering the way we train/generate and the causality of the filter.

Edit: Thanks for pointing out that a spectrogram was meant. This makes more sense.

lemonzi · 2016-10-04T15:50:17Z

I think he means a spectrogram. Notice that in that case we would be doing
multivariate regression, not classification, so the loss function would
have to be adjusted.

The whole point of this network, though, is that it can extract a
meaningful representation from raw audio -- we usually use FFT and
spectrograms because it's the best we know, but they are destructive
because we discard the phase and introduce a lot of artifacts because of
the windowing and the time-frequency duality.

It's worth a shot if you feel like playing with the model, though!

El dt., 4 oct. 2016 a les 11:42, Igor Babuschkin (notifications@github.com)
va escriure:

I don't think the network will be able to do a good job at predicting the
next FFT sample from all previous ones, but it might be worth trying.
I think that WaveNet is quite specific to sequence prediction, considering
the way we train/generate and the causality of the filter.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#119 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADCF5iR7kWqfVY1IbywMUz5hVnyjrOGGks5qwnPigaJpZM4KNsGW
.

nakosung · 2016-10-04T21:15:37Z

I agree with the artifacts IFFT would introduce so if we feed FFT'd frame as an additional input data(spectrogram), I think it is possible that network can capture more meaningful information which cannot be detected with receptive field. (it also could be done by receptive field with matching size)

btw, (kind of off-topic) as wavenet proves its power of signal processing and reconstruction, could we apply same technique to motion synthesis which is described in above video?

lemonzi · 2016-10-05T19:55:23Z

I think the receptive fields we are using now are already as large as an FFT frame, and if not they should be.

Isn't that new paper using for motion synthesis using a similar concept? It wouldn't be a WaveNet anymore (I would restrict "WaveNet" to the original paper, about auto-regressing a 1D time-series using a cascade of dilated convolutions, skip connections, etc., and maybe with a one-hot input). But I agree the concept of using dilated convolutions for time-series modelling rather than the classic recurrent units / LSTM is very promising.

nakosung · 2016-10-08T01:46:00Z

Closing this issue.

Cortexelus · 2016-10-08T02:42:46Z

but they are destructive because we discard the phase

List of Papers on Phase Recovery

lemonzi · 2016-10-08T03:21:26Z

These are all approximations that enforce different sets of constraints.

On Fri, Oct 7, 2016, 22:42 CJ Carr notifications@github.com wrote:

but they are destructive because we discard the phase

List of Papers on Phase Recovery
https://www.evernote.com/shard/s260/sh/72efd25c-491c-4a8a-a8db-aa2d6959ee92/1d6b05ae86f948d3

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#119 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADCF5spZldVas5uDcVxW7nDvuA2dAkByks5qxwMpgaJpZM4KNsGW
.

nakosung · 2016-10-08T05:02:03Z

@Cortexelus @lemonzi Could real + imaginary numbers maintain phase information? If so, we could feed Real+Im which are transformed by FFT into wavenet.

Cortexelus · 2016-10-08T07:20:02Z

Real+imaginary maintain phase, yes. Think polar geometry. If your complex number is 12 + 5i, phase is the angle θ, magnitude is the absolute value r

Likely better results with (magnitude, phase) than (real, imag) because magnitudes are more strongly correlated among each other vertically (same frame, different bin) and horizontally (same bin, different frame).

You could also try (magnitude, delta phase) or pairs of (instantaneous frequency, magnitude). This may help you more easily exploit correlations among frequencies in steady (harmonic) signals. But it may have trouble with transients (percussion, onsets). Absolute phase matters in transients. The delta phase (difference in phase between frames) can be used to calculate instantaneous frequency. A great explanation of this is here Pitch shifting using the FT.

Also not every spectrogram has a true waveform that corresponds to it. If you generate an untrue spectrogram, the iFFT may give you something close with artifacts. Sometimes a phase-recovery method corrects it. The simplest phase recovery method (Griffin Lim) iterates iFFT>FFT>iFFT>FFT>iFFT while enforcing the magnitude to be constant over iterations.

nakosung closed this as completed Oct 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFT/IFFT instead of u-law encoding #119

FFT/IFFT instead of u-law encoding #119

nakosung commented Oct 4, 2016

ibab commented Oct 4, 2016 •

edited

Loading

lemonzi commented Oct 4, 2016

nakosung commented Oct 4, 2016 •

edited

Loading

lemonzi commented Oct 5, 2016

nakosung commented Oct 8, 2016

Cortexelus commented Oct 8, 2016

lemonzi commented Oct 8, 2016

nakosung commented Oct 8, 2016

Cortexelus commented Oct 8, 2016 •

edited

Loading

FFT/IFFT instead of u-law encoding #119

FFT/IFFT instead of u-law encoding #119

Comments

nakosung commented Oct 4, 2016

ibab commented Oct 4, 2016 • edited Loading

lemonzi commented Oct 4, 2016

nakosung commented Oct 4, 2016 • edited Loading

lemonzi commented Oct 5, 2016

nakosung commented Oct 8, 2016

Cortexelus commented Oct 8, 2016

lemonzi commented Oct 8, 2016

nakosung commented Oct 8, 2016

Cortexelus commented Oct 8, 2016 • edited Loading

ibab commented Oct 4, 2016 •

edited

Loading

nakosung commented Oct 4, 2016 •

edited

Loading

Cortexelus commented Oct 8, 2016 •

edited

Loading