-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFT/IFFT instead of u-law encoding #119
Comments
I don't think the network will be able to do a good job at predicting the next FFT sample from all previous ones, but it might be worth trying. Edit: Thanks for pointing out that a spectrogram was meant. This makes more sense. |
I think he means a spectrogram. Notice that in that case we would be doing The whole point of this network, though, is that it can extract a It's worth a shot if you feel like playing with the model, though! El dt., 4 oct. 2016 a les 11:42, Igor Babuschkin (notifications@github.com)
|
I agree with the artifacts IFFT would introduce so if we feed FFT'd frame as an additional input data(spectrogram), I think it is possible that network can capture more meaningful information which cannot be detected with receptive field. (it also could be done by receptive field with matching size) btw, (kind of off-topic) as wavenet proves its power of signal processing and reconstruction, could we apply same technique to motion synthesis which is described in above video? |
I think the receptive fields we are using now are already as large as an FFT frame, and if not they should be. Isn't that new paper using for motion synthesis using a similar concept? It wouldn't be a WaveNet anymore (I would restrict "WaveNet" to the original paper, about auto-regressing a 1D time-series using a cascade of dilated convolutions, skip connections, etc., and maybe with a one-hot input). But I agree the concept of using dilated convolutions for time-series modelling rather than the classic recurrent units / LSTM is very promising. |
Closing this issue. |
|
These are all approximations that enforce different sets of constraints. On Fri, Oct 7, 2016, 22:42 CJ Carr notifications@github.com wrote:
|
@Cortexelus @lemonzi Could real + imaginary numbers maintain phase information? If so, we could feed Real+Im which are transformed by FFT into wavenet. |
Real+imaginary maintain phase, yes. Think polar geometry. If your complex number is Likely better results with (magnitude, phase) than (real, imag) because magnitudes are more strongly correlated among each other vertically (same frame, different bin) and horizontally (same bin, different frame). You could also try (magnitude, delta phase) or pairs of (instantaneous frequency, magnitude). This may help you more easily exploit correlations among frequencies in steady (harmonic) signals. But it may have trouble with transients (percussion, onsets). Absolute phase matters in transients. The delta phase (difference in phase between frames) can be used to calculate instantaneous frequency. A great explanation of this is here Pitch shifting using the FT. Also not every spectrogram has a true waveform that corresponds to it. If you generate an untrue spectrogram, the iFFT may give you something close with artifacts. Sometimes a phase-recovery method corrects it. The simplest phase recovery method (Griffin Lim) iterates iFFT>FFT>iFFT>FFT>iFFT while enforcing the magnitude to be constant over iterations. |
https://www.youtube.com/watch?v=NYDeH-knnAI
Is it worth trying with FFT/IFFT? Most audio signal processing involves FFT/IFFT so I think it is natural to process with frequency domain. What do you think about this approach?
The text was updated successfully, but these errors were encountered: