New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MFCC -> MP3? #424
Comments
DAn has some code/explanation on computing the inverse MFCC. On Fri, Oct 21, 2016 at 11:34 AM, Matt Lim notifications@github.com wrote:
|
Your question makes sense as reconstructing waveforms from derived import librosa def invlogamplitude(S): Load input audio.filename = "64f999a2b468daf4_2490_2520.wav" Calculate mfccs.Y = librosa.stft(y) Build reconstruction mappings,n_mfcc = mfccs.shape[0] Empirical scaling of channels to get ~flat amplitude mapping.bin_scaling = 1.0/np.maximum(0.0005, np.sum(np.dot(mel_basis.T, mel_basis), Reconstruct the approximate STFT squared-magnitude from the MFCCs.recon_stft = bin_scaling[:, np.newaxis] * np.dot(mel_basis.T, Impose reconstructed magnitude on white noise STFT.excitation = np.random.randn(y.shape[0]) Listen to the reconstruction.Audio(recon, rate=sr) DAn.
|
Thanks a bunch DAn this is super useful! This would be a great add to librosa, something like |
I'm not totally sold on this. It would be a nice demo to add to the gallery, but it seems a bit too niche for inclusion in the library proper. |
@rafaelvalle I tried using your code, but it raises an exception on numpy runtime when calling np.sqrt, any tips on how to overcome this issue? Also, it would be nice to have a method inside librosa to do this similar to torchaudio. |
@limapedro librosa added mfcc inversion in 0.7: https://librosa.github.io/librosa/generated/librosa.feature.inverse.mfcc_to_audio.html#librosa.feature.inverse.mfcc_to_audio |
Is there any possibility of adding MP3 reconstruction from MFCCs?
For example, https://ccrma.stanford.edu/~juhan/thesis.html or http://www.research.ibm.com/haifa/projects/multimedia/recovc/demo/index.html.
This would be useful in potential audio "deep dream" applications, or in other scenarios where it's desired to go from features to audio.
The text was updated successfully, but these errors were encountered: