Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MFCC -> MP3? #424

Closed
arcticmatt opened this issue Oct 21, 2016 · 7 comments
Closed

MFCC -> MP3? #424

arcticmatt opened this issue Oct 21, 2016 · 7 comments

Comments

@arcticmatt
Copy link

Is there any possibility of adding MP3 reconstruction from MFCCs?

For example, https://ccrma.stanford.edu/~juhan/thesis.html or http://www.research.ibm.com/haifa/projects/multimedia/recovc/demo/index.html.

This would be useful in potential audio "deep dream" applications, or in other scenarios where it's desired to go from features to audio.

@rafaelvalle
Copy link

rafaelvalle commented Oct 25, 2016

DAn has some code/explanation on computing the inverse MFCC.
http://labrosa.ee.columbia.edu/matlab/rastamat/

On Fri, Oct 21, 2016 at 11:34 AM, Matt Lim notifications@github.com wrote:

Is there any possibility of adding MP3 reconstruction from MFCCs?

For example, https://ccrma.stanford.edu/~juhan/thesis.html or
http://www.research.ibm.com/haifa/projects/multimedia/
recovc/demo/index.html.

This would be useful in potential audio "deep dream" applications, or in
other scenarios where it's desired to go from features to audio.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#424, or mute the thread
https://github.com/notifications/unsubscribe-auth/ACMij_nRKiSL2OGHwC5in746rm6Uz0vpks5q2QXKgaJpZM4KdecY
.

@dpwe
Copy link
Contributor

dpwe commented Oct 25, 2016

Is there any possibility of adding MP3 reconstruction from MFCCs?

So, of course, librosa doesn't know anything about MP3, the
psychoacoustic-based lossy encoding of waveforms, per se.

Your question makes sense as reconstructing waveforms from derived
representations. The Matlab code at
http://labrosa.ee.columbia.edu/matlab/rastamat/
steps through the path back from MFCC to waveform (using white noise
excitation to substitute for the missing phase information). To implement
this within librosa, you'd need to undo the DCT, the logamplitude, the Mel
mapping, and the STFT; something like:

import librosa
import numpy as np
from IPython.lib.display import Audio

def invlogamplitude(S):
"""librosa.logamplitude is actually 10_log10, so invert that."""
return 10.0_*(S/10.0)

Load input audio.

filename = "64f999a2b468daf4_2490_2520.wav"
y, sr = librosa.load(filename)

Calculate mfccs.

Y = librosa.stft(y)
mfccs = librosa.feature.mfcc(y)

Build reconstruction mappings,

n_mfcc = mfccs.shape[0]
n_mel = 128
dctm = librosa.filters.dct(n_mfcc, n_mel)
n_fft = 2048
mel_basis = librosa.filters.mel(sr, n_fft)

Empirical scaling of channels to get ~flat amplitude mapping.

bin_scaling = 1.0/np.maximum(0.0005, np.sum(np.dot(mel_basis.T, mel_basis),
axis=0))

Reconstruct the approximate STFT squared-magnitude from the MFCCs.

recon_stft = bin_scaling[:, np.newaxis] * np.dot(mel_basis.T,
invlogamplitude(np.dot(dctm.T, mfccs)))

Impose reconstructed magnitude on white noise STFT.

excitation = np.random.randn(y.shape[0])
E = librosa.stft(excitation)
recon = librosa.istft(E/np.abs(E)*np.sqrt(recon_stft))

Listen to the reconstruction.

Audio(recon, rate=sr)

DAn.

For example, https://ccrma.stanford.edu/~juhan/thesis.html or
http://www.research.ibm.com/haifa/projects/multimedia/
recovc/demo/index.html.

This would be useful in potential audio "deep dream" applications, or in
other scenarios where it's desired to go from features to audio.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#424, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAhs0W8fJJkp-A5QcML7hkyNJ3m6Ddi3ks5q2QXKgaJpZM4KdecY
.

@arcticmatt
Copy link
Author

Thanks a bunch DAn this is super useful! This would be a great add to librosa, something like librosa.feature.reconstruct (takes in an mfcc array and spits out the reconstruction)

@bmcfee
Copy link
Member

bmcfee commented Nov 1, 2016

This would be a great add to librosa, something like librosa.feature.reconstruct (takes in an mfcc array and spits out the reconstruction)

I'm not totally sold on this. It would be a nice demo to add to the gallery, but it seems a bit too niche for inclusion in the library proper.

@limapedro
Copy link

@rafaelvalle I tried using your code, but it raises an exception on numpy runtime when calling np.sqrt, any tips on how to overcome this issue?

Also, it would be nice to have a method inside librosa to do this similar to torchaudio.

@bmcfee
Copy link
Member

bmcfee commented Dec 27, 2019

@limapedro librosa added mfcc inversion in 0.7: https://librosa.github.io/librosa/generated/librosa.feature.inverse.mfcc_to_audio.html#librosa.feature.inverse.mfcc_to_audio

@Vichoko
Copy link

Vichoko commented Sep 1, 2020

Updated link:
https://librosa.org/doc/0.8.0/generated/librosa.feature.inverse.mfcc_to_audio.html#librosa.feature.inverse.mfcc_to_audio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants