MFCC -> MP3? #424

arcticmatt · 2016-10-21T18:34:49Z

Is there any possibility of adding MP3 reconstruction from MFCCs?

For example, https://ccrma.stanford.edu/~juhan/thesis.html or http://www.research.ibm.com/haifa/projects/multimedia/recovc/demo/index.html.

This would be useful in potential audio "deep dream" applications, or in other scenarios where it's desired to go from features to audio.

rafaelvalle · 2016-10-25T18:20:33Z

DAn has some code/explanation on computing the inverse MFCC.
http://labrosa.ee.columbia.edu/matlab/rastamat/

On Fri, Oct 21, 2016 at 11:34 AM, Matt Lim notifications@github.com wrote:

Is there any possibility of adding MP3 reconstruction from MFCCs?

For example, https://ccrma.stanford.edu/~juhan/thesis.html or
http://www.research.ibm.com/haifa/projects/multimedia/
recovc/demo/index.html.

This would be useful in potential audio "deep dream" applications, or in
other scenarios where it's desired to go from features to audio.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#424, or mute the thread
https://github.com/notifications/unsubscribe-auth/ACMij_nRKiSL2OGHwC5in746rm6Uz0vpks5q2QXKgaJpZM4KdecY
.

dpwe · 2016-10-25T18:41:01Z

Is there any possibility of adding MP3 reconstruction from MFCCs?

So, of course, librosa doesn't know anything about MP3, the
psychoacoustic-based lossy encoding of waveforms, per se.

Your question makes sense as reconstructing waveforms from derived
representations. The Matlab code at
http://labrosa.ee.columbia.edu/matlab/rastamat/
steps through the path back from MFCC to waveform (using white noise
excitation to substitute for the missing phase information). To implement
this within librosa, you'd need to undo the DCT, the logamplitude, the Mel
mapping, and the STFT; something like:

import librosa
import numpy as np
from IPython.lib.display import Audio

def invlogamplitude(S):
"""librosa.logamplitude is actually 10_log10, so invert that."""
return 10.0_*(S/10.0)

Load input audio.

filename = "64f999a2b468daf4_2490_2520.wav"
y, sr = librosa.load(filename)

Calculate mfccs.

Y = librosa.stft(y)
mfccs = librosa.feature.mfcc(y)

Build reconstruction mappings,

n_mfcc = mfccs.shape[0]
n_mel = 128
dctm = librosa.filters.dct(n_mfcc, n_mel)
n_fft = 2048
mel_basis = librosa.filters.mel(sr, n_fft)

Empirical scaling of channels to get ~flat amplitude mapping.

bin_scaling = 1.0/np.maximum(0.0005, np.sum(np.dot(mel_basis.T, mel_basis),
axis=0))

Reconstruct the approximate STFT squared-magnitude from the MFCCs.

recon_stft = bin_scaling[:, np.newaxis] * np.dot(mel_basis.T,
invlogamplitude(np.dot(dctm.T, mfccs)))

Impose reconstructed magnitude on white noise STFT.

excitation = np.random.randn(y.shape[0])
E = librosa.stft(excitation)
recon = librosa.istft(E/np.abs(E)*np.sqrt(recon_stft))

Listen to the reconstruction.

Audio(recon, rate=sr)

DAn.

For example, https://ccrma.stanford.edu/~juhan/thesis.html or
http://www.research.ibm.com/haifa/projects/multimedia/
recovc/demo/index.html.

This would be useful in potential audio "deep dream" applications, or in
other scenarios where it's desired to go from features to audio.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#424, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAhs0W8fJJkp-A5QcML7hkyNJ3m6Ddi3ks5q2QXKgaJpZM4KdecY
.

arcticmatt · 2016-10-31T22:32:10Z

Thanks a bunch DAn this is super useful! This would be a great add to librosa, something like librosa.feature.reconstruct (takes in an mfcc array and spits out the reconstruction)

bmcfee · 2016-11-01T14:17:33Z

This would be a great add to librosa, something like librosa.feature.reconstruct (takes in an mfcc array and spits out the reconstruction)

I'm not totally sold on this. It would be a nice demo to add to the gallery, but it seems a bit too niche for inclusion in the library proper.

limapedro · 2019-12-27T13:27:21Z

@rafaelvalle I tried using your code, but it raises an exception on numpy runtime when calling np.sqrt, any tips on how to overcome this issue?

Also, it would be nice to have a method inside librosa to do this similar to torchaudio.

bmcfee · 2019-12-27T15:51:18Z

@limapedro librosa added mfcc inversion in 0.7: https://librosa.github.io/librosa/generated/librosa.feature.inverse.mfcc_to_audio.html#librosa.feature.inverse.mfcc_to_audio

Vichoko · 2020-09-01T20:38:40Z

Updated link:
https://librosa.org/doc/0.8.0/generated/librosa.feature.inverse.mfcc_to_audio.html#librosa.feature.inverse.mfcc_to_audio

arcticmatt closed this as completed Oct 31, 2016

bmcfee mentioned this issue Nov 4, 2016

Synthetic phase for inverse transforms #434

Closed

rafaelvalle mentioned this issue Jan 25, 2018

MFCC to Wav file #660

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MFCC -> MP3? #424

MFCC -> MP3? #424

arcticmatt commented Oct 21, 2016

rafaelvalle commented Oct 25, 2016 •

edited

dpwe commented Oct 25, 2016

arcticmatt commented Oct 31, 2016

bmcfee commented Nov 1, 2016

limapedro commented Dec 27, 2019

bmcfee commented Dec 27, 2019

Vichoko commented Sep 1, 2020 •

edited

MFCC -> MP3? #424

MFCC -> MP3? #424

Comments

arcticmatt commented Oct 21, 2016

rafaelvalle commented Oct 25, 2016 • edited

dpwe commented Oct 25, 2016

Load input audio.

Calculate mfccs.

Build reconstruction mappings,

Empirical scaling of channels to get ~flat amplitude mapping.

Reconstruct the approximate STFT squared-magnitude from the MFCCs.

Impose reconstructed magnitude on white noise STFT.

Listen to the reconstruction.

arcticmatt commented Oct 31, 2016

bmcfee commented Nov 1, 2016

limapedro commented Dec 27, 2019

bmcfee commented Dec 27, 2019

Vichoko commented Sep 1, 2020 • edited

rafaelvalle commented Oct 25, 2016 •

edited

Vichoko commented Sep 1, 2020 •

edited