Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension is different between lua---audio on librosa! #17

Open
ardasnck opened this issue Mar 9, 2017 · 7 comments
Open

Dimension is different between lua---audio on librosa! #17

ardasnck opened this issue Mar 9, 2017 · 7 comments

Comments

@ardasnck
Copy link

ardasnck commented Mar 9, 2017

Thank you very much for your contribution Soumith.

When I read the same audio file (.mp3) with your library and 'librosa' in python, I get different size as an output.

Mono channel, sampling rate : 22050
lua---audio returns (417024x1)
librosa returns (417600x1)

Any idea what would be the reason?

Thank you very much.

@eborboihuc
Copy link

eborboihuc commented Mar 23, 2017

I have the same problem here.

I use the following codes to read a mp3 with 44100 Hz, 192 kb/s:

audio, sr = librosa.load(path, sr=None, Mono=False)
audio = audio[0]

and

sound, sr = audio.load(line)
sound = sound:select(2,1):clone()
sound:mul(2^-31)                 -- keep it in [-1, 1]

And that's what I got:

librosa.max=0.916290283203, lua.max=0.916290223598, 
librosa.min=-0.888732910156, lua.min=-0.888751626015 
librosa.shape=(14154624,), lua.shape=(14153472,)

All I can do to keep it (nearly) align is drop the last slice of Librosa version (14154624 - 14153472 = 1152), which does no harm in this workaround I think.

data

As you can see that they are nearly the same, the third row is the difference, around Ne-5 between them. But I still cannot find a correct solution or the reason why.

Any thoughts about the reason?

Thanks!

@ardasnck
Copy link
Author

@eborboihuc surprisingly you have very similar values. In my case, values are also differs a lot even thou I used the same file with same settings...But seems like your workaround pretty much solves the issue. However I'm also curious about the reason.

Thanks.

@eborboihuc
Copy link

@ardasnck

For the similar value part, I guess there might be some version issues or some setting issues between Librosa and Torch audio :( I use librosa 0.5.0 and audio-0.1-0. And my code setting is the same as above.

@ardasnck
Copy link
Author

ardasnck commented Mar 26, 2017

@eborboihuc When i test both library with the voice.mp3 (example sound file in torch-audio), I get very similar values. And the dimension difference between two libraries is 576 for sr=22050.
However when i try different sound files (training data from SoundNet), values are not that similar and this time Torch-audio has longer dimension then librosa. Also note that dimension difference is varying for each file.

So your above example is based on only one file or did you get similar results for different files too?

@eborboihuc
Copy link

@ardasnck
I got similar results for different files actually. But yes, the difference is varying from one to another.

And for the voice.mp3, I only got 4.27 for the difference between them when sr=22050

rosa.max=0.496704101562, rosa.min=-0.511627197266, 
th.max=0.496706217527, th.min=-0.51163572073
librosa.shape=(417600,), lua.shape=(417024,)

@ardasnck
Copy link
Author

ardasnck commented Mar 26, 2017

@eborboihuc yes I can confirm this on my side for voice.mp3 file. However when I try different sound files (.mp3 extension), such as 02 - "Canon" (in D-Major), Pachebel from http://www.stephaniequinn.com/samples.htm, librosa.shape=(4010496,1) and lua.shape=(4013568,1) and values are different. Can you validate this on your side?

@eborboihuc
Copy link

@ardasnck

I tried that one, got a big difference with original downloaded version. I now can have a considerably smaller difference after doing some conversion.

I have tried several versions of combinations, and find out a rule of thumb: convert it.

Here is what I do, and this can be easily solved by a simple command.

sox input.mp3 output.mp3 trim 0

Below is the original Canon.mp3:

Input #0, mp3, from 'data/canon.mp3':
  Metadata:
    title           : Canon
  Duration: 00:01:30.98, start: 0.025057, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 192 kb/s
    Metadata:
      encoder         : LAME3.96r

--------------------------------------------------------

librosa.max=0.999969482422, librosa.min=-1.0, 
lua_audio.max=1.0, lua_audio.min=-1.0
librosa.shape=(4012416,), lua_audio.shape=(4013568,)

and Total Diff: 15436.9 is quite large.

After conversion,

Input #0, mp3, from 'data/canon2.mp3':
  Metadata:
    encoder         : LAME 64bits version 3.99.5 (http://lame.sf.net)
    title           : Canon
    TLEN            : 91010
  Duration: 00:01:31.04, start: 0.000000, bitrate: 128 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s

--------------------------------------------------------

librosa.max=0.999969482422, librosa.min=-1.0, 
lua_audio.max=1.0, lua_audio.min=-1.0
librosa.shape=(4014720,), lua_audio.shape=(4013541,)

and Total Diff: 40.9921 is reasonably small now.

Hope this can answer your question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants