Info length and rate returns different values for different backends #618

faroit · 2020-05-08T15:05:42Z

🐛 Bug

torchaudio.info returns the info objects directly from the respective backend. Due to same property naming, users might forget to check how the metadata is calculated. This results in metadata being reported differently depending on which backend is reported.
E.g. sox calculates the length summed across channels whereas soundfile does this per channel (correct)

I would propose to add wrapper for the info objects that - independent of the backend - the most important metadata (length and rate) is identical.

Currently, the sox backend reports a missleading length and the rate parameter is of type float instead of int.

To Reproduce

path =  "any/wavfile.wav"
# soundfile
torchaudio.set_audio_backend("soundfile")
info = torchaudio.info(path)
print(si.length)
print(type(si.rate))

# sox
torchaudio.set_audio_backend("sox")
info = torchaudio.info(path)
print(si.length)
print(type(si.rate))

Expected behavior

soundfile reports the correct metadata, sox should be corrected so that:

# sox
torchaudio.set_audio_backend("sox")
info = torchaudio.info(path)
print(si.length // si.channels)
print(int(si.rate))

Environment

torchaudio==0.5.0 from pypi

The text was updated successfully, but these errors were encountered:

vincentqb · 2020-05-08T20:05:42Z

Relates to #236?

vincentqb · 2020-05-08T20:06:02Z

I agree that both backend should return the same values. We need to make sure we document the change of either so that users are aware of any changes. Thoughts?

mthrok · 2020-05-08T23:28:16Z

I could reproduce this;

code

import torchaudio

path = "test/assets/vad-go-stereo-44100.wav"

# soundfile
torchaudio.set_audio_backend("soundfile")
si, _ = torchaudio.info(path)
print(si.length)
print(type(si.rate))

# sox
torchaudio.set_audio_backend("sox")
si, _ = torchaudio.info(path)
print(si.length)
print(type(si.rate))

43339
<class 'int'>
86678
<class 'float'>

mthrok · 2020-05-08T23:36:05Z

I checked #236, and it does not fix the issue.
I think it's more appropriate to handle it on Python side, as suggested.
(Copy data from sox backend C++ struct to SignalInfo Python class and apply the fix there.)
That way, info function returns the same class regardless of backend.

faroit · 2020-05-09T09:17:37Z

That way, info function returns the same class regardless of backend.

we had already used the sox backend in production (before soundfile was merge in), so we made the modifications to info in our code (should have reported earlier!). If we change the info object now, this might cause unexpected results for users. I would rather solve this by

clarify documentation on torchaudio.info(...), stating that this returns the native info object of each backend and each can contain a different number and definitions of metadata (e.g. some metadata also depends on the codec/container being used).
create a new torchaudio info object that holds just the most common important infos: sample_rate, nb_samples and nb_channels. No idea how to name that, though...
mark torchaudio.info() as deprecated in future torchaudio versions

I could provide a PR

vincentqb · 2020-05-11T14:03:08Z

we had already used the sox backend in production (before soundfile was merge in), so we made the modifications to info in our code (should have reported earlier!). If we change the info object now, this might cause unexpected results for users. I would rather solve this by

You mean that you adjusted your code for sox, soundfile, or both?

faroit · 2020-05-11T17:32:24Z

You mean that you adjusted your code for sox, soundfile, or both?

I had nb_samples = si.length // si.channels calculated for sox. With the introduction of the soundfile backend, I was required to change this.

I looked into the unit tests of the torchaudio.info and I guess the error was not caught before as the unit tests only cover mono signals (where sox and soundfile give the same output). Can we add a stereo (and possibly multichannel) regression test for info?

mthrok · 2020-05-13T14:53:15Z

I looked into the unit tests of the torchaudio.info and I guess the error was not caught before as the unit tests only cover mono signals (where sox and soundfile give the same output). Can we add a stereo (and possibly multichannel) regression test for info?

@faroit Sure. We should add the test. Let us know if you would like to open a PR. If you are busy I will add it to my backlog.

vincentqb · 2020-05-13T21:19:09Z

There are two issues here.

"When doing torchaudio.info() on a multi channel audio file, the length is multiplied by however many channels there are." as in Length Inaccurate on Multi-Channel Audio Files #236.
sox and soundfile backend return similar object, but with different values.

For the first, I'm concerned about such change for backward compatibility for sox. I would recommend instead adding a key nb_samples = si.length // si.channels to the object returned in python.

For the second, I second adding a stereo test for consistency of course. :) One of the two would need to be updated to match the other. Since sox has been there longer, soundfile's si.length is the one that should be adjusted. A warning when someone invokes info with soundfile would need to be raised, and the documentation aligned.

I prefer this solution to creating a new abstraction, simply since it keeps us closer to the intended set up so far.

Thoughts? Do you want to open pull requests for this?

faroit · 2020-05-15T09:39:44Z

Regarding the first issue, this is probably a just a matter of setting the right vocabulary to make a formal distinction between frames and samples as it's done in libsndfile. Over there:

A frame is just a block of samples, one for each channel.

and

...seeks are defined in number of (multichannel) frames. Therefore, a seek in a stereo file from the current position forward with an offset of 1 would skip forward by one sample of both channels.

which makes totally sense to me (also soundfile is the defacto standard when it comes to proper handling of audio I/O). However this would probably lead to too many changes here but it makes sense to put the definition that is used here ("we define samples are the number of frames in an audio signal per channel").

For the second, I second adding a stereo test for consistency of course. :) One of the two would need to be updated to match the other. Since sox has been there longer, soundfile's si.length is the one that should be adjusted. A warning when someone invokes info with soundfile would need to be raised, and the documentation aligned.

I agree this is probably the simplest solution

Do you want to open pull requests for this?

I started with a new test #639 that is expected to fail and can propose a fix for this as well ( in the same PR?)

mthrok · 2020-06-18T23:18:29Z

Hi @faroit

I am working on a new backend with sox and along the way, I addressed most of the issues with the existing sox backend.
#726
#728
Unfortunately the new backend is not compatible with the existing one due to TorchScript's technical limitation and design choice, but you may want to check out once it lands.

faroit · 2020-08-20T14:34:12Z

@mthrok

has this been addressed in #761 and can be closed?

mthrok · 2020-08-20T16:37:05Z

@mthrok

has this been addressed in #761 and can be closed?

Hi @faroit
#761 is for the new "sox_io" backend and the original "sox" backend is not affected. We plan to deprecate and decommission "sox" backend in the future, by replacing the default backend with the new "sox_io" backend. I was thinking that we can close this issue when we decommission the "sox" backend.

faroit · 2020-08-21T07:19:51Z

@mthrok

#761 is for the new "sox_io" backend and the original "sox" backend is not affected.

does sox_io match the metadata format of soundfile then? If not, I would still propose to unify the defintion of samples, and frames with respect to the duration of a file.

mthrok · 2020-08-21T18:04:30Z

does sox_io match the metadata format of soundfile then?

Yes it does. I followed your advice and made sure that it's #samples == #frames x #channels
and also I did my best to address all the known issues with the original sox backend. You can see the over view #726 (comment)

faroit · 2020-08-21T18:08:40Z

Yes it does. I followed your advice and made sure that it's #samples == #frames x #channels

👍 so a unified info backend could look like this?

def load_info(path: str) -> dict:
    # get length of file in samples
    info = {}
    if torchaudio.get_audio_backend() == "sox_io":
        si = torchaudio.info(str(path))
        info['samplerate'] = si.sample_rate
        info['samples'] = si.num_frames
    else:
        si, _ = torchaudio.info(str(path))
        info['samplerate'] = si.rate
        if torchaudio.get_audio_backend() == "sox":
            info['samples'] = si.length // si.channels
        else:
            # soundfile and sox_io calc per channel
            info['samples'] = si.length

    info['duration'] = info['samples'] / info['samplerate']
    return info

and also I did my best to address all the known issues with the original sox backend. You can see the over view #726
(comment)

nice work. Is this also faster?

mthrok · 2020-08-21T18:41:05Z

so a unified info backend could look like this?

Yes, that looks about right. Sorry for the interface discrepancy. We plan to unity the interface (align the sound file interface to sox_io) when we decommission sox backend.

Is this also faster?

I do not anticipate the speed improvement. (memory usage could be improved if you are handling large files) and I did not benchmark them.

mthrok · 2020-11-04T16:29:05Z

In 0.7.0 release, we introduced the new interface for soundfile backend and in #978 we switched the default backends which report the number of samples correctly. This will be a part of 0.8.0 release.
The legacy backends are scheduled to be removed in 0.9.0.

mthrok · 2020-12-17T17:53:34Z

Closing the issue as the new backends handle this properly.

faroit mentioned this issue May 14, 2020

add multichannel info io test #639

Closed

mthrok added bug C++ labels Jun 3, 2020

This was referenced Jun 18, 2020

Add TorchScript-based SoX I/O backend #726

Merged

Add TorchScript-able "info" func to sox_io backend #728

Merged

mthrok closed this as completed Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Info length and rate returns different values for different backends #618

Info length and rate returns different values for different backends #618

faroit commented May 8, 2020

vincentqb commented May 8, 2020

vincentqb commented May 8, 2020

mthrok commented May 8, 2020

mthrok commented May 8, 2020

faroit commented May 9, 2020 •

edited

vincentqb commented May 11, 2020

faroit commented May 11, 2020 •

edited

mthrok commented May 13, 2020

vincentqb commented May 13, 2020

faroit commented May 15, 2020 •

edited

mthrok commented Jun 18, 2020 •

edited

faroit commented Aug 20, 2020

mthrok commented Aug 20, 2020

faroit commented Aug 21, 2020

mthrok commented Aug 21, 2020

faroit commented Aug 21, 2020

mthrok commented Aug 21, 2020

mthrok commented Nov 4, 2020

mthrok commented Dec 17, 2020

Info length and rate returns different values for different backends #618

Info length and rate returns different values for different backends #618

Comments

faroit commented May 8, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

vincentqb commented May 8, 2020

vincentqb commented May 8, 2020

mthrok commented May 8, 2020

mthrok commented May 8, 2020

faroit commented May 9, 2020 • edited

vincentqb commented May 11, 2020

faroit commented May 11, 2020 • edited

mthrok commented May 13, 2020

vincentqb commented May 13, 2020

faroit commented May 15, 2020 • edited

mthrok commented Jun 18, 2020 • edited

faroit commented Aug 20, 2020

mthrok commented Aug 20, 2020

faroit commented Aug 21, 2020

mthrok commented Aug 21, 2020

faroit commented Aug 21, 2020

mthrok commented Aug 21, 2020

mthrok commented Nov 4, 2020

mthrok commented Dec 17, 2020

faroit commented May 9, 2020 •

edited

faroit commented May 11, 2020 •

edited

faroit commented May 15, 2020 •

edited

mthrok commented Jun 18, 2020 •

edited