Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Info length and rate returns different values for different backends #618

Closed
faroit opened this issue May 8, 2020 · 19 comments
Closed

Info length and rate returns different values for different backends #618

faroit opened this issue May 8, 2020 · 19 comments

Comments

@faroit
Copy link
Contributor

faroit commented May 8, 2020

馃悰 Bug

torchaudio.info returns the info objects directly from the respective backend. Due to same property naming, users might forget to check how the metadata is calculated. This results in metadata being reported differently depending on which backend is reported.
E.g. sox calculates the length summed across channels whereas soundfile does this per channel (correct)

I would propose to add wrapper for the info objects that - independent of the backend - the most important metadata (length and rate) is identical.

Currently, the sox backend reports a missleading length and the rate parameter is of type float instead of int.

To Reproduce

path =  "any/wavfile.wav"
# soundfile
torchaudio.set_audio_backend("soundfile")
info = torchaudio.info(path)
print(si.length)
print(type(si.rate))

# sox
torchaudio.set_audio_backend("sox")
info = torchaudio.info(path)
print(si.length)
print(type(si.rate))

Expected behavior

soundfile reports the correct metadata, sox should be corrected so that:

# sox
torchaudio.set_audio_backend("sox")
info = torchaudio.info(path)
print(si.length // si.channels)
print(int(si.rate))

Environment

torchaudio==0.5.0 from pypi

@vincentqb
Copy link
Contributor

Relates to #236?

@vincentqb
Copy link
Contributor

I agree that both backend should return the same values. We need to make sure we document the change of either so that users are aware of any changes. Thoughts?

@mthrok
Copy link
Collaborator

mthrok commented May 8, 2020

I could reproduce this;

code
import torchaudio

path = "test/assets/vad-go-stereo-44100.wav"

# soundfile
torchaudio.set_audio_backend("soundfile")
si, _ = torchaudio.info(path)
print(si.length)
print(type(si.rate))

# sox
torchaudio.set_audio_backend("sox")
si, _ = torchaudio.info(path)
print(si.length)
print(type(si.rate))
43339
<class 'int'>
86678
<class 'float'>

@mthrok
Copy link
Collaborator

mthrok commented May 8, 2020

I checked #236, and it does not fix the issue.
I think it's more appropriate to handle it on Python side, as suggested.
(Copy data from sox backend C++ struct to SignalInfo Python class and apply the fix there.)
That way, info function returns the same class regardless of backend.

@faroit
Copy link
Contributor Author

faroit commented May 9, 2020

That way, info function returns the same class regardless of backend.

we had already used the sox backend in production (before soundfile was merge in), so we made the modifications to info in our code (should have reported earlier!). If we change the info object now, this might cause unexpected results for users. I would rather solve this by

  • clarify documentation on torchaudio.info(...), stating that this returns the native info object of each backend and each can contain a different number and definitions of metadata (e.g. some metadata also depends on the codec/container being used).
  • create a new torchaudio info object that holds just the most common important infos: sample_rate, nb_samples and nb_channels. No idea how to name that, though...
  • mark torchaudio.info() as deprecated in future torchaudio versions

I could provide a PR

@vincentqb
Copy link
Contributor

we had already used the sox backend in production (before soundfile was merge in), so we made the modifications to info in our code (should have reported earlier!). If we change the info object now, this might cause unexpected results for users. I would rather solve this by

You mean that you adjusted your code for sox, soundfile, or both?

@faroit
Copy link
Contributor Author

faroit commented May 11, 2020

You mean that you adjusted your code for sox, soundfile, or both?

I had nb_samples = si.length // si.channels calculated for sox. With the introduction of the soundfile backend, I was required to change this.

I looked into the unit tests of the torchaudio.info and I guess the error was not caught before as the unit tests only cover mono signals (where sox and soundfile give the same output). Can we add a stereo (and possibly multichannel) regression test for info?

@mthrok
Copy link
Collaborator

mthrok commented May 13, 2020

I looked into the unit tests of the torchaudio.info and I guess the error was not caught before as the unit tests only cover mono signals (where sox and soundfile give the same output). Can we add a stereo (and possibly multichannel) regression test for info?

@faroit Sure. We should add the test. Let us know if you would like to open a PR. If you are busy I will add it to my backlog.

@vincentqb
Copy link
Contributor

There are two issues here.

For the first, I'm concerned about such change for backward compatibility for sox. I would recommend instead adding a key nb_samples = si.length // si.channels to the object returned in python.

For the second, I second adding a stereo test for consistency of course. :) One of the two would need to be updated to match the other. Since sox has been there longer, soundfile's si.length is the one that should be adjusted. A warning when someone invokes info with soundfile would need to be raised, and the documentation aligned.

I prefer this solution to creating a new abstraction, simply since it keeps us closer to the intended set up so far.

Thoughts? Do you want to open pull requests for this?

@faroit
Copy link
Contributor Author

faroit commented May 15, 2020

Regarding the first issue, this is probably a just a matter of setting the right vocabulary to make a formal distinction between frames and samples as it's done in libsndfile. Over there:

A frame is just a block of samples, one for each channel.

and

...seeks are defined in number of (multichannel) frames. Therefore, a seek in a stereo file from the current position forward with an offset of 1 would skip forward by one sample of both channels.

which makes totally sense to me (also soundfile is the defacto standard when it comes to proper handling of audio I/O). However this would probably lead to too many changes here but it makes sense to put the definition that is used here ("we define samples are the number of frames in an audio signal per channel").

For the second, I second adding a stereo test for consistency of course. :) One of the two would need to be updated to match the other. Since sox has been there longer, soundfile's si.length is the one that should be adjusted. A warning when someone invokes info with soundfile would need to be raised, and the documentation aligned.

I agree this is probably the simplest solution

Do you want to open pull requests for this?

I started with a new test #639 that is expected to fail and can propose a fix for this as well ( in the same PR?)

@mthrok
Copy link
Collaborator

mthrok commented Jun 18, 2020

Hi @faroit

I am working on a new backend with sox and along the way, I addressed most of the issues with the existing sox backend.
#726
#728
Unfortunately the new backend is not compatible with the existing one due to TorchScript's technical limitation and design choice, but you may want to check out once it lands.

@faroit
Copy link
Contributor Author

faroit commented Aug 20, 2020

@mthrok

has this been addressed in #761 and can be closed?

@mthrok
Copy link
Collaborator

mthrok commented Aug 20, 2020

@mthrok

has this been addressed in #761 and can be closed?

Hi @faroit
#761 is for the new "sox_io" backend and the original "sox" backend is not affected. We plan to deprecate and decommission "sox" backend in the future, by replacing the default backend with the new "sox_io" backend. I was thinking that we can close this issue when we decommission the "sox" backend.

@faroit
Copy link
Contributor Author

faroit commented Aug 21, 2020

@mthrok

#761 is for the new "sox_io" backend and the original "sox" backend is not affected.

does sox_io match the metadata format of soundfile then? If not, I would still propose to unify the defintion of samples, and frames with respect to the duration of a file.

@mthrok
Copy link
Collaborator

mthrok commented Aug 21, 2020

does sox_io match the metadata format of soundfile then?

Yes it does. I followed your advice and made sure that it's #samples == #frames x #channels
and also I did my best to address all the known issues with the original sox backend. You can see the over view #726 (comment)

@faroit
Copy link
Contributor Author

faroit commented Aug 21, 2020

Yes it does. I followed your advice and made sure that it's #samples == #frames x #channels

馃憤 so a unified info backend could look like this?

def load_info(path: str) -> dict:
    # get length of file in samples
    info = {}
    if torchaudio.get_audio_backend() == "sox_io":
        si = torchaudio.info(str(path))
        info['samplerate'] = si.sample_rate
        info['samples'] = si.num_frames
    else:
        si, _ = torchaudio.info(str(path))
        info['samplerate'] = si.rate
        if torchaudio.get_audio_backend() == "sox":
            info['samples'] = si.length // si.channels
        else:
            # soundfile and sox_io calc per channel
            info['samples'] = si.length

    info['duration'] = info['samples'] / info['samplerate']
    return info

and also I did my best to address all the known issues with the original sox backend. You can see the over view #726
(comment)

nice work. Is this also faster?

@mthrok
Copy link
Collaborator

mthrok commented Aug 21, 2020

so a unified info backend could look like this?

Yes, that looks about right. Sorry for the interface discrepancy. We plan to unity the interface (align the sound file interface to sox_io) when we decommission sox backend.

Is this also faster?

I do not anticipate the speed improvement. (memory usage could be improved if you are handling large files) and I did not benchmark them.

@mthrok
Copy link
Collaborator

mthrok commented Nov 4, 2020

In 0.7.0 release, we introduced the new interface for soundfile backend and in #978 we switched the default backends which report the number of samples correctly. This will be a part of 0.8.0 release.
The legacy backends are scheduled to be removed in 0.9.0.

@mthrok
Copy link
Collaborator

mthrok commented Dec 17, 2020

Closing the issue as the new backends handle this properly.

@mthrok mthrok closed this as completed Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants