-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Info length and rate returns different values for different backends #618
Comments
Relates to #236? |
I agree that both backend should return the same values. We need to make sure we document the change of either so that users are aware of any changes. Thoughts? |
I could reproduce this; code
|
I checked #236, and it does not fix the issue. |
we had already used the sox backend in production (before soundfile was merge in), so we made the modifications to info in our code (should have reported earlier!). If we change the info object now, this might cause unexpected results for users. I would rather solve this by
I could provide a PR |
You mean that you adjusted your code for sox, soundfile, or both? |
I had I looked into the unit tests of the |
@faroit Sure. We should add the test. Let us know if you would like to open a PR. If you are busy I will add it to my backlog. |
There are two issues here.
For the first, I'm concerned about such change for backward compatibility for sox. I would recommend instead adding a key For the second, I second adding a stereo test for consistency of course. :) One of the two would need to be updated to match the other. Since sox has been there longer, soundfile's I prefer this solution to creating a new abstraction, simply since it keeps us closer to the intended set up so far. Thoughts? Do you want to open pull requests for this? |
Regarding the first issue, this is probably a just a matter of setting the right vocabulary to make a formal distinction between frames and samples as it's done in libsndfile. Over there:
and
which makes totally sense to me (also soundfile is the defacto standard when it comes to proper handling of audio I/O). However this would probably lead to too many changes here but it makes sense to put the definition that is used here ("we define
I agree this is probably the simplest solution
I started with a new test #639 that is expected to fail and can propose a fix for this as well ( in the same PR?) |
Hi @faroit I am working on a new backend with sox and along the way, I addressed most of the issues with the existing sox backend. |
Hi @faroit |
Yes it does. I followed your advice and made sure that it's |
馃憤 so a unified info backend could look like this? def load_info(path: str) -> dict:
# get length of file in samples
info = {}
if torchaudio.get_audio_backend() == "sox_io":
si = torchaudio.info(str(path))
info['samplerate'] = si.sample_rate
info['samples'] = si.num_frames
else:
si, _ = torchaudio.info(str(path))
info['samplerate'] = si.rate
if torchaudio.get_audio_backend() == "sox":
info['samples'] = si.length // si.channels
else:
# soundfile and sox_io calc per channel
info['samples'] = si.length
info['duration'] = info['samples'] / info['samplerate']
return info
nice work. Is this also faster? |
Yes, that looks about right. Sorry for the interface discrepancy. We plan to unity the interface (align the sound file interface to sox_io) when we decommission sox backend.
I do not anticipate the speed improvement. (memory usage could be improved if you are handling large files) and I did not benchmark them. |
In 0.7.0 release, we introduced the new interface for |
Closing the issue as the new backends handle this properly. |
馃悰 Bug
torchaudio.info
returns the info objects directly from the respective backend. Due to same property naming, users might forget to check how the metadata is calculated. This results in metadata being reported differently depending on which backend is reported.E.g. sox calculates the
length
summed across channels whereassoundfile
does this per channel (correct)I would propose to add wrapper for the info objects that - independent of the backend - the most important metadata (
length
andrate
) is identical.Currently, the sox backend reports a missleading
length
and therate
parameter is of typefloat
instead ofint
.To Reproduce
Expected behavior
soundfile
reports the correct metadata,sox
should be corrected so that:Environment
torchaudio==0.5.0 from pypi
The text was updated successfully, but these errors were encountered: