-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Announcement] Improving I/O for correct and consistent experience #903
Comments
* Add deprecation warning to sox backend Refer to #903
As a part of the "sox" backend sunset plan (#903), we add a "soundfile" backend that is compatible with the "sox_io" backend. No new public backend name is added. We provide a switch to change the interface/behavior of "soundfile" backend. This commit contains; - The implementation of the new "soundfile" backend. - The flag to switch the behavior of "soundfile" backend. (`torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE`) - Test for the new backend and switching mechanism. The default behavior of "soundfile" backend is not changed. The users who want to opt-in the new "soundfile" interface can do so by `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False` before changing the backend to "soundfile". In 0.8.0 release, the "soundfile" backend will use this interface by default, and users can still use the legacy one with `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = True`. In 0.9.0, the legacy interface is removed and `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE` flag will be eventually removed.
Just a quick question, does it mean that since 0.7 or 0.8 we can include |
Hi @snakers4
Yes. Technically, you can do it already with 0.6, however, the corresponding library is not available in any form yet, so you cannot run it outside Python application. I plan to propose this to the team after the release work, but no fixed time frame for landing it yet or even I am not sure if I can land this.
We are considering the possibility to add an I/O module (not another backend but something like
The Python "soudfile" package is not TorchScript compatible, so one of the thing we are considering as a part of the I/O module described above is to bind |
Nice! This is probably months from becoming actually useful by end users like us, but this increases the value of pytorch ecosystem quite a bit Btw, currently a vad in torch audio seems to be a port of some energy based algorithm We are planning to make a public general torch-scriptable noise / voise / music VAD pre-trained on large voice / noise / music corpora Guess we could collaborate on that |
Ah, that's very optimistic view, although that's what I am aiming for. I am working on a RFC with example usage, so that community can respond. Then we will finalize the interface and will start working on the implementation.
Thanks, that's a nice reaction to have. One of the things we struggle is to get a signal from the community, so feedback like that is really helpful. (and motivating for me ;) )
The current VAD is basically, the port of sox implementation.
That's very interesting. Please keep us updated! |
the current state of audio is that there are no go-to tools / components, that would work on all platforms in real projects you basically need a VAD + STT + some post-processing for edge deployments we still need 2-4x size reduction in model size (which is already achievable) but as I mentioned there still is no easy way to run a pytorch model in a browser
I will post an update here |
Refer to #903 for the overview of planned I/O changes. * Change the default backend from `"sox"(deprecated)` to `"sox_io"` * Change the default interface of `"soundfile"` backend to the one identical to `"sox_io"` backend. * Deprecate torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE * Update documentations * Re-order backends (default first) * Update overhaul timeline (removed 0.7.0) * Simplify `"soundfile"` backend description
This is great news, this will definitely improve trust and adoption of torchaudio 🙂 ! |
In line [151-160](https://github.com/pytorch/audio/blob/master/examples/pipeline_wav2letter/main.py#L151) and Line [437](https://github.com/pytorch/audio/blob/fb3ef9ba427acd7db3084f988ab55169fab14854/examples/pipeline_wav2letter/main.py#L437) of main.py, the default value of `dataset-root` and `dataset-folder-in-archive` will be None, which prevents `main.py` from knowing where the dataset is actually in the computer and loading it. Moreover, `n-hidden-channels 2000` has not been defined in `main.py`, so it needs to be removed. Erro log: ```bash python main.py \ --reduce-lr-valid \ --dataset-train train-clean-100 train-clean-360 train-other-500 \ --dataset-valid dev-clean \ --batch-size 128 \ --learning-rate .6 \ --momentum .8 \ --weight-decay .00001 \ --clip-grad 0. \ --gamma .99 \ --hop-length 160 \ --win-length 400 \ --n-bins 13 \ --normalize \ --optimizer adadelta \ --scheduler reduceonplateau \ --epochs 30 /home/hoangtnm/anaconda3/envs/dl/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to pytorch#903 for the detail. '"sox" backend is being deprecated. ' INFO:root:Namespace(batch_size=128, checkpoint='', clip_grad=0.0, dataset_folder_in_archive=None, dataset_root=None, dataset_train=['train-clean-100', 'train-clean-360', 'train-other-500'], dataset_valid=['dev-clean'], decoder='greedy', distributed=False, epochs=30, eps=1e-08, freq_mask=0, gamma=0.99, hop_length=160, jit=False, learning_rate=0.6, momentum=0.8, n_bins=13, normalize=True, optimizer='adadelta', progress_bar=False, reduce_lr_valid=True, rho=0.95, scheduler='reduceonplateau', seed=0, start_epoch=0, time_mask=0, type='mfcc', weight_decay=1e-05, win_length=400, workers=0, world_size=8) INFO:root:Start time: 2020-11-28 21:18:22.337478 /home/hoangtnm/anaconda3/envs/dl/lib/python3.7/site-packages/torchaudio/backend/utils.py:64: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do `torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False` before setting the backend to "soundfile". Please refer to pytorch#903 for the detail. 'The interface of "soundfile" backend is planned to change in 0.8.0 to ' Traceback (most recent call last): File "main.py", line 670, in <module> spawn_main(main, args) File "main.py", line 663, in spawn_main main(0, args) File "main.py", line 454, in main root=args.dataset_root, File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 65, in split_process_vlsp2020asr return tuple(create(dataset) for dataset in datasets) File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 65, in <genexpr> return tuple(create(dataset) for dataset in datasets) File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 57, in create for tag, transform in zip(tags, transform_list) File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 57, in <listcomp> for tag, transform in zip(tags, transform_list) File "/media/aiteam/DATA/workspace/hoangtnm/audio/examples/pipeline_wav2letter/src/datasets.py", line 15, in __init__ self._path = os.path.join(root, url) File "/home/hoangtnm/anaconda3/envs/dl/lib/python3.7/posixpath.py", line 80, in join a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType ```
This might be a stupid question, but should the warning I import
but still get the above warning. |
The warning is issued at the time |
Thanks @mthrok. Yes, data is being loaded as float32. Here's an example of a dataset that has many sound files that I'm using that are in 24-bit signed format. |
I'm running into the same issue. I'm loading some 24bit audio files and sox_io fails to load them. I can use sox backend for now but would appreciate if 24bit format can be supported too in sox_io. A good way to handle the normalize=False is to make it unsupported for this specific format given most of the time people would use normalize=True (at least that's what I do almost always). Another idea would be to convert the 24bit format automatically/internally to 32bit even if normalize=False. Thanks |
@ketanhdoshi 24-bit support seems to have been added a couple days ago to the master branch #1389 |
@aelimame @ketanhdoshi Sorry I forgot to let you know but we added 24-bit support. It's nice to learn that it is working for you @aelimame. |
FYI: @ketanhdoshi @aelimame 24-bit support has been ported to release |
Closing the issue as 0.9 is released which concludes the migration. |
tl;dr: how to migrate to new backend/interface in
0.7
If you are using
torchaudio
in Linux/macOS environments, please usetorchaudio.set_audio_backend("sox_io")
to adopt to the upcoming changes.If you are in Windows environment, please set
torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
and reload backend to use the new interface.Note that this ships with some bug-fixes for formats other than 16bit signed integer WAV, so you might experience some BC-breaking changes as described in the section below.
News
[UPDATE] 2021/03/06
[UPDATE] 2021/02/12
bits_per_sample
andencoding
argument (replaceddtype
) tosave
function.[UPDATE] 2021/01/29
encoding
toAudioMetaData
[UPDATE] 2021/01/22
format
argument toload
/info
/save
function.bits_per_sample
toAudioMetaData
[UPDATE] 2020/10/21
"soundfile"
backend legacy interface.[UPDATE] 2020/09/18
"soundfile"
backend."soundfile"
backend signatures change from 0.9.0 to 0.8.0 so that they match with"sox_io"
backend, which becomes default in 0.8.0.[UPDATE] 2020/09/17
libsox
structures such assignalinfo_t
andencoding_t
.Improving I/O for correct and consistent experience
This is an announcement for users that we are making backward-incompatible changes to I/O functions of
torchaudio
backends from 0.7.0 release throughout 0.9.0 release.What is affected?
Public APIs
torchaudio.load
"sox"
backend to"sox_io"
backend in 0.8.0, loading audio formats other than 16bit signed integer WAV returns the correct tensor."soundfile"
backend will be change in 0.8.0 to match that of"sox_io"
backend.torchaudio.save
"sox_io"
backend, saving audio files will no longer degrade the data. The supported format will be restricted to the tested formats only. (please refer to the doc for the supported formats.)"soundfile"
backend will be change in 0.8.0 to match that of"sox_io"
backend.torchaudio.info
"soundfile"
backend will be change in 0.8.0 to match that of"sox_io"
backend.torchaudio.load_wav
load
function withnormalize=False
will provide the same functionality)Internal APIs
The following functions/classes of
"sox"
backend were accidentally exposed and will be removed in 0.9.0. There is no replacement for them. Please usesave
/load
/info
functions.torchaudio.save_encinfo
torchaudio.get_sox_signalinfo_t
torchaudio.get_sox_encodinginfo_t
torchaudio.get_sox_option_t
torchaudio.get_sox_bool
The signatures of the other backends are not planned to be changed within this overhaul plan.
torchaudio.SignalInfo
andtorchaudio.EncodingInfo
AudioMetaData
in 0.8.0 for"soundfile"
backendWhy
There are currently three backends in
torchaudio
. (Please refer to the documentation for the detail.)"sox"
backend is the original backend, which bindslibsox
withpybind11
. The functionalities (load
/save
/info
) of this backend are not well-tested and have number of issues. (See #726).Fixing these issues in backward-compatible manner is not straightforward. Therefore while we were adding TorchScript-compatible I/O functions, we decided to deprecate this original
"sox"
backend and replace it with the new backend ("sox_io"
backend), which is confirmed not to have those issues.When we are switching the default backend for Linux/macOS from
"sox"
to"sox_io"
backend, we would like to align the interface of"soundfile"
backend, therefore, we introduced the new interface (not a new backend to reduce the number of public API) to"soundfile"
backend.When / What Changes
The following is the timeline for the planned changes;
(Oct 2020)
"sox"
backend issues deprecation warning.Add deprecation warning to sox backend #904"soundfile"
backend issues warning of expected signature change.Add expected BC-breaking change warning to soundfile #906"soubdfile"
backend.Add soundfile compatibility backend #922load_wav
function of all backends are marked as deprecated.Add deprecation warnings to load_wav functions #905(March 2021)
"sox_io"
backend becomes default backend. Function signatures of"soundfile"
backend are aligned with"sox_io"
backend.Switch the default backend to the ones with new interfaces #978get_sox_XXX
functions issue deprecation warning.Add deprecation warnings to libsox specific functions #975"sox"
backend is removed.Removed legacy backends from torchaudio #1311"soundfile"
backend is removed.Removed legacy backends from torchaudio #1311load_wav
functions are removed from all backends.BC-Breaking: Remove deprecated load_wav functions from backends #1362Planned signature changes of
"soundfile"
backend in 0.8.0The following is the planned signature change of
"soundfile"
backend functions in 0.8.0 release.info
functionAudioMetaData
implementation can be found here. The placement of theAudioMetaData
might be changed.Migration
The values returned from
info
function will be changed. Please use the corresponding new attributes.Note If the attribute you are using is missing, file a Feature Request issue.
load
functionMigration
Please change the argument names;
normalization
->normalize
offset
->frame_offst
save
functionMigration
BC-breaking changes
Read and write operations on the formats other than WAV 16-bit signed integer were affected by small bugs.
The text was updated successfully, but these errors were encountered: