Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TorchScript-able "info" func to sox_io backend #728

Merged
merged 6 commits into from Jun 19, 2020

Conversation

mthrok
Copy link
Collaborator

@mthrok mthrok commented Jun 18, 2020

This is a part of PRs to add new "sox_io" backend #726, and depends on #718.

This PR adds info function to "sox_io" backend, which allows users to fetch some metadata of an audio file.
At this moment, the information retrieved are;

  • Number of samples in the audio file
  • Sampling rate
  • Number of channels

This implementation ended up being aligned with what was suggested in #618 (comment).

Compared to the original "sox" backend, which exposed all sox_signalinfo_t and sox_encodinginfo_t, this is limited but it was not helpful to expose all of sox internals either. For the details of these structures, see sox_signalinfo_t and sox_encodinginfo_t.

@mthrok mthrok changed the title Add TorchScript-able info func to sox_io backend Add TorchScript-able "info" func to sox_io backend Jun 18, 2020
@mthrok mthrok force-pushed the get-info branch 2 times, most recently from 9b76f9f to 5e970ee Compare June 18, 2020 19:10
@mthrok mthrok mentioned this pull request Jun 18, 2020
6 tasks
@codecov
Copy link

codecov bot commented Jun 19, 2020

Codecov Report

Merging #728 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #728      +/-   ##
==========================================
+ Coverage   88.97%   88.99%   +0.02%     
==========================================
  Files          31       32       +1     
  Lines        2522     2527       +5     
==========================================
+ Hits         2244     2249       +5     
  Misses        278      278              
Impacted Files Coverage Δ
torchaudio/backend/sox_io_backend.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8eac89...74459d9. Read the comment docs.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -34,4 +34,7 @@ printf "* Installing dependencies (except PyTorch)\n"
conda env update --file "${this_dir}/environment.yml" --prune

# 4. Build codecs
build_tools/setup_helpers/build_third_party.sh
# build_tools/setup_helpers/build_third_party.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you elaborate on why you are changing this default mechanic as part of this pr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static build of libsox (which is what we ship as binary distribution) does not contain codecs for ogg/vorbis.
Using libsox-fmt-all we can test ogg/vorbis types too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can still build third parties by using BUILD_SOX=1, but by default, we link with the static build. Is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When BUILD_SOX=0 _torchaudio is linked against a libsox found in the system. static/dynamic depends on what is found.

@@ -87,6 +89,33 @@ def set_audio_backend(backend):
torchaudio.set_audio_backend(be)


class TempDirMixin:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this PR: just as is done below in a prior pr, what's the advantage of not inheriting PytorchTestCase here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid inheritance of PytorchTestCase multiple times at the test module.
TempDirMixin is designed to be composable so that each test case can choose use or not to use.
This decision should not collide with the fact we are requiring each test case to inherit PytorchTestCase.

Comment on lines +22 to +24
['float32', 'int32', 'int16', 'uint8'],
[8000, 16000],
[1, 2],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is nice :) is there a way to have pass keyword arguments with parameterized?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know. Please refer to the doc https://pypi.org/project/parameterized/

@@ -61,7 +55,7 @@ def assert_equal(self, output, *, expected, rtol=None, atol=None):
expected = expected.to(dtype=self.dtype, device=self.device)
self.assertEqual(output, expected, rtol=rtol, atol=atol)

@unittest.skipIf(_not_available('apply-cmvn-sliding'), '`apply-cmvn-sliding` not available')
@common_utils.skipIfNoExec('apply-cmvn-sliding')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: again, i'm not sure why this is changing as part of this pr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as to avoid two duplicated implementation.

kaldi_compatibility_impl had this original version of skipIfNoExec implementation (which is _not_available func + unittest.skipIf), and now that implementation is promoted to common utility because the new sox_io test uses it too.

@mthrok mthrok merged commit 88fccd1 into pytorch:master Jun 19, 2020
@mthrok mthrok deleted the get-info branch June 19, 2020 20:22
@mthrok
Copy link
Collaborator Author

mthrok commented Jun 19, 2020

thanks!

mthrok added a commit to mthrok/audio that referenced this pull request Jun 22, 2020
In pytorch#728, linux unit test switches to libsox provided by apt.
For CPU jobs this is fine because all the job steps share the same Docker container,
but on CPU job, each job step runs a script in a new Docker container, so
libsox installed in a step is not available to the subsequent steps.

To fix this, this PR moves the installation of libsox and sox to Docker build.
seemethere pushed a commit that referenced this pull request Jun 23, 2020
In #728, linux unit test switches to libsox provided by apt.
For CPU jobs this is fine because all the job steps share the same Docker container,
but on CPU job, each job step runs a script in a new Docker container, so
libsox installed in a step is not available to the subsequent steps.

To fix this, this PR moves the installation of libsox and sox to Docker build.
mthrok added a commit that referenced this pull request Jun 25, 2020
This is a part of PRs to add new "sox_io" backend. #726 and depends on #718 and #728 .

This PR adds `load` function to "sox_io" backend, which is  tested on the following audio formats;
 - `wav`
 - `mp3`
 - `flac`
 - `ogg/vorbis` *

By default, "sox_io" backend returns Tensor with `float32` dtype and the shape of `[channel, time]`. The samples are normalized to fit in the range of `[-1.0, 1.0]`.

Unlike existing "sox" backend, the new `load` function can handle WAV file natively, when the input format is WAV with integer type, (such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer) by providing `normalize=False`, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is, `int32` tensor for `32-bit PCM`, `int16` for `16-bit PCM` and `uint8` for `8-bit PCM`. This behavior follows [scipy.io.wavfile.read](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html). `normalize` parameter has no effect for other formats and the load function always return normalized value with `float32` Tensor.

__* Note__ The current binary distribution of torchaudio does not contain `ogg/vorbis` and `opus` codecs. To handle these files, one needs to build torchaudio from the source with proper codecs in the system.

__Note 2__ Since this PR, `scipy` becomes required module for running test.
mthrok added a commit that referenced this pull request Jul 1, 2020
This is a part of PRs to add new "sox_io" backend. #726 and depends on #718, #728 and #731.

This PR adds `save` function to "sox_io" backend, which can save Tensor to a file with the following audio formats;
 - `wav`
 - `mp3`
 - `flac`
 - `ogg/vorbis`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants