New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TorchScript-able "info" func to sox_io backend #728
Conversation
info
func to sox_io backend9b76f9f
to
5e970ee
Compare
Codecov Report
@@ Coverage Diff @@
## master #728 +/- ##
==========================================
+ Coverage 88.97% 88.99% +0.02%
==========================================
Files 31 32 +1
Lines 2522 2527 +5
==========================================
+ Hits 2244 2249 +5
Misses 278 278
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -34,4 +34,7 @@ printf "* Installing dependencies (except PyTorch)\n" | |||
conda env update --file "${this_dir}/environment.yml" --prune | |||
|
|||
# 4. Build codecs | |||
build_tools/setup_helpers/build_third_party.sh | |||
# build_tools/setup_helpers/build_third_party.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you elaborate on why you are changing this default mechanic as part of this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static build of libsox
(which is what we ship as binary distribution) does not contain codecs for ogg/vorbis
.
Using libsox-fmt-all
we can test ogg/vorbis
types too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can still build third parties by using BUILD_SOX=1
, but by default, we link with the static build. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When BUILD_SOX=0
_torchaudio
is linked against a libsox
found in the system. static/dynamic depends on what is found.
@@ -87,6 +89,33 @@ def set_audio_backend(backend): | |||
torchaudio.set_audio_backend(be) | |||
|
|||
|
|||
class TempDirMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR: just as is done below in a prior pr, what's the advantage of not inheriting PytorchTestCase
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid inheritance of PytorchTestCase
multiple times at the test module.
TempDirMixin
is designed to be composable so that each test case can choose use or not to use.
This decision should not collide with the fact we are requiring each test case to inherit PytorchTestCase
.
['float32', 'int32', 'int16', 'uint8'], | ||
[8000, 16000], | ||
[1, 2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is nice :) is there a way to have pass keyword arguments with parameterized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know. Please refer to the doc https://pypi.org/project/parameterized/
@@ -61,7 +55,7 @@ def assert_equal(self, output, *, expected, rtol=None, atol=None): | |||
expected = expected.to(dtype=self.dtype, device=self.device) | |||
self.assertEqual(output, expected, rtol=rtol, atol=atol) | |||
|
|||
@unittest.skipIf(_not_available('apply-cmvn-sliding'), '`apply-cmvn-sliding` not available') | |||
@common_utils.skipIfNoExec('apply-cmvn-sliding') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: again, i'm not sure why this is changing as part of this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as to avoid two duplicated implementation.
kaldi_compatibility_impl
had this original version of skipIfNoExec
implementation (which is _not_available
func + unittest.skipIf
), and now that implementation is promoted to common utility because the new sox_io
test uses it too.
thanks! |
In pytorch#728, linux unit test switches to libsox provided by apt. For CPU jobs this is fine because all the job steps share the same Docker container, but on CPU job, each job step runs a script in a new Docker container, so libsox installed in a step is not available to the subsequent steps. To fix this, this PR moves the installation of libsox and sox to Docker build.
In #728, linux unit test switches to libsox provided by apt. For CPU jobs this is fine because all the job steps share the same Docker container, but on CPU job, each job step runs a script in a new Docker container, so libsox installed in a step is not available to the subsequent steps. To fix this, this PR moves the installation of libsox and sox to Docker build.
This is a part of PRs to add new "sox_io" backend. #726 and depends on #718 and #728 . This PR adds `load` function to "sox_io" backend, which is tested on the following audio formats; - `wav` - `mp3` - `flac` - `ogg/vorbis` * By default, "sox_io" backend returns Tensor with `float32` dtype and the shape of `[channel, time]`. The samples are normalized to fit in the range of `[-1.0, 1.0]`. Unlike existing "sox" backend, the new `load` function can handle WAV file natively, when the input format is WAV with integer type, (such as 32-bit signed integer, 16-bit signed integer and 8-bit unsigned integer) by providing `normalize=False`, this function can return integer Tensor, where the samples are expressed within the whole range of the corresponding dtype, that is, `int32` tensor for `32-bit PCM`, `int16` for `16-bit PCM` and `uint8` for `8-bit PCM`. This behavior follows [scipy.io.wavfile.read](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html). `normalize` parameter has no effect for other formats and the load function always return normalized value with `float32` Tensor. __* Note__ The current binary distribution of torchaudio does not contain `ogg/vorbis` and `opus` codecs. To handle these files, one needs to build torchaudio from the source with proper codecs in the system. __Note 2__ Since this PR, `scipy` becomes required module for running test.
This is a part of PRs to add new "sox_io" backend #726, and depends on #718.
This PR adds
info
function to "sox_io" backend, which allows users to fetch some metadata of an audio file.At this moment, the information retrieved are;
This implementation ended up being aligned with what was suggested in #618 (comment).
Compared to the original "sox" backend, which exposed all
sox_signalinfo_t
andsox_encodinginfo_t
, this is limited but it was not helpful to expose all of sox internals either. For the details of these structures, seesox_signalinfo_t
andsox_encodinginfo_t
.