Port sox::vad #578

astaff · 2020-04-23T19:44:28Z

This PR adds Voice Activity Detector (vad) to Transforms to further reduce dependency on sox (see #260)

Original implementation: https://sourceforge.net/p/sox/code/ci/master/tree/src/vad.c

Notes:

Although @zou3519 and I were able to get about 5x performance improvement by replacing loops with vector operations in _measure, it is still about 2-3x slower than original C implementation.
~~Couldn't get parity with sox when normalization=True as torchaudio.load and sox seem to normalize differently. (is that right?)~~
~~Need help with handling multi-channel audio correctly.~~
~~Some variable names reflect ones in the original code for troubleshooting purposes.~~

vincentqb

Thanks again for working on this!

For the coverage of tests that we expect, you can see this readme that will be merged as part of #566. In particular, we need

sox compatibility test
jitability test
batch test

In terms of code organization, we put the computation in a functional when possible so that the transform is more of a thin wrapper around this.

torchaudio/transforms.py

vincentqb · 2020-04-23T22:32:21Z

Since this is WIP, I suggest prefixing the title with [WIP] so we know when we are ready to consider merging. As an alternative to prefixing by WIP, Github allows the creation of draft pull request, though I don't see a way to convert an existing pull request to draft mode. Do you?

vincentqb · 2020-04-23T23:20:59Z

Need help with handling multi-channel audio correctly.

Can you give a little more details? Is the current implementation working on a single channel? How is sox handling multiple channel? Is the output meant to be per channel?

astaff · 2020-04-24T03:07:21Z

In terms of code organization, we put the computation in a functional when possible so that the transform is more of a thin wrapper around this.

Got it. It was my original intent in the first commit, but after I ended up with two classes representing state I moved things to transforms.py and merged one into actual module. Do you want classes for holding the state in functional.py as well?

astaff · 2020-04-24T03:20:24Z

Can you give a little more details? Is the current implementation working on a single channel? How is sox handling multiple channel? Is the output meant to be per channel?

I tested current implementation on a single channel. I will confirm sox behavior on multiple channels and report back. Looking at the original C code, I think it triggers on voice activity in any channel and outputs all channels after the trigger.

vincentqb · 2020-04-24T14:59:58Z

Got it. It was my original intent in the first commit, but after I ended up with two classes representing state I moved things to transforms.py and merged one into actual module. Do you want classes for holding the state in functional.py as well?

The functional.py is for the computation, and transforms.py for the state. Is the state necessary to the computation? Is that what you meant?

vincentqb · 2020-04-24T15:05:06Z

I tested current implementation on a single channel. I will confirm sox behavior on multiple channels and report back. Looking at the original C code, I think it triggers on voice activity in any channel and outputs all channels after the trigger.

If the the VAD works per channel, then it's easy to add batching by simply reshaping the tensor from (batch, channels, time) to (batch * channels, time), and back.

If sox runs on each channel, and takes the union of detected regions over the channels, it may be a good idea to leave the detection per channel, and let the user decide what to do from there.

astaff · 2020-04-24T15:07:11Z

Is the state necessary to the computation? Is that what you meant?

That's right. I can technically pull the state inside the computation, declaring classes inside the function - let's see how it's gonna affect the readability and vectorization.

torchaudio/transforms.py

mthrok · 2020-04-26T02:17:36Z

EDIT: Wait, I no longer see anywhere dataclass being used. Did you get rid of it?

I personally like `dataclass`, but I do not think it worth doing to add a new dependency. Here are my thoughts.

Pros

dataclass is available

Cons

The backport library

could be wrong

could break in the place where we do not have control.

This adds maintenance cost

Implementation

CI

Packaging

Testing

Documentation

It is also worth noting that common practice is to use collections.namedtuple in place of dataclass where Python 3.6 support matters.

Having said that, if we are adding backport module as a new dependencies, the following has to be corrected, in addition to the implementation.

Documentation
This is most important because there is no automated system to tell if the instruction is missing from there but this is where users look at.

Update test
Unittest for Python 3.6 is failing as the backport is missing both on Travis and Circle CI. https://github.com/pytorch/audio/pull/578/checks?check_run_id=617323351
The dependency has to be added to test environment. Note that this library, although it claims it does not work with 3.7, however, according to my quick check, pip install dataclasses works on Python 3.7 and the library is installed. Therefor ensuring backport being not installed on Python 3.7+ requires careful attention.

Update binary build
Interestingly, not all binary build do not fail. https://github.com/pytorch/audio/pull/578/checks?check_run_id=617323353
Wheel builds succeed while Conda builds properly fail.
I think this suggests that Wheel package build does not check dependency list, presumably meaning that the package will fail at run time.

So I see a lot of maintenance overhead in adding backport library, while the benefit does not outweigh the overheads.

… a function

…sform to improve coverage

astaff · 2020-04-27T20:58:39Z

EDIT: Wait, I no longer see anywhere dataclass being used. Did you get rid of it?

@mthrok ugh, that's correct. sorry about that. having a state class was mostly the artifact of moving code from C. I ended up refactoring it, after I got numerical parity with SoX.

codecov · 2020-04-27T21:32:23Z

Codecov Report

Merging #578 into master will increase coverage by 0.72%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #578      +/-   ##
==========================================
+ Coverage   87.85%   88.58%   +0.72%     
==========================================
  Files          19       19              
  Lines        2051     2182     +131     
==========================================
+ Hits         1802     1933     +131     
  Misses        249      249

Impacted Files	Coverage Δ
torchaudio/functional.py	`95.61% <100.00%> (+0.81%)`	⬆️
torchaudio/transforms.py	`95.74% <100.00%> (+0.30%)`	⬆️
torchaudio/_backend.py	`82.85% <0.00%> (-0.48%)`	⬇️
torchaudio/sox_effects.py	`95.58% <0.00%> (-0.07%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3012050...47913ae. Read the comment docs.

vincentqb · 2020-04-27T21:42:07Z

having a state class was mostly the artifact of moving code from C. I ended up refactoring it, after I got numerical parity with SoX.

Awesome you were able to remove it!

vincentqb · 2020-04-27T21:42:19Z

I also see that you have the jit consistency test now. Marked test as added :)

astaff · 2020-04-27T23:33:41Z

I also see that you have the jit consistency test now. Marked test as added :)

Batch as well.

vincentqb · 2020-04-28T01:09:05Z

test/test_batch_consistency.py

+    def test_vad(self):
+        filepath = common_utils.get_asset_path("vad-hello-mono-32000.wav")
+        waveform, _ = torchaudio.load(filepath)
+        _test_batch(F.vad, waveform, sample_rate=32000)


nit: is the sample_rate=32000 different than what waveform, sample_rate = torchaudio.load(...) would return as sample_rate?

vincentqb · 2020-04-28T01:09:39Z

test/test_torchscript_consistency.py

+    def test_Vad(self):
+        filepath = common_utils.get_asset_path("vad-hello-mono-32000.wav")
+        waveform, _ = torchaudio.load(filepath)
+        self._assert_consistency(T.Vad(32000), waveform)


nit: same about sample_rate

vincentqb · 2020-04-28T01:10:49Z

torchaudio/functional.py

+    noise_down_time: float = .01,
+    noise_reduction_amount: float = 1.35,
+    measure_freq: float = 20.0,
+    measure_duration: Optional[float] = None,  # by default, twice the measurement period; i.e. with overlap.


nit: adding this comment in the documentation as you have done is enough to me

vincentqb · 2020-04-28T01:14:26Z

torchaudio/functional.py

+    trigger_level: float = 7.0,
+    trigger_time: float = 0.25,
+    search_time: float = 1.0,
+    allowed_gap: float = 0.25,
+    pre_trigger_time: float = 0.0,
+    # Fine-tuning parameters
+    boot_time: float = .35,
+    noise_up_time: float = .1,
+    noise_down_time: float = .01,
+    noise_reduction_amount: float = 1.35,
+    measure_freq: float = 20.0,
+    measure_duration: Optional[float] = None,  # by default, twice the measurement period; i.e. with overlap.
+    measure_smooth_time: float = .4,
+    hp_filter_freq: float = 50.,
+    lp_filter_freq: float = 6000.,
+    hp_lifter_freq: float = 150.,
+    lp_lifter_freq: float = 2000.,


nit: should we set defaults both on the functional and the transform or just the transform?

In the existing codebase there are cases of both. For example, complex_norm and compute_deltas, have defaults in functional and transforms. Some are implemented only in transforms (Fade), some are in functional (Overdrive). Personally, I think that the user experience of calling a transform should be no different from functional, that means both should have defaults.

The question that is out of scope of this Pull Request is: is there a value in synthesizing transform classes out of functions, including docstrings.

vincentqb · 2020-04-28T01:18:44Z

Besides the minor point I mentioned, this looks good to me! Can you also add to the documentation in docs/?

vincentqb

LGTM, thanks for working on this!

vincentqb self-requested a review April 23, 2020 19:51

vincentqb suggested changes Apr 23, 2020

View reviewed changes

torchaudio/transforms.py Outdated Show resolved Hide resolved

astaff changed the title ~~Port sox::vad~~ [WIP] Port sox::vad Apr 24, 2020

astaff marked this pull request as draft April 24, 2020 02:55

vincentqb reviewed Apr 24, 2020

View reviewed changes

torchaudio/transforms.py Outdated Show resolved Hide resolved

astaff force-pushed the feature-vad branch from 7c4c06c to 2be2aa7 Compare April 25, 2020 02:11

Artyom Astafurov added 16 commits April 27, 2020 16:52

initial test, stub function, transform and docstring

3e61fbe

add draft working implementation, update docstrings

f7694fe

merge VadSate into Vad calss, move Channel into Vad class

b8e8c56

remove functional stub for vad

916cd71

add wav file for test

eb9d116

refactor _measure() to improve performance

53641e1

rename argument

3574039

replace copy_ with assignment

013f569

refactor init, update documentation, update test for readability

89e4242

clean up default values

129df82

move code from transforms.py to funtional.py and integrate state into…

43b66db

… a function

remove Channel state class

fe31050

fix calcuation of a flush point

deeef69

make multiple channels work

55d11dd

clean up multi-channel, update test

05142ab

rename variables and re-org arguments for _measure

dc2ecef

Artyom Astafurov added 4 commits April 27, 2020 16:52

fix linting errors

9587851

add torchscript consistency test and fix errors

2b945ee

support and test batch consistency, fix normalization

dbf173b

update documentation, switch torchscript consistancy test to use tran…

f5b30a3

…sform to improve coverage

astaff force-pushed the feature-vad branch from 0628822 to f5b30a3 Compare April 27, 2020 20:54

Artyom Astafurov added 2 commits April 27, 2020 17:04

fix linting errors

67a3f7d

remove un-used imports

a8f78a6

astaff marked this pull request as ready for review April 27, 2020 21:27

astaff changed the title ~~[WIP] Port sox::vad~~ Port sox::vad Apr 27, 2020

vincentqb reviewed Apr 28, 2020

View reviewed changes

Artyom Astafurov added 2 commits April 28, 2020 10:47

address PR comments

e1fe99f

add doc references into rst

47913ae

astaff requested a review from vincentqb April 28, 2020 16:03

vincentqb approved these changes Apr 28, 2020

View reviewed changes

vincentqb merged commit 3ecc701 into pytorch:master Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port sox::vad #578

Port sox::vad #578

astaff commented Apr 23, 2020 •

edited

vincentqb left a comment •

edited

vincentqb commented Apr 23, 2020 •

edited

vincentqb commented Apr 23, 2020

astaff commented Apr 24, 2020

astaff commented Apr 24, 2020

vincentqb commented Apr 24, 2020

vincentqb commented Apr 24, 2020

astaff commented Apr 24, 2020 •

edited

mthrok commented Apr 26, 2020

astaff commented Apr 27, 2020

codecov bot commented Apr 27, 2020 •

edited

vincentqb commented Apr 27, 2020

vincentqb commented Apr 27, 2020

astaff commented Apr 27, 2020

vincentqb Apr 28, 2020

vincentqb Apr 28, 2020 •

edited

vincentqb Apr 28, 2020 •

edited

vincentqb Apr 28, 2020

astaff Apr 28, 2020 •

edited

vincentqb commented Apr 28, 2020

vincentqb left a comment

Port sox::vad #578

Port sox::vad #578

Conversation

astaff commented Apr 23, 2020 • edited

vincentqb left a comment • edited

Choose a reason for hiding this comment

vincentqb commented Apr 23, 2020 • edited

vincentqb commented Apr 23, 2020

astaff commented Apr 24, 2020

astaff commented Apr 24, 2020

vincentqb commented Apr 24, 2020

vincentqb commented Apr 24, 2020

astaff commented Apr 24, 2020 • edited

mthrok commented Apr 26, 2020

astaff commented Apr 27, 2020

codecov bot commented Apr 27, 2020 • edited

Codecov Report

vincentqb commented Apr 27, 2020

vincentqb commented Apr 27, 2020

astaff commented Apr 27, 2020

vincentqb Apr 28, 2020

Choose a reason for hiding this comment

vincentqb Apr 28, 2020 • edited

Choose a reason for hiding this comment

vincentqb Apr 28, 2020 • edited

Choose a reason for hiding this comment

vincentqb Apr 28, 2020

Choose a reason for hiding this comment

astaff Apr 28, 2020 • edited

Choose a reason for hiding this comment

vincentqb commented Apr 28, 2020

vincentqb left a comment

Choose a reason for hiding this comment

astaff commented Apr 23, 2020 •

edited

vincentqb left a comment •

edited

vincentqb commented Apr 23, 2020 •

edited

astaff commented Apr 24, 2020 •

edited

codecov bot commented Apr 27, 2020 •

edited

vincentqb Apr 28, 2020 •

edited

vincentqb Apr 28, 2020 •

edited

astaff Apr 28, 2020 •

edited