Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port sox::vad #578

Merged
merged 24 commits into from Apr 28, 2020
Merged

Port sox::vad #578

merged 24 commits into from Apr 28, 2020

Conversation

astaff
Copy link
Contributor

@astaff astaff commented Apr 23, 2020

This PR adds Voice Activity Detector (vad) to Transforms to further reduce dependency on sox (see #260)

Original implementation: https://sourceforge.net/p/sox/code/ci/master/tree/src/vad.c

Notes:

  • Although @zou3519 and I were able to get about 5x performance improvement by replacing loops with vector operations in _measure, it is still about 2-3x slower than original C implementation.
  • Couldn't get parity with sox when normalization=True as torchaudio.load and sox seem to normalize differently. (is that right?)
  • Need help with handling multi-channel audio correctly.
  • Some variable names reflect ones in the original code for troubleshooting purposes.

@vincentqb vincentqb self-requested a review April 23, 2020 19:51
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for working on this!

For the coverage of tests that we expect, you can see this readme that will be merged as part of #566. In particular, we need

  • sox compatibility test
  • jitability test
  • batch test

In terms of code organization, we put the computation in a functional when possible so that the transform is more of a thin wrapper around this.

torchaudio/transforms.py Outdated Show resolved Hide resolved
@vincentqb
Copy link
Contributor

vincentqb commented Apr 23, 2020

Since this is WIP, I suggest prefixing the title with [WIP] so we know when we are ready to consider merging. As an alternative to prefixing by WIP, Github allows the creation of draft pull request, though I don't see a way to convert an existing pull request to draft mode. Do you?

@vincentqb
Copy link
Contributor

  • Need help with handling multi-channel audio correctly.

Can you give a little more details? Is the current implementation working on a single channel? How is sox handling multiple channel? Is the output meant to be per channel?

@astaff astaff changed the title Port sox::vad [WIP] Port sox::vad Apr 24, 2020
@astaff astaff marked this pull request as draft April 24, 2020 02:55
@astaff
Copy link
Contributor Author

astaff commented Apr 24, 2020

In terms of code organization, we put the computation in a functional when possible so that the transform is more of a thin wrapper around this.

Got it. It was my original intent in the first commit, but after I ended up with two classes representing state I moved things to transforms.py and merged one into actual module. Do you want classes for holding the state in functional.py as well?

@astaff
Copy link
Contributor Author

astaff commented Apr 24, 2020

Can you give a little more details? Is the current implementation working on a single channel? How is sox handling multiple channel? Is the output meant to be per channel?

I tested current implementation on a single channel. I will confirm sox behavior on multiple channels and report back. Looking at the original C code, I think it triggers on voice activity in any channel and outputs all channels after the trigger.

@vincentqb
Copy link
Contributor

Got it. It was my original intent in the first commit, but after I ended up with two classes representing state I moved things to transforms.py and merged one into actual module. Do you want classes for holding the state in functional.py as well?

The functional.py is for the computation, and transforms.py for the state. Is the state necessary to the computation? Is that what you meant?

@vincentqb
Copy link
Contributor

I tested current implementation on a single channel. I will confirm sox behavior on multiple channels and report back. Looking at the original C code, I think it triggers on voice activity in any channel and outputs all channels after the trigger.

If the the VAD works per channel, then it's easy to add batching by simply reshaping the tensor from (batch, channels, time) to (batch * channels, time), and back.

If sox runs on each channel, and takes the union of detected regions over the channels, it may be a good idea to leave the detection per channel, and let the user decide what to do from there.

@astaff
Copy link
Contributor Author

astaff commented Apr 24, 2020

Is the state necessary to the computation? Is that what you meant?

That's right. I can technically pull the state inside the computation, declaring classes inside the function - let's see how it's gonna affect the readability and vectorization.

torchaudio/transforms.py Outdated Show resolved Hide resolved
@mthrok
Copy link
Collaborator

mthrok commented Apr 26, 2020

EDIT: Wait, I no longer see anywhere dataclass being used. Did you get rid of it?

I personally like `dataclass`, but I do not think it worth doing to add a new dependency. Here are my thoughts.
  • Pros

    • dataclass is available
  • Cons

    • The backport library
      • could be wrong
      • could break in the place where we do not have control.
    • This adds maintenance cost
      • Implementation
      • CI
        • Packaging
        • Testing
      • Documentation

It is also worth noting that common practice is to use collections.namedtuple in place of dataclass where Python 3.6 support matters.

Having said that, if we are adding backport module as a new dependencies, the following has to be corrected, in addition to the implementation.

  • Documentation
    This is most important because there is no automated system to tell if the instruction is missing from there but this is where users look at.
  • Update test
    Unittest for Python 3.6 is failing as the backport is missing both on Travis and Circle CI. https://github.com/pytorch/audio/pull/578/checks?check_run_id=617323351
    The dependency has to be added to test environment. Note that this library, although it claims it does not work with 3.7, however, according to my quick check, pip install dataclasses works on Python 3.7 and the library is installed. Therefor ensuring backport being not installed on Python 3.7+ requires careful attention.
  • Update binary build
    Interestingly, not all binary build do not fail. https://github.com/pytorch/audio/pull/578/checks?check_run_id=617323353
    Wheel builds succeed while Conda builds properly fail.
    I think this suggests that Wheel package build does not check dependency list, presumably meaning that the package will fail at run time.

So I see a lot of maintenance overhead in adding backport library, while the benefit does not outweigh the overheads.

@astaff
Copy link
Contributor Author

astaff commented Apr 27, 2020

EDIT: Wait, I no longer see anywhere dataclass being used. Did you get rid of it?

@mthrok ugh, that's correct. sorry about that. having a state class was mostly the artifact of moving code from C. I ended up refactoring it, after I got numerical parity with SoX.

@astaff astaff marked this pull request as ready for review April 27, 2020 21:27
@astaff astaff changed the title [WIP] Port sox::vad Port sox::vad Apr 27, 2020
@codecov
Copy link

codecov bot commented Apr 27, 2020

Codecov Report

Merging #578 into master will increase coverage by 0.72%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #578      +/-   ##
==========================================
+ Coverage   87.85%   88.58%   +0.72%     
==========================================
  Files          19       19              
  Lines        2051     2182     +131     
==========================================
+ Hits         1802     1933     +131     
  Misses        249      249              
Impacted Files Coverage Δ
torchaudio/functional.py 95.61% <100.00%> (+0.81%) ⬆️
torchaudio/transforms.py 95.74% <100.00%> (+0.30%) ⬆️
torchaudio/_backend.py 82.85% <0.00%> (-0.48%) ⬇️
torchaudio/sox_effects.py 95.58% <0.00%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3012050...47913ae. Read the comment docs.

@vincentqb
Copy link
Contributor

having a state class was mostly the artifact of moving code from C. I ended up refactoring it, after I got numerical parity with SoX.

Awesome you were able to remove it!

@vincentqb
Copy link
Contributor

I also see that you have the jit consistency test now. Marked test as added :)

@astaff
Copy link
Contributor Author

astaff commented Apr 27, 2020

I also see that you have the jit consistency test now. Marked test as added :)

Batch as well.

def test_vad(self):
filepath = common_utils.get_asset_path("vad-hello-mono-32000.wav")
waveform, _ = torchaudio.load(filepath)
_test_batch(F.vad, waveform, sample_rate=32000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is the sample_rate=32000 different than what waveform, sample_rate = torchaudio.load(...) would return as sample_rate?

def test_Vad(self):
filepath = common_utils.get_asset_path("vad-hello-mono-32000.wav")
waveform, _ = torchaudio.load(filepath)
self._assert_consistency(T.Vad(32000), waveform)
Copy link
Contributor

@vincentqb vincentqb Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same about sample_rate

noise_down_time: float = .01,
noise_reduction_amount: float = 1.35,
measure_freq: float = 20.0,
measure_duration: Optional[float] = None, # by default, twice the measurement period; i.e. with overlap.
Copy link
Contributor

@vincentqb vincentqb Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: adding this comment in the documentation as you have done is enough to me

Comment on lines 1928 to 1944
trigger_level: float = 7.0,
trigger_time: float = 0.25,
search_time: float = 1.0,
allowed_gap: float = 0.25,
pre_trigger_time: float = 0.0,
# Fine-tuning parameters
boot_time: float = .35,
noise_up_time: float = .1,
noise_down_time: float = .01,
noise_reduction_amount: float = 1.35,
measure_freq: float = 20.0,
measure_duration: Optional[float] = None, # by default, twice the measurement period; i.e. with overlap.
measure_smooth_time: float = .4,
hp_filter_freq: float = 50.,
lp_filter_freq: float = 6000.,
hp_lifter_freq: float = 150.,
lp_lifter_freq: float = 2000.,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we set defaults both on the functional and the transform or just the transform?

Copy link
Contributor Author

@astaff astaff Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the existing codebase there are cases of both. For example, complex_norm and compute_deltas, have defaults in functional and transforms. Some are implemented only in transforms (Fade), some are in functional (Overdrive). Personally, I think that the user experience of calling a transform should be no different from functional, that means both should have defaults.

The question that is out of scope of this Pull Request is: is there a value in synthesizing transform classes out of functions, including docstrings.

@vincentqb
Copy link
Contributor

Besides the minor point I mentioned, this looks good to me! Can you also add to the documentation in docs/?

@astaff astaff requested a review from vincentqb April 28, 2020 16:03
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this!

@vincentqb vincentqb merged commit 3ecc701 into pytorch:master Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants