Voice Activity Detection #143

juanmc2005 · 2023-04-19T16:06:48Z

This PR adds a voice activity detection pipeline that is fully compatible with all of diart's features:

Streaming, writing predictions to disk and plotting in real-time (only one label named "speech" will be displayed)
Running Benchmark on a dataset and computing the detection error rate (instead of the diarization error rate)
Running Benchmark in parallel
Tuning hyper-parameters (in this case tau_active)
Spinning up a websocket server

This is also implemented in the CLI interface and can be changed using the --pipeline argument, which requires the name of the pipeline to run. For the time being, the only possible options are SpeakerDiarization and VoiceActivityDetection. More to come in the future!

A significant refactoring was needed to squeeze in this feature, so backward compatibility with v0.7 is not guaranteed.
I also changed the names of some major classes so that they are more clear:

BasePipeline becomes Pipeline
BasePipelineConfig becomes PipelineConfig
OnlineSpeakerDiarization becomes SpeakerDiarization (all pipelines are online)
PipelineConfig becomes SpeakerDiarizationConfig (there are 2 pipelines now)
RealTimeInference becomes StreamingInference ("real-time" depends on hardware, I'm more comfortable with "streaming")
RealTimePlot becomes StreamingPlot (same as above)

Changelog

Add VoiceActivityDetection pipeline with its VoiceActivityDetectionConfig
Move base Pipeline, PipelineConfig and HyperParameter to diart.blocks.base
Add --pipeline argument to CLI so the user can select a different pipeline to run, optimize, evaluate, etc.
A Pipeline must be able to suggest an evaluation metric if none is provided
Add metric: pyannote.metrics.BaseMetric parameter to Optimizer and Benchmark.__call__()
Add direction: Literal["minimize", "maximize"] parameter to Optimizer
Update GitHub link in setup.cfg

Notes on performance

Using pyannote/segmentation, duration=5s, step=0.5s, latency=5s and tau_active=0.507, the performance on AMI MixHeadset is:

Subset	Detection Error Rate	False Alarm	Missed Detection
test	5.9	3.2	2.7

…o feat/vad

juanmc2005 added 4 commits April 19, 2023 17:41

New feature: streaming voice activity detection. Pipeline name changes

bca2873

Merge branch 'develop' of github.com:juanmc2005/OnlineDiarization int…

5e44ad4

…o feat/vad

Update link in setup.cfg

7447061

Update code snippets in README

4985394

juanmc2005 added feature New feature or request API Improvements to the API refactoring Internal design improvements that don't change the API labels Apr 19, 2023

juanmc2005 added this to the Version 0.8 milestone Apr 19, 2023

juanmc2005 added 2 commits April 19, 2023 21:18

Add minor README modifications

540ad0a

Rename base pipeline and config objects

6609e3c

juanmc2005 marked this pull request as ready for review April 24, 2023 09:46

juanmc2005 mentioned this pull request Apr 24, 2023

Speaker-blind speech recognition #144

Open

juanmc2005 merged commit 569c68f into develop Apr 24, 2023

juanmc2005 deleted the feat/vad branch April 24, 2023 10:39

juanmc2005 mentioned this pull request Oct 26, 2023

Version 0.8 #192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice Activity Detection #143

Voice Activity Detection #143

juanmc2005 commented Apr 19, 2023 •

edited

Loading

Voice Activity Detection #143

Voice Activity Detection #143

Conversation

juanmc2005 commented Apr 19, 2023 • edited Loading

Changelog

Notes on performance

juanmc2005 commented Apr 19, 2023 •

edited

Loading