Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice Activity Detection #143

Merged
merged 6 commits into from
Apr 24, 2023
Merged

Voice Activity Detection #143

merged 6 commits into from
Apr 24, 2023

Conversation

juanmc2005
Copy link
Owner

@juanmc2005 juanmc2005 commented Apr 19, 2023

This PR adds a voice activity detection pipeline that is fully compatible with all of diart's features:

  • Streaming, writing predictions to disk and plotting in real-time (only one label named "speech" will be displayed)
  • Running Benchmark on a dataset and computing the detection error rate (instead of the diarization error rate)
  • Running Benchmark in parallel
  • Tuning hyper-parameters (in this case tau_active)
  • Spinning up a websocket server

This is also implemented in the CLI interface and can be changed using the --pipeline argument, which requires the name of the pipeline to run. For the time being, the only possible options are SpeakerDiarization and VoiceActivityDetection. More to come in the future!

A significant refactoring was needed to squeeze in this feature, so backward compatibility with v0.7 is not guaranteed.
I also changed the names of some major classes so that they are more clear:

  • BasePipeline becomes Pipeline
  • BasePipelineConfig becomes PipelineConfig
  • OnlineSpeakerDiarization becomes SpeakerDiarization (all pipelines are online)
  • PipelineConfig becomes SpeakerDiarizationConfig (there are 2 pipelines now)
  • RealTimeInference becomes StreamingInference ("real-time" depends on hardware, I'm more comfortable with "streaming")
  • RealTimePlot becomes StreamingPlot (same as above)

Changelog

  • Add VoiceActivityDetection pipeline with its VoiceActivityDetectionConfig
  • Move base Pipeline, PipelineConfig and HyperParameter to diart.blocks.base
  • Add --pipeline argument to CLI so the user can select a different pipeline to run, optimize, evaluate, etc.
  • A Pipeline must be able to suggest an evaluation metric if none is provided
  • Add metric: pyannote.metrics.BaseMetric parameter to Optimizer and Benchmark.__call__()
  • Add direction: Literal["minimize", "maximize"] parameter to Optimizer
  • Update GitHub link in setup.cfg

Notes on performance

Using pyannote/segmentation, duration=5s, step=0.5s, latency=5s and tau_active=0.507, the performance on AMI MixHeadset is:

Subset Detection Error Rate False Alarm Missed Detection
test 5.9 3.2 2.7

@juanmc2005 juanmc2005 added feature New feature or request API Improvements to the API refactoring Internal design improvements that don't change the API labels Apr 19, 2023
@juanmc2005 juanmc2005 added this to the Version 0.8 milestone Apr 19, 2023
@juanmc2005 juanmc2005 marked this pull request as ready for review April 24, 2023 09:46
@juanmc2005 juanmc2005 merged commit 569c68f into develop Apr 24, 2023
@juanmc2005 juanmc2005 deleted the feat/vad branch April 24, 2023 10:39
@juanmc2005 juanmc2005 mentioned this pull request Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Improvements to the API feature New feature or request refactoring Internal design improvements that don't change the API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant