Add batched mode for faster inference on pre-recorded conversations #35

juanmc2005 · 2022-04-21T18:21:35Z

Problem

Running the online diarization pipeline on an entire dataset can be difficult and slow because the current implementation simulates an online scenario and processes one chunk at a time.
This is the way to go in a real-time scenario but it would be very useful to have a faster implementation for evaluation, for example to see the performance impact of swapping a component (issue #34 is a good example).

Idea

Pre-calculate segmentation and embeddings for all chunks in a file and only run clustering and output reconstruction online, this would considerably speed up the process. It could be hosted in a new BatchedOnlineSpeakerDiarization class implementing the same interface as OnlineSpeakerDiarization

The text was updated successfully, but these errors were encountered:

juanmc2005 · 2022-05-08T15:56:19Z

Changes added as part of PR #46.
Implemented as OnlineSpeakerDiarization.from_file()

juanmc2005 added the feature New feature or request label Apr 21, 2022

juanmc2005 added this to the Version 0.3 milestone Apr 21, 2022

juanmc2005 mentioned this issue Apr 26, 2022

SpeechBrain embedding compatibility #34

Closed

This was referenced May 4, 2022

My output is different from expected_outputs #15

Closed

Benchmark script + improvements and bug fixes #46

Merged

juanmc2005 closed this as completed May 8, 2022

juanmc2005 mentioned this issue May 18, 2022

Version 0.3.0 #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batched mode for faster inference on pre-recorded conversations #35

Add batched mode for faster inference on pre-recorded conversations #35

juanmc2005 commented Apr 21, 2022

juanmc2005 commented May 8, 2022

Add batched mode for faster inference on pre-recorded conversations #35

Add batched mode for faster inference on pre-recorded conversations #35

Comments

juanmc2005 commented Apr 21, 2022

Problem

Idea

juanmc2005 commented May 8, 2022