Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online speaker diarization as a block #92

Merged
merged 11 commits into from
Sep 28, 2022
Merged

Online speaker diarization as a block #92

merged 11 commits into from
Sep 28, 2022

Conversation

juanmc2005
Copy link
Owner

@juanmc2005 juanmc2005 commented Sep 8, 2022

This PR addresses issues #83 and #84.

Changelog

  • OnlineSpeakerDiarization is now independent from RxPY and can be used as a block (cc @hbredin)
    • It can receive a single waveform or a list of them (for batched inference)
    • This class is now stateful (speaker centroids), so it now allows to initialize the system with centroids from a previous pipeline
  • from diart import OnlineSpeakerDiarization, PipelineConfig
  • AudioLoader cannot split files into audio chunks anymore
  • Emit additional chunks with zero padding at the end of file streams so the pipeline output is fully aggregated
    • Before, this was done by notifying DelayedAggregation of the stream duration, then it would concatenate the last non-aggregated output
  • Fix bug: EmbeddingNormalization was squeezing the output
  • Uris are no longer required by blocks or output annotations, instead they're added by sinks
  • OnlineSpeakerDiarization does not split the stream into chunks anymore, nor it resamples chunks dynamically
    • This is now done by RealTimeInference, which still uses RxPY as it is a higher level API
  • RealTimeInference can now do batched inference and includes new parameters
  • Benchmark reuses RealTimeInference internally (huge win here)
  • regularize_audio_stream() renamed to rearrange_audio_stream(), as the notion of regularity is not very clear here
  • diart.pipelines does not exist anymore
  • Audio sources don't have a length property anymore
  • Remove PrecalculatedFeaturesAudioSource
  • Customizable reading block size in most audio sources
  • When possible, AudioSource block size is set to the step size
  • Add OnlineSpeakerDiarization.reset() to reset internal state (centroids and buffers)
  • Add AudioSource.close() to correctly handle termination from external causes
  • Avoid confusion with Rx observers not being called with do and do_action
    • Replace diart.operators.profile with a stateful diart.utils.Chronometer
    • Absorb diart.operators.progress in RealTimeInference
    • RealTimeInference now handles all the complexity of Rx

RAM usage during inference is considerably reduced (~30%)
Runtime doesn't seem to be impacted

@juanmc2005 juanmc2005 added feature New feature or request API Improvements to the API refactoring Internal design improvements that don't change the API labels Sep 8, 2022
@juanmc2005 juanmc2005 added this to the Version 0.6 milestone Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Improvements to the API feature New feature or request refactoring Internal design improvements that don't change the API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant