Skip to content

entbappy/Speaker-Diarization-with-LSTM---Spectral-clustering-algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SPEAKER DIARIZATION WITH LSTM

Authors of the paper:

Quan Wang, Carlton Downey et. al

Speaker diarization is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It answers the question “who spoke when” in a multi-speaker environment. It has a wide variety of applications including multimedia information retrieval, speaker turn analysis, and audio processing. In particular, the speaker boundaries produced by diarization systems have the potential to significantly improve acoustic speech recognition (ASR) accuracy.

A typical speaker diarization system usually consists of four components: (1) Speech segmentation, where the input audio is segmented into short sections that are assumed to have a single speaker, and the non-speech sections are filtered out; (2) Audio embedding extraction, where specific features such as MFCCs [1], speaker factors [2], or i-vectors [3, 4, 5] are extracted from the segmented sections; (3) Clustering, where the number of speakers is determined, and the extracted audio embeddings are clustered into these speakers; and optionally (4) Resegmentation [6], where the clustering results are further refined to produce the final diarization results

Resources links:

demo video:

https://youtu.be/axfAxfhe1Ko

Author: Bappy Ahmed
Data Scientist
Email: entbappy73@gmail.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published