Awesome Speaker Diarization

Overview

This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.

To add items to this page, simply send a pull request.

Publications

2018

2017

Speaker Diarization with LSTM
Speaker diarization using deep neural network embeddings
Speaker diarization using convolutional neural network for statistics accumulation refinement
pyannote. metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

2015

Diarization resegmentation in the factor analysis subspace

2014

Software

Framework

Link	Language	Description
SIDEKIT for diarization (s4d)	Python	An open source package extension of SIDEKIT for Speaker diarization.
pyAudioAnalysis	Python	Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
AaltoASR	Python & Perl	Speaker diarization scripts, based on AaltoASR.
LIUM_SpkDiarization	Java	LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013).
kaldi-asr	Bash	Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation.
Alize LIA_SpkSeg	C++	ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization.
pyannote-audio	Python	Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding.
pyBK	Python	Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data.

Evaluation

Link	Language	Description
pyannote-metrics	Python	A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems.
SimpleDER	Python	A lightweight library to compute Diarization Error Rate (DER).
modified NIST md-eval.pl	Perl	From Mary Tai Knox
NIST md-eval-v21.pl	Perl	From jitendra
NIST md-eval-22.pl	Perl	From nryant
dscore	Python & Perl	Diarization scoring tools.
Sequence Match Accuracy	Python	Match the accuracy of two sequences with Hungarian algorithm.

Clustering

Link	Language	Description
uis-rnn	Python & PyTorch	Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization.
SpectralCluster	Python	Spectral clustering with affinity matrix refinement operations.
sklearn.cluster	Python	scikit-learn clustering algorithms.
PLDA	Python	Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA	C++	Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).

Speaker embedding

Link	Method	Language	Description
Speaker_Verification	d-vector	Python & TensorFlow	Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification	d-vector	Python & PyTorch	PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
x-vector-kaldi-tf	x-vector	Python & TensorFlow & Perl	Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector	i-vector	C++ & Perl	Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector	i-vector	Perl	Voxceleb1 i-vector based speaker recognition system.

Other

Link	Language	Description
VB Diarization	Python	VB Diarization with Eigenvoice and HMM Priors.

Datasets

Audio	Diarization ground truth	Language	Pricing	Additional information
2000 NIST Speaker Recognition Evaluation	Disk-6 (Switchboard), Disk-8 (CALLHOME)	Multiple	$2400.00	Evaluation Plan
2003 NIST Rich Transcription Evaluation Data	Together with audios	en, ar, zh	$2000.00	telephone speech, broadcast news
CALLHOME American English Speech	CALLHOME American English Transcripts	en	$1500.00 + $1000.00	CH109 whitelist
The ICSI Meeting Corpus	Together with audios	en	Free	License
The AMI Meeting Corpus	Together with audios (need to be processed)	Multiple	Free	License
Fisher English Training Speech Part 1 Speech	Fisher English Training Speech Part 1 Transcripts	en	$7000.00 + $1000.00
Fisher English Training Part 2, Speech	Fisher English Training Part 2, Transcripts	en	$7000.00 + $1000.00

Leaderboards

Other learning materials

Tech blog

Literature Review For Speaker Change Detection by Halil Erdoğan

Video tutorials

Google's Diarization System: Speaker Diarization with LSTM by Google
Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings by Microsoft Research
Robust Speaker Diarization for Meetings: the ICSI system by Microsoft Research

Products

Company	Product
Google	Google Cloud Speech-to-Text API
Amazon	Amazon Transcribe
IBM	Watson Speech To Text API
DeepAffects	Speaker Diarization API

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitattributes		.gitattributes
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml

License

serenidpity/awesome-diarization

Folders and files

Latest commit

History

Repository files navigation

Awesome Speaker Diarization

Overview

Publications

2018

2017

2015

2014

2013

2011

2010

2008

2006