binary-key-diarizer

Fast speaker diarization system using Binary Key modelling

For more information about the binary key speaker diarization process, refer to the following publication:

Héctor Delgado, Xavier Anguera, Corinne Fredouille and Javier Serrano, "Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23-12, pp 2286-2297, 2015.

Also, if you use the code in your research, please cite the aforementioned publication.

The code has been developed and tested in Matlab R20015b

Matlab toolboxes required:

Statistics toolbox
Parallel computing toolbox (optional, only if you want to take advantage of the cpu cores by using multiple workers)

External toolboxes required:

The system also requires some functions of external freely available toolboxes, but you do not need to download them: those funcions are included in the "/matlab/external" folder of this package. However, we acknoledge them next. The required functions are the following:

mvn_new.m: for handling multivariate normal distributions. From "Matlab/Octave Multivariate Normals Toolbox (Version 1)" (http://www.ofai.at/~dominik.schnitzer/mvn)
readhtk.m: for reading HTK feature files. From "VOICEBOX: Speech Processing Toolbox for MATLAB" (http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html):
pdist2.m (renamed to pdist22.m to avoid conflict with matlab pdist2 function): For calculating the chi-square similarity. From "Piotr's Computer Vision Matlab Toolbox" (http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html).
VLFeat (http://www.vlfeat.org/): The function "vl_gmm" is used for GMM training.

Running the system:

The included package consists of the following folders:

/eval/ contains the NIST md-eval-v21.pl for system evaluation
/featureFiles/ input feature files folder
/matlab/ contains the matlab code
/reference/ contains the reference files necessary to evaluate the system
/sad/ to place the Speech Activity Detection Files
/uem/ to place the UEM partition files

IMPORTANT NOTICE: Note that the system does not perform Feature Extraction nor Speech Activity Detection. You have to provide these files by yourself. For feature extraction, you can use HTK feature files, or just an ascii file in which each line is a feature composed of the coefficients separated by blanks.

In order to easily test the system, we have provided some example data:

In /featureFiles folder you can find 5 feature files:

3054300.mfc
3055877.mfc
3056696.mfc
3057402.mfc
3063115.mfc

Those are 19-order MFCCs feature files in HTK format. They were extracted from audio files of the SAIVT-BNEWS database of Australian broadcast news (https://wiki.qut.edu.au/display/saivt/SAIVT-BNEWS). We cannot distribute them (copyright All Rights Reserved by Fairfax Media), but you can watch/get the videos in:

In "/sad" folder we include speech/non-speech label files. They were actually extracted from the reference speaker labels in "/reference" folder, so the SAD files are "perfect" in this example.

In the root folder of this package, the file "main.m" is included. This is a matlab script file in which all the system parameters are configured and from which the speaker diarization system is called. The possible values of the parameters are explained in the code comments. Read them carefully.

The list of input feature files is obtained by scanning the folder specified for feature files. See the example inside.

Once the "main.m" has been edited (you can leave the values provided as a starting point), we are ready to run the system:

Run "main.m"

main;

The standard output will show information about the processes. Once finished, the output RTTM file will be at "/out" folder, and the log file at "/log" folder

Evaluating the obtained solution

In "/eval" folder, we have included the NIST md-eval script widely used to assess speaker diarization technology. Open a terminal and go to the package root folder.

run: $ eval/md-eval-v21.pl -af -c 0.25 -s out/[experiment_name].rttm -r reference/reference.rttm

(where [experiment_name] is the name you assigned to the variable "experimentName" in "main.m")

You will get the evaluation report in the standard output. If everything went fine the final result should be:

"OVERALL SPEAKER DIARIZATION ERROR = 4.62 percent of scored speaker time  `(ALL)"

Thanks for downloading and using the system!

Héctor Delgado

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

binary-key-diarizer

Matlab toolboxes required:

External toolboxes required:

Running the system:

Evaluating the obtained solution

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
eval		eval
featureFiles		featureFiles
matlab		matlab
reference		reference
sad		sad
uem		uem
.gitattributes		.gitattributes
README.md		README.md
main.m		main.m

h-delgado/binary-key-diarizer

Folders and files

Latest commit

History

Repository files navigation

binary-key-diarizer

Matlab toolboxes required:

External toolboxes required:

Running the system:

Evaluating the obtained solution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages