Skip to content

Speaker diarization system using Binary Key modelling

Notifications You must be signed in to change notification settings

h-delgado/binary-key-diarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

binary-key-diarizer

Fast speaker diarization system using Binary Key modelling

For more information about the binary key speaker diarization process, refer to the following publication:

Héctor Delgado, Xavier Anguera, Corinne Fredouille and Javier Serrano, "Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23-12, pp 2286-2297, 2015.

Also, if you use the code in your research, please cite the aforementioned publication.

The code has been developed and tested in Matlab R20015b

Matlab toolboxes required:

  • Statistics toolbox
  • Parallel computing toolbox (optional, only if you want to take advantage of the cpu cores by using multiple workers)

External toolboxes required:

The system also requires some functions of external freely available toolboxes, but you do not need to download them: those funcions are included in the "/matlab/external" folder of this package. However, we acknoledge them next. The required functions are the following:

Running the system:

The included package consists of the following folders:

  • /eval/ contains the NIST md-eval-v21.pl for system evaluation
  • /featureFiles/ input feature files folder
  • /matlab/ contains the matlab code
  • /reference/ contains the reference files necessary to evaluate the system
  • /sad/ to place the Speech Activity Detection Files
  • /uem/ to place the UEM partition files

IMPORTANT NOTICE: Note that the system does not perform Feature Extraction nor Speech Activity Detection. You have to provide these files by yourself. For feature extraction, you can use HTK feature files, or just an ascii file in which each line is a feature composed of the coefficients separated by blanks.

In order to easily test the system, we have provided some example data:

In /featureFiles folder you can find 5 feature files:

  • 3054300.mfc
  • 3055877.mfc
  • 3056696.mfc
  • 3057402.mfc
  • 3063115.mfc

Those are 19-order MFCCs feature files in HTK format. They were extracted from audio files of the SAIVT-BNEWS database of Australian broadcast news (https://wiki.qut.edu.au/display/saivt/SAIVT-BNEWS). We cannot distribute them (copyright All Rights Reserved by Fairfax Media), but you can watch/get the videos in:

In "/sad" folder we include speech/non-speech label files. They were actually extracted from the reference speaker labels in "/reference" folder, so the SAD files are "perfect" in this example.

In the root folder of this package, the file "main.m" is included. This is a matlab script file in which all the system parameters are configured and from which the speaker diarization system is called. The possible values of the parameters are explained in the code comments. Read them carefully.

The list of input feature files is obtained by scanning the folder specified for feature files. See the example inside.

Once the "main.m" has been edited (you can leave the values provided as a starting point), we are ready to run the system:

  • Run "main.m"

    main;

The standard output will show information about the processes. Once finished, the output RTTM file will be at "/out" folder, and the log file at "/log" folder

Evaluating the obtained solution

In "/eval" folder, we have included the NIST md-eval script widely used to assess speaker diarization technology. Open a terminal and go to the package root folder.

  • run: $ eval/md-eval-v21.pl -af -c 0.25 -s out/[experiment_name].rttm -r reference/reference.rttm

(where [experiment_name] is the name you assigned to the variable "experimentName" in "main.m")

You will get the evaluation report in the standard output. If everything went fine the final result should be:

"OVERALL SPEAKER DIARIZATION ERROR = 4.62 percent of scored speaker time  `(ALL)"

Thanks for downloading and using the system!

Héctor Delgado

About

Speaker diarization system using Binary Key modelling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages