# Diarization Example

Let's use a [Tagesschau video](https://www.tagesschau.de/multimedia/sendung/ts-33043.html). It is already processed under the [test folder](./test) with ground-truth speaker labels in the RTTM format.

In [2]:
!soxi test/tagesschau02092019.wav


Input File     : 'test/tagesschau02092019.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:14:40.80 = 14092841 samples ~ 66060.2 CDDA sectors
File Size      : 28.2M
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM



In [3]:
from IPython.display import display, HTML
video_id = "E1q0yIW0O74"
raw_html = """
<a href="https://www.youtube.com/watch?v={video_id}" target="_blank">
    <img src="http://img.youtube.com/vi/{video_id}/0.jpg" width="400" border="4">
</a>
"""
display(HTML(raw_html.format(video_id=video_id)))

## Import Pipeline

Diarization system first uses voice activity detection (VAD) to segment the input audio file, computes speaker embeddings using a pre-trained x-vector model, and applies agglomerative hierarchical clustering (AHC) with a VBx hidden Markov model (HMM) to perform speaker diarization on the input audio file.

In [5]:
from main import Pipeline
diarizer = Pipeline.init_from_wav("./test/tagesschau02092019.wav")

## Start Diarization

Diarizer will return the RTTM output,

In [6]:
%%time

rttm_path = diarizer.write_to_RTTM("./results")

Diarization completed. RTTM file generated at results/tagesschau02092019.rttm
CPU times: user 12min 33s, sys: 2min 41s, total: 15min 14s
Wall time: 1min 56s


Display the first few lines of the produced RTTM file,

In [7]:
!cat {rttm_path} | head

SPEAKER tagesschau02092019 1 4.360000 2.990000 <NA> <NA> 34 <NA> <NA>
SPEAKER tagesschau02092019 1 14.970000 2.430000 <NA> <NA> 34 <NA> <NA>
SPEAKER tagesschau02092019 1 18.040000 26.320000 <NA> <NA> 32 <NA> <NA>
SPEAKER tagesschau02092019 1 45.530000 10.940000 <NA> <NA> 34 <NA> <NA>
SPEAKER tagesschau02092019 1 57.400000 21.960000 <NA> <NA> 21 <NA> <NA>
SPEAKER tagesschau02092019 1 79.360000 5.590000 <NA> <NA> 34 <NA> <NA>
SPEAKER tagesschau02092019 1 86.120000 7.080000 <NA> <NA> 27 <NA> <NA>
SPEAKER tagesschau02092019 1 93.200000 6.150000 <NA> <NA> 34 <NA> <NA>
SPEAKER tagesschau02092019 1 100.360000 15.000000 <NA> <NA> 27 <NA> <NA>
SPEAKER tagesschau02092019 1 115.360000 6.790000 <NA> <NA> 34 <NA> <NA>


Measure the diarization error rate using `md-eval.pl` script where `-r` and `-s` specify the reference RTTM file and the system output RTTM file respectively,

In [8]:
!./test/md-eval.pl -1 -c 0.25 -r test/tagesschau02092019.rttm -s results/tagesschau02092019.rttm \
    | grep "SPEAKER DIARIZATION ERROR"d

 OVERALL SPEAKER DIARIZATION ERROR = 2.69 percent of scored speaker time  `(ALL)
