# Diarize single-channel audio files

This notebook instantiates a `pyannote-audio` pipeline and diarizes the single-channel left|right audio files. The diarization results are stored in `.rttm` files.

In [1]:
from pathlib import Path
import diarize_utils as utils
from pyannote.audio import Pipeline

## Define the project

The source audio files are stored in a subdirectories named `audio/left` and `audio/right` in the project root. The left and right `.rttm` outputs will be stashed in `left` and `right` subdirectories of `diarized/rttm`.

In [2]:
projroot = Path('/global/home/groups/fc_phonlab/spkrcorpus')
outputext = '.TextGrid' # Desired output type: '.eaf', '.TextGrid', '.rttm', '.lab'
wavleft = projroot / 'audio' / 'left'
wavright = projroot / 'audio' / 'right'
outleft = projroot / 'diarized' / outputext.replace('.', '') / 'left'
outright = projroot / 'diarized' / outputext.replace('.', '') / 'right'

## Instantiate the pipeline

TODO: more on auth tokens
TODO: more on setting params.

In [3]:
# Store the token as the first line of `tokenfile`. This file should not be
# readable by other users on the system and should not be added to a git
# repository.
tokenfile = '/global/home/users/rsprouse/pyannote-auth-token'
with open(tokenfile, 'r') as tf:
    auth_token = tf.readline().strip()

In [4]:
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization",
    use_auth_token=auth_token
)
parameters = {
    "segmentation": {
        "min_duration_off": 0.3,
    },
}

pipeline.instantiate(parameters)

<pyannote.audio.pipelines.speaker_diarization.SpeakerDiarization at 0x2b7edc28e9d0>

## Diarize the left channels

The `compare_dirs` function finds `left` `.wav` files that do not have a corresponding output file. The `ext1` and `ext2` values ensure that `compare_dirs` only looks for `.wav` and `.rttm` files in their corresponding directories.

In [5]:
todoleft = utils.compare_dirs(
    dir1=wavleft, ext1='.wav',
    dir2=outleft, ext2=outputext
)
todoleft

Unnamed: 0,relpath,fname,barename
0,speaker_1,interview_a.wav,interview_a
1,speaker_1,interview_b.wav,interview_b
2,speaker_2,interview_a.wav,interview_a
3,speaker_2,interview_b.wav,interview_b


`todoleft` is a dataframe in which the rows represent input audio files that require processing of the left channel.

The `diarize_df` function iterates over the rows of `todoleft` and uses the pipeline to diarize the input audio file and produce an `.rttm`.

In [6]:
verbose = True
for row in todoleft.itertuples():
    wavfile = wavleft / row.relpath / row.fname
    outfile = outleft / row.relpath / f'{row.barename}{outputext}'
    if verbose:
        print(f'diarize: {outfile}')
    utils.diarize(wavfile, pipeline, 2, outfile)

[W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.


In [13]:
outfile.parent

PosixPath('/global/home/groups/fc_phonlab/spkrcorpus/diarized/TextGrid/left/speaker_2')

In [None]:
utils.diarize_df(todoleft, pipeline, num_spkr, wavleft, rttmleft)

## Diarize the right channels

In [10]:
todoright = utils.compare_dirs(
    dir1=wavright, ext1='.wav',
    dir2=outright, ext2=outputext
)
todoright

Unnamed: 0,relpath,fname,barename
0,speaker_1,interview_a.wav,interview_a
1,speaker_1,interview_b.wav,interview_b
2,speaker_2,interview_a.wav,interview_a
3,speaker_2,interview_b.wav,interview_b


In [None]:
for row in todoright.itertuples():
    wavfile = wavright / row.relpath / row.fname
    outfile = outright / row.relpath / f'{row.barename}{outputext}'
    utils.diarize(wavfile, pipeline, 2, outfile)

In [None]:
utils.diarize_df(todoright, pipeline, num_spkr, wavright, rttmright)