# Speaker diarization

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sensein/senselab/blob/main/tutorials/audio/speaker_diarization.ipynb)

This tutorial demonstrates how to use the `diarize_audios` function to perform speaker diarization on some audio files, which means to segment the audio into multiple speakers.

In [None]:
%pip install senselab['audio']

In [None]:
# Import necessary modules
from senselab.audio.data_structures import Audio
from senselab.audio.tasks.plotting.plotting import play_audio
from senselab.audio.tasks.preprocessing import resample_audios
from senselab.audio.tasks.speaker_diarization import diarize_audios
from senselab.utils.data_structures import DeviceType, PyannoteAudioModel
from senselab.utils.tasks.plotting import plot_segment

In [None]:
# Initialize a PyannoteAudioModel for speaker diarization, providing the model's path or URI.
model = PyannoteAudioModel(path_or_uri="pyannote/speaker-diarization-3.1")

# Specify the device type to be used for processing (CPU in this case).
device = DeviceType.CPU

In [None]:
# Load an audio file from the specified file path into an Audio object.
!mkdir -p tutorial_audio_files
!wget -O tutorial_audio_files/audio_48khz_mono_16bits.wav https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav

audio = Audio(filepath="tutorial_audio_files/audio_48khz_mono_16bits.wav")

# Resample the audio to 16kHz, as this is the expected input format for the model.
# The resample_audios function returns a list, so we take the first (and only) element.
audio = resample_audios([audio], 16000)[0]

# Play the resampled audio to verify the preprocessing step was successful.
play_audio(audio)

In [None]:
# Perform speaker diarization on the audio using the specified model and device.
# The function returns a list of results, where each element corresponds to an audio segment.
results = diarize_audios(audios=[audio], model=model, device=device)

# Print the results of speaker diarization to the console.
print(results)

In [None]:
# Plot the detected speakers for visualization.
plot_segment(results[0])

**Ehm wait**. In the audio, we can hear four speakers, but the speaker diarization results indicate only two speakers. Why is this happening?

Unfortunately, the model is not perfect and can make mistakes. We can try adjusting the parameters by setting `num_speakers=4`, `min_speakers=4`, and `max_speakers=4` to force the model to recognize four speakers. However, this approach doesn't always work as expected.