# Speaker Verification

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sensein/senselab/blob/main/tutorials/audio/speaker_verification.ipynb)

Speaker Verification is a process in which an audio system determines whether a given set of speech samples are from the same speaker. This technology is widely used in various applications such as security systems, authentication processes, and personalized user experiences. The core concept revolves around comparing voice characteristics extracted from speech samples to verify the identity of the speaker.

Speaker verification can be done in senselab as follows:

In [None]:
%pip install 'senselab[audio]'

In [None]:
# Import necessary libraries
from senselab.audio.data_structures import Audio
from senselab.audio.tasks.plotting import play_audio
from senselab.audio.tasks.preprocessing import extract_segments, resample_audios
from senselab.audio.tasks.speaker_verification.speaker_verification import verify_speaker

# Download an audio file for testing
!mkdir -p tutorial_audio_files
!wget -O tutorial_audio_files/audio_48khz_mono_16bits.wav https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav

# Load an audio file from the specified file path into an Audio object.
audio = Audio(filepath="tutorial_audio_files/audio_48khz_mono_16bits.wav")

# Resample the audio to 16 kHz
audio = resample_audios([audio], 16000)[0]

# Clip the audio into two different speakers
audio_clips = extract_segments([(audio, [(0, 0.8), (0.9, 1.8)])])
audio1 = audio_clips[0][0]
audio2 = audio_clips[0][1]

# Display the audio clips
play_audio(audio1)
play_audio(audio2)

Now we can ask senselab to verify whether the two clips are of the same speaker or not. If we give it the same audio clip twice, we should expect a high verification score, and with the two different clips, we should expect a low verification score. Let's see what happens.

In [None]:
# List of audio pairs to compare
audio_pairs = [(audio1, audio1), (audio1, audio2)]

# Verify if the audios are from the same speaker
results = verify_speaker(audio_pairs)

# Print the results
for score, is_same_speaker in results:
    print(f"Verification Score: {score}, Same Speaker: {is_same_speaker}")

The verify_speaker function is designed to accomplish the task of speaker verification using a pre-trained model. Here's a breakdown of how the function achieves this:

Input Data: The function takes a list of tuples, where each tuple contains two audio samples to be compared. Each audio sample is represented by an Audio object which includes the signal data and sampling rate.

Model and Device Setup: The function uses a pre-trained speaker verification model (SpeechBrainModel). It also selects the appropriate device (CPU or GPU) to run the model efficiently.

Sampling Rate Check: The function ensures that the audio samples have a sampling rate of 16kHz, as this is the rate the model was trained on. If the sampling rate does not match, it raises an error.

Embedding Extraction: For each pair of audio samples, the function extracts speaker embeddings using the SpeechBrainEmbeddings module. Embeddings are numerical representations that capture the unique characteristics of a speaker's voice.

Cosine Similarity Calculation: The function calculates the cosine similarity between the embeddings of the two audio samples. Cosine similarity is a measure of similarity between two vectors, where a higher value indicates greater similarity.

Threshold Comparison: The function compares the calculated similarity score against a predefined threshold (default is 0.25). If the score exceeds the threshold, it indicates that the two audio samples are likely from the same speaker.

Output: The function returns a list of tuples, each containing the similarity score and a boolean indicating whether the two audio samples are from the same speaker.