# Voice activity detection (VAD)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sensein/senselab/blob/main/tutorials/audio/voice_activity_detection.ipynb)

This tutorial demonstrates how to use the `detect_human_voice_activity_in_audios` function to do voice activity detection.

Specifically, will show you how to use the `pyannote/speaker-diarization-3.1` model (https://huggingface.co/pyannote/speaker-diarization-3.1).

In [5]:
# Import necessary modules
from senselab.audio.data_structures import Audio
from senselab.audio.tasks.plotting.plotting import play_audio
from senselab.audio.tasks.preprocessing import resample_audios
from senselab.audio.tasks.voice_activity_detection import detect_human_voice_activity_in_audios
from senselab.utils.data_structures import DeviceType, PyannoteAudioModel
from senselab.utils.tasks.plotting import plot_segment

In [6]:
# Initialize a PyannoteAudioModel for speaker diarization, providing the model's path or URI.
model = PyannoteAudioModel(path_or_uri="pyannote/speaker-diarization-3.1")

# Specify the device type to be used for processing (CPU in this case).
device = DeviceType.CPU

In [None]:
# Load an audio file from the specified file path into an Audio object.
audio = Audio.from_filepath("../../src/tests/data_for_testing/audio_48khz_mono_16bits.wav")

# Resample the audio to 16kHz, as this is the expected input format for the model.
# The resample_audios function returns a list, so we take the first (and only) element.
audio = resample_audios([audio], 16000)[0]

# Play the resampled audio to verify the preprocessing step was successful.
play_audio(audio)

In [None]:
# Perform human voice activity detection on the audio using the specified model and device.
# The function returns a list of results, where each element corresponds to an audio segment.
results = detect_human_voice_activity_in_audios(audios=[audio], model=model, device=device)

# Print the results of the voice activity detection to the console.
print(results)

In [None]:
# Plot the first segment of the detected human voice activity for visualization.
plot_segment(results[0])