# Getting Started with ```senselab```

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sensein/senselab/blob/main/tutorials/audio/00_getting_started.ipynb)


Welcome to the `senselab` quick start tutorial!

This guide will showcase some of the key functionalities offered by `senselab`. We'll cover how to read, preprocess, analyze, and manipulate audio data. For more details, please check the documentation and task-specific tutorials.

Note that the package evolves continuously, so if you find that this tutorial breaks at some point, please let us know by opening an issue.

Let's get started!

## Installation

In [None]:
%pip install 'senselab[audio]'

First, let's download some audio data for our demo:

In [None]:
!mkdir -p tutorial_audio_files
!wget -O tutorial_audio_files/audio_48khz_mono_16bits.wav https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav
!wget -O tutorial_audio_files/audio_48khz_stereo_16bits.wav https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_stereo_16bits.wav

In Google colab, import senselab and set inline magic for plotting.

In [None]:
import sys
if 'google.colab' in sys.modules:
    import senselab
    get_ipython().run_line_magic('matplotlib', 'inline')

## Reading audio clips from disk:
Need to read some audio files from disk? **EASY!**

In [None]:
from senselab.audio.data_structures import Audio

MONO_AUDIO_PATH = "tutorial_audio_files/audio_48khz_mono_16bits.wav"
STEREO_AUDIO_PATH = "tutorial_audio_files/audio_48khz_stereo_16bits.wav"

audio1 = Audio(filepath=MONO_AUDIO_PATH)
audio2 = Audio(filepath=STEREO_AUDIO_PATH)

Alternatively, if you want to read audio chunks from a stream (a filepath or URL or sys.stdin.buffer) you can do:

In [None]:
for audio in Audio.from_stream(MONO_AUDIO_PATH):
    print(f"Audio chunk: {audio}")

## Downmixing audio clips to mono
Want to downmix your audio to mono? It has neve been that **EASY!**! Here’s how:

In [None]:
from senselab.audio.tasks.preprocessing import downmix_audios_to_mono

print("The original audio has {} channels.".format(audio2.waveform.shape[0]))
audio2 = downmix_audios_to_mono([audio2])[0]
print("The downmixed audio has {} channels.".format(audio2.waveform.shape[0]))

## Resampling audio clips to 16000 Hz
Need to resample your audio to 16000 Hz? **EASY!**



In [None]:
from senselab.audio.tasks.preprocessing import resample_audios

print("The original audio has a sampling rate of {} Hz.".format(audio1.sampling_rate))
[audio1, audio2] = resample_audios([audio1, audio2], resample_rate=16000)
print("The resampled audio has a sampling rate of {} Hz.".format(audio1.sampling_rate))

## Playing and plotting audio
Want to play or plot your audio? **EASY!**! Here is how:

In [None]:
from senselab.audio.tasks.plotting.plotting import play_audio

play_audio(audio1)

In [None]:
from senselab.audio.tasks.plotting.plotting import plot_waveform

plot_waveform(audio1);

## Voice Activity Detection
Want to detect when someone is speaking? **EASY!**

**Note**: You will need to request access to the following huggingface models:

- https://huggingface.co/pyannote/segmentation-3.0
- https://huggingface.co/pyannote/speaker-diarization-3.1

And then create a HuggingFace token and add it to your Google colab secrets (see the key on the left column) as a variable called `HF_TOKEN`

In [None]:
from senselab.audio.tasks.voice_activity_detection import detect_human_voice_activity_in_audios
from senselab.utils.data_structures import PyannoteAudioModel

pyannote_model = PyannoteAudioModel(path_or_uri="pyannote/speaker-diarization-3.1", revision="main")
voice_activity_results = detect_human_voice_activity_in_audios(audios=[audio1, audio2], model=pyannote_model)
print("Voice activity detection results: {}".format(voice_activity_results))

## Speaker Diarization
Wondering who is speaking and when? **EASY!**

In [None]:
from senselab.audio.tasks.speaker_diarization import diarize_audios

pyannote_model = PyannoteAudioModel(path_or_uri="pyannote/speaker-diarization-3.1", revision="main")
diarization_results = diarize_audios(audios=[audio1, audio2], model=pyannote_model)

print("Diarization results: {}".format(diarization_results))

## Automatic Speech Recognition
Want to convert speech to text? **EASY!**! Use this:


In [None]:
from senselab.audio.tasks.speech_to_text import transcribe_audios
from senselab.utils.data_structures import HFModel

hf_model = HFModel(path_or_uri="openai/whisper-tiny", revision="main")
transcripts = transcribe_audios(audios=[audio1, audio2], model=hf_model)

print("Transcripts: {}".format(transcripts))

## Speaker Embeddings
Need to get unique speaker signatures? **EASY!** Here’s how:

In [None]:
from senselab.audio.tasks.speaker_embeddings import extract_speaker_embeddings_from_audios
from senselab.utils.data_structures import SpeechBrainModel

ecapa_model = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main")
embeddings = extract_speaker_embeddings_from_audios(audios=[audio1, audio1], model=ecapa_model)

print("Speaker embeddings: {}".format(embeddings))

## Speech Emotion Recognition
Want to know the emotions in the speech? **EASY!**

In [None]:
from senselab.audio.tasks.classification.speech_emotion_recognition import classify_emotions_from_speech

emotion_model = HFModel(path_or_uri="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition")
emotion_results = classify_emotions_from_speech([audio1, audio2], emotion_model)

print("Emotion results: {}".format(emotion_results))

## Audio Augmentation
Need to augment your audio data? **EASY!**! Here’s how:

In [None]:
from torch_audiomentations import Compose, PolarityInversion

from senselab.audio.tasks.data_augmentation import augment_audios

apply_augmentation = Compose(transforms=[PolarityInversion(p=1, output_type="dict")], output_type="dict")
[augmented_audio1, augmented_audio2] = augment_audios([audio1, audio2], apply_augmentation)

print("Augmented audio: {}".format(augmented_audio1))

## Feature Extraction
Want to extract some OPENSMILE features from audio? **EASY!**


In [None]:
from senselab.audio.tasks.features_extraction.opensmile import extract_opensmile_features_from_audios

features = extract_opensmile_features_from_audios([audio1, audio2])

print("OpenSMILE features: {}".format(features))

## Speech Enhancement
Need to clean up your audio? **EASY!** Here’s how:

In [None]:
from senselab.audio.tasks.speech_enhancement import enhance_audios
from senselab.utils.data_structures import SpeechBrainModel

speechbrain_model = SpeechBrainModel(path_or_uri="speechbrain/sepformer-wham16k-enhancement", revision="main")
enhanced_audios = enhance_audios(audios=[audio1, audio2], model=speechbrain_model)

print("Enhanced audios: {}".format(enhanced_audios))