# Getting Started with ```senselab```

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sensein/senselab/blob/main/tutorials/audio/00_getting_started.ipynb)


Welcome to the `senselab` quick start tutorial! 

This guide will showcase some of the key functionalities offered by `senselab`. We'll cover how to read, preprocess, analyze, and manipulate audio data. For more details, please check the documentation and task-specific tutorials. 

Note that the package evolves continuously, so if you find that this tutorial breaks at some point, please let us know by opening an issue. 

Let's get started!

## Installation

In [1]:
!pip install senselab

Collecting transformers<4.45.0,>=4.44.2 (from senselab)
  Downloading transformers-4.44.2-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
Collecting protobuf>=3.20 (from tensorboardX>=2.6->pyannote-audio<4.0,>=3.3->senselab)
  Downloading protobuf-5.28.3-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Downloading transformers-4.44.2-py3-none-any.whl (9.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading protobuf-5.28.3-cp38-abi3-manylinux2014_x86_64.whl (316 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m316.6/316.6 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: protobuf, transformers
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.19.6
    Uninstalling protobuf-3.19.6:
      Successfu

## Reading audio clips from disk:
Need to read some audio files from disk? **EASY!**

In [4]:
from senselab.audio.data_structures import Audio
!mkdir -p tutorial_audio_files
!wget -O tutorial_audio_files/audio_48khz_mono_16bits.wav https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav
!wget -O tutorial_audio_files/audio_48khz_stereo_16bits.wav https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_stereo_16bits.wav

MONO_AUDIO_PATH = "tutorial_audio_files/audio_48khz_mono_16bits.wav"
STEREO_AUDIO_PATH = "tutorial_audio_files/audio_48khz_stereo_16bits.wav"

audio1 = Audio.from_filepath(MONO_AUDIO_PATH)
audio2 = Audio.from_filepath(STEREO_AUDIO_PATH)

--2024-11-24 17:25:10--  https://github.com/sensein/senselab/raw/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/sensein/senselab/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav [following]
--2024-11-24 17:25:10--  https://raw.githubusercontent.com/sensein/senselab/main/src/tests/data_for_testing/audio_48khz_mono_16bits.wav
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 472488 (461K) [audio/wav]
Saving to: ‘tutorial_audio_files/audio_48khz_mono_16bits.wav’


2024-11-24 17:25:10 (15.7 MB/s) - ‘tutorial_audio_files/audio_48kh

## Downmixing audio clips to mono
Want to downmix your audio to mono? It has neve been that **EASY!**! Here’s how:

In [3]:
from senselab.audio.tasks.preprocessing import downmix_audios_to_mono

print("The original audio has {} channels.".format(audio2.waveform.shape[0]))
audio2 = downmix_audios_to_mono([audio2])[0]
print("The downmixed audio has {} channels.".format(audio2.waveform.shape[0]))

The original audio has 2 channels.
The downmixed audio has 1 channels.


## Resampling audio clips to 16000 Hz
Need to resample your audio to 16000 Hz? **EASY!**



In [None]:
from senselab.audio.tasks.preprocessing import resample_audios

print("The original audio has a sampling rate of {} Hz.".format(audio1.sampling_rate))
[audio1, audio2] = resample_audios([audio1, audio2], resample_rate=16000)
print("The resampled audio has a sampling rate of {} Hz.".format(audio1.sampling_rate))

## Playing and plotting audio
Want to play or plot your audio? **EASY!**! Here is how:

In [None]:
from senselab.audio.tasks.plotting.plotting import play_audio

play_audio(audio1)

In [None]:
from senselab.audio.tasks.plotting.plotting import plot_waveform

plot_waveform(audio1)

## Voice Activity Detection
Want to detect when someone is speaking? **EASY!**

In [None]:
from senselab.audio.tasks.voice_activity_detection import detect_human_voice_activity_in_audios
from senselab.utils.data_structures import PyannoteAudioModel

pyannote_model = PyannoteAudioModel(path_or_uri="pyannote/speaker-diarization-3.1", revision="main")
voice_activity_results = detect_human_voice_activity_in_audios(audios=[audio1, audio2], model=pyannote_model)
print("Voice activity detection results: {}".format(voice_activity_results))

## Speaker Diarization
Wondering who is speaking and when? **EASY!**

In [None]:
from senselab.audio.tasks.speaker_diarization import diarize_audios

pyannote_model = PyannoteAudioModel(path_or_uri="pyannote/speaker-diarization-3.1", revision="main")
diarization_results = diarize_audios(audios=[audio1, audio2], model=pyannote_model)

print("Diarization results: {}".format(diarization_results))

## Automatic Speech Recognition
Want to convert speech to text? **EASY!**! Use this:


In [None]:
from senselab.audio.tasks.speech_to_text import transcribe_audios
from senselab.utils.data_structures import HFModel

hf_model = HFModel(path_or_uri="openai/whisper-tiny", revision="main")
transcripts = transcribe_audios(audios=[audio1, audio2], model=hf_model)

print("Transcripts: {}".format(transcripts))

## Speaker Embeddings
Need to get unique speaker signatures? **EASY!** Here’s how:

In [None]:
from senselab.audio.tasks.speaker_embeddings import extract_speaker_embeddings_from_audios
from senselab.utils.data_structures import SpeechBrainModel

ecapa_model = SpeechBrainModel(path_or_uri="speechbrain/spkrec-ecapa-voxceleb", revision="main")
embeddings = extract_speaker_embeddings_from_audios(audios=[audio1, audio1], model=ecapa_model)

print("Speaker embeddings: {}".format(embeddings))

## Speech Emotion Recognition
Want to know the emotions in the speech? **EASY!**

In [None]:
from senselab.audio.tasks.classification.speech_emotion_recognition import speech_emotion_recognition_with_hf_models

emotion_model = HFModel(path_or_uri="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition")
emotion_results = speech_emotion_recognition_with_hf_models([audio1, audio2], emotion_model)

print("Emotion results: {}".format(emotion_results))

## Audio Augmentation
Need to augment your audio data? **EASY!**! Here’s how:

In [None]:
from torch_audiomentations import Compose, PolarityInversion

from senselab.audio.tasks.data_augmentation import augment_audios

apply_augmentation = Compose(transforms=[PolarityInversion(p=1, output_type="dict")], output_type="dict")
[augmented_audio1, augmented_audio2] = augment_audios([audio1, audio2], apply_augmentation)

print("Augmented audio: {}".format(augmented_audio1))

## Feature Extraction
Want to extract some OPENSMILE features from audio? **EASY!**


In [None]:
from senselab.audio.tasks.features_extraction.opensmile import extract_opensmile_features_from_audios

features = extract_opensmile_features_from_audios([audio1, audio2])

print("OpenSMILE features: {}".format(features))

## Speech Enhancement
Need to clean up your audio? **EASY!** Here’s how:

In [None]:
from senselab.audio.tasks.speech_enhancement import enhance_audios
from senselab.utils.data_structures import SpeechBrainModel

speechbrain_model = SpeechBrainModel(path_or_uri="speechbrain/sepformer-wham16k-enhancement", revision="main")
enhanced_audios = enhance_audios(audios=[audio1, audio1], model=speechbrain_model)

print("Enhanced audios: {}".format(enhanced_audios))