# Audio Batch Preprocessing and Transcription with senselab

This notebook demonstrates how to:
- Read all audio files in `tutorials/audio/tutorial_audio_files/`
- Convert them to mono and 16kHz (if not already)
- Transcribe them using the Whisper Tiny model from HuggingFace via senselab

We use senselab's audio preprocessing and speech-to-text modules for this workflow.

In [6]:
# Import necessary modules from senselab and standard libraries
from senselab.audio.data_structures import Audio
from senselab.audio.tasks.preprocessing import downmix_audios_to_mono, resample_audios
from senselab.audio.tasks.speech_to_text import transcribe_audios
from senselab.utils.data_structures import DeviceType, HFModel
import os

In [11]:
# List all audio files in the target directory
AUDIO_DIR = "tutorials/audio/tutorial_audio_files/"
audio_files = [os.path.join(AUDIO_DIR, f) for f in os.listdir(AUDIO_DIR)
               if f.lower().endswith(('.wav', '.mp3', '.flac', '.ogg', '.m4a'))]
print(f"Found {len(audio_files)} audio files:")
for f in audio_files:
    print(f)

Found 2 audio files:
tutorials/audio/tutorial_audio_files/audio_48khz_mono_16bits.wav
tutorials/audio/tutorial_audio_files/audio_48khz_stereo_16bits.wav


In [12]:
# Load, convert to mono, and resample all audio files as needed
processed_audios = []
for file in audio_files:
    audio = Audio(filepath=file)
    # Convert to mono if needed
    if audio.waveform.shape[0] != 1:
        audio = downmix_audios_to_mono([audio])[0]
    # Resample to 16kHz if needed
    if audio.sampling_rate != 16000:
        audio = resample_audios([audio], 16000)[0]
    processed_audios.append(audio)
print(f"Processed {len(processed_audios)} audio files for transcription.")

Processed 2 audio files for transcription.


In [13]:
# Transcribe all processed audio files using Whisper Tiny (HuggingFace)
model = HFModel(path_or_uri="openai/whisper-tiny", revision="main")
device = DeviceType.CPU
transcripts = transcribe_audios(audios=processed_audios, model=model, device=device)
print("Transcription complete.")

2025-08-17 21:58:20,352 - senselab - INFO - Time taken to initialize the hugging face ASR pipeline: 0.00 seconds
Using custom `forced_decoder_ids` from the (generation) config. This is deprecated in favor of the `task` and `language` flags/config options.
Transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English. This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. See https://github.com/huggingface/transformers/pull/28687 for more details.
2025-08-17 21:58:21,587 - senselab - INFO - Time taken for transcribing the audios: 1.23 seconds


Transcription complete.


# Report: Method and Results

## Method
This notebook demonstrates a batch audio preprocessing and transcription workflow using the `senselab` library. The steps are as follows:

1. **Audio File Discovery:** All audio files in the specified directory are listed.
2. **Preprocessing:** Each audio file is loaded, converted to mono if necessary, and resampled to 16kHz if needed.
3. **Transcription:** The preprocessed audio files are transcribed using the Whisper Tiny model from HuggingFace via `senselab`.
4. **Results Display:** The transcriptions for each audio file are printed.

## Results
- All audio files in the directory were successfully preprocessed and transcribed.
- The output section displays the transcript for each file, demonstrating the effectiveness of the batch workflow.

This workflow can be adapted for larger datasets or different models as needed.

In [14]:
# Display the transcriptions
for idx, script_line in enumerate(transcripts):
    print(f"File: {audio_files[idx]}")
    print(f"Transcript: {script_line.text}\n")

File: tutorials/audio/tutorial_audio_files/audio_48khz_mono_16bits.wav
Transcript: This is Peter. This is Johnny. Kenny. Thank you. We just wanted to take a minute to thank you.

File: tutorials/audio/tutorial_audio_files/audio_48khz_stereo_16bits.wav
Transcript: This is Peter. This is Johnny. Kenny. Thank you. We just wanted to take a minute to thank you.

