# 🎧 Batch Audio Transcription with `faster-whisper`

This notebook will guide you through transcribing multiple `.mp3` or `.wav` files in a folder using the optimized [faster-whisper](https://github.com/guillaumekln/faster-whisper) model.

## 🧰 Requirements

Install the necessary libraries:

In [1]:
%pip install faster-whisper ffmpeg-python


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# 📚 Imports
import os
import json
import time
from pathlib import Path
import subprocess
from faster_whisper import WhisperModel

## 📁 Step 1: Set Up Your Folder Paths

Prepare your folder with audio files and choose where to store your transcriptions.

In [3]:
# 🔧 Configuration

AUDIO_DIR = "audios"  # Folder with .mp3 or .wav files
OUTPUT_DIR = "transcripts"  # Folder to store .json outputs
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Model configuration
MODEL_SIZE = "large-v3"  # Options: tiny, base, small, medium, large-v3, etc.
LANGUAGE = "pt"  # e.g., 'en', 'pt', 'es', 'fr'
DEVICE = "cpu"  # 'cpu', 'cuda', 'mps', or 'auto'
COMPUTE_TYPE = "auto"  # 'float16', 'int8', 'auto', etc.

## 🔄 Step 2: Utility Functions

We’ll define a few helper functions for:

- Converting MP3 → WAV (mono, 16kHz)
- Resolving hardware configuration
- Running the transcription

In [4]:
def convert_to_wav(input_path: str) -> str:
    """Converts any audio to mono WAV 16kHz."""
    output_path = Path(input_path).with_suffix(".wav")
    command = [
        "ffmpeg",
        "-y",
        "-i",
        input_path,
        "-acodec",
        "pcm_s16le",
        "-ac",
        "1",
        "-ar",
        "16000",
        str(output_path),
    ]
    subprocess.run(command, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    return str(output_path)


def resolve_device(device_choice):
    if device_choice == "auto":
        import torch

        if torch.cuda.is_available():
            return "cuda"
        elif torch.backends.mps.is_available():
            return "mps"
        return "cpu"
    return device_choice


def resolve_compute_type(device, compute_choice):
    if compute_choice != "auto":
        return compute_choice
    if device in {"cuda", "mps"}:
        return "float16"
    return "int8"


def transcribe_file(model, file_path, language="en"):
    segments, _ = model.transcribe(
        file_path,
        beam_size=5,
        word_timestamps=False,
        vad_filter=True,
        language=language,
    )
    return [
        {"start": seg.start, "end": seg.end, "text": seg.text.strip()}
        for seg in segments
    ]

## ⚙️ Step 3: Load the faster-whisper Model

In [5]:
device = resolve_device(DEVICE)
compute = resolve_compute_type(device, COMPUTE_TYPE)
print(f"Loading model on `{device}` with `{compute}` precision...")

model = WhisperModel(MODEL_SIZE, device=device, compute_type=compute)

Loading model on `cpu` with `int8` precision...


## 🔁 Step 4: Transcribe All Files in the Folder
This will process each `.mp3` or `.wav` file in the folder and save the output as a `.json` transcription.

In [6]:
for file_name in os.listdir(AUDIO_DIR):
    if not file_name.lower().endswith((".mp3", ".wav")):
        continue

    input_path = os.path.join(AUDIO_DIR, file_name)
    print(f"\n🔊 Processing: {file_name}")

    # Convert to 16kHz WAV
    wav_path = convert_to_wav(input_path)
    print("📥 Converted to WAV.")

    # Transcribe
    start = time.time()
    segments = transcribe_file(model, wav_path, language=LANGUAGE)
    print(f"📝 Transcribed in {time.time() - start:.2f}s.")

    # Save as JSON
    output_path = os.path.join(OUTPUT_DIR, Path(file_name).stem + ".json")
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(segments, f, ensure_ascii=False, indent=2)
    print(f"✅ Saved to: {output_path}")


🔊 Processing: 59414b70-ed2b-43a9-9d61-09968aa13674_audio2.MP3
📥 Converted to WAV.
📝 Transcribed in 78.26s.
✅ Saved to: transcripts/59414b70-ed2b-43a9-9d61-09968aa13674_audio2.json

🔊 Processing: 7aed1441-334f-4556-b9b6-b69927b1ff4b_audio2.wav
📥 Converted to WAV.
📝 Transcribed in 60.49s.
✅ Saved to: transcripts/7aed1441-334f-4556-b9b6-b69927b1ff4b_audio2.json

🔊 Processing: c4110d90-63b5-4afc-94e2-7089ac7ed712_audio2.MP3
📥 Converted to WAV.
📝 Transcribed in 141.25s.
✅ Saved to: transcripts/c4110d90-63b5-4afc-94e2-7089ac7ed712_audio2.json

🔊 Processing: 59414b70-ed2b-43a9-9d61-09968aa13674_audio1.MP3
📥 Converted to WAV.
📝 Transcribed in 34.75s.
✅ Saved to: transcripts/59414b70-ed2b-43a9-9d61-09968aa13674_audio1.json

🔊 Processing: 7aed1441-334f-4556-b9b6-b69927b1ff4b_audio1.wav
📥 Converted to WAV.
📝 Transcribed in 32.98s.
✅ Saved to: transcripts/7aed1441-334f-4556-b9b6-b69927b1ff4b_audio1.json

🔊 Processing: c4110d90-63b5-4afc-94e2-7089ac7ed712_audio1.MP3
📥 Converted to WAV.
📝 Transcrib

## ✅ Done!

You can now find your transcriptions in the `transcripts/` folder.

Each `.json` file will look like:

```json
[
  {
    "start": 0.0,
    "end": 2.56,
    "text": "Hello, how can I help you?"
  },
  ...
]