# IRIS Audio Pipeline (Colab GPU)

Notebook to exercise the audio pipeline on a Colab GPU runtime. It installs GPU-ready deps, loads the IRIS code, and runs a sample window over a WAV file.

**Prep**
- Runtime ➜ Change runtime type ➜ GPU
- Place this repo at `/content/IRIS` (clone or upload). The pipeline code lives in `poc/src` and sample audio in `poc/data/audio`.

In [None]:
# Check GPU is visible
!nvidia-smi
import torch
print('torch', torch.__version__)
print('cuda available', torch.cuda.is_available())

Install dependencies (CUDA wheel picked automatically from nvidia-smi).

In [None]:
import subprocess
CUDA_VARIANT = 'cu121'  # fallback
try:
    smi = subprocess.check_output(['nvidia-smi']).decode()
    if 'CUDA Version: 11.' in smi:
        CUDA_VARIANT = 'cu118'
    elif 'CUDA Version: 12.' in smi:
        CUDA_VARIANT = 'cu121'
except Exception as exc:  # noqa: BLE001
    print('Could not auto-detect CUDA version, defaulting to cu121:', exc)

print(f'Using CUDA wheels: {CUDA_VARIANT}')
!pip install -q --upgrade pip
!pip install -q --extra-index-url https://download.pytorch.org/whl/$CUDA_VARIANT torch torchaudio
!pip install -q -f https://opennmt.net/CTranslate2/whl/$CUDA_VARIANT ctranslate2==4.3.1 faster-whisper==1.0.3
!pip install -q silero-vad speechbrain praat-parselmouth sentence-transformers soundfile librosa

**Get the code**
- If the repo is public, set `REPO_URL` to clone.
- Otherwise upload/drag the `IRIS` folder into the Colab file pane or mount Drive to `/content/IRIS`.

In [None]:
REPO_URL = ''  # e.g., 'https://github.com/.../IRIS.git'
if REPO_URL:
    !rm -rf /content/IRIS
    !git clone $REPO_URL /content/IRIS
else:
    print('Upload or mount the IRIS repo to /content/IRIS (with poc/src and poc/data/audio).')

In [None]:
from pathlib import Path
import sys

PROJECT_ROOT = Path('/content/IRIS')
CODE_ROOT = PROJECT_ROOT / 'poc' / 'src'
if not CODE_ROOT.exists():
    raise FileNotFoundError('IRIS code not found. Ensure /content/IRIS/poc/src exists (see previous cell).')

sys.path.append(str(CODE_ROOT))
print('Using code from', CODE_ROOT)


Load a sample WAV (replace with your own if you want).

In [None]:
from pathlib import Path
import soundfile as sf
import numpy as np

AUDIO_FILE = PROJECT_ROOT / 'poc' / 'data' / 'audio' / 'slice2_audio.wav'
if not AUDIO_FILE.exists():
    raise FileNotFoundError('Audio file not found. Point AUDIO_FILE to a WAV you uploaded.')

audio, sr = sf.read(AUDIO_FILE, dtype='float32')
if audio.ndim > 1:
    audio = audio.mean(axis=1)  # mono

TARGET_SR = 16000
if sr != TARGET_SR:
    import librosa
    audio = librosa.resample(audio, orig_sr=sr, target_sr=TARGET_SR)
    sr = TARGET_SR

print(f'Loaded {AUDIO_FILE.name}: sr={sr}, shape={audio.shape}, duration={len(audio)/sr:.2f}s')


Run the pipeline over the window. Switch `diarization_backend` to `'simple'` if you want a faster lightweight diarizer. Set `enable_prosody=True` to test prosody (adds compute).

In [None]:
from datetime import datetime
from audio.pipeline import AudioPipeline, PipelineConfig

config = PipelineConfig(
    debug=True,
    diarization_backend='speechbrain',  # use 'simple' for fastest smoke test
    asr_model='ivrit-v2-d4',            # Hebrew optimized faster-whisper
    enable_prosody=False,               # enable to exercise prosody extractor
    retain_transcripts=True,
)

pipeline = AudioPipeline(config)
result = pipeline.process_window(audio, datetime.now())

print(f'Processing time: {result.processing_time_ms} ms')
print(f'Utterances: {result.num_utterances}, Speakers detected: {result.speakers_detected}')
print('--- Sample events ---')
for evt in result.speech_events[:5]:
    print(evt)


Optional: tabular view of events (keeps transcripts only because retain_transcripts=True above).

In [None]:
import pandas as pd

rows = []
for evt in result.speech_events:
    rows.append({
        'timestamp': evt.timestamp,
        'speaker': getattr(evt.speaker, 'id', None),
        'duration_ms': evt.duration_ms,
        'text': getattr(evt, 'transcription', None),
        'complexity': getattr(evt, 'complexity', None),
        'gap_before_ms': evt.gap_before_ms,
    })

df = pd.DataFrame(rows)
df.head()
