<a href="https://colab.research.google.com/github/helitb/iris-poc/blob/main/notebooks/audio_pipeline_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IRIS Audio Pipeline (Colab GPU)

Notebook to exercise the audio pipeline on a Colab GPU runtime. It installs GPU-ready deps, loads the IRIS code, and runs a sample window over a WAV file.

## Prep
- Runtime ➜ Change runtime type ➜ GPU
- Place this repo at `/content/IRIS` (clone or upload). The pipeline code lives in `poc/src` and sample audio in `poc/data/audio`.

In [1]:
# Check GPU is visible
!nvidia-smi
import torch
print('torch', torch.__version__)
print('cuda available', torch.cuda.is_available())

Sun Dec 28 17:07:42 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   55C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

Install dependencies (CUDA wheel picked automatically from nvidia-smi).

In [2]:
import subprocess
CUDA_VARIANT = 'cu121'  # fallback
try:
    smi = subprocess.check_output(['nvidia-smi']).decode()
    if 'CUDA Version: 11.' in smi:
        CUDA_VARIANT = 'cu118'
    elif 'CUDA Version: 12.' in smi:
        CUDA_VARIANT = 'cu121'
except Exception as exc:  # noqa: BLE001
    print('Could not auto-detect CUDA version, defaulting to cu121:', exc)

print(f'Using CUDA wheels: {CUDA_VARIANT}')


Using CUDA wheels: cu121
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m39.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [7]:
!pip install -q --upgrade pip
!pip install -q --extra-index-url https://download.pytorch.org/whl/$CUDA_VARIANT torch>=2.0.0 torchaudio>=2.0.0
!pip install -q -f https://opennmt.net/CTranslate2/whl/$CUDA_VARIANT ctranslate2==4.3.1 faster-whisper==1.0.3
!pip install -q silero-vad speechbrain praat-parselmouth sentence-transformers soundfile

**Get the code**
- If the repo is public, set `REPO_URL` to clone.
- Otherwise upload/drag the `IRIS` folder into the Colab file pane or mount Drive to `/content/IRIS`.

In [19]:
GITHUB_TOKEN = 'ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJ8TmVot0mr67ykcBMd1bWmHxSjWl9z6I/RoO/IOiTHl'
token_len = len(GITHUB_TOKEN)
print(f"git token size is {token_len}")

git token size is 80


In [25]:
GITHUB_TOKEN = 'ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJ8TmVot0mr67ykcBMd1bWmHxSjWl9z6I/RoO/IOiTHl'

REPO_URL = 'helitb/iris-poc.git'
if REPO_URL:
    !rm -rf /content/iris-poc
    !git clone https://${GITHUB_TOKEN}@github.com/${REPO_URL}
else:
    print('Upload or mount the IRIS repo to /content/IRIS (with poc/src and poc/data/audio).')

Cloning into 'AAAAC3NzaC1lZDI1NTE5AAAAIJ8TmVot0mr67ykcBMd1bWmHxSjWl9z6I/RoO/IOiTHl@github.com//iris-poc.git'...
fatal: unable to access 'https://-ed25519/': Could not resolve host: -ed25519


In [8]:
REPO_URL = 'helitb/iris-poc.git'
if REPO_URL:
    !rm -rf /content/iris-poc
    !git clone https://github.com/helitb/iris-poc
else:
    print('Upload or mount the IRIS repo to /content/IRIS (with poc/src and poc/data/audio).')

Cloning into 'iris-poc'...
remote: Enumerating objects: 29, done.[K
remote: Counting objects: 100% (29/29), done.[K
remote: Compressing objects: 100% (29/29), done.[K
remote: Total 29 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (29/29), 6.41 MiB | 23.62 MiB/s, done.


In [9]:
from pathlib import Path
import sys

PROJECT_ROOT = Path('/content/iris-poc')
CODE_ROOT = PROJECT_ROOT / 'src' / 'audio'
if not CODE_ROOT.exists():
    raise FileNotFoundError('IRIS code not found. Ensure /content/iris-poc/poc/src exists (see previous cell).')

sys.path.append(str(CODE_ROOT))
print('Using code from', CODE_ROOT)

Using code from /content/iris-poc/src/audio


## Audio Pipeline

Load a sample WAV (replace with your own if you want).

In [10]:
from pathlib import Path
import soundfile as sf
import numpy as np

AUDIO_FILE = PROJECT_ROOT / 'data' / 'audio' / 'slice2_audio.wav'
if not AUDIO_FILE.exists():
    raise FileNotFoundError('Audio file not found. Point AUDIO_FILE to a WAV you uploaded.')

audio, sr = sf.read(AUDIO_FILE, dtype='float32')
if audio.ndim > 1:
    audio = audio.mean(axis=1)  # mono

TARGET_SR = 16000
if sr != TARGET_SR:
    import librosa
    audio = librosa.resample(audio, orig_sr=sr, target_sr=TARGET_SR)
    sr = TARGET_SR

print(f'Loaded {AUDIO_FILE.name}: sr={sr}, shape={audio.shape}, duration={len(audio)/sr:.2f}s')


Loaded slice2_audio.wav: sr=16000, shape=(480011,), duration=30.00s


Run the pipeline over the window. Switch `diarization_backend` to `'simple'` if you want a faster lightweight diarizer. Set `enable_prosody=True` to test prosody (adds compute).

In [11]:
from datetime import datetime
from pipeline import AudioPipeline, PipelineConfig

config = PipelineConfig(
    debug=True,
    diarization_backend='simple',  # use 'simple' for fastest smoke test
    asr_model='ivrit-v2-d4',            # Hebrew optimized faster-whisper
    enable_prosody=False,               # enable to exercise prosody extractor
    retain_transcripts=True,
)

pipeline = AudioPipeline(config)
result = pipeline.process_window(audio, datetime.now())

print(f'Processing time: {result.processing_time_ms} ms')
print(f'Utterances: {result.num_utterances}, Speakers detected: {result.speakers_detected}')
print('--- Sample events ---')
for evt in result.speech_events[:5]:
    print(evt)


AttributeError: module 'torchaudio' has no attribute 'list_audio_backends'

Optional: tabular view of events (keeps transcripts only because retain_transcripts=True above).

In [None]:
import pandas as pd

rows = []
for evt in result.speech_events:
    rows.append({
        'timestamp': evt.timestamp,
        'speaker': getattr(evt.speaker, 'id', None),
        'duration_ms': evt.duration_ms,
        'text': getattr(evt, 'transcription', None),
        'complexity': getattr(evt, 'complexity', None),
        'gap_before_ms': evt.gap_before_ms,
    })

df = pd.DataFrame(rows)
df.head()
