## How to use the WhisperSeg Model - Step by Step Guide

This notebook runs a trained WhisperSeg model on your audio files and saves the detected calls as `.json` files. Optionally, results can be converted to Raven selection tables.

**Always run the cells in order from top to bottom!**

### Step 1: Install Dependencies

You only need to run this cell the first time running the notebook (it might take some time).

In [None]:
!pip install numpy scipy torch transformers huggingface_hub ctranslate2 librosa pandas

### Step 2: Import Required Modules

In [None]:
import logging
import os

from utils import infer

### Step 3: Set Paths and Parameters

Before running: place your `.wav` files in the `audios/` folder next to this notebook.

You also need a **model checkpoint folder** (CT2 format). Two options:
- `final_checkpoint_20251116_163404_ct2` -- detects only **high quality** calls (default)
- `final_checkpoint_20251113_145510_ct2` -- detects **all quality** classes

In [None]:
logging.basicConfig(level=logging.INFO)
PATH = os.getcwd()

# --- Paths (adjust if needed) ---
DATA_DIR = os.path.join(PATH, "audios")
MODEL_PATH = os.path.join(PATH, "final_checkpoint_20251116_163404_ct2")
OUTPUT_DIR = os.path.join(PATH, "jsons")

# To detect ALL quality classes instead, uncomment the line below:
# MODEL_PATH = os.path.join(PATH, "final_checkpoint_20251113_145510_ct2")

# --- Inference parameters ---
MIN_FREQUENCY = 0         # minimum frequency for spectrogram (Hz)
SPEC_TIME_STEP = 0.0025   # time step for spectrogram (seconds)
MIN_SEGMENT_LENGTH = 0.0195  # minimum segment length (seconds)
EPS = 0.02                # DBSCAN epsilon for clustering
NUM_TRIALS = 3            # number of segmentation trials

### Step 4: Verify Paths

Run this cell to check that all required paths exist.

In [None]:
all_ok = True

print("Model:", MODEL_PATH)
if os.path.isdir(MODEL_PATH):
    contents = os.listdir(MODEL_PATH)
    has_model = any(f.endswith(".bin") for f in contents)
    print(f"  OK (directory with {len(contents)} files, model.bin: {'found' if has_model else 'MISSING'})")
    if not has_model:
        all_ok = False
else:
    print("  MISSING -- please provide the CT2 model checkpoint folder")
    all_ok = False

print("Audio folder:", DATA_DIR)
if os.path.isdir(DATA_DIR):
    wav_files = [f for f in os.listdir(DATA_DIR) if f.lower().endswith(".wav")]
    print(f"  OK ({len(wav_files)} WAV file(s) found)")
    if not wav_files:
        print("  WARNING: no .wav files found in audios/")
        all_ok = False
else:
    print("  MISSING -- create an 'audios' folder and place your .wav files there")
    all_ok = False

print()
if all_ok:
    print("Everything looks good! Proceed to Step 5.")
else:
    print("Please fix the issues above before running inference.")

### Step 5: Run Inference

This will process all `.wav` files in `audios/` and save predictions as `.json` files in `jsons/`.

In [None]:
results = infer(
    data_dir=DATA_DIR,
    model_path=MODEL_PATH,
    output_dir=OUTPUT_DIR,
    min_frequency=MIN_FREQUENCY,
    spec_time_step=SPEC_TIME_STEP,
    min_segment_length=MIN_SEGMENT_LENGTH,
    eps=EPS,
    num_trials=NUM_TRIALS,
)

### Step 6: Results Summary

Overview of the detected calls per file:

In [None]:
total = 0
for filename, preds in results.items():
    n = len(preds.get("onset", []))
    total += n
    clusters = {}
    for c in preds.get("cluster", []):
        clusters[c] = clusters.get(c, 0) + 1
    cluster_str = ", ".join(f"{k}: {v}" for k, v in sorted(clusters.items()))
    print(f"  {filename}: {n} predictions ({cluster_str})")

print(f"\nTotal: {total} predictions across {len(results)} file(s)")
print(f"Results saved to: {OUTPUT_DIR}")

### Step 7: Convert to Raven Selection Tables (Optional)

To visualize the results in Raven, convert the `.json` files to `.txt` selection tables.

In [None]:
from json_to_raven import process_folder

JSON_DIR = os.path.join(PATH, "jsons")
RAVEN_DIR = os.path.join(PATH, "raven")

process_folder(JSON_DIR, RAVEN_DIR)

The `.txt` selection tables can now be found in the `raven/` folder. Open them in Raven Pro alongside the corresponding audio files to visualize the detected calls.