## How to use the WhisperFormer Model - Step by Step Guide

This notebook runs a trained WhisperFormer model on your audio files and saves the detected calls as `.json` files. Optionally, results can be converted to Raven selection tables.

**Always run the cells in order from top to bottom!**

### Step 1: Install Dependencies

You only need to run this cell the first time running the notebook (it might take some time).

In [None]:
!pip install numpy scipy torch transformers librosa pandas

### Step 2: Import Required Modules

In [None]:
import logging
import json
import os
import torch

from utils import infer

### Step 3: Set Paths and Parameters

Before running: place your `.wav` files in the `audios/` folder next to this notebook.

You also need:
- **checkpoint file** (`.pth`) -- the trained WhisperFormer model
- **whisper_config/** folder -- must contain `config.json` and `preprocessor_config.json` from the Whisper model used during training (copy from `whisper_models/whisper_base` or `whisper_models/whisper_large` in the main repository)

In [None]:
logging.basicConfig(level=logging.INFO)
PATH = os.getcwd()

# --- Paths (adjust if needed) ---
DATA_DIR = os.path.join(PATH, "audios")
CHECKPOINT_PATH = os.path.join(PATH, "checkpoint.pth")
WHISPER_CONFIG_PATH = os.path.join(PATH, "whisper_config")
OUTPUT_DIR = os.path.join(PATH, "jsons")

# --- Inference parameters ---
THRESHOLD = 0.35        # minimum confidence score to keep a prediction
IOU_THRESHOLD = 0.4     # IoU threshold for non-maximum suppression
NUM_RUNS = 3            # number of offset runs (1 = fast, 3 = more robust)
OVERLAP_TOLERANCE = 0.1 # IoU threshold for consolidating predictions across runs
TOTAL_SPEC_COLUMNS = 3000
BATCH_SIZE = 4

### Step 4: Verify Paths

Run this cell to check that all required paths exist.

In [None]:
all_ok = True

print("Checkpoint:", CHECKPOINT_PATH)
if os.path.exists(CHECKPOINT_PATH):
    size_mb = os.path.getsize(CHECKPOINT_PATH) / 1e6
    print(f"  OK ({size_mb:.0f} MB)")
else:
    print("  MISSING -- please provide a .pth checkpoint file")
    all_ok = False

print("Whisper config:", WHISPER_CONFIG_PATH)
if os.path.isdir(WHISPER_CONFIG_PATH):
    contents = os.listdir(WHISPER_CONFIG_PATH)
    print(f"  OK (files: {contents})")
    for needed in ["config.json", "preprocessor_config.json"]:
        if needed not in contents:
            print(f"  WARNING: {needed} is missing in whisper_config/")
            all_ok = False
else:
    print("  MISSING -- copy from whisper_models/whisper_base or whisper_large")
    all_ok = False

print("Audio folder:", DATA_DIR)
if os.path.isdir(DATA_DIR):
    wav_files = [f for f in os.listdir(DATA_DIR) if f.lower().endswith(".wav")]
    print(f"  OK ({len(wav_files)} WAV file(s) found)")
    if not wav_files:
        print("  WARNING: no .wav files found in audios/")
        all_ok = False
else:
    print("  MISSING -- create an 'audios' folder and place your .wav files there")
    all_ok = False

print()
print("Device:", "cuda" if torch.cuda.is_available() else "cpu")
print()
if all_ok:
    print("Everything looks good! Proceed to Step 5.")
else:
    print("Please fix the issues above before running inference.")

### Step 5: Run Inference

This will process all `.wav` files in `audios/` and save predictions as `.json` files in `jsons/`.

The model runs each file multiple times with different time offsets and consolidates the results for more robust predictions.

In [None]:
results = infer(
    data_dir=DATA_DIR,
    checkpoint_path=CHECKPOINT_PATH,
    whisper_config_path=WHISPER_CONFIG_PATH,
    output_dir=OUTPUT_DIR,
    threshold=THRESHOLD,
    iou_threshold=IOU_THRESHOLD,
    total_spec_columns=TOTAL_SPEC_COLUMNS,
    batch_size=BATCH_SIZE,
    num_runs=NUM_RUNS,
    overlap_tolerance=OVERLAP_TOLERANCE,
)

### Step 6: Results Summary

Overview of the detected calls per file:

total = 0
for filename, preds in results.items():
    n = len(preds["onset"])
    total += n
    clusters = {}
    for c in preds["cluster"]:
        clusters[c] = clusters.get(c, 0) + 1
    cluster_str = ", ".join(f"{k}: {v}" for k, v in sorted(clusters.items()))
    print(f"  {filename}: {n} predictions ({cluster_str})")

print(f"\nTotal: {total} predictions across {len(results)} file(s)")
print(f"Results saved to: {OUTPUT_DIR}")

In [None]:
### Step 7: Convert to Raven Selection Tables (Optional)

To visualize the results in Raven, convert the `.json` files to `.txt` selection tables.

In [None]:
from json_to_raven import process_folder

JSON_DIR = os.path.join(PATH, "jsons")
RAVEN_DIR = os.path.join(PATH, "raven")

process_folder(JSON_DIR, RAVEN_DIR)

The `.txt` selection tables can now be found in the `raven/` folder. Open them in Raven Pro alongside the corresponding audio files to visualize the detected calls.