## Instructions on how to use final WhisperFormer Model - A Step by Step Guide

Always run the cells in the correct order!

### Step 1: Install Dependencies

You only need to run this cell the first time (it might take some time).

In [None]:
!pip install numpy scipy torch transformers librosa pandas matplotlib

### Step 2: Import Required Modules

In [None]:
import logging
import os
import torch

from utils import infer

### Step 3: Set Paths and Parameters

**Required folder structure:**
```
whisperformer/
├── whisperformer_application.ipynb   (this notebook)
├── utils.py
├── model.py
├── json_to_raven.py
├── audios/                           (place your .wav files here)
├── whisper_config/                   (Whisper config directory, contains config.json etc.)
└── checkpoint.pth                    (your trained WhisperFormer checkpoint)
```

**Important:** The `whisper_config` folder must contain the Whisper model config files (at minimum `config.json`). You can find these in the `whisper_models/whisper_base` or `whisper_models/whisper_large` directory of the main repository.

In [None]:
logging.basicConfig(level=logging.INFO)
PATH = os.getcwd()

DATA_DIR = os.path.join(PATH, "audios")

CHECKPOINT_PATH = os.path.join(PATH, "checkpoint.pth")

WHISPER_CONFIG_PATH = os.path.join(PATH, "whisper_config")

OUTPUT_DIR = os.path.join(PATH, "jsons")

THRESHOLD = 0.35
IOU_THRESHOLD = 0.4
TOTAL_SPEC_COLUMNS = 3000
BATCH_SIZE = 4

### Step 4: Verify Paths

Run this cell to check that all required paths exist.

In [None]:
print("CHECKPOINT_PATH:", CHECKPOINT_PATH)
print("  Exists:", os.path.exists(CHECKPOINT_PATH))

print("WHISPER_CONFIG_PATH:", WHISPER_CONFIG_PATH)
print("  Exists:", os.path.exists(WHISPER_CONFIG_PATH))
if os.path.isdir(WHISPER_CONFIG_PATH):
    print("  Contents:", os.listdir(WHISPER_CONFIG_PATH))

print("DATA_DIR:", DATA_DIR)
print("  Exists:", os.path.exists(DATA_DIR))
if os.path.isdir(DATA_DIR):
    wav_files = [f for f in os.listdir(DATA_DIR) if f.lower().endswith(".wav")]
    print(f"  WAV files found: {len(wav_files)}")

print("Device:", "cuda" if torch.cuda.is_available() else "cpu")

### Step 5: Run Inference

This will process all `.wav` files in `audios/` and save predictions as `.json` files in `jsons/`.

In [None]:
infer(
    data_dir=DATA_DIR,
    checkpoint_path=CHECKPOINT_PATH,
    whisper_config_path=WHISPER_CONFIG_PATH,
    output_dir=OUTPUT_DIR,
    threshold=THRESHOLD,
    iou_threshold=IOU_THRESHOLD,
    total_spec_columns=TOTAL_SPEC_COLUMNS,
    batch_size=BATCH_SIZE,
)

Now you should be able to see the predictions as `.json` files in the `jsons` folder.

### Step 6: Convert to Raven Selection Tables (Optional)

To visualize the results in Raven, convert the `.json` files to `.txt` selection tables:

In [None]:
from json_to_raven import process_folder

JSON_DIR = os.path.join(PATH, "jsons")
RAVEN_DIR = os.path.join(PATH, "raven")

process_folder(JSON_DIR, RAVEN_DIR)

The `.txt` files of your results can now be found in the `raven` folder. You can open them directly in Raven Pro alongside the corresponding audio files.