## Instructions on how to use final WhisperSeg Model - A Step by Step Guide


Always run the cells in the correct order!

In [None]:
#You only need to run this cell the first time running the code (it might take some time)
!pip install numpy scipy torch transformers huggingface_hub ctranslate2 matplotlib ipywidgets scikit-learn pillow librosa pandas

Collecting numpy
  Using cached numpy-2.4.1-cp314-cp314-win_amd64.whl.metadata (6.6 kB)
Collecting scipy
  Using cached scipy-1.17.0-cp314-cp314-win_amd64.whl.metadata (60 kB)
Collecting torch
  Using cached torch-2.10.0-cp314-cp314-win_amd64.whl.metadata (31 kB)
Collecting transformers
  Using cached transformers-5.0.0-py3-none-any.whl.metadata (37 kB)
Collecting huggingface_hub
  Using cached huggingface_hub-1.3.4-py3-none-any.whl.metadata (13 kB)
Collecting ctranslate2
  Using cached ctranslate2-4.6.3-cp314-cp314-win_amd64.whl.metadata (10 kB)
Collecting matplotlib
  Using cached matplotlib-3.10.8-cp314-cp314-win_amd64.whl.metadata (52 kB)
Collecting ipywidgets
  Downloading ipywidgets-8.1.8-py3-none-any.whl.metadata (2.4 kB)
Collecting scikit-learn
  Using cached scikit_learn-1.8.0-cp314-cp314-win_amd64.whl.metadata (11 kB)
Collecting pillow
  Using cached pillow-12.1.0-cp314-cp314-win_amd64.whl.metadata (9.0 kB)
Collecting filelock (from torch)
  Using cached filelock-3.20.3-py3-n

In [None]:
pip install pandas

In [1]:
import argparse
import json
import logging
import librosa
import os

from os.path import join
from pathlib import Path
from scipy.io import wavfile

from utils import fetch_sr_rounded, infer

If you want only HIGH QUALITY calls to be detected (default): Use final_checkpoint_20251113_145510_ct2
Else use os.path.join(PATH, "final_checkpoint_20251113_145510_ct2") 


In [2]:
logging.basicConfig(level=logging.INFO)
PATH = os.getcwd()

DATA_DIR = os.path.join(PATH, "audios")

MODEL_PATH = os.path.join(PATH, "final_checkpoint_20251116_163404_ct2")

#if you want to find ALL quality classes use this:
#MODEL_PATH = os.path.join(PATH, "final_checkpoint_20251113_145510_ct2") 

OUTPUT_DIR = os.path.join(PATH,"jsons")

MIN_FREQUENCY = 0
SPEC_TIME_STEP = 0.0025
MIN_SEGMENT_LENGTH = 0.0195
EPS = 0.02
NUM_TRIALS = 3


In [3]:
import os

print("MODEL_PATH:", MODEL_PATH)
print("Exists:", os.path.exists(MODEL_PATH))
print("Is dir:", os.path.isdir(MODEL_PATH))
print("Contents:", os.listdir(MODEL_PATH))


MODEL_PATH: d:\JUPYTER\final_checkpoint_20251116_163404_ct2
Exists: True
Is dir: True
Contents: ['config.json', 'loss_curve.png', 'model.bin', 'U2024_09_03_10_03_14_799-U2024_09_03_10_04_42_175.UBN_v2.jsonr', 'U2024_09_03_10_04_42_203-U2024_09_03_10_06_09_580.UBN_v2.jsonr', 'U2024_09_03_11_37_56_745-U2024_09_03_11_39_24_123.UBN_v2.jsonr', 'U2024_09_10_12_16_34_369-U2024_09_10_12_18_01_445.UBN_v2.jsonr', 'U2024_09_10_12_18_01_482-U2024_09_10_12_19_28_556.UBN_v2.jsonr', 'U2024_09_24_11_05_26_490-U2024_09_24_11_06_53_867.UBN_v2.jsonr', 'U2024_09_24_12_06_37_702-U2024_09_24_12_08_05_079.UBN_v2.jsonr', 'U2024_09_24_12_24_06_562-U2024_09_24_12_25_33_938.UBN_v2.jsonr', 'U2024_09_24_12_40_08_036-U2024_09_24_12_41_35_412.UBN_v2.jsonr', 'U2024_09_24_13_18_00_613-U2024_09_24_13_19_27_989.UBN_v2.jsonr', 'U2025_09_18_08_34_48_865-U2025_09_18_08_36_16_243.UBN_v5.jsonr', 'U2025_09_18_08_42_06_242-U2025_09_18_08_43_33_618.UBN_v2.jsonr', 'U2025_09_18_09_31_37_579-U2025_09_18_09_33_04_957.UBN_v2.jsonr',

In [4]:
infer(
    data_dir=DATA_DIR,
    model_path=MODEL_PATH,
    output_dir=OUTPUT_DIR,
    min_frequency=MIN_FREQUENCY,
    spec_time_step=SPEC_TIME_STEP,
    min_segment_length=MIN_SEGMENT_LENGTH,
    eps=EPS,
    num_trials=NUM_TRIALS
)

[INFO] Using local model path: d:\JUPYTER\final_checkpoint_20251116_163404_ct2


INFO:root:Model loaded successfully.
INFO:root:Found 2 wav files in: d:\JUPYTER\audios
INFO:root:[1/2] Processing file: U2024_09_03_10_03_14_799-U2024_09_03_10_04_42_175.UBN_v2.WAV
INFO:root:✅ Finished file 1/2, saved to d:\JUPYTER\jsons\U2024_09_03_10_03_14_799-U2024_09_03_10_04_42_175.UBN_v2.json
INFO:root:[2/2] Processing file: U2024_09_03_10_04_42_203-U2024_09_03_10_06_09_580.UBN_v2.WAV
INFO:root:✅ Finished file 2/2, saved to d:\JUPYTER\jsons\U2024_09_03_10_04_42_203-U2024_09_03_10_06_09_580.UBN_v2.json
INFO:root:All files processed.


Now, you should be able to see the predictiosn as .json files in the folder 'jsons'.

To visualize the results in Raven, we need to convert these .json files to .txt files used by Raven. You can do this by running the next cell:

In [5]:
import json
import pandas as pd
import sys
import os
from json_to_raven import process_folder

JSON_DIR = os.path.join(PATH,"jsons")
RAVEN_DIR = os.path.join(PATH,"raven")

process_folder(JSON_DIR, RAVEN_DIR)


ModuleNotFoundError: No module named 'pandas'

The .txt files of your results can now be found in the folder 'raven'.