This notebook demonstrates how to use pre-trained models (PANNs and YAMNet) for real-time inference. PANNs and YAMNet both gives probabilities over AudioSet categories.
These models are currently used independently. Their predictions are not yet combined.

Session Lifecycle
   - `start_session()` opens the JSONL file, timestamps the start, and resets counters
   - `process_chunk(wave_chunk, chunk_sr)` is called for each arriving audio frame
   - `stop_session()` closes files and writes a compact session summary (start/stop time, number of frames, config)

 Inference Pipeline

In [None]:
from realtime_solo import RealTimeSolo
import librosa
import numpy as np
# Step 1: Initialize
solo = RealTimeSolo("config.yaml")
solo.start_session()
waveform, sr = librosa.load("resources_R9_ZSCveAHg_7s.wav", sr=16000)

In [2]:

# Step 2: Frame-wise chunking and detection
frame_size = int(solo.sr * solo.chunk_ms / 1000.0)
for offset in range(0, len(waveform), frame_size):
    chunk = waveform[offset : offset + frame_size]
    if len(chunk) < frame_size:
        chunk = np.pad(chunk, (0, frame_size - len(chunk)))
    solo.process_chunk(chunk, chunk_sr=16000)

solo.stop_session()
print('results saved in ./runs')

Checkpoint path: C:\Users\asus/panns_data/Cnn14_DecisionLevelMax.pth
GPU number: 1
results saved in ./run


Output Files
stream_preds.jsonl: frame-wise detection results (1 line per chunk)
stream_summary.json: metadata like timestamps, chunk count

The core detection logic:

In [None]:
local_window = append_chunk_to_window(chunk)
pann_output = pann_model.infer_clipwise(local_window)
yam_output  = yamnet_model.infer_clipwise(local_window)
row = {
    "time_start": ..., "time_end": ...,
    "PANN":   {"top_label": ..., "top_score": ...},
    "YAMNet": {"top_label": ..., "top_score": ...}
}


Real-Time Integration with Recorder
Inside record_sound.py:
Tkinter GUI starts and stops recording
Audio chunks are continuously streamed to the detection system

In [None]:
rec = ChunkRecorder(sr=16000, chunk_seconds=2, on_chunk=handle_chunk)
rec.start()

def handle_chunk(chunk):
    solo.process_chunk(chunk, chunk_sr=16000)  # Real-time inference per chunk