feat(audio): add advanced audio processing kernels (v0.2.12) #99

m96-chan · 2025-12-22T17:03:07Z

Summary

Add comprehensive GPU audio processing operations with custom Radix-2 FFT (no cuFFT dependency).

New Audio Operations

Category	Operations
Time-Frequency	`istft`, `griffin_lim`
Spectral Features	`spectral_centroid`, `spectral_bandwidth`, `spectral_rolloff`, `spectral_flatness`, `spectral_contrast`
Pitch Detection	`detect_pitch_yin`, `detect_pitch_yin_frames`, `autocorrelation`
Music Analysis	`cqt`, `chroma_stft`, `chroma_cqt`, `zero_crossing_rate`
Source Separation	`hpss`, `harmonic`, `percussive`
Time/Pitch	`time_stretch`, `pitch_shift`

Changes

Add CUDA kernels for all new audio operations (audio_kernels.cuh)
Add C++ dispatch functions (audio.cu, audio.hpp)
Add pybind11 bindings (ops_bindings.cpp)
Add Python wrappers (audio.py)
Add demo script (demo_v0212.py)
Update README with v0.2.12 features
Bump version to 0.2.12

Key Features

Driver-Only Mode: All kernels use custom Radix-2 FFT, no cuFFT dependency
IFFT: Uses conjugate twiddle factors + 1/N scaling
Griffin-Lim: Iterative phase reconstruction
YIN Algorithm: Robust pitch detection
HPSS: Median filtering for harmonic-percussive separation
Phase Vocoder: Time stretch and pitch shift

Test plan

All 229 existing tests pass
Demo script runs successfully
New audio functions verified manually

🤖 Generated with Claude Code

Add GPU-accelerated audio processing operations for ASR/Whisper preprocessing: - from_pcm: Convert int16/float32 PCM to GPUArray - stereo_to_mono: Convert stereo to mono - normalize_peak/rms: Audio level normalization - resample: 48kHz -> 16kHz polyphase resampling Also adds Int16 DataType support and updates build.sh default to SM 120 / CUDA 12.9. Tests: 11 new audio tests, all pass Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add GPU ring buffer kernels (write/read with wrap-around) - Add Hann windowing for overlap-add processing - Add overlap_add kernel for stream output reconstruction - Create AudioRingBuffer class for real-time audio buffering - Create AudioStream class for chunked processing with windowing - Support configurable chunk size and hop size (default 50% overlap) - Add 9 new streaming tests (20 total audio tests pass) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add GPU-accelerated VAD with energy and zero-crossing rate features - Implement frame-level feature computation kernels - Add threshold-based decision with adaptive noise floor estimation - Support hangover smoothing to extend speech regions - Create VAD class with configurable parameters - Return SpeechSegment objects with sample/time boundaries - Add 7 new VAD tests (27 total audio tests pass) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add comprehensive audio DSP kernels for Whisper/ASR support: Spectral processing (high priority): - Custom Radix-2 FFT (Cooley-Tukey + Stockham) - no cuFFT dependency - STFT with configurable window/hop size - Power/magnitude spectrum computation - Mel filterbank creation and application - Log-mel spectrogram and dB conversion - MFCC extraction with DCT-II - Delta/delta-delta features Audio preprocessing (medium priority): - Pre-emphasis and de-emphasis filters - DC offset removal - High-pass filter - Noise gate (amplitude-based) - Spectral gate (frequency-domain) - Short-term energy computation High-level API: - mel_spectrogram() - one-call mel spectrogram - log_mel_spectrogram() - one-call log-mel extraction All kernels maintain Driver-Only mode (CUDA::cuda_driver only). 24 new tests added, all 229 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add comprehensive audio processing operations (Driver-Only, no cuFFT): - ISTFT: Inverse STFT with overlap-add and window normalization - Griffin-Lim: Iterative phase reconstruction from magnitude - Pitch detection: Autocorrelation and YIN algorithm - Spectral features: centroid, bandwidth, rolloff, flatness, contrast - Zero-crossing rate: Frame-based ZCR computation - CQT: Constant-Q Transform via STFT bin interpolation - Chromagram: Both STFT-based and CQT-based - HPSS: Harmonic-Percussive Source Separation with median filtering - Time stretch / Pitch shift: Phase vocoder implementation All kernels use custom Radix-2 FFT (no CUDA Toolkit runtime dependency). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add demo_v0212.py showcasing advanced audio processing - Update README with v0.2.12 audio features - Bump version to 0.2.12 New audio operations: - ISTFT, Griffin-Lim phase reconstruction - Spectral features (centroid, bandwidth, rolloff, flatness, contrast) - Pitch detection (YIN), autocorrelation - CQT, chromagram - HPSS (harmonic-percussive separation) - Time stretch, pitch shift 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

m96-chan and others added 6 commits December 22, 2025 23:40

m96-chan merged commit 45aa16f into main Dec 22, 2025
13 checks passed

m96-chan deleted the feature/v0.2.12 branch December 26, 2025 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): add advanced audio processing kernels (v0.2.12) #99

feat(audio): add advanced audio processing kernels (v0.2.12) #99

Uh oh!

m96-chan commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(audio): add advanced audio processing kernels (v0.2.12) #99

feat(audio): add advanced audio processing kernels (v0.2.12) #99

Uh oh!

Conversation

m96-chan commented Dec 22, 2025

Summary

New Audio Operations

Changes

Key Features

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants