Skip to content

Conversation

@m96-chan
Copy link
Owner

Summary

Add comprehensive GPU audio processing operations with custom Radix-2 FFT (no cuFFT dependency).

New Audio Operations

Category Operations
Time-Frequency istft, griffin_lim
Spectral Features spectral_centroid, spectral_bandwidth, spectral_rolloff, spectral_flatness, spectral_contrast
Pitch Detection detect_pitch_yin, detect_pitch_yin_frames, autocorrelation
Music Analysis cqt, chroma_stft, chroma_cqt, zero_crossing_rate
Source Separation hpss, harmonic, percussive
Time/Pitch time_stretch, pitch_shift

Changes

  • Add CUDA kernels for all new audio operations (audio_kernels.cuh)
  • Add C++ dispatch functions (audio.cu, audio.hpp)
  • Add pybind11 bindings (ops_bindings.cpp)
  • Add Python wrappers (audio.py)
  • Add demo script (demo_v0212.py)
  • Update README with v0.2.12 features
  • Bump version to 0.2.12

Key Features

  • Driver-Only Mode: All kernels use custom Radix-2 FFT, no cuFFT dependency
  • IFFT: Uses conjugate twiddle factors + 1/N scaling
  • Griffin-Lim: Iterative phase reconstruction
  • YIN Algorithm: Robust pitch detection
  • HPSS: Median filtering for harmonic-percussive separation
  • Phase Vocoder: Time stretch and pitch shift

Test plan

  • All 229 existing tests pass
  • Demo script runs successfully
  • New audio functions verified manually

🤖 Generated with Claude Code

m96-chan and others added 6 commits December 22, 2025 23:40
Add GPU-accelerated audio processing operations for ASR/Whisper preprocessing:
- from_pcm: Convert int16/float32 PCM to GPUArray
- stereo_to_mono: Convert stereo to mono
- normalize_peak/rms: Audio level normalization
- resample: 48kHz -> 16kHz polyphase resampling

Also adds Int16 DataType support and updates build.sh default to SM 120 / CUDA 12.9.

Tests: 11 new audio tests, all pass

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add GPU ring buffer kernels (write/read with wrap-around)
- Add Hann windowing for overlap-add processing
- Add overlap_add kernel for stream output reconstruction
- Create AudioRingBuffer class for real-time audio buffering
- Create AudioStream class for chunked processing with windowing
- Support configurable chunk size and hop size (default 50% overlap)
- Add 9 new streaming tests (20 total audio tests pass)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add GPU-accelerated VAD with energy and zero-crossing rate features
- Implement frame-level feature computation kernels
- Add threshold-based decision with adaptive noise floor estimation
- Support hangover smoothing to extend speech regions
- Create VAD class with configurable parameters
- Return SpeechSegment objects with sample/time boundaries
- Add 7 new VAD tests (27 total audio tests pass)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive audio DSP kernels for Whisper/ASR support:

Spectral processing (high priority):
- Custom Radix-2 FFT (Cooley-Tukey + Stockham) - no cuFFT dependency
- STFT with configurable window/hop size
- Power/magnitude spectrum computation
- Mel filterbank creation and application
- Log-mel spectrogram and dB conversion
- MFCC extraction with DCT-II
- Delta/delta-delta features

Audio preprocessing (medium priority):
- Pre-emphasis and de-emphasis filters
- DC offset removal
- High-pass filter
- Noise gate (amplitude-based)
- Spectral gate (frequency-domain)
- Short-term energy computation

High-level API:
- mel_spectrogram() - one-call mel spectrogram
- log_mel_spectrogram() - one-call log-mel extraction

All kernels maintain Driver-Only mode (CUDA::cuda_driver only).
24 new tests added, all 229 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive audio processing operations (Driver-Only, no cuFFT):

- ISTFT: Inverse STFT with overlap-add and window normalization
- Griffin-Lim: Iterative phase reconstruction from magnitude
- Pitch detection: Autocorrelation and YIN algorithm
- Spectral features: centroid, bandwidth, rolloff, flatness, contrast
- Zero-crossing rate: Frame-based ZCR computation
- CQT: Constant-Q Transform via STFT bin interpolation
- Chromagram: Both STFT-based and CQT-based
- HPSS: Harmonic-Percussive Source Separation with median filtering
- Time stretch / Pitch shift: Phase vocoder implementation

All kernels use custom Radix-2 FFT (no CUDA Toolkit runtime dependency).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add demo_v0212.py showcasing advanced audio processing
- Update README with v0.2.12 audio features
- Bump version to 0.2.12

New audio operations:
- ISTFT, Griffin-Lim phase reconstruction
- Spectral features (centroid, bandwidth, rolloff, flatness, contrast)
- Pitch detection (YIN), autocorrelation
- CQT, chromagram
- HPSS (harmonic-percussive separation)
- Time stretch, pitch shift

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@m96-chan m96-chan merged commit 45aa16f into main Dec 22, 2025
13 checks passed
@m96-chan m96-chan deleted the feature/v0.2.12 branch December 26, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants