feat(processor): add voice-activated detection and adaptive speech RMS minimum#53
Merged
flexiondotorg merged 10 commits intomainfrom Mar 14, 2026
Merged
feat(processor): add voice-activated detection and adaptive speech RMS minimum#53flexiondotorg merged 10 commits intomainfrom
flexiondotorg merged 10 commits intomainfrom
Conversation
…ce fraction Add VoiceActivated field to AudioMeasurements and detectVoiceActivated() helper that classifies recordings based on the fraction of digital-silence-rejected candidates. When >= 95% of silence candidates are digital silence, the recording is classified as voice-activated. Signed-off-by: Martin Wimpress <code@wimpress.io>
…ted recordings Add voiceActivatedSpeechInterruptionToleranceIntervals constant (40 intervals, 10 seconds) to bridge platform-inserted gaps in voice-activated media files. - Add voiceActivated bool parameter to findSpeechCandidatesFromIntervals - Implement conditional selection: 8 intervals (2s) for standard recordings, 40 intervals (10s) for voice-activated - Update analyzer.go call site to pass measurements.VoiceActivated - Extend analyzer_test with two new cases: 7.5s gap bridges with widened tolerance, 12.5s gap splits even with widened tolerance - Update all 6 existing test call sites to pass false Signed-off-by: Martin Wimpress <code@wimpress.io>
- Add NoiseRemoveCompandEnabled field to FilterChainConfig (default true) - Modify buildNoiseRemoveFilter to return anlmdn-only spec when compand disabled - Replace blind -55 dBFS fallback with explicit compand disabling when NoiseProfile unavailable - Update tuneNoiseRemove to disable compand for recordings lacking noise calibration - Add sub-tests verifying compand enable/disable paths in TestBuildNoiseRemoveFilter Signed-off-by: Martin Wimpress <code@wimpress.io>
…annotation - Display "Voice-activated recording detected" in silence detection section when voice-activated recording is confirmed, covering both candidate selection paths (no elected candidate, no silence detected) - Annotate filter adaptation section with "NR Compander: disabled" when noise profile unavailable - Include "Voice-Activated: yes" annotation in detailed log file reportsacross candidates and none-found sections - Add comprehensive test coverage with 8 test cases for display output, compander annotation, and diagnostic reporting Signed-off-by: Martin Wimpress <code@wimpress.io>
Implement dynamic `speechRMSMin` calculation based on recording characteristics: - Add `speechRMSMinimum`, `speechRMSMinimumOffset` (12.0), and `speechRMSMinimumNoiseMargin` (6.0) constants - Implement `computeSpeechRMSMinimum()` helper using formula: max(RMSLevel - 12, NoiseFloor + 6) - Update `speechScore()` to accept adaptive threshold parameter - Update `findSpeechCandidatesFromIntervals()` to derive threshold from measurements at call site - Pass RMS and noise floor measurements through analyser pipeline Fixes speech detection in quiet recordings where fixed -40 dBFS threshold was above most speech intervals. - Add 6 test cases for `computeSpeechRMSMinimum()` covering edge cases and fallback behaviour - Update 9 call sites across 8 sub-tests with new parameters Signed-off-by: Martin Wimpress <code@wimpress.io>
…nd NR compander - Update Pass 2 filter table to clarify compand residual suppression requires noise profile - Add voice-activated recording detection (Riverside, Zencastr) to analysis-only output description - Explain specific consequence of missing silence section (NR compander disabled, spectral denoiser runs alone) Signed-off-by: Martin Wimpress <code@wimpress.io>
…tection capabilities - Correct Pass 2 NoiseRemove row: compand disabled when no noise profile exists; note that anlmdn filter always active - Qualify compand dependency on noise profile in Section 4 capabilities - Add two newly implemented Speech-Aware Processing capabilities: * Digital silence rejection in room tone candidate selection * Voice-activated recording detection from silence fraction Signed-off-by: Martin Wimpress <code@wimpress.io>
Contributor
There was a problem hiding this comment.
5 issues found across 13 files
Confidence score: 3/5
- There is concrete regression risk in
internal/processor/adaptive.go: ifNoiseRemoveCompandEnabledis not re-enabled on the valid-profile path, one uncalibrated file can keep compand disabled for subsequent files in the same run. internal/processor/analyzer_candidates.gohas user-facing detection risk: voice-activated mode is only inferred from silence candidates >=8s, so shorter digital-silence gaps may never activate widened speech tolerance.- Logging/reporting issues in
internal/logging/analysis_display.goandinternal/logging/report.gocan hide or misstate recording mode/disable reasons, which reduces operator visibility and can make troubleshooting harder even when processing succeeds. - Pay close attention to
internal/processor/adaptive.go,internal/processor/analyzer_candidates.go,internal/logging/analysis_display.go, andinternal/logging/report.go- processing-state carryover and mode/disable-status reporting are the main risk areas.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/processor/adaptive.go">
<violation number="1" location="internal/processor/adaptive.go:689">
P1: Re-enable `NoiseRemoveCompandEnabled` on the valid-profile path; otherwise one uncalibrated file disables compand for every later file in the same run.</violation>
</file>
<file name="internal/logging/analysis_display.go">
<violation number="1" location="internal/logging/analysis_display.go:130">
P2: The voice-activated indicator is omitted whenever an elected silence candidate exists, so analysis output hides the detected recording mode in the common success path.</violation>
<violation number="2" location="internal/logging/analysis_display.go:255">
P2: This message hard-codes the wrong disable reason; the compander is also disabled for invalid noise-floor calibration, not just a missing noise profile.</violation>
</file>
<file name="internal/logging/report.go">
<violation number="1" location="internal/logging/report.go:1613">
P2: The voice-activated indicator is only printed in some silence-report branches, so it disappears for reports that have a `NoiseProfile` or only `SilenceRegions`.</violation>
</file>
<file name="internal/processor/analyzer_candidates.go">
<violation number="1" location="internal/processor/analyzer_candidates.go:1030">
P1: This only detects voice-activated mode from >=8s silence candidates, so recordings with shorter digital-silence gaps never enable the widened speech tolerance.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…alysis output - Track derivation source (astats, RMS estimate, loudness estimate) in NoiseFloor Source field to distinguish between measured and fallback values - Update analysis display to show the derived noise floor value with appropriate source label instead of displaying -Inf when astats returns infinity - Add noiseFloorSourceLabel() helper to convert source identifier to user-facing label matching the measurement derivation tier Fixes analysis output for recordings with no detectable noise floor (e.g. digital silence), ensuring the fallback-derived value is displayed with clarity about its source rather than the unmeaningful -Inf value. Signed-off-by: Martin Wimpress <code@wimpress.io>
Contributor
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/processor/analyzer.go">
<violation number="1" location="internal/processor/analyzer.go:207">
P2: `NoiseFloorSource` is not updated when elected silence overrides `NoiseFloor`, so the exported source can disagree with the actual noise-floor value.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…utput - Update threshold annotation to show room tone estimate explicitly with +1 dB offset - Remove extraneous blank line between candidate count and voice-activated indicator - Add blank line before SPEECH DETECTION section for consistent formatting Signed-off-by: Martin Wimpress <code@wimpress.io>
…loor tracking - Re-enable NoiseRemove compand when processing files with valid noise profiles after a file without a profile (prevents state leakage between files) - Display voice-activated indicator in both elected and no-candidate branches for complete diagnostic information - Change compander disabled message from condition-specific to generic "disabled" as it disables for both nil profile and non-negative noise floor - Track NoiseFloorSource as "silence_profile" when elected silence region overrides the derived noise floor with its measured value Added regression test for compand re-enabling on valid-profile path. Signed-off-by: Martin Wimpress <code@wimpress.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add automatic detection of voice-activated recordings (Riverside/Zencastr) and adaptive speech RMS minimum threshold. Voice-activated mode detects when recordings are triggered by voice activity (significant silence fraction), automatically adjusting tolerance and processing parameters. Speech RMS minimum adapts from fixed -40 dBFS to a formula based on spectral centroid, improving filter adaptation for diverse recording conditions.
Changes
speechRMSMinimum= max(-70, cumulativeSum * 0.8 - 22))Testing