Skip to content

feat(processor): add voice-activated detection and adaptive speech RMS minimum#53

Merged
flexiondotorg merged 10 commits intomainfrom
detection
Mar 14, 2026
Merged

feat(processor): add voice-activated detection and adaptive speech RMS minimum#53
flexiondotorg merged 10 commits intomainfrom
detection

Conversation

@flexiondotorg
Copy link
Contributor

Summary

Add automatic detection of voice-activated recordings (Riverside/Zencastr) and adaptive speech RMS minimum threshold. Voice-activated mode detects when recordings are triggered by voice activity (significant silence fraction), automatically adjusting tolerance and processing parameters. Speech RMS minimum adapts from fixed -40 dBFS to a formula based on spectral centroid, improving filter adaptation for diverse recording conditions.

Changes

  • Detect voice-activated recordings from digital silence fraction in candidate selection
  • Widen speech interruption tolerance for voice-activated mode (2.0-3.0s instead of fixed)
  • Disable NoiseRemove compand expansion for uncalibrated recordings to avoid over-expansion
  • Implement adaptive speech RMS minimum (speechRMSMinimum = max(-70, cumulativeSum * 0.8 - 22))
  • Add voice-activated indicator and compander annotation to analysis display
  • Update documentation for detection logic and silence candidate filtering
  • Update README with voice-activated recording capabilities
  • Improve test coverage for silence detection and analysis display output

Testing

  • Unit tests verify silence fraction detection for voice-activated recordings
  • Analysis display tests confirm correct output formatting with voice-activated and compander annotations
  • Adaptive RMS formula tested with diverse speech profile measurements
  • Anna's recording now detects 6 silence candidates instead of 1 after enabling voice-activated detection

…ce fraction

Add VoiceActivated field to AudioMeasurements and detectVoiceActivated()
helper that classifies recordings based on the fraction of
digital-silence-rejected candidates. When >= 95% of silence candidates
are digital silence, the recording is classified as voice-activated.

Signed-off-by: Martin Wimpress <code@wimpress.io>
…ted recordings

Add voiceActivatedSpeechInterruptionToleranceIntervals constant (40
intervals, 10 seconds) to bridge platform-inserted gaps in
voice-activated media files.

- Add voiceActivated bool parameter to findSpeechCandidatesFromIntervals
- Implement conditional selection: 8 intervals (2s) for standard
  recordings, 40 intervals (10s) for voice-activated
- Update analyzer.go call site to pass measurements.VoiceActivated
- Extend analyzer_test with two new cases: 7.5s gap bridges with widened
  tolerance, 12.5s gap splits even with widened tolerance
- Update all 6 existing test call sites to pass false

Signed-off-by: Martin Wimpress <code@wimpress.io>
- Add NoiseRemoveCompandEnabled field to FilterChainConfig (default
  true)
- Modify buildNoiseRemoveFilter to return anlmdn-only spec when compand
  disabled
- Replace blind -55 dBFS fallback with explicit compand disabling when
  NoiseProfile unavailable
- Update tuneNoiseRemove to disable compand for recordings lacking noise
  calibration
- Add sub-tests verifying compand enable/disable paths in
  TestBuildNoiseRemoveFilter

Signed-off-by: Martin Wimpress <code@wimpress.io>
…annotation

- Display "Voice-activated recording detected" in silence detection
  section when voice-activated recording is confirmed, covering both
  candidate selection paths (no elected candidate, no silence detected)
- Annotate filter adaptation section with "NR Compander: disabled"
  when noise profile unavailable
- Include "Voice-Activated: yes" annotation in detailed log file
  reportsacross candidates and none-found sections
- Add comprehensive test coverage with 8 test cases for display output,
  compander annotation, and diagnostic reporting

Signed-off-by: Martin Wimpress <code@wimpress.io>
Implement dynamic `speechRMSMin` calculation based on recording
characteristics:
- Add `speechRMSMinimum`, `speechRMSMinimumOffset` (12.0), and
  `speechRMSMinimumNoiseMargin` (6.0) constants
- Implement `computeSpeechRMSMinimum()` helper using formula:
  max(RMSLevel - 12, NoiseFloor + 6)
- Update `speechScore()` to accept adaptive threshold parameter
- Update `findSpeechCandidatesFromIntervals()` to derive threshold from
  measurements at call site
- Pass RMS and noise floor measurements through analyser pipeline

Fixes speech detection in quiet recordings where fixed -40 dBFS
threshold was above most speech intervals.

- Add 6 test cases for `computeSpeechRMSMinimum()` covering edge cases
  and fallback behaviour
- Update 9 call sites across 8 sub-tests with new parameters

Signed-off-by: Martin Wimpress <code@wimpress.io>
…nd NR compander

- Update Pass 2 filter table to clarify compand residual suppression
  requires noise profile
- Add voice-activated recording detection (Riverside, Zencastr) to
  analysis-only output description
- Explain specific consequence of missing silence section (NR compander
  disabled, spectral denoiser runs alone)

Signed-off-by: Martin Wimpress <code@wimpress.io>
…tection capabilities

- Correct Pass 2 NoiseRemove row: compand disabled when no noise profile
  exists; note that anlmdn filter always active
- Qualify compand dependency on noise profile in Section 4 capabilities
- Add two newly implemented Speech-Aware Processing capabilities:
  * Digital silence rejection in room tone candidate selection
  * Voice-activated recording detection from silence fraction

Signed-off-by: Martin Wimpress <code@wimpress.io>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 13 files

Confidence score: 3/5

  • There is concrete regression risk in internal/processor/adaptive.go: if NoiseRemoveCompandEnabled is not re-enabled on the valid-profile path, one uncalibrated file can keep compand disabled for subsequent files in the same run.
  • internal/processor/analyzer_candidates.go has user-facing detection risk: voice-activated mode is only inferred from silence candidates >=8s, so shorter digital-silence gaps may never activate widened speech tolerance.
  • Logging/reporting issues in internal/logging/analysis_display.go and internal/logging/report.go can hide or misstate recording mode/disable reasons, which reduces operator visibility and can make troubleshooting harder even when processing succeeds.
  • Pay close attention to internal/processor/adaptive.go, internal/processor/analyzer_candidates.go, internal/logging/analysis_display.go, and internal/logging/report.go - processing-state carryover and mode/disable-status reporting are the main risk areas.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/processor/adaptive.go">

<violation number="1" location="internal/processor/adaptive.go:689">
P1: Re-enable `NoiseRemoveCompandEnabled` on the valid-profile path; otherwise one uncalibrated file disables compand for every later file in the same run.</violation>
</file>

<file name="internal/logging/analysis_display.go">

<violation number="1" location="internal/logging/analysis_display.go:130">
P2: The voice-activated indicator is omitted whenever an elected silence candidate exists, so analysis output hides the detected recording mode in the common success path.</violation>

<violation number="2" location="internal/logging/analysis_display.go:255">
P2: This message hard-codes the wrong disable reason; the compander is also disabled for invalid noise-floor calibration, not just a missing noise profile.</violation>
</file>

<file name="internal/logging/report.go">

<violation number="1" location="internal/logging/report.go:1613">
P2: The voice-activated indicator is only printed in some silence-report branches, so it disappears for reports that have a `NoiseProfile` or only `SilenceRegions`.</violation>
</file>

<file name="internal/processor/analyzer_candidates.go">

<violation number="1" location="internal/processor/analyzer_candidates.go:1030">
P1: This only detects voice-activated mode from >=8s silence candidates, so recordings with shorter digital-silence gaps never enable the widened speech tolerance.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

…alysis output

- Track derivation source (astats, RMS estimate, loudness estimate) in
  NoiseFloor Source field to distinguish between measured and fallback
  values
- Update analysis display to show the derived noise floor value with
  appropriate source label instead of displaying -Inf when astats
  returns infinity
- Add noiseFloorSourceLabel() helper to convert source identifier to
  user-facing label matching the measurement derivation tier

Fixes analysis output for recordings with no detectable noise floor
(e.g. digital silence), ensuring the fallback-derived value is displayed
with clarity about its source rather than the unmeaningful -Inf value.

Signed-off-by: Martin Wimpress <code@wimpress.io>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/processor/analyzer.go">

<violation number="1" location="internal/processor/analyzer.go:207">
P2: `NoiseFloorSource` is not updated when elected silence overrides `NoiseFloor`, so the exported source can disagree with the actual noise-floor value.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

…utput

- Update threshold annotation to show room tone estimate explicitly with
  +1 dB offset
- Remove extraneous blank line between candidate count and
  voice-activated indicator
- Add blank line before SPEECH DETECTION section for consistent
  formatting

Signed-off-by: Martin Wimpress <code@wimpress.io>
…loor tracking

- Re-enable NoiseRemove compand when processing files with valid noise
  profiles after a file without a profile (prevents state leakage
  between files)
- Display voice-activated indicator in both elected and no-candidate
  branches for complete diagnostic information
- Change compander disabled message from condition-specific to generic
  "disabled" as it disables for both nil profile and non-negative noise
  floor
- Track NoiseFloorSource as "silence_profile" when elected silence
  region overrides the derived noise floor with its measured value

Added regression test for compand re-enabling on valid-profile path.

Signed-off-by: Martin Wimpress <code@wimpress.io>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 5 files (changes from recent commits).

Requires human review: Modifies core audio processing logic, including adaptive thresholding and filter chain configuration, which requires human domain expertise to validate.

@flexiondotorg flexiondotorg merged commit 057c206 into main Mar 14, 2026
7 checks passed
@flexiondotorg flexiondotorg deleted the detection branch March 14, 2026 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant