Skip to content

Improve numerical stability of variance calculation in envelope#107

Merged
networmix merged 1 commit intomainfrom
claude/fix-major-bug-4sCRZ
Feb 7, 2026
Merged

Improve numerical stability of variance calculation in envelope#107
networmix merged 1 commit intomainfrom
claude/fix-major-bug-4sCRZ

Conversation

@networmix
Copy link
Owner

@networmix networmix commented Feb 7, 2026

Summary

Refactored the variance calculation in Envelope.from_values() to use a numerically stable two-pass algorithm instead of the computational formula, improving accuracy for edge cases with extreme values.

Changes

  • Replaced computational variance formula with the numerically stable sum-of-squared-deviations approach: sum((x - mean)²) / n
  • Removed single-pass calculation of sum_squares which can suffer from catastrophic cancellation when values are large
  • Implemented two-pass algorithm: First pass computes mean and frequency map, second pass calculates variance using deviations from mean
  • Optimized for duplicate values: Iterates over the frequency map rather than raw values, providing efficiency gains when Monte Carlo results contain many duplicates

Implementation Details

The new approach trades a second iteration over unique values for significantly improved numerical stability. This is particularly beneficial when:

  • Values have large magnitudes (where E[X²] can dominate and lose precision)
  • There are many duplicate values (frequency map iteration is more efficient than iterating all raw values)

The change maintains the same computational complexity while providing better accuracy for edge cases.

https://claude.ai/code/session_01BH7FXdY35eRtf98jo8kQiG


Note

Low Risk
Small, localized change to a statistical calculation; main risk is minor numeric/behavioral drift in reported stdev_capacity for some datasets.

Overview
Updates CapacityEnvelope.from_values() variance/stddev computation to a numerically stable two-pass approach (sum((x-mean)^2)/n) instead of E[X^2]-(E[X])^2, removing the sum_squares accumulator.

The second pass iterates over the computed frequencies map (unique values) to keep performance reasonable when Monte Carlo outputs contain many duplicates, while leaving the envelope output fields unchanged.

Written by Cursor Bugbot for commit c6ade7e. This will update automatically on new commits. Configure here.

…m_values()

The computational variance formula E[X²] - E[X]² suffers from catastrophic
floating-point cancellation when capacity values are large or nearly identical.
This produced silently wrong stdev values (e.g., 41 million instead of 0 for
identical values) or complex numbers when the computed variance went negative.

Replace with the numerically stable two-pass formula sum((x - mean)²) / n,
iterating over the frequency map for efficiency with duplicate values.

https://claude.ai/code/session_01BH7FXdY35eRtf98jo8kQiG
Copilot AI review requested due to automatic review settings February 7, 2026 10:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the capacity variance/standard deviation computation in CapacityEnvelope.from_values() to reduce catastrophic cancellation risk for large-magnitude inputs, while leveraging the existing frequency map representation used for Monte Carlo outputs.

Changes:

  • Replaced the computational variance formula (E[X²] - (E[X])²) with a deviation-based computation over the frequency map.
  • Removed the sum_squares single-pass accumulation and added a second pass over unique values to compute variance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +77 to 78
# First pass: build frequency map and compute mean
frequencies = {}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions refactoring Envelope.from_values(), but the code change is in CapacityEnvelope.from_values(). Please update the description (or code) to match the actual API being modified to avoid confusion for reviewers and future readers.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude need an update?

Comment on lines +96 to +104
# Second pass over unique values: compute variance using the
# numerically stable formula sum((x - mean)^2) / n.
# Iterating over the frequency map is efficient when there are
# many duplicate values (common in Monte Carlo results).
variance_sum = 0.0
for value, count in frequencies.items():
diff = value - mean_capacity
variance_sum += count * diff * diff
stdev_capacity = (variance_sum / n) ** 0.5
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variance/stdev computation was changed to a new two-pass algorithm, but there are no unit tests asserting stdev_capacity correctness or demonstrating improved numerical stability (e.g., large-magnitude values that cause catastrophic cancellation in the old formula, and duplicate-heavy inputs). Please add tests that validate mean_capacity/stdev_capacity for representative cases and edge cases.

Copilot uses AI. Check for mistakes.
@networmix networmix merged commit 04e88bb into main Feb 7, 2026
13 checks passed
@networmix networmix deleted the claude/fix-major-bug-4sCRZ branch February 7, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants