Conversation
#908) NaN values in bin weights caused two classes of failures: - CBS segmentation: R's CNA() dropped NaN rows internally, creating a length mismatch with the weights vector (#436, fixed in R script by #914) - segment_mean / np.average: NaN weights produced ZeroDivisionError or silent NaN propagation through sex inference and segmetrics (#908) Root cause: apply_weights() used np.clip() to bound weights, but np.clip(NaN, min, max) returns NaN rather than clamping to min. Fix at three layers: - apply_weights(): replace NaN with epsilon before clipping (prevents NaN weights from ever being written to .cnr files) - segment_mean(): filter NaN weights before np.average (handles .cnr files from older CNVkit versions) - calc_intervals(), group_by_genes(): guard np.average calls against NaN weights in bootstrap CI and gene-level depth calculations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude Please review. |
|
Claude finished @etal's task —— View job Code Review for NaN Weight Handling FixI'll review this PR that fixes NaN weight handling in segmentation and metrics. Todo List:
SummaryThis is a well-implemented fix for a critical issue. The root cause analysis is correct: Core Fix Analysis
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1036 +/- ##
=======================================
Coverage 65.67% 65.68%
=======================================
Files 72 72
Lines 7362 7375 +13
Branches 1290 1294 +4
=======================================
+ Hits 4835 4844 +9
- Misses 2093 2094 +1
- Partials 434 437 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Combine the redundant valid.all()/valid.any() branches into a single valid.any() check — when all weights are valid, indexing with the mask is equivalent to using the original arrays. Also log when NaN weights are found in apply_weights() so users can see when degenerate bins are being handled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
apply_weights()to replace NaN with epsilon beforenp.clip(), which does not clamp NaN values — this is the root cause of NaN weights in.cnrfilessegment_mean(),calc_intervals(), andgroup_by_genes()to handle.cnrfiles produced by older CNVkit versionssegment_meanandapply_weightsAddresses #436, #908.
Details
Users reported crashes during CBS segmentation and
segment_meanwhen.cnrfiles contained NaN weight values. The root cause:apply_weights()computes weights via formulas that can produce NaN (e.g. from degenerate coverage or NaN spread), then callsweights.clip(epsilon, 1.0)— butnp.clip(NaN, min, max)returns NaN per IEEE 754.These NaN weights then caused:
CNA()dropped NaN rows internally, creating a length mismatch with the weights vector (already fixed in R script by Prevent CBS segmentation failures due to nulls in input .cnr #914)segment_mean(targeted DNA seq #908):np.average()with NaN/zero-sum weights →ZeroDivisionErroror silent NaN propagation through sex inferenceThe fix prevents NaN weights at the source and adds defensive guards at downstream
np.average()call sites.Test plan
test_segment_mean_nan_weights— partial NaN, all-NaN, and clean weightstest_apply_weights_no_nan— NaN spread in reference produces valid weights🤖 Generated with Claude Code