Conversation
|
@claude Please review. |
|
Claude finished @etal's task —— View job Code Review: Fix NaN-poisoning issue in segment weightsReviewing PR that fixes issues #436, #908, #901, #896 related to NaN-poisoning in segment weights. Review Progress:
SummaryThis PR effectively addresses NaN weight propagation issues throughout the CNVkit segmentation pipeline. The changes are well-implemented and comprehensively tested. Strengths✅ Comprehensive NaN handling: The fixes properly handle NaN weights at multiple critical points:
✅ Excellent test coverage: Added 8 comprehensive test cases covering all the major NaN scenarios:
✅ Follows project conventions:
✅ Clinical safety: The changes preserve numerical output stability - critical for downstream clinical pipelines Key Improvementssegmentation/init.py:406-407 - Fixed the core issue: # Use nansum so NaN weights don't propagate into .cns output
seg_wt = float(np.nansum(wt))
if seg_wt > 0:
valid = ~np.isnan(wt)
seg_dp = np.average(bin_depths[bin_idx][valid], weights=wt[valid])segfilters.py:113-119 - Robust weight filtering: def _wavg(col: str) -> float:
if region_weight > 0 and valid_wt.any():
return float(np.average(cnarr[col].to_numpy()[valid_wt], weights=wt[valid_wt]))
return float(np.mean(cnarr[col]))reports.py:402-408 - Prevents NaN poisoning in gene statistics: weights = bins["weight"].to_numpy() if "weight" in bins else np.ones(len(bins_log2))
valid = ~np.isnan(weights)
if not valid.all():
bins_log2 = bins_log2[valid]
weights = weights[valid]Minor Suggestions
ConclusionThis is a high-quality fix that addresses a real production issue. The implementation is robust, well-tested, and follows project best practices. The changes maintain backward compatibility while fixing the NaN propagation bugs. Recommendation: Approve ✅ |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1045 +/- ##
==========================================
+ Coverage 66.23% 66.41% +0.18%
==========================================
Files 72 72
Lines 7396 7409 +13
Branches 1301 1303 +2
==========================================
+ Hits 4899 4921 +22
+ Misses 2057 2049 -8
+ Partials 440 439 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#908) Guard all downstream np.average/np.sum calls against NaN weights from .cnr files produced by older CNVkit versions or external tools: - transfer_fields: use np.nansum so NaN weights don't propagate into .cns - squash_region: NaN-masked np.average and weighted_median calls - bic: treat NaN weights same as zero for RSS fallback - compute_gene_stats: filter NaN weights before bootstrap CI/PI Review fixes: group_by_genes depth fallback, type annotations, test precision - group_by_genes: use np.nansum for weight aggregation; fall back to unweighted mean depth when all weights are NaN - shift_xx: widen is_xx parameter to accept bool | numpy.bool_ | None - do_genemetrics: fix is_sample_female annotation (None -> bool | None) - test_segment_mean_nan_weights: assert exact expected values, not just non-NaN Document NaN weight handling in docstrings and BIC fallback logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes #436, #908, #901, #896.