Skip to content

docs(methods): consolidate 5 methods pages to 2 (preserve WASP2-unique contracts)#103

Merged
Jaureguy760 merged 2 commits intodevfrom
feat/docs-methods-consolidation
Apr 19, 2026
Merged

docs(methods): consolidate 5 methods pages to 2 (preserve WASP2-unique contracts)#103
Jaureguy760 merged 2 commits intodevfrom
feat/docs-methods-consolidation

Conversation

@Jaureguy760
Copy link
Copy Markdown
Collaborator

Summary

Follow-up to the user's "delete the stats stuff" request. After running a `/co-scientist` hypothesis-critic and a `/sophia-claudette` decision-auditor on the blanket-deletion proposal, both agents independently identified 4 WASP2-unique contracts in the methods pages that are not in any paper or textbook and would be silently lost by full deletion. The agreed reframe: consolidate instead of erase.

What changed

Before (5 methods pages, 883 lines):

  • counting_algorithm.rst (193)
  • mapping_filter.rst (329)
  • statistical_models.rst (173)
  • dispersion_estimation.rst (126)
  • fdr_correction.rst (66)
  • methods/index.rst (49, had the landing + 3 citations, 2 unused)

After (2 methods pages, ~500 lines):

  • mapping_filter.rst (~335 — added counting-side canonical-filter paragraph)
  • statistical_models.rst (~180 — absorbs dispersion + FDR + defaults)

WASP2-unique contracts preserved

Four load-bearing pieces flagged by the review agents, all preserved:

  1. Canonical filter contract at count step — only `0x4` unmapped filter; no secondary/supplementary/dup/QC-fail filtering at count time. Not in van de Geijn 2015. Previously at `counting_algorithm.rst:69-75`, now a paragraph in `mapping_filter.rst`.
  2. Profile-LRT ρ convention — ρ held at its null-model MLE when maximizing L₁; WASP2 does NOT jointly re-estimate. This was the scientific-correctness fix just shipped in PR docs: reduce text volume (~47%) + fix scientific correctness issues #99. Preserved as an `.. important::` admonition in `statistical_models.rst`.
  3. BH-NaN propagation warning — `np.minimum.accumulate` silently propagates NaN through cumulative minima; `scipy.stats.false_discovery_control` raises. Matches our own MEMORY.md pitfall. Preserved as a `.. warning::` in `statistical_models.rst`.
  4. WASP2 defaults and bounds — pseudocount=1, min_count=10, ρ ∈ (1e-6, 1-1e-6), logit clip ±10. Preserved as a parameter table in `statistical_models.rst`.

What got cut

  • Generic biological-rationale prose ("ASE arises from...", FWER/FDR definitions, binomial-vs-beta-binomial motivation)
  • CIGAR-operations reference table (standard SAM spec — trivially findable)
  • Q-score formula `Q = -10 log10(P_error)` (textbook)
  • Unused Storey q-value code block (WASP2 uses BH, not Storey)
  • `methods/index.rst` landing page (just a captioned toctree + duplicate citations)

Cross-references updated

  • `faq.rst`: `methods/fdr_correction` → `methods/statistical_models` (the NaN warning lives there now)
  • `tutorials/bulk_workflow.rst`: removed duplicate `fdr_correction` ref from the FDR paragraph
  • Top-level `index.rst` toctree: 6 methods entries → 2

Side benefit: fixes pre-existing duplicate-citation warnings

Removing `methods/index.rst` and deleting `dispersion_estimation.rst` eliminates the 3 duplicate-citation warnings (Kumasaka2016, vandeGeijn2015) that shipped with Tier-B.

Test plan

  • `sphinx-build -q` completes with zero new warnings
  • Zero broken `:doc:` refs (grep-verified across all docs)
  • No Python changes → ruff/pytest/Rust tests are no-ops (but CI will run them)
  • Master CI on merge commit

Cumulative reduction (across Tier-B + Tier-D + Tier-E)

  • docs/source: ~6,300 → ~2,680 lines (−57%)
  • Methods section: 1,357 → ~500 lines (−63%)
  • Tutorial section: 3,495 → 520 lines (−85%)
  • Total Sphinx pages: 30 → 20

…rsion/FDR

Rewrite statistical_models.rst as single ~180-line page covering:
- Beta-binomial model (unchanged essentials)
- LRT with profile-likelihood convention (ρ held at null MLE, df=1)
- Dispersion: single vs linear, AIC/BIC choice, ρ bounds (1e-6, 1-1e-6),
  logit clip (±10) — all WASP2-specific implementation contracts not
  in any paper
- Phased vs unphased (uniform prior over 2^(n-1) configurations)
- BH via scipy.stats.false_discovery_control
- WASP2 defaults: pseudocount=1, min_count=10
- NaN-propagation warning for manual BH (matches MEMORY.md pitfall)
- Output columns

Extends mapping_filter.rst canonical-filter contract section with a
single paragraph stating the same contract applies at the counting
step (only 0x4 unmapped filter). This was previously buried in
counting_algorithm.rst:69-75.

Preserves four WASP2-unique contracts flagged by both the sophia
decision-auditor and co-scientist hypothesis-critic:
1. Canonical filter contract at count step
2. Profile-LRT ρ convention (fix from PR #99)
3. BH-NaN trap warning
4. Default parameter values + ρ/logit bounds

Next commit will delete counting_algorithm.rst, dispersion_estimation.rst,
fdr_correction.rst and update toctree + cross-refs.
Removed: counting_algorithm.rst, dispersion_estimation.rst,
fdr_correction.rst, methods/index.rst.

All load-bearing WASP2-unique content (canonical filter contract at
count step, profile-LRT ρ convention, BH-NaN trap, default values,
ρ/logit bounds) was absorbed into statistical_models.rst and
mapping_filter.rst in the previous commit.

Updated 2 inbound cross-references:
- faq.rst: methods/fdr_correction → methods/statistical_models
- tutorials/bulk_workflow.rst: removed duplicate fdr_correction ref,
  kept single statistical_models ref

Top-level index.rst toctree reduced from 6 methods entries to 2.
Removed stale "Statistical Methods" landing caption was replaced by
the main index.rst's own :caption: Statistical Methods toctree.

Side benefit: eliminates the 3 pre-existing Sphinx duplicate-citation
warnings (Kumasaka2016, vandeGeijn2015) by removing the duplicate
reference definitions in methods/index.rst and the deleted pages.

Methods section: 5 pages (883 lines) → 2 pages (~500 lines),
a further 43% reduction on top of Tier-B + Tier-D.

Agreed reframe from sophia decision-auditor + co-scientist
hypothesis-critic review of the "delete all 4" proposal: consolidate
instead of erase, to preserve WASP2-specific operational contracts
not present in van de Geijn 2015 or in cited textbooks.
@Jaureguy760 Jaureguy760 merged commit 0650b58 into dev Apr 19, 2026
6 checks passed
@Jaureguy760 Jaureguy760 deleted the feat/docs-methods-consolidation branch April 19, 2026 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant