Skip to content

Reductions: partition-based updates + EHA reductions chapter#156

Merged
RaphaelS1 merged 51 commits into
mainfrom
reductions-clean
May 17, 2026
Merged

Reductions: partition-based updates + EHA reductions chapter#156
RaphaelS1 merged 51 commits into
mainfrom
reductions-clean

Conversation

@adibender
Copy link
Copy Markdown
Collaborator

@adibender adibender commented Mar 8, 2026

Summary

  • Updated partition-based reductions chapter (P4C22) with expanded content
  • Added new EHA reductions chapter (P4C23) covering competing risks reduction framework
  • Minor edits to survival chapter (P1C4), quarto config, and bibliography

Replaces #153 with a clean history (single commit on main).

RaphaelS1 and others added 15 commits January 30, 2026 08:55
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
- Update partition-based reductions (P4C22) with expanded content
- Add EHA reductions chapter (P4C23) with competing risks framework,
  PAM and RSF examples on sir.adm data
- Add CIF comparison figures for PAM vs RSF
- Update survival chapter (P1C4) with minor edits
- Update quarto config and bibliography
@adibender adibender mentioned this pull request Mar 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 8, 2026

Book 📖

P4C19 (Reductions intro):
- Hyphenation fixes: one-dimensional, event-history
- Grammar fixes: sentence fragment, data are, dataset, only a few, computational cost

P4C20 (IPCW Classification):
- Chapter title: IPC Weighted → IPCW Classification
- Hyphenation: time point, pointwise, well-calibrated, rank-based, gradient-based
- Capitalise Bayesian; \mathbb{I} → \II
- Loss notation: \mathcal{l} → L (aggregate loss convention); add missing negation
  to log loss (both pointwise and aggregate forms)
- Fix logical description of censored observations (line 36)

P4C21 (Pseudo-value Regression):
- New section: Choice of Response Function (§17.2), covering post-hoc vs.
  integrated application of h(·) and interpretation of covariate effects
- Table 17.1: common link/response functions with covariate interpretations;
  notation refined to h⁻¹(ψ) / h(f(x)) with ψ = S(τ|x) in caption
- Section moved before examples to avoid forward references
- Removed stale forward-reference note and inline RMST explanation (now refs §17.2)
- Fix \mathbb{I} → \II (4 occurrences); \xx_i^⊤ notation; h(f(x)) → h(f(xx))
- Correct @eq-cox-ph → @eq-ph; Gamma/Poisson → log link with literature support
- data set → dataset; various grammar/hyphenation fixes
- Further reading expanded: add foundational papers (Andersen 2003, 2004),
  eventglm (Sachs 2022), random forests (Mogensen 2013), deep learning (Zhao 2020)

P5C24 (Conclusions):
- Update Langbein2024 → Langbein2025 (published journal version)

library.bib:
- Add Sachs2022 (eventglm, JSS), Hothorn2021, royston2011use, tian2014predicting
- Remove duplicate Andersen2004 entry (andersen2004regressionanalysis already present)
@adibender adibender mentioned this pull request Mar 11, 2026
- Title hyphenation: Partition-Based Reductions
- Abstract written (was TODO)
- Conclusion section added: Key Takeaways, Limitations, Further Reading callouts
- Numerous typo fixes: disjunct→disjoint, occured→occurred, classifcation→classification,
  inerpreted→interpreted, transfomration→transformation, akward→awkward, accross→across
- Consistent hyphenation throughout: discrete-time, partition-based, tree-based,
  risk-set-based, continuous-time
- data set → dataset throughout
- \mathbb{I} → \II; \mathbb{E} → E
- Citation style fixes: (@...) → [@...]; remove Figure/Table/Section prefix before @-refs
- Stray punctuation fixes (extra comma, stray parenthesis)
- less rows → fewer rows
- Sentence fragment fixed (garbled sentence in left-closed vs left-open section)
- Limitation added: reductions apply to right-censored data only; left- and
  interval-censored data require additional adaptations
@adibender
Copy link
Copy Markdown
Collaborator Author

adibender commented Mar 11, 2026

Summary of changes (beyond #147)

P4C19 – Reductions intro

  • Minor grammar and hyphenation fixes (one-dimensional, event-history, dataset, data are)

P4C20 – IPCW Classification

  • Chapter title corrected: IPC WeightedIPCW Classification
  • Hyphenation and capitalisation fixes (pointwise, well-calibrated, rank-based, gradient-based, Bayesian)
  • \mathbb{I}\II throughout
  • Loss notation made consistent with rest of book: \mathcal{l}L for aggregate loss; missing negation added to log loss (both pointwise and aggregate forms)
  • Logical fix: description of censored observations clarified (neither event nor confirmed non-event)

P4C21 – Pseudo-value Regression

  • New section added: Choice of Response Function (§17.1), covering:
    • Post-hoc vs. integrated application of $h(\cdot)$
    • Table of common link functions with covariate interpretations (identity, logit, cloglog, log)
    • cloglog link as Cox PH special case and PH diagnostic
    • Identity vs. log link for RMST targets (with literature support; unsupported Gamma deviance claim removed)
  • Section placed before the worked examples to avoid forward references
  • Table notation refined: $h^{-1}(\psi)$ / $h(f(\mathbf{x}))$ with $\psi = S(\tau|\mathbf{x})$ in caption
  • \mathbb{I}\II; @eq-cox-ph@eq-ph; notation fixes ($\mathbf{x}_i^\top$, $h(f(\mathbf{x}))$)
  • Further reading expanded: foundational papers (Andersen 2003, 2004), eventglm (Sachs 2022), random forests (Mogensen 2013)
  • New bib entries: Sachs2022, Hothorn2021, royston2011use, tian2014predicting, Andersen2004 (deduplicated to existing key)

P4C22 – Partition-Based Reductions

  • Abstract written (was TODO)
  • Conclusion section added: Key Takeaways, Limitations, Further Reading callouts
  • New limitation: reductions apply to right-censored data only; left- and interval-censored data require additional adaptations
  • Consistent hyphenation: discrete-time, partition-based, continuous-time, risk-set-based, tree-based
  • \mathbb{I}\II; \mathbb{E}E; citation style (@...)[@...]
  • Removed "Figure/Table/Section" prefixes before @-refs (would double-render)
  • Typo fixes: disjunct→disjoint, occurred, classification, interpreted, transformation, awkward
  • data setdataset throughout; less rowsfewer rows

P5C24 – Conclusions

  • Citation key updated: Langbein2024Langbein2025 (published journal version)

@adibender adibender requested a review from RaphaelS1 March 11, 2026 13:11
@adibender
Copy link
Copy Markdown
Collaborator Author

@RaphaelS1 Integrated most of your previous suggestions from #147. Added few things, particularly in partition based reduction. I think 15-18 are very good now. 19 needs more work, but will do later

RaphaelS1 and others added 2 commits March 13, 2026 08:57
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Comment thread book/_quarto.yml Outdated
Comment thread book/P1C4_survival.qmd Outdated
Comment thread book/P1C4_survival.qmd
Comment thread book/P4C19_reductions.qmd Outdated
Comment thread book/P4C20_ipcw.qmd Outdated
Comment thread book/P4C21_pseudo.qmd Outdated
Comment thread book/P4C21_pseudo.qmd Outdated
Comment thread book/P4C21_pseudo.qmd Outdated
Comment thread book/P4C21_pseudo.qmd Outdated
First, consider a linear model without features, that is $\hat{S}(\tau|\xx_i) = \hat{\beta}_{\tau,0}$.
By construction of the pseudo-values, at time point $\tau = 1000$ days we have $\hat{\tilde{\theta}}(1000) = \hat{\beta}_{1000,0} = \frac{1}{n} \sum_{i=1}^n \tilde{\theta}_i(1000) = 0.6175 = \hat{S}_{KM}(1000)$.
Of course it doesn't really make sense to estimate $n+1$ Kaplan-Meier curves just to obtain the overall Kaplan-Meier estimate at one time-point, but this example illustrates that pseudo-value based regression provides consistent estimators of the survival probability.
To guarantee pseudo-values are within the required range, one can then apply the sigmoid function (see @tbl-pseudo-link-interpretation)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table doesn't state that sigmoid and logit are the inverse and doesn't clearly state in which column is the logit and which is sigmoid

Comment thread book/P4C21_pseudo.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment thread book/P4C23_eha_reduction.qmd Outdated
Comment on lines +91 to +95
**Left-truncation.**
In the competing risks setting, all subjects usually start in the initial state at time $0$.
In a multi-state process, subjects enter the risk set for a transition $\ell \to e$ at the time they enter state $\ell$, which can differ across subjects.
As discussed in @sec-multi-state for the `prothr` data, a subject may only enter an intermediate state (and thus become at risk for transitions from that state) at some later time point.
This constitutes *internal* or *process-induced* left-truncation and must be handled appropriately by the hazard estimation method.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be before the pseudo-algorithm above because currently it just throws in left truncation without explanation

Comment thread book/P4C23_eha_reduction.qmd Outdated
* @Friedman1982 introduced the piecewise exponential model and established its consistency properties.
* @Tutz2016 provide a comprehensive treatment of discrete-time survival analysis, covering model specification, estimation, and interpretation in depth.
* @Bender2018 and @Kopper2022 show how penalized additive models (PAMs) can be used as the base learner in the PEM framework, with the `pammtools` package [@pkgpammtools] providing a convenient implementation.
* @Bender2018PAM and @Kopper2022 show how penalized additive models (PAMs) can be used as the base learner in the PEM framework, with the `pammtools` package [@pkgpammtools] providing a convenient implementation.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adibender can you check I updated to the correct citation here

RaphaelS1 and others added 28 commits April 10, 2026 10:30
Co-authored-by: Raphael Sonabend-Friend <raphaelsonabend@gmail.com>
Co-authored-by: Raphael Sonabend-Friend <raphaelsonabend@gmail.com>
Co-authored-by: Raphael Sonabend-Friend <raphaelsonabend@gmail.com>
Co-authored-by: Raphael Sonabend-Friend <raphaelsonabend@gmail.com>
Co-authored-by: Raphael Sonabend-Friend <raphaelsonabend@gmail.com>
Co-authored-by: Raphael Sonabend-Friend <raphaelsonabend@gmail.com>
Co-authored-by: Raphael Sonabend-Friend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
# Conflicts:
#	book/P5C24_conclusions.qmd
#	book/_book/Machine-Learning-in-Survival-Analysis.pdf
#	book/_not_used.qmd
Walkthrough fixes for the 8 review-comment TODOs Raphael left in
book/P4C22_partition-based-reduction.qmd:

* (Data Transformation) drop meta TODO comment, fix "effect"->
  "affect".
* (Discrete-time likelihood) rewrite the case-by-case justification of
  the last equality so it links to the cases on the previous line
  instead of using the ambiguous "first/second bracket" phrasing.
* (Discrete-time summary) fix the interpretation -- a classifier
  predicts the class probability \hat\pi_{ij}, not a label
  \hat\delta_{ij}; \hat\pi_{ij} estimates the conditional discrete-time
  hazard.
* (Discrete-time logistic example) drop the implementation-detail
  parenthetical about reference-coded interval index and fix a stray
  comma (resolved manually by user).
* (Discrete-time logistic example) add an explicit bridge showing
  \hat h_{tilde Y}(j|x) = sigmoid(linear predictor) via
  @eq-discrete-hazard-probability before introducing \hat S.
* (PEM intro) rewrite to distinguish PEM from the discrete-time
  approach up front: continuous-time step-function hazard, exact
  within-interval event times preserved via offset, continuous-time
  hazard estimates.
* (PEM survival formula) show the integral S = exp(-int h du) first,
  then collapse to a sum because h is piecewise constant.
* (PEM) add new subsection "When to use PEM" with pros/cons vs
  discrete-time / survival stacking; remove the orphan TODO comment.

Also tightens an ill-formed set-builder expression in the survival
stacking section: \mathcal{A} = {t_{(1)}, ..., t_{(m+1)}: ...} ->
chain of strict inequalities, matching the convention in
@eq-cut-points.

No prose semantics changed elsewhere; all cross-references resolve.
* Restructure CR section into Separate datasets / Stacked dataset /
  Separate vs. stacked / Application to sir.adm.
* Add MS section: transition-specific pipeline + stacked dataset
  subsections, with new prothr application using a two-stage reduction
  (multi-state -> transition-specific LT single-event -> Poisson PED)
  fit via XGBoost and compared against the AJ baseline split by treatment.
* Replace older Python infographic generators with drawio HTML+SVG sources
  for cr-reduction-pipeline, cr-reduction-stacked, ms-reduction-pipeline,
  and ms-reduction-stacked figures.
* Regenerate cif-marg-sir.png; render tp-prothr-cmp.png.
* Add Limitations callout (separate-data bookkeeping overhead, specialised
  learner trade-offs e.g. native CR RSF).
* P1C5: prothr state-occupation discussion + companion R scripts and
  figures, currently parked in _not_used.qmd.
* library.bib: new citations (niessl2023, putter2018).
* code.R: prothr MS reduction comparison block (AJ + XGBoost-Poisson via
  pammtools as_ped + xgb.train with offset base_margin trick).
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
Signed-off-by: Raphael Sonabend <raphaelsonabend@gmail.com>
@RaphaelS1 RaphaelS1 merged commit 2fc0ad1 into main May 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants