Competing risk and alternative splitting methods #467

cmnelson1 · 2025-07-16T19:52:21Z

cmnelson1
Jul 16, 2025

We are performing a competing risk random survival forest with 4 causes of death. We have 52 features. We are interested in the risk factors for each cause of death as well as predicting probability of survival, mortality, or cumulative incidence of each event type. We have read your 2014 Biostatistics paper and the rfsrc competing risk vignette on the alternative splitting methods (logrank vs. longrank CR; event-specific vs. composite). We have run all 4 types of models just so that we could understand the output and differences (default setting for mtry, random split points, and node size, but number of trees=10,000). Our sample size was 801 and the # events for the 4 causes of death were: 51, 61, 38, and 93. We are sharing just the first cause of death models for the cause-specific models. Below are the performance metrics for the 4 different models.

Logrank splitting rule using c(1,0,0,0)
(OOB) Requested performance error: 0.29115942, 0.25861567, 0.43694966, 0.33555775
Logrank splitting rule using c(1,1,1,1)
(OOB) Requested performance error: 0.27111804, 0.24055408, 0.26897492, 0.2889797
LogrankCR splitting rule c(1,0,0,0)
(OOB) Requested performance error: 0.29224457, 0.2641798, 0.46006413, 0.3565026
LogrankCR splitting rule c(1,1,1,1)
(OOB) Requested performance error: 0.26521754, 0.22380342, 0.26136581, 0.29091971

Comments/Questions:

We were surprised to see that the performance was slightly better for the first cause of death for the composite models compared to the cause-specific models (for that cause of death). Should we be surprised by that?
Should we expect that the Gray’s composite ensemble CIF (model 4 above) is the best approach for comparing the CIF for the 4 causes of death?
Should we expect that the logrank model with event-specific weights would be the best approach to determine risk factors (via VIMP) for each competing risk (model 1 above for the first cause of death and 0,1,0,0; 0,0,1,0; 0,0,0,1 for the remaining 3 causes of death)?
What type of predicted value would you suggest for extracting the partial dependence plots for variables that are found to rank highest on VIMP, for specific causes of death? We have extracted the cumulative incidence functions for specific variables by using plot.variable (surv.type=”cif”) for the logrank model (model 1 above). Does it make more sense to plot surv.type=”surv” or surv.type=”mort”? Or does it make more sense to use the composite Gray’s model to plot cif for individual predictor variables?

Thank you, in advance, for any suggestions!

Answered by DanteTrb

Jul 17, 2025

Hi and thank you for sharing such a well-structured experimental setup, this is an exemplary use case of rfsrc in a challenging multi-cause competing risks scenario. I’ll address your questions point by point below.

1. Should we be surprised that composite models outperform cause-specific ones?

Not necessarily, and here’s why.

Composite models (e.g., using c(1,1,1,1) weights) leverage global structure across all event types. This broader signal can help stabilize splits, especially when individual causes have low event counts (as in your case: 51/61/38/93 events across 4 causes). By contrast, cause-specific models with c(1,0,0,0) may suffer from higher variance due to sparser events for …

View full answer

DanteTrb · 2025-07-17T11:33:17Z

DanteTrb
Jul 17, 2025

Hi and thank you for sharing such a well-structured experimental setup, this is an exemplary use case of rfsrc in a challenging multi-cause competing risks scenario. I’ll address your questions point by point below.

1. Should we be surprised that composite models outperform cause-specific ones?

Not necessarily, and here’s why.

Composite models (e.g., using c(1,1,1,1) weights) leverage global structure across all event types. This broader signal can help stabilize splits, especially when individual causes have low event counts (as in your case: 51/61/38/93 events across 4 causes). By contrast, cause-specific models with c(1,0,0,0) may suffer from higher variance due to sparser events for a single cause.

So yes—composite models might outperform cause-specific ones in prediction, even for that specific cause. This is a reminder that better predictive performance ≠ better causal interpretability.

2. Is the Gray’s model (logrankCR + c(1,1,1,1)) best for comparing CIFs?

Yes—if your goal is accurate estimation and comparison of CIFs across event types, the Gray-inspired approach (logrankCR with composite weights) is typically the most aligned with the theory. It’s specifically designed to respect the non-informative censoring assumption across multiple competing risks and give a more robust view of overall incidence.

That said, for visualizing the CIF of one specific variable for one cause only, the Gray model may blur event-specific dynamics, which brings us to the next point.

3. Best approach to determine risk factors (VIMP) for each competing risk?

For variable importance (VIMP) and interpretability, the event-specific models are preferable. Here's the logic:

c(1,0,0,0): focuses splitting exclusively on the first cause
VIMP from this model will prioritize features that separate observations according to Cause 1 vs. censoring + other causes
This approach approximates a Fine-Gray-like cause-specific hazard, which is what you want for feature attribution

Recommended strategy:

Use event-specific logrank or logrankCR models (e.g., c(0,1,0,0), etc.)
Extract VIMP from each
Build PDP or SHAP-style plots per cause (as you’re doing)

In short: use composite models for prediction / incidence, and event-specific models for interpretation / causality.

4. Which `surv.type` for partial dependence plots (PDP)?

Let’s clarify the types:

surv.type="surv" → survival probability (P(T > t), all causes)
surv.type="mort" → 1 - survival probability = all-cause mortality
surv.type="cif" → cumulative incidence for a specific cause

Given your goal is to visualize the impact of predictors on each cause of death, then:

Use surv.type="cif" when:

You’re plotting from an event-specific model trained on c(1,0,0,0) (or the equivalent for other causes)
You want to know how much a variable contributes to the probability of death by that specific cause

⚠️ Be careful if you use surv.type="cif" from a composite Gray model—this blends signals and may misattribute contributions if you're interested in individual causes.

In practice:

For interpretation: stick with surv.type="cif" + event-specific model
For global PDPs across all causes: consider using the composite model with surv.type="mort" if you want to assess overall mortality risk conditioned on covariates

Suggested hybrid workflow

Goal	Model Type	Notes
Predict CIFs for all causes	Composite Gray (`logrankCR + c(1,1,1,1)`)	Stable and robust
Understand risk factors per cause	Event-specific (`logrank + c(1,0,0,0)`, etc.)	Better VIMP resolution
Visualize feature impact (PDP)	Event-specific + `surv.type="cif"`	Avoid composite CIF here
Rank top predictors overall	Optional: ensemble of VIMPs from all 4 models	For cross-cause summary

Happy to dig deeper into any of these directions (e.g. marginal effect estimation, time-dependent PDPs, or integrating SHAP for RSF in rfsrc). Again, thanks for the thoughtful post—this kind of discussion pushes the field forward.

Best,
Dante

P.S. We're deeply interested in refining our approach, not just for this project, but for shaping reproducible and interpretable workflows in complex survival contexts.

If anyone in the community has encountered similar patterns (especially regarding the trade-off between composite prediction and cause-specific interpretability), we’d truly appreciate hearing your perspective.

This feels like an area where shared insight can meaningfully raise the bar for applied survival modeling. Thanks again for fostering such a rich exchange.

0 replies

cmnelson1 · 2025-08-06T13:52:24Z

cmnelson1
Aug 6, 2025
Author

Thank you so much Dante for your very thoughtful and helpful responses to our questions! We really appreciate the time you took to answer our questions so thoroughly and to help us with an analytic approach!

Best,
Mindy

0 replies

DanteTrb · 2025-08-06T14:08:53Z

DanteTrb
Aug 6, 2025

I’m really glad my reply was helpful! 😊 If it answered your questions, it would be great if you could mark it as the accepted answer, that way others can easily find the solution too. 🚀

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Competing risk and alternative splitting methods #467

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Competing risk and alternative splitting methods #467

Uh oh!

cmnelson1 Jul 16, 2025

1. Should we be surprised that composite models outperform cause-specific ones?

Replies: 3 comments

Uh oh!

DanteTrb Jul 17, 2025

1. Should we be surprised that composite models outperform cause-specific ones?

2. Is the Gray’s model (logrankCR + c(1,1,1,1)) best for comparing CIFs?

3. Best approach to determine risk factors (VIMP) for each competing risk?

4. Which surv.type for partial dependence plots (PDP)?

Suggested hybrid workflow

Uh oh!

cmnelson1 Aug 6, 2025 Author

Uh oh!

DanteTrb Aug 6, 2025

cmnelson1
Jul 16, 2025

DanteTrb
Jul 17, 2025

4. Which `surv.type` for partial dependence plots (PDP)?

cmnelson1
Aug 6, 2025
Author

DanteTrb
Aug 6, 2025