-
|
We are performing a competing risk random survival forest with 4 causes of death. We have 52 features. We are interested in the risk factors for each cause of death as well as predicting probability of survival, mortality, or cumulative incidence of each event type. We have read your 2014 Biostatistics paper and the rfsrc competing risk vignette on the alternative splitting methods (logrank vs. longrank CR; event-specific vs. composite). We have run all 4 types of models just so that we could understand the output and differences (default setting for mtry, random split points, and node size, but number of trees=10,000). Our sample size was 801 and the # events for the 4 causes of death were: 51, 61, 38, and 93. We are sharing just the first cause of death models for the cause-specific models. Below are the performance metrics for the 4 different models.
Comments/Questions:
Thank you, in advance, for any suggestions! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
|
Hi and thank you for sharing such a well-structured experimental setup, this is an exemplary use case of 1. Should we be surprised that composite models outperform cause-specific ones?Not necessarily, and here’s why. Composite models (e.g., using So yes—composite models might outperform cause-specific ones in prediction, even for that specific cause. This is a reminder that better predictive performance ≠ better causal interpretability. 2. Is the Gray’s model (logrankCR + c(1,1,1,1)) best for comparing CIFs?Yes—if your goal is accurate estimation and comparison of CIFs across event types, the Gray-inspired approach ( That said, for visualizing the CIF of one specific variable for one cause only, the Gray model may blur event-specific dynamics, which brings us to the next point. 3. Best approach to determine risk factors (VIMP) for each competing risk?For variable importance (VIMP) and interpretability, the event-specific models are preferable. Here's the logic:
Recommended strategy:
In short: use composite models for prediction / incidence, and event-specific models for interpretation / causality. 4. Which
|
| Goal | Model Type | Notes |
|---|---|---|
| Predict CIFs for all causes | Composite Gray (logrankCR + c(1,1,1,1)) |
Stable and robust |
| Understand risk factors per cause | Event-specific (logrank + c(1,0,0,0), etc.) |
Better VIMP resolution |
| Visualize feature impact (PDP) | Event-specific + surv.type="cif" |
Avoid composite CIF here |
| Rank top predictors overall | Optional: ensemble of VIMPs from all 4 models | For cross-cause summary |
Happy to dig deeper into any of these directions (e.g. marginal effect estimation, time-dependent PDPs, or integrating SHAP for RSF in rfsrc). Again, thanks for the thoughtful post—this kind of discussion pushes the field forward.
Best,
Dante
P.S. We're deeply interested in refining our approach, not just for this project, but for shaping reproducible and interpretable workflows in complex survival contexts.
If anyone in the community has encountered similar patterns (especially regarding the trade-off between composite prediction and cause-specific interpretability), we’d truly appreciate hearing your perspective.
This feels like an area where shared insight can meaningfully raise the bar for applied survival modeling. Thanks again for fostering such a rich exchange.
Beta Was this translation helpful? Give feedback.
-
|
Thank you so much Dante for your very thoughtful and helpful responses to our questions! We really appreciate the time you took to answer our questions so thoroughly and to help us with an analytic approach! Best, |
Beta Was this translation helpful? Give feedback.
-
|
I’m really glad my reply was helpful! 😊 If it answered your questions, it would be great if you could mark it as the accepted answer, that way others can easily find the solution too. 🚀 |
Beta Was this translation helpful? Give feedback.
Hi and thank you for sharing such a well-structured experimental setup, this is an exemplary use case of
rfsrcin a challenging multi-cause competing risks scenario. I’ll address your questions point by point below.1. Should we be surprised that composite models outperform cause-specific ones?
Not necessarily, and here’s why.
Composite models (e.g., using
c(1,1,1,1)weights) leverage global structure across all event types. This broader signal can help stabilize splits, especially when individual causes have low event counts (as in your case: 51/61/38/93 events across 4 causes). By contrast, cause-specific models withc(1,0,0,0)may suffer from higher variance due to sparser events for …