-
-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cox Proportional Hazards models #230
Comments
This issue is kind of duplicate with #175. Are you aware of any nice implementation of Cox Proportional Hazards models in Stan? This would really help in getting them implemented. |
Hi, sorry for the duplicate. There is an example model at github.com/stan-dev |
Thanks for the example! I have one follow up question: What is the "response" variable of such a model? I mean what should be returned by methods such as |
This is from the function |
Maybe looking at SurvivalStan is informative. |
After reading a bit about Bayesian survival regression, it seems we have two ways of doing it:
The cox regression stuff appears to fall in category 2, which is yet to be implemented in brms. Do I understand the situation correctly? |
Hi Paul, From what I know of Cox regression, your understanding is correct. If I understand correctly myself, implementing (1) in brms would be an accelerated failure time model. Being able to model proportional hazards as in (2) would be fantastic! Many thanks! Charles |
Hi Paul, Maybe adding the Cox Proportional Hazards model may be 'circumvented'. I often use the following approach: The results are very comparable or even identical to the coxph results. |
Sounds reasonable, thanks! Could you give a code example? Maybe i add this
to the documentation.
Am 17.10.2017 22:06 schrieb "HansTierens" <notifications@github.com>:
… Hi Paul,
Maybe adding the Cox Proportional Hazards model may be 'circumvented'.
I often use the following approach:
First, organize the survival data in the long-format (splitting the time
horizon by unique failure times or even unique observation times).
Then fit a Poisson regression with the failure status as a dependent
variable and 'interval length' as an offset variable.
The baseline hazard can be flexibly modelled using a smoothing spline of
time.
The results are very comparable or even identical to the coxph results.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#230 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMVtAK1p4gm0goq2hUpWqiHVKx7v3U3gks5stQjPgaJpZM4OGN3Z>
.
|
Hi Hans, This sounds like fantastic approach! I would be very interested in seeing your code for this, if you are happy to share. Kind regards, |
Hi Charles, I quickly composed a R-markdown file explaining the approach. Kind regards, |
Thanks Hans, That is very helpful - thanks for taking the time to send it through! Kind regards, Charles |
Given that the proportional hazards models can be fitted using poisson regression as explained by @HansTierens above (see also http://discourse.mc-stan.org/t/using-stan-jm-for-parametric-proportional-hazards-regression-only/3931/3) and that brms now allows custom families if one wants to use a different representation of the model, I think this issue can be closed. |
Reopening this issue following some discussions with users. Maybe we can get proportional hazards models working in brms at some point. |
A possibly relevant contribution on Discourse... |
The Cox model is now implemented and can be accessed via
|
Nice ! Thank you very much !
I have to disagree with the results :
A sign error somewhere ? Or something more worrying ? Questions :
|
This is not a sign error but intentional for consistency with other
families in brms: higher values = longer survival times. Sorry I should
have said this.
I did not implement time dependent covariates on purpose as this costs so
much extra work I currently cannot effort. But rstanarm::stan_surv (https://github.com/stan-dev/rstanarm/tree/feature/survival) can do this.
|
Interesting rationale. It just goes against half a century of relative risk estimation, disturbing readers' (and reviewers') habits... At least in medicine, these habits are deeply ingrained. Could that be controlled by an option to the
Would you be open to a proposal (in the form of a
I'm more interested in
which essential in real-world medical problems. |
I will think about this. Since the family is not offically supported yet, we can make changes at any point if we like. In any case, this will be clearly documented and explained.
Of course! The only problem I currently see is that we have to use numerical integration of the (now predictor dependent) baseline hazard in this case, as we have no general analytic solution to integrate it was we have with M-Splines and I-Splines. But I may be mistaken. |
The whole point of Cox's proportional hazard model was to integrate out the (unknow) base hazard function.
In any way, I think that, in a PH modeling problem, the shape of h(t) can't be but a secondary goal, the main goal being the relative risks estimation(s) ; if this shape becomes the main goal, other, more parametric models should be used. |
The way I see the Cox model from a brms perspective is as inducing a probability distribution via The piecewise constant hazard looks like something that will likely have problems with (2) but I haven't tried it out myself thoroughly enough to be sure about this. |
I just reverted the cox parameterization back to the standard one, that is, higher values of eta imply higher hazard and thus lower survival times. The improved consistency with other families in brms is likely not worth the confusion caused when users compare the brms results to the results of other packages and find the signs to be inverted. |
Indeed. The inconsistency isn't yours : for half a century, the risk analysis that the Cox model (and AFT models, BTW) embody has been taught under the name of "survival analysis". Which might be the misnomer of the century... OTOH, Kaplan-Meier estimation is (part of survival analysis. Piecewise-constant vs. spline estimation of risk as a function of time : I think that this is but a marginal issue. I used piecewise-constant model because I understood it. I have yet to wrap my mind around your clever use of "auto-integrating" splines (which indeed fullfill the need of estimating h(t) and using its integral H = \int_t0}^{t1] h(t) dt. I'm reading about this kind of possibility. The fundamental issue is that in the case of your present implementation, the only information available about one observational unit (one subject) are
which fits well in the "general scheme" of regression analysis : (scalar observation) ~ (vector of covariates)(vector of coefficients) or, equivalently (vector of observations) ~ (covariates matrix)(design matrix). In this scheme, one observation <==> one line of inputs <==> one increment to the log-likelihood. This cannot stand with the "extensions to Cox models" such as time-varying covariates or multistage models : these models are, IMHO, examples of what I'll call "history analysis", where the observational unit is a subject, and the observation a (varying length) time-ordered collection of records. The likelihood of any of these records has no meaning in itself ; what is pertinent to the analysis is the likelihood of the collection of records pertaining to one subject. Using Terry Therneau's presentation of {Id (start end] status} records, it is relatively easy to come up with a Stan program estimating the various element of a complicated model. The log-likelihood is best computed in a "transformed parameters" section where a subject-wise array of LLs is updated by the likelihood of the individual records. What is difficult is to find a way to wrap the various elements indicating to the program:
without overcomplicating the presentation. Multistage models are but another complication, since one does not try to estimate a single (scalar) risk, but a matrix of state transition risks. Furthermore, in most cases, supplementary info should be passed to the program, if only to indicate possible and impossible transition (e. g. transitioning from "dead" to "pregnant" is impossible...). That is the kind of book-keeping that computers are supposed to excel at... So I think that the "regresssion presentation" of brms may or may not be the "best" solution of encompassing these extensions:
The possibility of passing a complicated program "in pieces", as brms does in specifying a whole DAG by specifying one regression per edge, should be kept, somehow... Your thoughts ? |
I agree with you. brms functionality is amendable to time-varying covariates for instance in the way What you call history analysis is out of scope of brms I believe and might actually be strongly related to what Both rstanarm and brms are part of the stan supported interfaces and I don't plan to duplicate much of what rstanarm can already do very well via All the advanced stuff is nice but likely goes into a very specific direction beyond the scope of brms. |
Hello, all of this Group |
You see an example for the cox model as implemented in brms in the above posts. Please keep in mind that the implementation is experimental and there still remains stuff to be implemented for instance surival curves etc. |
ok thanks paul-buerkner . |
brms package can do bayesian parametric survival model?if yes how? |
The Cox proportional hazards model is now offically supported via family |
Extremely nice. But, as of Is there an upcoming update ? |
You have to install brms from github for now to make use of the latested (exported) version of the feature. |
The results obtained in analyzing survival data via `brm` and the `cox` family seem extremely sensitive to the baseline hazard parametrization. This can be illustrated by the analysis of the extremely simple dataset `Leuk`, from `survival` library(survival)
data(leukemia, package="survival")
print(head(leukemia))
length(table(leukemia$time))
A landmark can be given by the frequentist analysis : CPH <- coxph(Surv(time,status)~x, data=leukemia)
summary(CPH)
Bayesian analysis via `brms` with the default parameters gives consonant results : invisible({
library(brms)
options(mc.cores = parallel::detectCores())
rstan::rstan_options(auto_write = TRUE) })
B1 <-brm(time|cens(Cens)~x,
data=within(leukemia, Cens <- factor(c("none","right")[2-status])),
family=cox, iter=10000, silent=TRUE, open_progress=FALSE, refresh=0)
summary(B1)
However, these results seem exremely sensitive to the parametrisation of the baseline hazard function, whose only control, as far as I can tell, is the number of nodes ("degrees of freedom") of the spline: B2 <- do.call(
rbind,
lapply(
3*(2:6),
function(d)
data.frame(DF=d,
as.data.frame(
summary(
brm(
time|cens(Cens)~x,
data=within(
leukemia,
Cens <- factor(c("none","right")[2-status])),
family=cox(bhaz=list(df=d)),
iter=10000, silent=TRUE,
open_progress=FALSE, refresh=0))$fixed)))) format(B2, digits=3)
|
Interesting. Could you try what happens if you exclude the spline intercept via |
About the same : B3 <- do.call(
rbind,
lapply(
3*(2:6),
function(d)
data.frame(DF=d,
as.data.frame(
summary(
brm(
time|cens(Cens)~x,
data=within(
leukemia,
Cens <- factor(c("none","right")[2-status])),
family=cox(bhaz=list(df=d, intercept=FALSE)),
iter=10000, silent=TRUE,
open_progress=FALSE, refresh=0))$fixed)))) format(B3, digits=3)
HTH, |
BTW : the way the bazeline hazard is modeled should be documented (along with a pointer to a basic introduction to M-splines) : I still do not understand it, even after reading the generated A vignette illustrating the way the time-to-event analysis is implemented in I still think that survival analysis "in full" (including time-dependent covariates and, possibly, accelerated failure time modeling) is a valid target for OTOH, further reflections on the possible extensions to multistage/multistate models (what I called "history" earlier in this thread) led me to thonk that this is a fundamentally different (more complex) problem, harder to fit in the regression straitjacket. I still have to think further about this. HTH, |
Isn't most of this already implemented in I am not sure if adding all of these features to brms is worthwile, because it requires a lot of special case code. Edit: I have some quick doc in the |
I (recently) published an article which fits time-dependent covariates (and mentions an approach to extend this towards time-dependent effects too). My approach would still be the Poisson approach. I think that the strength of using the Poisson approach lies in the easier use of an offset (when time intervals have unequal lengths) and its interpretation. The article is uses an organizational (labor-market) data example, but the whole code can be easily extended to much more applications in medicine, physics, biology, engineering, ... The main contribution of this article for organizational sciences is the multiple-membership setting, which I absolutely love!!! Since brms has the most straightforward implementation and easy-to-use approach in R so far! I hope this might be somewhat helpful. |
Hi, any chance of adding Cox Proportional Hazards models? They are widely used for survival analysis in medical research and would be a nice addition to the package. Thanks!
The text was updated successfully, but these errors were encountered: