-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support competing events in prep-data-long-surv #42
Comments
The assumption will be that the data for multiple events will be provided in a "tidy" form: This means that:
In other words, the data will look something like the following:
The first subject had two recurring events (type 1 above), then a censoring event at t=1.2 (censored because the value = 0). Subjects 2 and 3 had no new-lesion events and had terminating events at t=5.5. There are two assumptions which are often made in processing these data, which are helpful to make explicit:
In other words, in the example above, we are assuming that no "new_lesion" events occurred between the event at t=0.44 and the censoring event at t=1.2. It also means that we will infer, for subjects 2 and 3 above, that no new-lesion events occurred, since the censor status for the 'death' event is tied to that for the 'new-lesion' event. Depending on your data, this may or may not be accurate. (In other words, knowing there are no new-lesion-events is very different from not knowing whether there is a new-lesion). In order to relax assumption 2, we would need to know the time of censor for new-lesion events, which would require that the user create an additional record at t=1.2 censoring the second event type at t=1.2. For now, we don't have a need to support this. |
For competing & semi-competing risk models (related to #36), we often have multiple events (& separate covariates for each event). The first part of supporting competing & semi-competing risk models is to allow for multiple events in prep_data_long_surv.
Some design decisions:
input data We will assume for now that the data for each event is stored in separate dummy variables, we want to handle the scenario where the user provides a list of event_col names rather than a single event_col name. The case where the user has data stored with numerical indices is one we can support at a later date, by first transforming the two-outcome data to a factor.
event type there are several ways to handle multiple events, depending on their type:
[0, 0, 0, 1, 0, 0, 0, ...]
. In other words, it occurred at t=3, but did not occur before that time & did not occur at timepoints following.[0, 0, 0, 1]
fort = [0, 1, 2, 3]
. After t=3, there should be no following records in the dataset. (this is the default type)0
values before the event, and1
values at and following the event:[0, 0, 0, 1, 1, 1, ...]
. The most natural use case for this type of event in a clinical setting is the entering of an intermediate state, like "recurrence" or "hospitalization".The current plan is to support each of these, to varying degrees.
The text was updated successfully, but these errors were encountered: