Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support competing events in prep-data-long-surv #42

Closed
jburos opened this issue Jan 7, 2017 · 1 comment
Closed

support competing events in prep-data-long-surv #42

jburos opened this issue Jan 7, 2017 · 1 comment
Assignees

Comments

@jburos
Copy link
Member

jburos commented Jan 7, 2017

For competing & semi-competing risk models (related to #36), we often have multiple events (& separate covariates for each event). The first part of supporting competing & semi-competing risk models is to allow for multiple events in prep_data_long_surv.

Some design decisions:

input data We will assume for now that the data for each event is stored in separate dummy variables, we want to handle the scenario where the user provides a list of event_col names rather than a single event_col name. The case where the user has data stored with numerical indices is one we can support at a later date, by first transforming the two-outcome data to a factor.

event type there are several ways to handle multiple events, depending on their type:

  • binary event, recurring: let's consider an event that occurs at t=3, and let's assume we have integer timepoints numbered 0, 1, 2, .. etc. The event data for this time-series will look like: [0, 0, 0, 1, 0, 0, 0, ...]. In other words, it occurred at t=3, but did not occur before that time & did not occur at timepoints following.
  • binary event, terminating: let's consider the same set up with an event at t=3, as above. In this case, our event data should look like: [0, 0, 0, 1] for t = [0, 1, 2, 3]. After t=3, there should be no following records in the dataset. (this is the default type)
  • binary event, one-time: considering a similar set up as in previous examples, this event type with an event at t=3 will produce data with 0 values before the event, and 1 values at and following the event: [0, 0, 0, 1, 1, 1, ...]. The most natural use case for this type of event in a clinical setting is the entering of an intermediate state, like "recurrence" or "hospitalization".

The current plan is to support each of these, to varying degrees.

@jburos jburos self-assigned this Jan 7, 2017
@jburos
Copy link
Member Author

jburos commented Jan 7, 2017

The assumption will be that the data for multiple events will be provided in a "tidy" form:

This means that:

  1. All event data are provided in a single data frame
  2. Each "observation" of an event state is provided in a separate record.
  3. Each record contains:
    • subject identifier
    • time value
    • event label
    • event value (1: yes / 0: no)

In other words, the data will look something like the following:

subject_id time event_name event_value
1 0.22 new_lesion 1
1 0.44 new_lesion 1
1 1.2 death 0
2 5.5 death 1
3 5.5 death 0

The first subject had two recurring events (type 1 above), then a censoring event at t=1.2 (censored because the value = 0). Subjects 2 and 3 had no new-lesion events and had terminating events at t=5.5.

There are two assumptions which are often made in processing these data, which are helpful to make explicit:

  1. Assume that the event state is "0" between observed timepoints for each subject, and that no records will be created following the event unless otherwise specified. This is equivalent to the second case above, of the binary, terminating event type.

  2. Also assume that each subject is censored at a single time, which applies uniformly for all event types. Here, the censor time is assumed to be the time of last observation for each subject_id.

In other words, in the example above, we are assuming that no "new_lesion" events occurred between the event at t=0.44 and the censoring event at t=1.2.

It also means that we will infer, for subjects 2 and 3 above, that no new-lesion events occurred, since the censor status for the 'death' event is tied to that for the 'new-lesion' event. Depending on your data, this may or may not be accurate. (In other words, knowing there are no new-lesion-events is very different from not knowing whether there is a new-lesion).

In order to relax assumption 2, we would need to know the time of censor for new-lesion events, which would require that the user create an additional record at t=1.2 censoring the second event type at t=1.2. For now, we don't have a need to support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant