Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Kaplan Meier, Survival models #548

Closed
wants to merge 36 commits into from
Closed

Conversation

josef-pkt
Copy link
Member

see #223
also issue #547 bug for older version in master

add survival models

low test coverage, mostly smoke tests

but could be merged because it's still in sandbox

ScottWPiraino and others added 30 commits April 9, 2012 13:48
…lass for Cox Proportional Hazard models, and create classes for inputs and results for both models
…lan-Meier models with more that 1 category of covariates
@jseabold
Copy link
Member

I'm going to rebase this again with the newest changes and start to go through it.

@jseabold
Copy link
Member

Actually, let me know before I do a force push if this is ok. I've refactored the ovarian cancer dataset to be like the others. Now I'm having some ideas on the Survival class. Maybe I should start a new branch?

@josef-pkt
Copy link
Member Author

push to a new branch, I'm not sure what the status of my local branch is (rebased, rebased2, rebased3, ...)

@jseabold
Copy link
Member

I'll probably just move it to my fork to work on. Looks like it the classes need some cleaning up to work with changes.

I'll continue discussion here and inline on the code as necessary.


##update usage with Survival for changes to Survival

def __init__(self, surv, exog=None, data=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the thinking behind having separate exog and data arguments? Can we stick to the endog, exog=None signature here and add a censoring argument? Then we could also have an time_type = "interval" or whatever and get rid of the Survival class. Then it could be similar to the TimeSeries models. I'm not against having a helper class to pass to these models but I'm seeing if we can simplify this code a bit first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://groups.google.com/d/topic/pystatsmodels/4O-C5-nY0Dc/discussion

I think it's incomplete refactoring that we have both in the __init__ or exog is for the strata, stratification (I need to check again)
surv is endog IIRC

I'm not sure where we go with this.
I like Survival as a class to hold the data descriptive information, instead of attaching everything to the model classes. Also a Survival data instance needs to be created only once, and then can be used with all the related functions and models.
I kind of got to like this idea, but it's a bit of a break with the current signatures.

(I worked a bit similar to this with repeated measures/panel data where I keep internally a Groups instance.)

Preliminary opinion: I would keep Survival as a class for internal use even if we move to a more "standard" signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the Survival abstraction but maybe not from a user perspective. I think it could make sense to have a SurvivalData class or something of the sort similar to what we have to TimeSeries. It's been pretty useful there. Will think some more. I need to play around and get back into this, but I think it looks like a pretty good start and not far from being polished.

@josef-pkt
Copy link
Member Author

Overall I think that this has a very good coverage of survival models, close to being able to do full UCLA-stats type examples.

One main innovation compared to existing models is the included stratification and associated tests and plots.
I think the tight integration of stratification into the model makes it more complex and less modular than necessary.
But I thought a redesign and possible separation of the stratification from the main model can wait until we have unit tests and some more experience in how stratification (in the style of SAS) could be included more generally and in a more modular way in various models.

censoring : int or array-like
index of the column containing an indicator
of whether an observation is an event, or a censored
observation, with 0 for censored, and 1 for an event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a convention? I was thinking of changing the argument to event and then having a censored attribute that is the opposite of event, because this would make a lot more sense to me. But if censoring where 0 indicates censored is a convention then I won't change it.

@jseabold
Copy link
Member

Trying to finish refactoring the CoxPH code. Is there a reference (or intuition) for the "interval" likelihoods anywhere? I haven't come across anything, but I don't have a single good reference in front of me, and this code is a bit hard to follow.

Should end up with generic "stratified" model (I'm calling it GroupedModel for now) and a generic SurvivalModel. Both of which should hopefully be useful and reusable.

@josef-pkt
Copy link
Member Author

Where is the "interval" likelihood? I'll have to look since I don't remember

@jseabold
Copy link
Member

It's in sandbox/survival2.CoxPH (and survival2.Survival). It's what's used when you give a start and end time to Survival as opposed to just a time to failure (or censoring).

@josef-pkt
Copy link
Member Author

My first guess was that it's just a convenience interface for time to failure given by start to end time.

But I think it is absolute (calender) time, so we can date exog correctly.
unemployment spells with macro data ?

I didn't read it long enough to really understand this
https://github.com/jseabold/statsmodels/blob/1ccb9c4b87fad29fff679bc8536f7ff474ef7318/statsmodels/sandbox/survival2.py#L894

I don't have any specific references for survival

@josef-pkt
Copy link
Member Author

http://support.sas.com/documentation/cdl/en/statug/65328/HTML/default/viewer.htm#statug_phreg_details05.htm

I think I also read Stata and SPSS descriptions, but never checked the gory details.

@jseabold
Copy link
Member

jseabold commented Feb 7, 2014

Closing this in favor of #1312. Any further PRs will be linked there.

@jseabold jseabold closed this Feb 7, 2014
@jseabold jseabold deleted the survival-rebased branch February 7, 2014 19:00
@jseabold jseabold restored the survival-rebased branch February 7, 2014 19:00
@bashtage bashtage added the rejected Used for PRs with changes that are not acceptable label Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rejected Used for PRs with changes that are not acceptable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants