Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gformula sequential #32

Merged
merged 7 commits into from
Dec 9, 2018
Merged

Gformula sequential #32

merged 7 commits into from
Dec 9, 2018

Conversation

pzivich
Copy link
Owner

@pzivich pzivich commented Nov 15, 2018

In reference to #30

Dividing TimeVaryGFormula into two different estimation methods. Monte Carlo (currently implemented) and Sequential Regression (new method). Monte Carlo works better for survival data while Sequential Regression works best for longitudinal data

Sequential regression uses the following process

  1. at Q_t fit a regression model to those who survived till T=t.
  2. predict Y_t based on that model for the intervention of interest
  3. for those who followed the treatment plan AND had the outcome they have a 1 carried forward
  4. all else who did NOT have the outcome, are considered censored (np.nan)
  5. above process is repeated. For those WITH predicted outcomes, their predicted outcome is used in the model fitting. Those who were observed at Q_{t-1} but censored at Q_t have their observed outcome used

…ary. Still needs custom treatment support and testing
…as a reference. Had some weird values show-up on the sample data when converted to time chunks
@pzivich pzivich mentioned this pull request Nov 15, 2018
@pzivich
Copy link
Owner Author

pzivich commented Nov 16, 2018

Sometimes risks go down over time when using these longitudinal methods (personal communication regarding AIPW). Even in the LTMLE paper, they have some risks go down over time. Still weird and feels unnatural to me

My best bet is to simulate some reasonable data and compare to R's ltmle estimated via gcomp=TRUE. If I can obtain consistent results, I will be more confident in my implementation

Might know the issue. In current implement; if have outcome at that time point then always gets reset to 1. However, it should only be set to 1 for FUTURE outcomes.

@pzivich
Copy link
Owner Author

pzivich commented Nov 16, 2018

Found the issue. It was a tricky little piece. In case I need to remember back, The outcomes for individuals ONLY is set to 1 iff they followed the treatment regime of interest, had the outcome, and had that outcome before the current iteration.

This is now caught by adding an additional condition asserting that the current outcome is NaN. This occurs in Step 2.3 of the estimation procedure

@pzivich
Copy link
Owner Author

pzivich commented Nov 20, 2018

Next step is to simulate data. It looks like it will be the easiest way. Some publicly available longitudinal data requires registering, so I don't think I can include the data with zEpid...

R's ltmle has some recipes for simulated data that would be a good starting point

@pzivich pzivich changed the base branch from master to v0.4.0 November 20, 2018 18:30
@pzivich pzivich merged commit 1866c0b into v0.4.0 Dec 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant