Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add G-formula #10

Closed
pzivich opened this issue Jul 6, 2018 · 8 comments
Closed

Add G-formula #10

pzivich opened this issue Jul 6, 2018 · 8 comments
Assignees

Comments

@pzivich
Copy link
Owner

pzivich commented Jul 6, 2018

One lofty goal is to implement the G-formula. Would need to code two versions; time-fixed and time-varying. The Chapter by Robins & Hernan is good reference. I have code that implements the g-formula using pandas. It is reasonably fast.

TODO: generalize to a class, allow input models then predict, need to determine how to allow users to input custom treatment regimes (all/none/natural course are easy to do), compare results (https://www.ncbi.nlm.nih.gov/pubmed/25140837)

Time-fixed version will be relatively easy to write up

Time-varying will need the ability to specify a large amount of models and specify the order in which the models are fit.

Note; I am also considering reorganizing in v0.2.0 that IPW/g-formula/doubly robust will all be contained within a folder caused causal, rather than adding to the current ipw folder

@pzivich pzivich self-assigned this Jul 6, 2018
@pzivich
Copy link
Owner Author

pzivich commented Jul 13, 2018

Thoughts on how to allow custom specification:

Allow input to be a str object like ["((gf['age']>=25) & (gf['female']==1"))
So this would select those older than 24 and females and apply the treatment to this group only. In the background, the program would take that and run it as an executable via eval(treatment)

Don't know how i feel about this solution. The user would HAVE to specify gf for the function to work properly. Would need to somehow check this. Doesn't seem that elegant but it should get the job done. However it seems easy for the user to break...

@pzivich
Copy link
Owner Author

pzivich commented Jul 14, 2018

This is the solution I am going to use. It works well in my testing. User will need to specify ["((g['age']>=25) & (g['female']==1")] for custom treatment strategies. I am happy with how it works and it seems to be the easiest/most user friendly solution I have come across

Still might want to add in checks to see whether user specified g correctly otherwise it might cause some trouble.

@pzivich
Copy link
Owner Author

pzivich commented Jul 15, 2018

For TimeVaryGFormula lagged variables are causing me a little bit of a problem.

Basically, I think the user will need to specify which variables are lagged and what they correspond to. A dictionary where the keys are the variables and the values are the columns of the lagged variables.

With this addition, I should have some executable code for both g-formulas.

TODO: testing (compared to 722), see if there is any way to make it easier for the user, optimize run time (I am particularly concerned about the time-varying g-formula since it does a lot of operations

@pzivich
Copy link
Owner Author

pzivich commented Jul 15, 2018

Also functional forms for continuous covariates that are predicted are another complex issue... Not sure how to fix that yet

@pzivich
Copy link
Owner Author

pzivich commented Jul 15, 2018

Ugly solution and probably confusing is to use exec() to do the variable recoding. Essentially, this will require the user to write a block of recoding, like code_to_recode = "g['var_sq'] = g['var']**2; g['var_cu'] = g['var']**3

Not happy with this as the solution, however I am going to proceed since I don't know another solution yet. Functional forms for time-varying confounders is not great...

While not the most elegant solution, it should give the user a lot of control to specify. However it will get cumbersome for very complicated recodings. However, it might just be better to write a quick function and call that through this

@pzivich
Copy link
Owner Author

pzivich commented Jul 18, 2018

As somewhat expected, the time-varying g-formula takes a while to run. Memory error occurs with the sort_values functionality

I need to look into some ways to minimize the draw (and speed everything up if possible)

@pzivich
Copy link
Owner Author

pzivich commented Jul 23, 2018

Time-varying G-formula works. A little slow and pulls too much memory. Currently uses pandas.sort_values() after each step. This draws too much memory space. I need to find an alternative to sort by two columns (ID, time) or another way to append that uses the order

@pzivich
Copy link
Owner Author

pzivich commented Jul 23, 2018

Have a working version of both versions of the G-formula. Closing this issue since they have both been added to the GitHub page. Still have the TODO list in the TimeVary file itself.

At this stage, implementation is still experimental and has not been verified

@pzivich pzivich closed this as completed Jul 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant