moving to dev version 0.1.1; adding examples to readme.md;

osofr · Sep 28, 2015 · 8ae2608 · 8ae2608
1 parent aeda43e
commit 8ae2608
Show file tree

Hide file tree

Showing 2 changed files with 152 additions and 3 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: tmlenet
 Title: Targeted Maximum Likelihood Estimation for Network Data
-Version: 0.1.0
+Version: 0.1.1
 Authors@R: c(
     person("Oleg", "Sofrygin", role=c("aut", "cre"), email="oleg.sofrygin@gmail.com"),
     person(c("Mark", "J."), "van der Laan", role="aut", email="laan@berkeley.edu"))

diff --git a/README.md b/README.md
@@ -1,7 +1,6 @@
 tmlenet
 ==========
 
-
 [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tmlenet)](http://cran.r-project.org/package=tmlenet)
 [![](http://cranlogs.r-pkg.org/badges/tmlenet)](http://cran.rstudio.com/web/packages/tmlenet/index.html)
 [![Travis-CI Build Status](https://travis-ci.org/osofr/tmlenet.svg?branch=master)](https://travis-ci.org/osofr/tmlenet)
@@ -13,6 +12,12 @@ The `tmlenet` R package performs estimation of average causal effects for single
 
 ### Installation
 
+To install the CRAN release version of `simcausal`: 
+
+```R
+install.packages('tmlenet')
+```
+
 To install the development version of `tmlenet` (requires the `devtools` package):
 
 ```R
@@ -39,7 +44,151 @@ The summary measures (`sW.i`,`sA.i`) are defined simultaneously for all `i` with
 All estimation is performed by calling the `tmlenet` function. The vector of friends `F.i` can be specified either as a single column in the input data (where each `F.i` is a string of friend IDs or friend row numbers delimited by character `sep`) or as a separate input matrix of network IDs (where each row is a vector of friend IDs or friend row numbers). Specifying the network as a matrix generally results in significant improvements to run time. See `tmlenet` function help file for additional details on how to specify these and the rest of the input arguments.
 
 ### Example
-...
+
+We will use the sample dataset (`W`=(`W1`,`W2`,`W3`),`A`,`Y`) and the sample network matrix of friend IDs (`F`) that come along with the package:
+
+```R
+data(df_netKmax6)
+head(df_netKmax6)
+data(NetInd_mat_Kmax6)
+head(NetInd_mat_Kmax6)
+Kmax <- ncol(NetInd_mat_Kmax6) # Max number of friends in this network:
+```
+
+The estimation algorithm assumes that the outcomes in `Y.i` for units `i=1,...,N` are conditionally independent,
+given the summary measures defined in `def_sW` and the summary measures defined in `def_sA`.
+
+When no additional assumptions about the conditional independence of outcomes `Y.i` can be made 
+(beyond the dependence on the network structure),
+one can define the summary measures `sW` and `sA` non-parametrically, e.g., 
+for each observation `i`: include in `sW` all baseline covariates of unit `i` and 
+all baseline covariates of `i`'s friends; include in `sA` the exposure of unit `i` and
+all exposures of `i`'s friends. 
+
+The example below does just that, defining  `sW`:=(`netW1`,`netW2`,`netW3`) and `sA`:=`netA`, 
+where `netVar` is a summary measure of dimension `Kmax+1` and includes `Var` values of each 
+unit as well as `Var` values of all friends of each unit:
+
+```R
+def_sW <- def.sW(netW1 = W1[[0:Kmax]], netW2 = W2[[0:Kmax]], netW3 = W3[[0:Kmax]])
+def_sA <- def.sA(netA = A[[0:Kmax]])
+```
+
+Note that the summary measure `nF` (number of friends for each unit) is always added automatically to 
+`def.sW` function calls (only once), but not to `def.sA`.
+
+A helper function that can pre-evaluate the above summary measures based on the input data:
+
+```R
+eval_res <- eval.summaries(sW = def_sW, sA = def_sA,  Kmax = 6, data = df_netKmax6,
+                          NETIDmat = NetInd_mat_Kmax6)
+```
+
+Contents of the list returned by eval.summaries():
+
+```R
+head(eval_res$sW.matrix) # Matrix of sW summary measures:
+head(eval_res$sA.matrix) # Matrix of sA summary measures:
+head(eval_res$NETIDmat) # matrix of network IDs:
+# Observed data summary measures (sW,sA) and network stored in one object:
+# eval_res$DatNet.ObsP0
+# class(eval_res$DatNet.ObsP0)
+```
+
+In the example below, we estimate mean population outcome under deterministic intervention that assigns all `A` to 0
+(network specified via a matrix of friend IDs). Note that can also use previously evaluated
+summary measures object `DatNet.ObsP0` as input to `tmlenet`, avoiding the need to specify the argumentss
+(`data`,`NETIDmat`,`Kmax`,`sW`,`sA`) for the second time.
+
+```R
+res1 <- tmlenet(data = df_netKmax6, NETIDmat = NetInd_mat_Kmax6, Kmax = Kmax, 
+                sW = def_sW, sA = def_sA,
+                Anode = "A", Ynode = "Y",
+                f_gstar1 = 0L, optPars = list(n_MCsims = 1))
+res1$EY_gstar1$estimates
+res1$EY_gstar1$vars
+res1$EY_gstar1$CIs
+```
+
+By default, the conditional expectation `E[Y=1|...]` (`Qform` argument) is estimated by including all the
+summary measures defined in `sW` and `sA` as predictors in the logistic regression for the outcome `Y`.
+Similarly, by default, the observed exposure model `P(sA|sW)` (`hform.g0` argument) is estimated
+as the conditional probability of observing all summary measures defined in `sA`, given all summary measures
+defined in `sW`. Finally, the intervention exposure model `P(sA^*|sW)` (`hform.gstar` 
+argument) is estimated by first replacing all observed exposures in `A` with those generated from
+the intervention function specified in `f_gstar1` (new exposures denoted by `A^*`) and then building
+the same summary measures defined in `sW` and `sA` using exposures `A^*` instead of `A`
+(new summary measures denoted by `sA^*`). By default, the intervention exposure model `P(sA^*|sW)`
+will be estimated as the conditional probability of observing all summary measures defined in `sA^*` 
+(`sA^*` built with `A^*` using the same summary mappings as in `sA`), given the summary measures defined in `sW`.
+
+One can alter this default behavior and use the arguments `Qform`, `hform.g0` and `hform.gstar`
+to select a subset of the summary measures in `sW`,`sA` to be included in each of the three models described above.
+For example, below we are assuming that the outcomes in `Y` only depend on the summary measures `netA`,`netW2` 
+(`"Y~netA+netW2"`), hence the observed exposure model is `P(sA|netW2)` (`"netA~netW2"`) and we also know that 
+`f_gstar1` defines a static intervention and hence sA^* doesn't depend on any covariates (`P(sA^*|sW)' is degenerate),
+but is estimated with a simple model (`"netA ~ nF"`):
+
+```R
+res2 <- tmlenet(DatNet.ObsP0 = eval_res$DatNet.ObsP0,
+                    Anode = "A", Ynode = "Y", 
+                    Qform = "Y ~ netA + netW2",
+                    hform.g0 = "netA ~ netW2",
+                    hform.gstar = "netA ~ nF",
+                    f_gstar1 = 0L, optPars = list(n_MCsims = 1))
+res2$EY_gstar1$estimates
+res2$EY_gstar1$vars
+res2$EY_gstar1$CIs
+```
+
+One might be also willing to make dimension reducing assumptions about the dependence of each `Y.i` on its 
+network. For example, here we assume that each `Y.i` depends on its network's baseline covariates only
+through a sum of its friends' values of `W3` and `Y.i` depends on its network's exposures only through a sum 
+of `i`'s friends' interactions `(1-A)*(W2)` (while we assume `Y.i` still depends on `i`'s baseline covariates and
+`i`'s exposure):
+
+```R
+def_sW <- def.sW(W = c(W1,W2,W3)) +
+          def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE)
+
+def_sA <- def.sA(A) +
+          def.sA(sum.netAW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE)
+
+eval_res <- eval.summaries(sW = def_sW, sA = def_sA, Kmax = 6, data = df_netKmax6,
+                            NETIDmat = NetInd_mat_Kmax6, verbose = TRUE)
+
+res3 <- tmlenet(DatNet.ObsP0 = eval_res$DatNet.ObsP0,
+                Anode = "A", Ynode = "Y",
+                Qform = "Y ~ A + sum.netAW2 + W + sum.netW3 + nF",
+                hform.g0 = "A + sum.netAW2 ~ sum.netW3",
+                hform.gstar = "A + sum.netAW2 ~ sum.netW3",
+                f_gstar1 = 0, optPars = list(n_MCsims = 1))
+res3$EY_gstar1$estimates
+```
+
+Note that the above model specified by `Qform` includes all summary measures in `sW`,`sA`, and hence is equivalent to the default regression model that would have been used if `Qform` was omitted.
+
+One can specify any intervention of interest, for example below we estimate the counterfactual mean outcome under intervention that randomly assigns 20% of the population to exposure `A=1`. Note that we are also increasing the number of Monte-Carlo simulations
+from 1 to 100.
+
+```R
+f.A_.2 <- function(data, ...) rbinom(n = nrow(data), size = 1, prob = 0.2)
+res4 <- tmlenet(data = df_netKmax6, NETIDmat = NetInd_mat_Kmax6, Kmax = Kmax,
+                sW = def_sW, sA = def_sA, 
+                Anode = "A", Ynode = "Y", 
+                f_gstar1 = f.A_.2, optPars = list(n_MCsims = 100))
+res4$EY_gstar1$estimates
+```
+
+To estimate the average treatment effect (ATE) for two interventions (static or stochastic), specify the second intervention function using the argument `optPars(f_gstar2 = ...)`. In the example below, the intervention `f_gstar1`
+statically sets everyone's exposure to `A=1` and the intervention `f_gstar2` statically sets everyone's exposure to `A=0`:
+
+```R
+res5 <- tmlenet(data = df_netKmax6, NETIDmat = NetInd_mat_Kmax6, Kmax = Kmax,
+                sW = def_sW, sA = def_sA, Anode = "A", Ynode = "Y",
+                f_gstar1 = 1, optPars = list(f_gstar2 = 0, n_MCsims = 1))
+res5$ATE$estimates
+```
 
 ### Citation
 To cite `tmlenet` in publications, please use: