Skip to content

A friendly syntax for creating dags and simulating data from them.

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ianmoran11/raldag

Repository files navigation

raldag

Lifecycle: experimental Codecov test coverage R-CMD-check

raldag helps you create DAGs and generate data from them.

Specify how nodes generate data and evaluate inputs

Declare the nodes of your DAG and how data is generated by those nodes with the following code:

c <- v("c", .f = d(~ rnorm(n = 10^4, mean = rsum(.x) + 10, sd =  1)))
a <- v("a", .f = d(~ rnorm(n = 10^4, mean = rsum(.x)     , sd =  1)))
x <- v("x", .f = d(~ rnorm(n = 10^4, mean = rsum(.x) +  5, sd =  1)))
y <- v("y", .f = d(~ rnorm(n = 10^4, mean = rsum(.x) +  2, sd =  1)))
m <- v("m", .f = d(~ rnorm(n = 10^4, mean = rsum(.x) +  8, sd =  1)))

In this case, the node c generates 1000 observations with a mean equal to the sum of its input edges plus 10 and a standard deviation of 4.

Declare causal connections

Causal connections between nodes as well as coefficients are specified with the graph syntax from ralget

 (a * b(10) + m * b(10)) * x +
 (c * b(10) + m * b(10)) * y +
 (a * b(10) + c * b(10)) * m

Generate data from DAG

Now that we have a DAG with an underlying data generating process, we can simulate data:

g %>% simulate(seed = 123)
#> # A tibble: 10,000 × 7
#>   sim_id      a     c     m     x     y label         
#>    <int>  <dbl> <dbl> <dbl> <dbl> <dbl> <chr>         
#> 1      1  2.37   9.81  130. 1329. 1401. simulation set
#> 2      2 -0.167 10.3   110. 1101. 1203. simulation set
#> 3      3  0.927  9.46  111. 1126. 1208. simulation set
#> # … with 9,997 more rows

Introduce interventions

We can also examine causal effects using interventions, inspired by Pearl’s do-calculus.

For example, we can set a to 0:

g %>% manipulate(a = 0) %>% simulate(seed = 123)
#> # A tibble: 10,000 × 7
#>   sim_id     a     c     m     x     y label         
#>    <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>         
#> 1      1     0  9.81  106. 1069. 1164. simulation set
#> 2      2     0 10.3   111. 1119. 1220. simulation set
#> 3      3     0  9.46  102. 1024. 1116. simulation set
#> # … with 9,997 more rows

And we could compare that with what happens when a is set to 1:

g %>% manipulate(a = 1) %>% simulate(seed = 123)
#> # A tibble: 10,000 × 7
#>   sim_id     a     c     m     x     y label         
#>    <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>         
#> 1      1     1  9.81  116. 1179. 1264. simulation set
#> 2      2     1 10.3   121. 1229. 1320. simulation set
#> 3      3     1  9.46  112. 1134. 1216. simulation set
#> # … with 9,997 more rows

Compare intervention distributions

Here’s how our intervention on a has affected the variables in our DAG/DGP.

bind_rows(obs, do0, do1) %>% 
  gather(var,value, -sim_id,-label) %>%
  plot_distributions()

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("ianmoran11/raldag")

About

A friendly syntax for creating dags and simulating data from them.

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published