New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsetting an rvar with an rvar #282
Comments
Yeah - currently not, but something like this would be great! I've considered this and indexing with integer rvars before, but haven't gotten around to figuring it all out since I think it will take some care to get right. Since it can create situations where rvars of different numbers of chains/draws can be combined (which is currently not supported), we'd want to carefully sort out the semantics of it and make sure all the operations we support are valid (and raise warnings/errors for invalid cases). Eg it will probably have to require chains to be merged (or auto merge them), and I'm not sure what to require for the assignment version |
Here is kind of what I expect this would look like (feel free to use any to none of this):
I'm not sure assignment ( Here are some examples: "[.rvar" <- function(x, ..., drop = FALSE) {
if (is_rvar(..1)) {
stopifnot(length(..1) == 1L)
i <- posterior::draws_of(..1)
x_v <- posterior::draws_of(x)
# ugly code that works for subsetting any dim-array
num_dims <- length(dim(x))
indexing_string <- paste(rep(", ", num_dims), collapse = "")
x_v_subset <- eval(parse(text = paste("x_v[i", indexing_string, ", drop = FALSE]")))
x_mod <- posterior::rvar(x_v_subset)
return(x_mod)
}
# else use current [.rvar:
posterior:::`[.rvar`(x, ..., drop = FALSE)
} Example with vector library(posterior)
A <- rvar(rnorm(1000))
B <- rvar(rnorm(1000, mean = 0.4))
(idx <- A < B)
#> rvar<1000>[1] mean ± sd:
#> [1] 0.62 ± 0.48
Pr(idx)
#> [1] 0.623
A[idx]
#> rvar<623>[1] mean ± sd:
#> [1] -0.44 ± 0.85
z <- c(A,B)
z[idx] # subset both
#> rvar<623>[2] mean ± sd:
#> [1] -0.44 ± 0.85 0.87 ± 0.80
# regular subset still works:
z[2]
#> rvar<1000>[1] mean ± sd:
#> [1] 0.42 ± 0.99 Example with array: rows <- 4
cols <- 3
shelves <- 2
x <- rvar(array(rnorm(4000 * rows * cols * shelves, mean = 1, sd = 1),
dim = c(4000, rows, cols, shelves)))
x
#> rvar<4000>[4,3,2] mean ± sd:
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 1.01 ± 1.00 0.98 ± 1.00 1.01 ± 1.00
#> [2,] 1.03 ± 1.01 0.99 ± 1.00 1.00 ± 1.01
#> [3,] 0.98 ± 1.02 0.97 ± 1.01 1.02 ± 1.00
#> [4,] 0.98 ± 0.98 0.99 ± 1.03 0.97 ± 0.99
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 1.01 ± 0.99 0.97 ± 1.01 1.01 ± 0.97
#> [2,] 0.99 ± 1.00 0.99 ± 1.01 0.99 ± 1.00
#> [3,] 1.00 ± 1.00 1.00 ± 0.99 1.00 ± 1.01
#> [4,] 1.01 ± 1.01 0.99 ± 0.99 0.99 ± 1.00
idx <- x[1,1,2] < x[2,3, 1]
Pr(idx)
#> , , 1
#>
#> [,1]
#> [1,] 0.5085
x[idx]
#> rvar<2034>[4,3,2] mean ± sd:
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 1.01 ± 1.01 0.96 ± 1.01 1.00 ± 1.02
#> [2,] 1.02 ± 1.02 0.98 ± 0.99 1.57 ± 0.82
#> [3,] 0.98 ± 1.01 0.99 ± 1.00 1.00 ± 0.99
#> [4,] 0.99 ± 1.00 0.99 ± 1.00 0.94 ± 0.99
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 0.45 ± 0.83 1.00 ± 1.00 1.03 ± 0.99
#> [2,] 0.98 ± 1.00 1.00 ± 1.00 0.99 ± 0.99
#> [3,] 0.98 ± 0.98 1.00 ± 0.98 1.00 ± 1.03
#> [4,] 1.02 ± 1.01 1.00 ± 0.98 1.01 ± 1.01 Example with brms: library(brms)
library(ggplot2)
library(ggdist)
library(patchwork)
fit <- brm(hp ~ am, data = mtcars,
backend = "cmdstanr", refresh = 0)
#> Start sampling
#> Running MCMC with 4 sequential chains...
#>
#> Chain 1 finished in 0.1 seconds.
#> Chain 2 finished in 0.1 seconds.
#> Chain 3 finished in 0.1 seconds.
#> Chain 4 finished in 0.1 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.1 seconds.
#> Total execution time: 1.3 seconds.
grid <- data.frame(am = 0:1)
grid$hp <- posterior_predict(fit, newdata = grid) |> rvar()
grid
#> am hp
#> 1 0 159 ± 72
#> 2 1 126 ± 74
idx <- with(grid, hp[am==0] > hp[am==1])
Pr(idx)
#> [1] 0.6335
grid$hp2 <- grid$hp[idx]
grid
p1 <- ggplot(grid, aes(fill = factor(am))) +
stat_slabinterval(aes(ydist = hp)) +
coord_cartesian(ylim = c(-150, 500)) +
labs(title = "All draws")
p2 <- ggplot(grid, aes(fill = factor(am))) +
stat_slabinterval(aes(ydist = hp2)) + # SUBSET!
coord_cartesian(ylim = c(-150, 500)) +
labs(title = "Only draws where 0 > 1")
p1 + p2 + plot_layout(guides = "collect") |
Hmmm yeah, I think we could at least implement logical and numeric scalar rvar indexing like this --- it's basically draw indexing. Thinking about the other kinds of array indexing, I don't see how that would conflict with any potential future support for rvars in those types --- we're basically looking at a modification of indexing somewhere about here.
For consistency, I would want it if possible. Ideally |
Here's a prototype implementation on the Will try adding slice assignment there too (haven't yet). |
I've finished a very conservative implementation of this for subsetting ( Currently, the semantics are:
I would love if some folks can play around with this and let me know if the semantics make sense. |
This looks right to me! I'll find some time to play around with it to see if it behaves like my mental model. It would be great to find some archeologists and see if they find it useful. Other than that group, I know Richard Morey (and others?) had some work on order restrictions for estimating effects in psychology. (I've never actually needed to use this myself, but wanted to demonstrate how it can be done in a Bayes class for undergrad stats) (Also, I am aware that it is more efficient, MCMC-wise, to bake these restrictions into the Stan code.) |
I seem to be getting an error with some of the library(posterior)
#> This is posterior version 1.4.1.9000
#>
#> Attaching package: 'posterior'
#> The following objects are masked from 'package:stats':
#>
#> mad, sd, var
#> The following objects are masked from 'package:base':
#>
#> %in%, match
A <- rvar(rnorm(1000))
B <- rvar(rnorm(1000, mean = 0.4))
idx <- A > B
A[idx]
#> rvar<407>[1] mean ± sd:
#> [1] 0.65 ± 0.81
as_draws(A[idx])
#> # A draws_rvars: 407 iterations, 1 chains, and 1 variables
#> $x: rvar<407>[1] mean ± sd:
#> [1] 0.65 ± 0.81
as_draws_array(A[idx])
#> # A draws_array: 407 iterations, 1 chains, and 1 variables
#> , , variable = x
#>
#> chain
#> iteration 1
#> 1 0.039
#> 2 0.697
#> 3 -0.256
#> 4 0.362
#> 5 0.417
#>
#> # ... with 402 more iterations
as_draws_matrix(A[idx])
#> # A draws_matrix: 407 iterations, 1 chains, and 1 variables
#> variable
#> draw x
#> 1 0.039
#> 2 0.697
#> 3 -0.256
#> 4 0.362
#> 5 0.417
#> 6 -0.428
#> 7 0.737
#> 8 -1.130
#> 9 1.256
#> 10 -0.386
#> # ... with 397 more draws
as_draws_rvars(A[idx])
#> # A draws_rvars: 407 iterations, 1 chains, and 1 variables
#> $x: rvar<407>[1] mean ± sd:
#> [1] 0.65 ± 0.81
as_draws_df(A[idx])
#> Error in `[[<-.data.frame`(`*tmp*`, ".chain", value = c(1L, 1L, 1L, 1L, : replacement has 999 rows, data has 407
as_draws_list(A[idx])
#> Error in `[[<-.data.frame`(`*tmp*`, ".chain", value = c(1L, 1L, 1L, 1L, : replacement has 999 rows, data has 407 Created on 2023-04-19 with reprex v2.0.2 |
good catch, thanks! fixed. |
A few additional ideas in this vein that might be worth implementing occurred to me:
|
(I did not understand the other two ideas you had) I've been using the subsetting branch this week - really great, exactly what I had in mind! |
Good point, probably need some consistency here. I suppose a non-rvar version might allow
The third idea probably won't happen any time soon anyway (I am realizing its implementation would probably be slow in pure R anyway, unless we dropped into C/C++). The second idea is just an attempt to generalize the |
@mjskay I absolutely love the
rvar
class! So much better than dealing with pesky vectors / matrices / arrays!Is there a way to subset an
rvar
with anrvar
?If not, it would be great to be able to do something like this:
This would really be great for order restrictions (e.g., the type used in C14 carbon dating...)
The text was updated successfully, but these errors were encountered: