Subsetting an rvar with an rvar #282

mattansb · 2023-04-16T18:42:11Z

@mjskay I absolutely love the rvar class! So much better than dealing with pesky vectors / matrices / arrays!

Is there a way to subset an rvar with an rvar?

If not, it would be great to be able to do something like this:

library(posterior)

A <- rvar(rnorm(100))
B <- rvar(rnorm(100))

(idx <- A > B)
#> rvar<100>[1] mean ± sd:
#> [1] 0.44 ± 0.5

Pr(idx) # is logical
#> [1] 0.44

A[idx] # make a new rvar based only on the subset of samples for which idx is true
#> Error...

This would really be great for order restrictions (e.g., the type used in C14 carbon dating...)

The text was updated successfully, but these errors were encountered:

mjskay · 2023-04-17T00:19:02Z

Yeah - currently not, but something like this would be great! I've considered this and indexing with integer rvars before, but haven't gotten around to figuring it all out since I think it will take some care to get right.

Since it can create situations where rvars of different numbers of chains/draws can be combined (which is currently not supported), we'd want to carefully sort out the semantics of it and make sure all the operations we support are valid (and raise warnings/errors for invalid cases). Eg it will probably have to require chains to be merged (or auto merge them), and I'm not sure what to require for the assignment version x[y] <- z in terms of how/whether chains of x, y, and z should conform.

mattansb · 2023-04-18T20:07:30Z

Here is kind of what I expect this would look like (feel free to use any to none of this):

The [.rvar can accept regular indices or:
A single rvar index - it has to be single so that all rvars are subsetted to the same length.
The resulting rvar will lose all chain information.

I'm not sure assignment (rvar[rvar] <- ???) is a required feature for this type of subsetting.

Here are some examples:

"[.rvar" <- function(x, ..., drop = FALSE) {
  if (is_rvar(..1)) {
    stopifnot(length(..1) == 1L)
    i <- posterior::draws_of(..1)
    x_v <- posterior::draws_of(x)
    
    # ugly code that works for subsetting any dim-array
    num_dims <- length(dim(x))
    indexing_string <- paste(rep(", ", num_dims), collapse = "")
    x_v_subset <- eval(parse(text = paste("x_v[i", indexing_string, ", drop = FALSE]")))
    x_mod <- posterior::rvar(x_v_subset)
    return(x_mod)
  }
  
  # else use current [.rvar:
  posterior:::`[.rvar`(x, ..., drop = FALSE)
}

Example with vector rvar:

library(posterior)
A <- rvar(rnorm(1000))
B <- rvar(rnorm(1000, mean = 0.4))

(idx <- A < B)
#> rvar<1000>[1] mean ± sd:
#> [1] 0.62 ± 0.48
Pr(idx)
#> [1] 0.623

A[idx]
#> rvar<623>[1] mean ± sd:
#> [1] -0.44 ± 0.85


z <- c(A,B)
z[idx] # subset both
#> rvar<623>[2] mean ± sd:
#> [1] -0.44 ± 0.85   0.87 ± 0.80

# regular subset still works:
z[2]
#> rvar<1000>[1] mean ± sd:
#> [1] 0.42 ± 0.99

Example with array:

rows <- 4
cols <- 3
shelves <- 2
x <- rvar(array(rnorm(4000 * rows * cols * shelves, mean = 1, sd = 1),
                dim = c(4000, rows, cols, shelves)))
x
#> rvar<4000>[4,3,2] mean ± sd:
#> , , 1
#> 
#>      [,1]         [,2]         [,3]        
#> [1,] 1.01 ± 1.00  0.98 ± 1.00  1.01 ± 1.00 
#> [2,] 1.03 ± 1.01  0.99 ± 1.00  1.00 ± 1.01 
#> [3,] 0.98 ± 1.02  0.97 ± 1.01  1.02 ± 1.00 
#> [4,] 0.98 ± 0.98  0.99 ± 1.03  0.97 ± 0.99 
#> 
#> , , 2
#> 
#>      [,1]         [,2]         [,3]        
#> [1,] 1.01 ± 0.99  0.97 ± 1.01  1.01 ± 0.97 
#> [2,] 0.99 ± 1.00  0.99 ± 1.01  0.99 ± 1.00 
#> [3,] 1.00 ± 1.00  1.00 ± 0.99  1.00 ± 1.01 
#> [4,] 1.01 ± 1.01  0.99 ± 0.99  0.99 ± 1.00

idx <- x[1,1,2] < x[2,3, 1]
Pr(idx)
#> , , 1
#> 
#>        [,1]
#> [1,] 0.5085
x[idx]
#> rvar<2034>[4,3,2] mean ± sd:
#> , , 1
#> 
#>      [,1]         [,2]         [,3]        
#> [1,] 1.01 ± 1.01  0.96 ± 1.01  1.00 ± 1.02 
#> [2,] 1.02 ± 1.02  0.98 ± 0.99  1.57 ± 0.82 
#> [3,] 0.98 ± 1.01  0.99 ± 1.00  1.00 ± 0.99 
#> [4,] 0.99 ± 1.00  0.99 ± 1.00  0.94 ± 0.99 
#> 
#> , , 2
#> 
#>      [,1]         [,2]         [,3]        
#> [1,] 0.45 ± 0.83  1.00 ± 1.00  1.03 ± 0.99 
#> [2,] 0.98 ± 1.00  1.00 ± 1.00  0.99 ± 0.99 
#> [3,] 0.98 ± 0.98  1.00 ± 0.98  1.00 ± 1.03 
#> [4,] 1.02 ± 1.01  1.00 ± 0.98  1.01 ± 1.01

Example with brms:

library(brms)
library(ggplot2)
library(ggdist)
library(patchwork)

fit <- brm(hp ~ am, data = mtcars,
           backend = "cmdstanr", refresh = 0)
#> Start sampling
#> Running MCMC with 4 sequential chains...
#> 
#> Chain 1 finished in 0.1 seconds.
#> Chain 2 finished in 0.1 seconds.
#> Chain 3 finished in 0.1 seconds.
#> Chain 4 finished in 0.1 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.1 seconds.
#> Total execution time: 1.3 seconds.

grid <- data.frame(am = 0:1)
grid$hp <- posterior_predict(fit, newdata = grid) |> rvar()
grid
#>   am       hp
#> 1  0 159 ± 72
#> 2  1 126 ± 74

idx <- with(grid, hp[am==0] > hp[am==1])
Pr(idx)
#> [1] 0.6335

grid$hp2 <- grid$hp[idx]
grid

p1 <- ggplot(grid, aes(fill = factor(am))) +
  stat_slabinterval(aes(ydist = hp)) +
  coord_cartesian(ylim = c(-150, 500)) +
  labs(title = "All draws")

p2 <- ggplot(grid, aes(fill = factor(am))) +
  stat_slabinterval(aes(ydist = hp2)) + # SUBSET!
  coord_cartesian(ylim = c(-150, 500)) +
  labs(title = "Only draws where 0 > 1")

p1 + p2 + plot_layout(guides = "collect")

mjskay · 2023-04-18T21:25:30Z

Hmmm yeah, I think we could at least implement logical and numeric scalar rvar indexing like this --- it's basically draw indexing. Thinking about the other kinds of array indexing, I don't see how that would conflict with any potential future support for rvars in those types --- we're basically looking at a modification of indexing somewhere about here.

I'm not sure assignment (rvar[rvar] <- ???) is a required feature for this type of subsetting.

For consistency, I would want it if possible. Ideally x[i] <- x[i] should result in x being unchanged.

mjskay · 2023-04-18T22:27:02Z

Here's a prototype implementation on the rvar-slice branch if you want to kick the tires: https://github.com/stan-dev/posterior/tree/rvar_slice

Will try adding slice assignment there too (haven't yet).

mjskay · 2023-04-19T02:47:11Z

I've finished a very conservative implementation of this for subsetting ([) and subset-assignment ([<-): it supports only a single scalar logical rvar as an index, used to subset draws. After playing with it, I'm not even sure what the correct implementation for a numeric vector should be (selecting draws or selecting elements of the vector?), so I'm leaving that out for now. I'm honestly not 100% sure what I've come up with makes the most sense, but I tried to keep subsetting and subset-assignment consistent with each other.

Currently, the semantics are:

x[i] for rvar x and scalar logical rvar i returns an rvar with the same shape as x, but subset to the draws with a value TRUE in i. ndraws(i) must be either 1 or ndraws(x). The result has dim(x[i]) = dim(x) and ndraws(x[i]) = sum(i).
x[i] <- y for rvars x, y and scalar logical rvar i replaces each full (joint) draw in x where i is TRUE with the corresponding draw from y. y must have dim(y) = dim(x) and ndraws(y) = sum(i), or be broadcastable to this shape.

I would love if some folks can play around with this and let me know if the semantics make sense.

mattansb · 2023-04-19T05:25:12Z

This looks right to me! I'll find some time to play around with it to see if it behaves like my mental model.

It would be great to find some archeologists and see if they find it useful. Other than that group, I know Richard Morey (and others?) had some work on order restrictions for estimating effects in psychology.

(I've never actually needed to use this myself, but wanted to demonstrate how it can be done in a Bayes class for undergrad stats)

(Also, I am aware that it is more efficient, MCMC-wise, to bake these restrictions into the Stan code.)

mattansb · 2023-04-19T07:43:26Z

I seem to be getting an error with some of the as_draws_*() functions:

library(posterior)
#> This is posterior version 1.4.1.9000
#> 
#> Attaching package: 'posterior'
#> The following objects are masked from 'package:stats':
#> 
#>     mad, sd, var
#> The following objects are masked from 'package:base':
#> 
#>     %in%, match
A <- rvar(rnorm(1000))
B <- rvar(rnorm(1000, mean = 0.4))

idx <- A > B


A[idx]
#> rvar<407>[1] mean ± sd:
#> [1] 0.65 ± 0.81

as_draws(A[idx])
#> # A draws_rvars: 407 iterations, 1 chains, and 1 variables
#> $x: rvar<407>[1] mean ± sd:
#> [1] 0.65 ± 0.81
as_draws_array(A[idx])
#> # A draws_array: 407 iterations, 1 chains, and 1 variables
#> , , variable = x
#> 
#>          chain
#> iteration      1
#>         1  0.039
#>         2  0.697
#>         3 -0.256
#>         4  0.362
#>         5  0.417
#> 
#> # ... with 402 more iterations
as_draws_matrix(A[idx])
#> # A draws_matrix: 407 iterations, 1 chains, and 1 variables
#>     variable
#> draw      x
#>   1   0.039
#>   2   0.697
#>   3  -0.256
#>   4   0.362
#>   5   0.417
#>   6  -0.428
#>   7   0.737
#>   8  -1.130
#>   9   1.256
#>   10 -0.386
#> # ... with 397 more draws
as_draws_rvars(A[idx])
#> # A draws_rvars: 407 iterations, 1 chains, and 1 variables
#> $x: rvar<407>[1] mean ± sd:
#> [1] 0.65 ± 0.81

as_draws_df(A[idx])
#> Error in `[[<-.data.frame`(`*tmp*`, ".chain", value = c(1L, 1L, 1L, 1L, : replacement has 999 rows, data has 407
as_draws_list(A[idx])
#> Error in `[[<-.data.frame`(`*tmp*`, ".chain", value = c(1L, 1L, 1L, 1L, : replacement has 999 rows, data has 407

^{Created on 2023-04-19 with reprex v2.0.2}

mjskay · 2023-04-20T04:29:04Z

good catch, thanks! fixed.

mjskay · 2023-04-27T20:17:32Z

A few additional ideas in this vein that might be worth implementing occurred to me:

ifelse.rvar(test, yes, no), which would broadcast test, yes, and no (all rvars or castable to rvars) to a common shape and then do a vectorized ifelse on them. Might either make ifelse() generic or call this rvar_ifelse().
y = x[[i]] where i is a scalar numeric rvar, which would return a scalar rvar. If $x_{(n)}$ is the $n$-th draw of x, $y_{(n)}$ would be $x_{(n)}[[i_{(n)}]]$
y = x[i,j,...] where i, j, ... are vector numeric rvars, which would return an rvar with dim(y) = lengths(list(i, j, ...)) and $y_{(n)} = x_{(n)}[i_{(n)}, j_{(n)}, ...]$

mattansb · 2023-04-28T05:51:31Z

rvar_ifelse() implies a non-rvar ifelse() which I guess can be used to manipulate the actual vector or rvars?

(I did not understand the other two ideas you had)

I've been using the subsetting branch this week - really great, exactly what I had in mind!

…; for #282

mjskay · 2023-05-02T00:09:15Z

rvar_ifelse() implies a non-rvar ifelse() which I guess can be used to manipulate the actual vector or rvars?

Good point, probably need some consistency here. I suppose a non-rvar version might allow test to be a logical vector instead of a logical rvar vector, and then yes and no would still be rvars (or castable-to-rvar). But then, the result would still be an rvar. So, given the naming scheme used elsewhere, this should perhaps still be something like rvar_ifelse... honestly, rvar_ifelse could be generic and have an implementation for test as an rvar and test as a base vector.

(I did not understand the other two ideas you had)

The third idea probably won't happen any time soon anyway (I am realizing its implementation would probably be slow in pure R anyway, unless we dropped into C/C++).

The second idea is just an attempt to generalize the [[ operator... could use it to create mixtures, for example.

…; for #282

mjskay added a commit that referenced this issue Apr 18, 2023

attempt at slicing an rvar by an scalar rvar, for #282

e7332f2

mjskay added a commit that referenced this issue Apr 19, 2023

support slice assignment with rvar index, for #282

785468a

mjskay added a commit that referenced this issue Apr 20, 2023

ensure x[i] adjusts draw ids for rvars x, i; for #282

97ba3de

mjskay added a commit that referenced this issue Apr 30, 2023

add x[[i]] for rvar x and scalar numeric rvar i; also add rvar_ifelse…

0380343

…; for #282

mjskay added a commit that referenced this issue Jun 14, 2023

attempt at slicing an rvar by an scalar rvar, for #282

3822f9a

mjskay added a commit that referenced this issue Jun 14, 2023

support slice assignment with rvar index, for #282

412b697

mjskay added a commit that referenced this issue Jun 14, 2023

ensure x[i] adjusts draw ids for rvars x, i; for #282

f94baa9

mjskay added a commit that referenced this issue Jun 14, 2023

add x[[i]] for rvar x and scalar numeric rvar i; also add rvar_ifelse…

e2c04b8

…; for #282

mjskay mentioned this issue Jun 14, 2023

Slicing rvars with rvars (and rvar_ifelse) #293

Merged

paul-buerkner closed this as completed in #293 Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsetting an rvar with an rvar #282

Subsetting an rvar with an rvar #282

mattansb commented Apr 16, 2023

mjskay commented Apr 17, 2023

mattansb commented Apr 18, 2023

mjskay commented Apr 18, 2023

mjskay commented Apr 18, 2023 •

edited

mjskay commented Apr 19, 2023

mattansb commented Apr 19, 2023

mattansb commented Apr 19, 2023

mjskay commented Apr 20, 2023

mjskay commented Apr 27, 2023 •

edited

mattansb commented Apr 28, 2023

mjskay commented May 2, 2023

Subsetting an rvar with an rvar #282

Subsetting an rvar with an rvar #282

Comments

mattansb commented Apr 16, 2023

mjskay commented Apr 17, 2023

mattansb commented Apr 18, 2023

mjskay commented Apr 18, 2023

mjskay commented Apr 18, 2023 • edited

mjskay commented Apr 19, 2023

mattansb commented Apr 19, 2023

mattansb commented Apr 19, 2023

mjskay commented Apr 20, 2023

mjskay commented Apr 27, 2023 • edited

mattansb commented Apr 28, 2023

mjskay commented May 2, 2023

mjskay commented Apr 18, 2023 •

edited

mjskay commented Apr 27, 2023 •

edited