Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't drop weights when subsetting variables? #298

Open
fweber144 opened this issue Aug 15, 2023 · 6 comments
Open

Don't drop weights when subsetting variables? #298

fweber144 opened this issue Aug 15, 2023 · 6 comments
Labels
feature New feature or request

Comments

@fweber144
Copy link

I noticed that subsetting the variables of a draws object will drop weights:

# Based on the "Examples" section in `?posterior::weight_draws`:
suppressPackageStartupMessages(library(posterior))
x <- as_draws_matrix(example_draws())
wts <- rexp(ndraws(x))
( x <- weight_draws(x, weights = wts) )
#> Loading required namespace: testthat
#> # A draws_matrix: 100 iterations, 4 chains, and 10 variables
#>     variable
#> draw   mu tau theta[1] theta[2] theta[3] theta[4] theta[5] theta[6]
#>   1  2.01 2.8     3.96    0.271    -0.74      2.1    0.923      1.7
#>   2  1.46 7.0     0.12   -0.069     0.95      7.3   -0.062     11.3
#>   3  5.81 9.7    21.25   14.931     1.83      1.4    0.531      7.2
#>   4  6.85 4.8    14.70    8.586     2.67      4.4    4.758      8.1
#>   5  1.81 2.8     5.96    1.156     3.11      2.0    0.769      4.7
#>   6  3.84 4.1     5.76    9.909    -1.00      5.3    5.889     -1.7
#>   7  5.47 4.0     4.03    4.151    10.15      6.6    3.741     -2.2
#>   8  1.20 1.5    -0.28    1.846     0.47      4.3    1.467      3.3
#>   9  0.15 3.9     1.81    0.661     0.86      4.5   -1.025      1.1
#>   10 7.17 1.8     6.08    8.102     7.68      5.6    7.106      8.5
#> # ... with 390 more draws, and 2 more variables
#> # ... hidden reserved variables {'.log_weight'}

( x_sub <- x[, 1:2] )
#> # A draws_matrix: 100 iterations, 4 chains, and 2 variables
#>     variable
#> draw   mu tau
#>   1  2.01 2.8
#>   2  1.46 7.0
#>   3  5.81 9.7
#>   4  6.85 4.8
#>   5  1.81 2.8
#>   6  3.84 4.1
#>   7  5.47 4.0
#>   8  1.20 1.5
#>   9  0.15 3.9
#>   10 7.17 1.8
#> # ... with 390 more draws

Created on 2023-08-15 with reprex v2.0.2

As you can see, x_sub lacks the hidden reserved variable .log_weight.

I'm not sure if this is intended behavior or not, but I think subsetting variables (variables only, not the draws or chains) should keep weights.

@fweber144
Copy link
Author

I forgot to mention that other hidden reserved variables (other than .log_weight) might be affected as well, so perhaps this needs to be resolved by a more general approach which ensures that subsetting columns keeps any hidden reserved variables.

@paul-buerkner
Copy link
Collaborator

I think, we need to make an explicit design decision on whether we want to retain hidden variables or not when subsetting via []. I can imagine it can become very annoying for users if they try to get rid of all variables except for a view via [] but the hidden variables refuse to leave. @mjskay and @jgabry what do you think?

That said, you can always use subset_draws(., variable = ...) to subset while retaining hidden variables.

@mjskay
Copy link
Collaborator

mjskay commented Aug 15, 2023

Hmmm, that's a tough one. It does seem like the default behavior of [ should be to keep reserved variables (or at least .log_weight --- which is currently equivalent since it's the only reserved variable for that format?) to avoid silent errors caused by unexpected dropping of weights. Maybe [ should gain an argument indicating whether reserved variables are subject to the subset (defaulting to not dropping them)? (Perhaps subset() should too?)

@paul-buerkner
Copy link
Collaborator

I agree. So both [] and subset(_draws) should keep reserved variables by default but both gain an argument to turn this off? Sounds like a good approach to me.

@paul-buerkner paul-buerkner added the feature New feature or request label Aug 17, 2023
@jgabry
Copy link
Member

jgabry commented Aug 17, 2023

I agree. So both [] and subset(_draws) should keep reserved variables by default but both gain an argument to turn this off? Sounds like a good approach to me.

Sounds good to me too.

@paul-buerkner
Copy link
Collaborator

One challenge towards this will be that we cannot just use NextMethod for [] in some cases. For example, in the corresponding .draws_matrix method I see:

`[.draws_matrix` <- function(x, i, j, ..., drop = FALSE) {
  # TODO: allow for argument 'reserved' as in '[.draws_df'
  #   right now this fails because NextMethod() cannot ignore arguments
  out <- NextMethod("[", drop = drop)
  if (length(dim(out)) == length(dim(x))) {
    class(out) <- class(x)
    .nchains <- nchains(x)
    if (missing(i)) {
      attr(out, "nchains") <- .nchains
    } else if (.nchains > 1L) {
      warn_merge_chains("index")
    }
  }
  out
}

Just leaving this here for furture reference because I will surely forget otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants