Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variance(<draws_array>) method for changing default method in {distributional} #219

Closed
mitchelloharawild opened this issue Dec 1, 2021 · 5 comments · Fixed by #221
Closed

Comments

@mitchelloharawild
Copy link

Looks like the changes in mitchelloharawild/distributional#55 to the default method for variance() has caused problems with {posterior}.
Would you like to add a method for variance(<draws_array>) or swap out distributional::variance for var in examples and tests?

library(posterior)
#> This is posterior version 1.1.0.9000
#> 
#> Attaching package: 'posterior'
#> The following objects are masked from 'package:stats':
#> 
#>     mad, sd, var
x <- example_draws("eight_schools")
summarise_draws(x, var = distributional::variance)
#> # A tibble: 10 × 17
#>    variable var.1  var.2    var.3  var.4  var.5 var.6   var.7  var.8    var.9
#>    <chr>    <dbl>  <dbl>    <dbl>  <dbl>  <dbl> <dbl>   <dbl>  <dbl>    <dbl>
#>  1 mu        12.1  0.652 -0.288   -0.153  0.652  17.5 -0.157  -1.46  -0.288  
#>  2 tau       11.9 -2.40  -0.0611  -0.799 -2.40   13.1 -1.27   -0.304 -0.0611 
#>  3 theta[1]  46.4  0.805 -3.43     0.782  0.805  42.6 -9.72   -1.65  -3.43   
#>  4 theta[2]  27.0 -0.132  0.00865 -1.52  -0.132  26.6 -5.31   -2.10   0.00865
#>  5 theta[3]  44.9 -8.23  -3.24    -3.66  -8.23   36.2 -2.01    0.194 -3.24   
#>  6 theta[4]  22.8  0.266 -1.42    -1.74   0.266  27.6 -2.15    0.593 -1.42   
#>  7 theta[5]  30.9  0.879  0.840   -4.10   0.879  29.8  2.34   -0.505  0.840  
#>  8 theta[6]  34.8  0.754 -3.73    -3.03   0.754  24.9  0.0602  0.939 -3.73   
#>  9 theta[7]  30.5  2.41  -0.968   -1.16   2.41   34.8 -2.53   -3.50  -0.968  
#> 10 theta[8]  25.7  1.01  -3.63    -1.32   1.01   33.1  0.314  -4.44  -3.63   
#> # … with 7 more variables: var.10 <dbl>, var.11 <dbl>, var.12 <dbl>,
#> #   var.13 <dbl>, var.14 <dbl>, var.15 <dbl>, var.16 <dbl>

devtools::load_all("~/github/distributional/")
#> ℹ Loading distributional
summarise_draws(x, var = distributional::variance)
#> Error in variance.default(structure(c(2.00583113051241, 1.45831611462033, : The variance() method is not supported for objects of type c("draws_array", "draws", "array")

Created on 2021-12-01 by the reprex package (v2.0.0)

@mjskay
Copy link
Collaborator

mjskay commented Dec 1, 2021

Ah --- this was what we had hoped would be solved with mitchelloharawild/distributional#55 --- the old output you've given above is also incorrect because it uses var under the hood and inherits its misfeature of actually returning a covariance matrix.

I think the reason it didn't fix it is just because the classes for draws_array includes "array" but not "numeric". If you're willing I think a good solution would be to also add a definition like this to {distributional}:

#' @rdname variance
#' @export
variance.array <- variance.numeric

Which in a quick test I made now solves the problem for us:

x <- example_draws("eight_schools")
summarise_draws(x, var = distributional::variance)
# A tibble: 10 x 2
   variable   var
   <chr>    <dbl>
 1 mu        11.6
 2 tau       12.8
 3 theta[1]  39.7
 4 theta[2]  21.5
 5 theta[3]  46.2
 6 theta[4]  24.2
 7 theta[5]  25.9
 8 theta[6]  26.6
 9 theta[7]  27.7
10 theta[8]  27.6

@mitchelloharawild
Copy link
Author

I think I'd prefer to be conservative with the behaviour of variance() for now - I'm not sure what the most generic method variance of an arrays should be.
Would it be possible to add a method for variance(<draws_array>)?

@mjskay
Copy link
Collaborator

mjskay commented Dec 7, 2021

We can, though I would argue the interface you have has already made a choice re: arrays: the variance function on base arrays will already use the numeric generic --- due to peculiarities in S3 dispatch, base numeric arrays will still dispatch to the "numeric" generic despite not having the "numeric" class, even though objects that explicitly subclass from "array" don't. E.g. this dispatches to variance.numeric() in the current version of {distributional}:

x = array(1:10, dim = c(1,2,5))
class(x)
## [1] "array"
variance(x)
## [1] 9.166667
class(x) = c("foo","array")
variance(x)
## Error in variance.default(x) : 
##   The variance() method is not supported for objects of type c("foo", "array")

So adding the "array" generic as an alias in some sense would make the interface more consistent.

Anyway, if you still would rather not we can add the draws_array generic here. :)

@mitchelloharawild
Copy link
Author

Thanks - on reflection I'm starting to think that variance(<matrix>) should not use variance(<numeric>) to give var(as.vector(<matrix>)), and instead it is more consistent to give diag(var(<matrix>)). This is consistent with base, and the concept driven throughout {distributional} that matrix columns are variates (as used in dist_multivariate_normal() and soon, dist_sample() (mitchelloharawild/distributional#75)).

The variance(<numeric>) method is not yet on CRAN, so I wouldn't say that adding variance(<matrix>) is a breaking change.

I'm not sure how this extends to arrays, but I doubt giving a single variance is best. So at this stage I think a draws_array method is safest.

@mjskay
Copy link
Collaborator

mjskay commented Dec 28, 2021

Makes sense, I'll put together a PR for variance.draws_array here then. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants