Fix base-inherited behavior of variance() on matrices #55

mjskay · 2021-05-20T04:31:18Z

The variance() generic inherits a misfeature of base-R var() in that for matrices it returns a covariance matrix. This is a misfeature in my opinion as it means that (1) var() does not parallel sd(); (2) var() returns a different type of output depending on what it guesses the caller's intent is rather than just providing a consistent API; and (3) cov() returns the covariance matrix anyway so there is no need for var() to do so. Because variance() delegates to var() in the default case, it also has this behavior.

For example:

library(distributional)

set.seed(123456)
x = rnorm(16)
sd(x)
## [1] 1.060834
sd(matrix(x, nrow = 4))
## [1] 1.060834
var(x)
## [1] 1.125369
var(matrix(x, nrow = 4))
##            [,1]       [,2]       [,3]       [,4]
## [1,]  0.2946480  0.2999319  0.4823488 -0.3004184
## [2,]  0.2999319  0.6153391  0.1802705 -0.4886662
## [3,]  0.4823488  0.1802705  1.1038920 -0.3054725
## [4,] -0.3004184 -0.4886662 -0.3054725  0.4174298
variance(x)
## [1] 1.125369
variance(matrix(x, nrow = 4))
##            [,1]       [,2]       [,3]       [,4]
## [1,]  0.2946480  0.2999319  0.4823488 -0.3004184
## [2,]  0.2999319  0.6153391  0.1802705 -0.4886662
## [3,]  0.4823488  0.1802705  1.1038920 -0.3054725
## [4,] -0.3004184 -0.4886662 -0.3054725  0.4174298

I would expect variance() on a matrix to behave much like sd() does; i.e. return a single value the same as it does on a vector.

We ran into this problem in {posterior} as we would like people to be able to run summary functions over posterior samples that are stored in matrices (see stan-dev/posterior#121). Since obviously we can't fix base var(), and we already have a dependency on {distributional}, we were hoping to be able to use distributional's variance() for this purpose.

I think the fix should be straightforward, either by changing this function definition:

distributional/R/distribution.R

Lines 220 to 223 in ab2fd9e

    
           #' @export 
        
           variance.default <- function(x, ...){ 
        
             stats::var(x, ...) 
        
           }

to something like this:

#' @export
variance.default <- function(x, ...){
  stats::var(as.vector(x), ...)
}

Or by adding a function definition like this:

#' @export
variance.numeric <- function(x, ...){
  stats::var(as.vector(x), ...)
}

Are either of those changes something you'd be willing to have in distributional? If so I'd be happy to submit a PR for whichever solution you prefer.

The text was updated successfully, but these errors were encountered:

mitchelloharawild · 2021-05-20T07:12:29Z

I don't mind adding a variance(<numeric>) method, and possibly also removing the default (or better, raising a helpful error). This deviation from {stats} for matrices should be mentioned in the documentation.

I'm happy for you to work on a PR for this.

…itchelloharawild#55

make variance() return variance (not covariance) on matrices, closes #55

mitchelloharawild · 2021-10-04T05:36:11Z

I'm preparing a release for this now and wanted to get your opinion on the appropriateness of this change in relation to multivariate distributions. The current behaviour is:

library(distributional)

ux <- c(1, 2)
mx <- matrix(c(0,3,1,4), nrow = 2)
uv <- dist_normal(0,1)
mv <- dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2)))
dimnames(mv) <- c("a", "b")

mean(ux)
#> [1] 1.5
mean(mx)
#> [1] 2
mean(uv)
#> [1] 0
mean(mv)
#>      a b
#> [1,] 1 2

variance(ux)
#> [1] 0.5
variance(mx)
#> [1] 3.333333
variance(uv)
#> [1] 1
variance(mv)
#> [[1]]
#>      [,1] [,2]
#> [1,]    4    2
#> [2,]    2    3

^{Created on 2021-10-04 by the reprex package (v2.0.0)}

I'm now considering changing the multivariate distribution's variance() output to give the diagonal instead of the variance-covariance matrix, so that:

variance(mv)
#>      a b
#> [1,] 4 3

A new generic, covariance() would be added to give the current variance(mv) behaviour.

Does this sound reasonable? The other question would be what covariance(<matrix>) should give? I think it is reasonable to expect a covariance matrix, but in that case shouldn't variance() give the diagonal of the matrix rather than the variance of all columns combined?

mjskay · 2021-10-11T04:42:41Z

Yeah, I agree --- I think for a multivariate normal I would expect variance() to give the variance of the marginal distributions, i.e. the diagonal of the covariance matrix.

Ref #55

mitchelloharawild · 2021-10-11T10:48:11Z

Thanks :)

Here's what I've got so far for variance() and covariance(), do you see any problems with this?

library(distributional)

ux <- c(1, 2)
mx <- matrix(c(0,3,1,4), nrow = 2)
uv <- dist_normal(0,1)
mv <- dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2)))
dimnames(mv) <- c("a", "b")

mean(ux)
#> [1] 1.5
mean(mx)
#> [1] 2
mean(uv)
#> [1] 0
mean(mv)
#>      a b
#> [1,] 1 2

variance(ux)
#> [1] 0.5
variance(mx)
#> [1] 3.333333
variance(uv)
#> [1] 1
variance(mv)
#>      a b
#> [1,] 4 3

covariance(ux)
#> Error in stats::cov(x, ...): supply both 'x' and 'y' or a matrix-like 'x'
covariance(mx)
#>      [,1] [,2]
#> [1,]  4.5  4.5
#> [2,]  4.5  4.5
covariance(uv)
#> [1] 1
covariance(mv)
#> [[1]]
#>      [,1] [,2]
#> [1,]    4    2
#> [2,]    2    3

^{Created on 2021-10-11 by the reprex package (v2.0.0)}

mjskay · 2021-10-12T02:22:05Z

Looks good to me!

mjskay added a commit to mjskay/distributional that referenced this issue May 22, 2021

make variance() return variance (not covariance) on matrices, closes m…

13237e8

…itchelloharawild#55

mjskay mentioned this issue May 22, 2021

make variance() return variance (not covariance) on matrices, closes #55 #56

Merged

mitchelloharawild closed this as completed in #56 Jun 7, 2021

mitchelloharawild added a commit that referenced this issue Jun 7, 2021

Merge pull request #56 from mjskay/variance

1f7f49a

make variance() return variance (not covariance) on matrices, closes #55

mitchelloharawild added a commit that referenced this issue Oct 11, 2021

Added covariance()

2b847a6

Ref #55

mitchelloharawild mentioned this issue Dec 1, 2021

variance(<draws_array>) method for changing default method in {distributional} stan-dev/posterior#219

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix base-inherited behavior of variance() on matrices #55

Fix base-inherited behavior of variance() on matrices #55

mjskay commented May 20, 2021

mitchelloharawild commented May 20, 2021

mitchelloharawild commented Oct 4, 2021

mjskay commented Oct 11, 2021

mitchelloharawild commented Oct 11, 2021

mjskay commented Oct 12, 2021

Fix base-inherited behavior of variance() on matrices #55

Fix base-inherited behavior of variance() on matrices #55

Comments

mjskay commented May 20, 2021

mitchelloharawild commented May 20, 2021

mitchelloharawild commented Oct 4, 2021

mjskay commented Oct 11, 2021

mitchelloharawild commented Oct 11, 2021

mjskay commented Oct 12, 2021