mean and variance are incorrect for transformed distributions #102

venpopov · 2024-04-04T16:02:08Z

The mean and variance methods for transformed distributions give incorrect results. This is unrelated to recent changes, as it also happens on the cran version. Here are two examples:

library(distributional)
library(ggdist)
library(ggplot2)
library(magrittr)

# Define a lognormal distribution either by the default or by transforming a normal distribution
mu = 2
sd = 1.5
trans_lnorm <- exp(dist_wrap('norm', mean = mu, sd = sd))
def_lnorm <- dist_lognormal(mu = mu, sigma = sd)

# plot shows same density
data.frame(dists = c(def_lnorm, trans_lnorm),
           names = c('Builtin lognormal', 'Transformed normal')) %>%
  ggplot(aes(xdist = dists, y = names)) +
  stat_slab(p_limits = c(0, 0.9)) +
  theme_ggdist() +
  ylab("")

# means and variance are incorrect for the transformed distribution
data.frame(type = c('analytic', 'builtin', 'transformed'),
           mean = c(exp(mu + sd^2/2), 
                    mean(def_lnorm), 
                    mean(trans_lnorm)),
           var = c((exp(sd^2) - 1) * exp(2*mu + sd^2),
                   variance(def_lnorm),
                   variance(trans_lnorm)))
#>          type     mean       var
#> 1    analytic 22.75990 4396.7560
#> 2     builtin 22.75990 4396.7560
#> 3 transformed 15.70174  261.0474

# same with gumbel
def_gumb <- dist_gumbel(alpha = mu, scale = sd)
trans_gumb <- -log(-log(dist_uniform(0, 1))) * sd + mu

data.frame(dists = c(def_gumb, trans_gumb),
           names = c('Builtin gumbel', 'Transformed uniform')) %>%
  ggplot(aes(xdist = dists, y = names)) +
  stat_slab(p_limits = c(0.0001, 0.995)) +
  theme_ggdist() +
  ylab("")

data.frame(type = c('analytic', 'builtin', 'transformed'),
           mean = c(mu + 0.5772 * sd, 
                    mean(def_gumb),
                    mean(trans_gumb)),
           var = c((pi^2/6) * sd^2,
                   variance(def_gumb),
                   variance(trans_gumb)))
#>          type     mean      var
#> 1    analytic 2.865800 3.701102
#> 2     builtin 2.865823 3.701102
#> 3 transformed 2.709438 1.612015

^{Created on 2024-04-04 with reprex v2.1.0}

venpopov · 2024-04-04T17:33:20Z

Now that we don't get NA values after #97, there is an easy fix:

mean_new <- function(x, ...) {
  fun <- function(at, dist) {
    unlist(density(dist, at)) * at
  }
  integrate(fun, -Inf, Inf, dist = x, rel.tol = .Machine$double.eps^0.5)$value
}

covariance_new <- function(x, ...) {
  fun <- function(at, dist) {
    unlist(density(dist, at)) * at^2
  }
  mean <- mean_int(dist)
  integrate(fun, -Inf, Inf, dist = x, rel.tol = .Machine$double.eps^0.5)$value - mean^2
}

mean_new(trans_lnorm)
#> [1] 22.7599
var_int(trans_lnorm)
#> [1] 4396.756

mitchelloharawild · 2024-04-18T12:49:55Z

Currently these are calculated with taylor series approximations, which are faster than the numerical integration and I think work fine in most cases. Maybe numerical integration would be fast enough with a higher tolerance, while still giving comparable accuracy but with improved consistency.

mitchelloharawild · 2024-04-19T06:23:16Z

Closing as this is a duplicate of #73, you can add your examples or comments in that thread.

venpopov mentioned this issue Apr 4, 2024

add more extensive tests #103

Open

mitchelloharawild closed this as completed Apr 19, 2024

venpopov mentioned this issue Apr 19, 2024

Improve accuracy of summary statistic methods for transformed distributions #73

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mean and variance are incorrect for transformed distributions #102

mean and variance are incorrect for transformed distributions #102

venpopov commented Apr 4, 2024

venpopov commented Apr 4, 2024

mitchelloharawild commented Apr 18, 2024

mitchelloharawild commented Apr 19, 2024

mean and variance are incorrect for transformed distributions #102

mean and variance are incorrect for transformed distributions #102

Comments

venpopov commented Apr 4, 2024

venpopov commented Apr 4, 2024

mitchelloharawild commented Apr 18, 2024

mitchelloharawild commented Apr 19, 2024