Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mean and variance are incorrect for transformed distributions #102

Closed
venpopov opened this issue Apr 4, 2024 · 3 comments
Closed

mean and variance are incorrect for transformed distributions #102

venpopov opened this issue Apr 4, 2024 · 3 comments

Comments

@venpopov
Copy link
Contributor

venpopov commented Apr 4, 2024

The mean and variance methods for transformed distributions give incorrect results. This is unrelated to recent changes, as it also happens on the cran version. Here are two examples:

library(distributional)
library(ggdist)
library(ggplot2)
library(magrittr)

# Define a lognormal distribution either by the default or by transforming a normal distribution
mu = 2
sd = 1.5
trans_lnorm <- exp(dist_wrap('norm', mean = mu, sd = sd))
def_lnorm <- dist_lognormal(mu = mu, sigma = sd)

# plot shows same density
data.frame(dists = c(def_lnorm, trans_lnorm),
           names = c('Builtin lognormal', 'Transformed normal')) %>%
  ggplot(aes(xdist = dists, y = names)) +
  stat_slab(p_limits = c(0, 0.9)) +
  theme_ggdist() +
  ylab("")

# means and variance are incorrect for the transformed distribution
data.frame(type = c('analytic', 'builtin', 'transformed'),
           mean = c(exp(mu + sd^2/2), 
                    mean(def_lnorm), 
                    mean(trans_lnorm)),
           var = c((exp(sd^2) - 1) * exp(2*mu + sd^2),
                   variance(def_lnorm),
                   variance(trans_lnorm)))
#>          type     mean       var
#> 1    analytic 22.75990 4396.7560
#> 2     builtin 22.75990 4396.7560
#> 3 transformed 15.70174  261.0474
# same with gumbel
def_gumb <- dist_gumbel(alpha = mu, scale = sd)
trans_gumb <- -log(-log(dist_uniform(0, 1))) * sd + mu

data.frame(dists = c(def_gumb, trans_gumb),
           names = c('Builtin gumbel', 'Transformed uniform')) %>%
  ggplot(aes(xdist = dists, y = names)) +
  stat_slab(p_limits = c(0.0001, 0.995)) +
  theme_ggdist() +
  ylab("")

data.frame(type = c('analytic', 'builtin', 'transformed'),
           mean = c(mu + 0.5772 * sd, 
                    mean(def_gumb),
                    mean(trans_gumb)),
           var = c((pi^2/6) * sd^2,
                   variance(def_gumb),
                   variance(trans_gumb)))
#>          type     mean      var
#> 1    analytic 2.865800 3.701102
#> 2     builtin 2.865823 3.701102
#> 3 transformed 2.709438 1.612015

Created on 2024-04-04 with reprex v2.1.0

@venpopov
Copy link
Contributor Author

venpopov commented Apr 4, 2024

Now that we don't get NA values after #97, there is an easy fix:

mean_new <- function(x, ...) {
  fun <- function(at, dist) {
    unlist(density(dist, at)) * at
  }
  integrate(fun, -Inf, Inf, dist = x, rel.tol = .Machine$double.eps^0.5)$value
}

covariance_new <- function(x, ...) {
  fun <- function(at, dist) {
    unlist(density(dist, at)) * at^2
  }
  mean <- mean_int(dist)
  integrate(fun, -Inf, Inf, dist = x, rel.tol = .Machine$double.eps^0.5)$value - mean^2
}

mean_new(trans_lnorm)
#> [1] 22.7599
var_int(trans_lnorm)
#> [1] 4396.756

@mitchelloharawild
Copy link
Owner

Currently these are calculated with taylor series approximations, which are faster than the numerical integration and I think work fine in most cases. Maybe numerical integration would be fast enough with a higher tolerance, while still giving comparable accuracy but with improved consistency.

@mitchelloharawild
Copy link
Owner

Closing as this is a duplicate of #73, you can add your examples or comments in that thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants