Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantile works incorrectly with negation of non-normal distributions #100

Closed
venpopov opened this issue Apr 1, 2024 · 7 comments
Closed

Comments

@venpopov
Copy link
Contributor

venpopov commented Apr 1, 2024

Just found out that if we have a transformed distribution by negation, the quantile function gives incorrect results:

library(distributional)
d <- -dist_gumbel(0, 1)
quantile(d, 0)
#> [1] Inf
quantile(d, 0.1)
#> [1] 0.8340324
quantile(d, 0.2)
#> [1] 0.475885
quantile(d, 1)
#> [1] -Inf

Created on 2024-04-01 with reprex v2.1.0

This is unrelated to #96 - it would have happened before with a transformation such as 1-dist_gumbel(0,1)

I picked up the gumbel distribution on purpose, because this is a common transformation - gumbel comes in two flavors, gumbel max and gumbel min, which are just with inversed input values. But this happens for any distribution on which the negative operation is applied.

This is due to how the quantile.dist_transformed function is implemented:

quantile.dist_transformed <- function(x, p, ...){
  x[["transform"]](quantile(x[["dist"]], p, ...))
}

It's also a problem for the cdf, which is also inverted.

@venpopov
Copy link
Contributor Author

venpopov commented Apr 1, 2024

For the quantile function this is one possible easy and quick solution:

#' @export
quantile.dist_transformed <- function(x, p, ...){
  q1 <- x[["transform"]](quantile(x[["dist"]], p, ...))
  q2 <- x[["transform"]](quantile(x[["dist"]], 1 - p, ...))
  ifelse((q1 > q2 & p > 0.5) | (q1 < q2 & p < 0.5), q1, q2)
}

@venpopov
Copy link
Contributor Author

venpopov commented Apr 1, 2024

Adding a similar check to the cdf function also fixes things:

cdf.dist_transformed <- function(x, q, ...){
  inv <- function(v) SW(x[["inverse"]](v))
  p1 <- cdf(x[["dist"]], inv(q), ...)
  p2 <- cdf(x[['dist']], inv(q+1), ...)
  p <- ifelse(p1 <= p2, p1, 1-p1)

  limits <- field(support(x), "lim")[[1]]
  if (!any(is.na(limits))) {
    p[q <= limits[1]] <- 0
    p[q >= limits[2]] <- 1
  }
  p
}

The logic is basically that the cdf should increase monotonically. If it does not, then we need to take the 1-cdf.

Let me know what you think about this approach

@mitchelloharawild
Copy link
Owner

Transformations are assumed (but not tested) to be monotonic (#7), and I suppose the current implementation also assumes that they are monotonically increasing over the distribution's domain.

Rather than calculating the quantile, cdf, etc. multiple times I think it is better (more efficient) to instead test if the transformation is increasing or decreasing over the support of the underlying distribution. I'll see if I can fix this one.

@mitchelloharawild
Copy link
Owner

Thanks, should be fixed now. Check out the changes and let me know what you think.
I realise that the implementation isn't vectorised, and will need further work after #25.

library(distributional)
d <- -dist_gumbel(0, 1)
quantile(d, 0)
#> [1] -Inf
quantile(d, 0.1)
#> [1] -2.250367
quantile(d, 0.2)
#> [1] -1.49994
quantile(d, 1)
#> [1] Inf

Created on 2024-04-05 with reprex v2.0.2

@venpopov
Copy link
Contributor Author

venpopov commented Apr 5, 2024

Yup, I can confirm it is now correct:

library(distributional)
d <- dist_gumbel(0,1)
x <- seq(-4,4, 0.001)
plot(x, cdf(d, x)[[1]], type = "l")
lines(x, cdf(-d, x)[[1]], col='red')

cdf_gumbel_max <- function(x) {
  exp(-exp(-x))
}

cdf_gumbel_min <- function(x) {
  1 - exp(-exp(x))
}

plot(x, cdf_gumbel_max(x), type = "l")
lines(x, cdf_gumbel_min(x), col='red')

Created on 2024-04-05 with reprex v2.1.0

Thanks!

@venpopov
Copy link
Contributor Author

venpopov commented Apr 5, 2024

Though it looks like it caused some checks to fail on master, you probably need to take a look at that

mitchelloharawild added a commit that referenced this issue Apr 5, 2024
@mitchelloharawild
Copy link
Owner

Thanks, the failing test was from an invalid distribution.
We might like to calculate and store the transformation's monotonic nature (increasing/decreasing) when the transformation is applied, and in doing so raise an error if the transformation is invalid (e.g. log on distributions with domain <0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants