Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fitting truncated gamma distribution to simulated data yields unexpected result #21

Closed
paul-sheridan opened this issue Jul 18, 2023 · 3 comments

Comments

@paul-sheridan
Copy link

I am trying to fit a truncated gamma distribution, $X|X>x_0$ for some threshold $x_0>0$, to simulated data using the fitdistrplus package.

To this end, I modified the code from the "Can I fit truncated distributions?" section of the package FAQ to fit a truncated gamma distribution to data generated from a gamma distribution with shape parameter 11, rate parameter 3, and threshold $x_0=5$:

library(fitdistrplus)

dtgamma <- function(x, shape, rate, low, upp)
{
  PU <- pgamma(upp, shape = shape, rate = rate)
  PL <- pgamma(low, shape = shape, rate = rate)
  dgamma(x, shape, rate) / (PU - PL) * (x >= low) * (x <= upp) 
}

ptgamma <- function(q, shape, rate, low, upp)
{
  PU <- pgamma(upp, shape = shape, rate = rate)
  PL <- pgamma(low, shape = shape, rate = rate)
  (pgamma(q, shape, rate) - PL) / (PU - PL) * (q >= low) * (q <= upp) + 1 * (q > upp)
}

set.seed(20230718)
n <- 200
shape <- 11
rate <- 3
x0 <- 5
x <- rgamma(n, shape = shape, rate = rate)
x <- x[x > x0]
fit <- fitdist(
  data = x,
  distr = "tgamma",
  method = "mle",
  start = list(shape = shape, rate = rate),
  fix.arg = list(low = x0, upp = Inf),
  lower = c(0, 0))
fit

However, the estimated parameters as shown here

Fitting of the distribution ' tgamma ' by maximum likelihood 
Parameters:
          estimate Std. Error
shape 1.261387e-06         NA
rate  9.366664e-01         NA
Fixed parameters:
    value
low     5
upp   Inf

differ from the true values to conspicuous degree.

Any ideas about what might be going wrong here?

@dutangc
Copy link
Collaborator

dutangc commented Aug 2, 2023

Dear Paul,
Indeed your truncated gamma example is puzzling. If you plot the log-likelihood surface, you will observe that the lower the shape value is, the higher the log-likelihood is. I try with Nelder Mead (default) and BFGS and obtain the same kind of iterates and fitted values, see attached file (obtained for n=2000). The true value is the red dot

tgamma-iterates

We consider the shape as a fixed parameter, the fitted log-likelihood is a decreasing function of the shape. Surprisingly the fit is correct when comparing the distribution function. see below

tgamma-fittedloglik

This issue also happens if we work on y <- x - x0. y is not distributed as a gamma distribution (shape=11, rate=3). One way to see it is to observe that a gamma distribution (shape=11, rate=3) has an unimodal density, whereas y has a strictly decreasing density. If we fit a gamma distribution on y, we get shape=0.9150596, rate=1.2617782.

The origin of the issue is to use the true value of the low parameter. Indeed if we use

fit.NM.3P <- fitdist(
  data = x,
  distr = "tgamma",
  method = "mle",
  start = list(shape = 10, rate = 10, low=1),
  fix.arg = list(upp = Inf),
  lower = c(0, 0, -Inf), upper=c(Inf, Inf, min(x)))

We obtain a far better estimate (of all parameters):

> coef(fit.NM.3P)
    shape      rate       low 
10.654516  2.947720  5.000502 

The fit of 3 parameters and 2 parameters only is almost indistinguable when checking the cdf, see below.

tgamma-3par-2par

It might be good to add this example to the FAQ? Any comment is welcome.

@paul-sheridan
Copy link
Author

Hi Christophe, this is very enlightening. Just to sum things up, I compare the old approach with your newly suggested one below. This time around I've upped the number of random draws from the gamma distribution from n=200 to n=10000 to sidestep any small sample size issues with x (i.e., the random draws from the gamma distribution exceeding the threshold x0 = 5).

Initial Setup

library(fitdistrplus)

dtgamma <- function(x, shape, rate, low, upp)
{
  PU <- pgamma(upp, shape = shape, rate = rate)
  PL <- pgamma(low, shape = shape, rate = rate)
  dgamma(x, shape, rate) / (PU - PL) * (x >= low) * (x <= upp) 
}

ptgamma <- function(q, shape, rate, low, upp)
{
  PU <- pgamma(upp, shape = shape, rate = rate)
  PL <- pgamma(low, shape = shape, rate = rate)
  (pgamma(q, shape, rate) - PL) / (PU - PL) * (q >= low) * (q <= upp) + 1 * (q > upp)
}

set.seed(20230718)
n <- 10000
shape <- 11
rate <- 3
x0 <- 5
x <- rgamma(n, shape = shape, rate = rate)
x <- x[x > x0]

Old Approach

fit <- fitdist(
  data = x,
  distr = "tgamma",
  method = "mle",
  start = list(shape = shape, rate = rate),
  fix.arg = list(low = x0, upp = Inf),
  lower = c(0, 0))
> fit$estimate
   shape     rate 
9.230792 2.732394 

Newly Suggested Approach

fit.NM.3P <- fitdist(
  data = x,
  distr = "tgamma",
  method = "mle",
  start = list(shape = shape, rate = rate, low = x0),
  fix.arg = list(upp = Inf),
  lower = c(0, 0, -Inf), upper=c(Inf, Inf, min(x)))
> coef(fit.NM.3P)
    shape      rate       low 
11.226122  3.059620  5.000174 

For what it's worth, having a concrete example of fitting a truncated distribution in the FAQ would have proved quite helpful to me a couple of weeks back when I started experimenting with the fitdistrplus package.

@dutangc
Copy link
Collaborator

dutangc commented Feb 19, 2024

Issue added to the FAQ.

@dutangc dutangc closed this as completed Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants