Fitting truncated gamma distribution to simulated data yields unexpected result #21

paul-sheridan · 2023-07-18T11:38:13Z

I am trying to fit a truncated gamma distribution, $X|X>x_0$ for some threshold $x_0>0$, to simulated data using the fitdistrplus package.

To this end, I modified the code from the "Can I fit truncated distributions?" section of the package FAQ to fit a truncated gamma distribution to data generated from a gamma distribution with shape parameter 11, rate parameter 3, and threshold $x_0=5$:

library(fitdistrplus)

dtgamma <- function(x, shape, rate, low, upp)
{
  PU <- pgamma(upp, shape = shape, rate = rate)
  PL <- pgamma(low, shape = shape, rate = rate)
  dgamma(x, shape, rate) / (PU - PL) * (x >= low) * (x <= upp) 
}

ptgamma <- function(q, shape, rate, low, upp)
{
  PU <- pgamma(upp, shape = shape, rate = rate)
  PL <- pgamma(low, shape = shape, rate = rate)
  (pgamma(q, shape, rate) - PL) / (PU - PL) * (q >= low) * (q <= upp) + 1 * (q > upp)
}

set.seed(20230718)
n <- 200
shape <- 11
rate <- 3
x0 <- 5
x <- rgamma(n, shape = shape, rate = rate)
x <- x[x > x0]
fit <- fitdist(
  data = x,
  distr = "tgamma",
  method = "mle",
  start = list(shape = shape, rate = rate),
  fix.arg = list(low = x0, upp = Inf),
  lower = c(0, 0))
fit

However, the estimated parameters as shown here

Fitting of the distribution ' tgamma ' by maximum likelihood 
Parameters:
          estimate Std. Error
shape 1.261387e-06         NA
rate  9.366664e-01         NA
Fixed parameters:
    value
low     5
upp   Inf

differ from the true values to conspicuous degree.

Any ideas about what might be going wrong here?

The text was updated successfully, but these errors were encountered:

dutangc · 2023-08-02T05:42:47Z

Dear Paul,
Indeed your truncated gamma example is puzzling. If you plot the log-likelihood surface, you will observe that the lower the shape value is, the higher the log-likelihood is. I try with Nelder Mead (default) and BFGS and obtain the same kind of iterates and fitted values, see attached file (obtained for n=2000). The true value is the red dot

We consider the shape as a fixed parameter, the fitted log-likelihood is a decreasing function of the shape. Surprisingly the fit is correct when comparing the distribution function. see below

This issue also happens if we work on y <- x - x0. y is not distributed as a gamma distribution (shape=11, rate=3). One way to see it is to observe that a gamma distribution (shape=11, rate=3) has an unimodal density, whereas y has a strictly decreasing density. If we fit a gamma distribution on y, we get shape=0.9150596, rate=1.2617782.

The origin of the issue is to use the true value of the low parameter. Indeed if we use

fit.NM.3P <- fitdist(
  data = x,
  distr = "tgamma",
  method = "mle",
  start = list(shape = 10, rate = 10, low=1),
  fix.arg = list(upp = Inf),
  lower = c(0, 0, -Inf), upper=c(Inf, Inf, min(x)))

We obtain a far better estimate (of all parameters):

> coef(fit.NM.3P)
    shape      rate       low 
10.654516  2.947720  5.000502

The fit of 3 parameters and 2 parameters only is almost indistinguable when checking the cdf, see below.

It might be good to add this example to the FAQ? Any comment is welcome.

paul-sheridan · 2023-08-02T10:04:34Z

Hi Christophe, this is very enlightening. Just to sum things up, I compare the old approach with your newly suggested one below. This time around I've upped the number of random draws from the gamma distribution from n=200 to n=10000 to sidestep any small sample size issues with x (i.e., the random draws from the gamma distribution exceeding the threshold x0 = 5).

Initial Setup

library(fitdistrplus)

dtgamma <- function(x, shape, rate, low, upp)
{
  PU <- pgamma(upp, shape = shape, rate = rate)
  PL <- pgamma(low, shape = shape, rate = rate)
  dgamma(x, shape, rate) / (PU - PL) * (x >= low) * (x <= upp) 
}

ptgamma <- function(q, shape, rate, low, upp)
{
  PU <- pgamma(upp, shape = shape, rate = rate)
  PL <- pgamma(low, shape = shape, rate = rate)
  (pgamma(q, shape, rate) - PL) / (PU - PL) * (q >= low) * (q <= upp) + 1 * (q > upp)
}

set.seed(20230718)
n <- 10000
shape <- 11
rate <- 3
x0 <- 5
x <- rgamma(n, shape = shape, rate = rate)
x <- x[x > x0]

Old Approach

fit <- fitdist(
  data = x,
  distr = "tgamma",
  method = "mle",
  start = list(shape = shape, rate = rate),
  fix.arg = list(low = x0, upp = Inf),
  lower = c(0, 0))

> fit$estimate
   shape     rate 
9.230792 2.732394

Newly Suggested Approach

fit.NM.3P <- fitdist(
  data = x,
  distr = "tgamma",
  method = "mle",
  start = list(shape = shape, rate = rate, low = x0),
  fix.arg = list(upp = Inf),
  lower = c(0, 0, -Inf), upper=c(Inf, Inf, min(x)))

> coef(fit.NM.3P)
    shape      rate       low 
11.226122  3.059620  5.000174

For what it's worth, having a concrete example of fitting a truncated distribution in the FAQ would have proved quite helpful to me a couple of weeks back when I started experimenting with the fitdistrplus package.

dutangc · 2024-02-19T09:32:21Z

Issue added to the FAQ.

dutangc closed this as completed Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fitting truncated gamma distribution to simulated data yields unexpected result #21

Fitting truncated gamma distribution to simulated data yields unexpected result #21

paul-sheridan commented Jul 18, 2023

dutangc commented Aug 2, 2023

paul-sheridan commented Aug 2, 2023

dutangc commented Feb 19, 2024

Fitting truncated gamma distribution to simulated data yields unexpected result #21

Fitting truncated gamma distribution to simulated data yields unexpected result #21

Comments

paul-sheridan commented Jul 18, 2023

dutangc commented Aug 2, 2023

paul-sheridan commented Aug 2, 2023

Initial Setup

Old Approach

Newly Suggested Approach

dutangc commented Feb 19, 2024