Mahmood Ghaffarynia

mahmood.ghaffarynia@utdallas.edu

Computer Science Department

The goal of this work is to see how large n should be for the large-sample and the (parametric) bootstrap percentile method confidence intervals for the mean of an exponential population to be accurate. To be specific, let X1, . . . , Xn represent a random sample from an exponential (λ) distribution. This distribution is skewed and its mean is µ = 1/λ. We can construct two confidence intervals for µ — one the large-sample z-interval (interval 1) and the other a (parametric) bootstrap percentile method interval (interval 2). We would like to investigate their accuracy, i.e., how close their estimated coverage probabilities are to the assumed nominal level of confidence, for various combinations of (n, λ). This investigation will focus on 1 − α = 0.95, λ = 0.01, 0.1, 1, 10 and n = 5, 10, 30, 100. Thus, we have a total of 4 ∗ 4 = 16 combinations of (n, λ) to investigate.

In [1]:
%load_ext rpy2.ipython

Importing the boot library and determining the number of simulations and also Type I error.

In [2]:
%%R
library(boot)
nsim = 5000
alpha = 0.05

In [3]:
%%R
# Generates confidence intervals by both methods
conf.int = function(lambda, n, alpha) {
  # Generate data:
  x = rexp(n, rate = lambda)
  # Find large sample z-interval 
  z.ci = mean(x) + c(-1, 1) * qnorm(1-(alpha/2)) * (mean(x) / sqrt(n))
  # Find parametric bootstrap percentile method interval
  mean.par = function(x) {
    result = mean(x)
    return(result)
  }
  exp.gen = function(x, mle) {
    dat = rexp(n, 1/mle)
    return(dat)
  }
  mean.par.boot = boot(data = x, 
                       statistic = mean.par, 
                       # R = 999, 
                       R = 99, 
                       sim = "parametric", 
                       ran.gen = exp.gen,
                       mle = mean(x))
  # percentile.boot.ci = sort(mean.par.boot$t)[c(25, 975)]
  percentile.boot.ci = sort(mean.par.boot$t)[c(3, 97)]
  
  return (list(z.ci, percentile.boot.ci))
}

In [4]:
%%R
lambda = c(0.01, 0.1, 1, 10)
n = c(5, 10, 30, 100)
# Initialize matrices to store results
z.accuracy.mat = matrix(, length(lambda), length(n))
boot.accuracy.mat = matrix(, length(lambda), length(n))

# Part B: (repetition over all lambda and n)
for (val in lambda) {
  for (num in n) {
    # Part A:
    # Generate both confidence intervals for all 5000 replications:
    ci.mat = replicate(nsim, array( unlist(conf.int(val, num, alpha)), dim = c(2,1, 2)))
    # For Z method, check if true value of mu (1/lambda) is captured by CI,
    #    compute accuracy over all 5000 intervals
    z.ci.accuracy = mean(((1/val) >= ci.mat[1,1,1,]) * ((1/val) <= ci.mat[2,1,1,]))
    # Store the results in matrix for display
    z.accuracy.mat[which(val == lambda), which(num == n)] = z.ci.accuracy
    # For bootstrap method, check if true value of mu (1/lambda) is captured by CI,
    #    compute accuracy over all 5000 intervals
    boot.ci.accuracy = mean(((1/val) >= ci.mat[1,1,2,]) * ((1/val) <= ci.mat[2,1,2,]))
    # Store the results in matrix for display
    boot.accuracy.mat[which(val == lambda), which(num == n)] = boot.ci.accuracy
  }
}

In [5]:
%%R
# Display results
colnames(z.accuracy.mat) <- paste("(n:", n, ")")
colnames(boot.accuracy.mat) <- paste("(n:", n, ")")
rownames(z.accuracy.mat) <- paste("(lambda:", lambda, ")")
rownames(boot.accuracy.mat) <- paste("(lambda:", lambda, ")")
print("Large-Sample Z-Intervals")
print(z.accuracy.mat)
print("Parametric Bootstrap Percentile Method Intervals")
print(boot.accuracy.mat)

[1] "Large-Sample Z-Intervals"
                (n: 5 ) (n: 10 ) (n: 30 ) (n: 100 )
(lambda: 0.01 )  0.8684   0.9036   0.9312    0.9444
(lambda: 0.1 )   0.8718   0.9090   0.9334    0.9462
(lambda: 1 )     0.8664   0.9056   0.9256    0.9420
(lambda: 10 )    0.8632   0.9030   0.9322    0.9490
[1] "Parametric Bootstrap Percentile Method Intervals"
                (n: 5 ) (n: 10 ) (n: 30 ) (n: 100 )
(lambda: 0.01 )  0.8884   0.9138   0.9248    0.9388
(lambda: 0.1 )   0.8908   0.9152   0.9300    0.9424
(lambda: 1 )     0.8956   0.9156   0.9224    0.9348
(lambda: 10 )    0.8902   0.9158   0.9328    0.9388


Conclusion:

For the Large Sample z-interval method, we need 100 data points before
we are close to the confidence level of (1-α). For the Parametric Bootstrap Percentile Method Intervals, we are more likely to be close to our confidence level of (1-α) with only 30 data points. These results appear to be independent of the predetermined values for lambda.

The Parametric Bootstrap Percentile Method Intervals seem to be slightly more
accurate. It takes considerably longer to run, and so that accuracy increase comes with significant trade-offs. If the extra accuracy is needed, then bootstrapping the data is worthwhile.