# Lecture 10.3: Application to Monte Carlo Estimation


In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.3     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.3     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.0.1     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



##  Monte Carlo Estimation

Vectorization can be particularly useful in Monte Carlo studies where we might otherwise be inclined to use explicit loops. We will look at some examples after an introduction to Monte Carlo estimates.

In statistics and data science we are often interested in computing expectations (means) of random outcomes of various types.
When analytic expectations are unavailable or cumbersome to compute, it can be useful to obtain Monte Carlo approximations by simulating a random process and then directly averaging the values of interest.

This works because the sample average is generally a good estimate of the corresponding expectation:

$$\tilde{\theta}_n = \sum_{i=1}^n X_i / n \rightarrow_p \theta = E(X)$$

In fact, assuming our data are independent and identically distributed (iid) from a distribution with finite variance, we can characterize the rate of convergence of a sample average to its population counterpart using the central limit theorem (CLT),

$$\sqrt{n} (\tilde{\theta}_n - \theta) \rightarrow_d N(0,\sigma^2) $$

where $σ^2=E[X^2]−E[X]^2$
 is the variance of the underlying distribution from which X
 is drawn. This can be useful for constructing approximate confidence intervals for the Monte Carlo error.




## Distribution functions

There are vectorized functions in `R` for simulating from many common distributions. Here are a few:

- `rnorm()` - Normal
- `runif()` - Uniform
- `rt()` - the t-distribution
- `rexp()` - Exponential
- `rpois()` - Poisson
Another useful function in R is `sample()` for sampling from a finite set of values, i.e. the discrete uniform distribution or any finite probability mass function.

As an aside, you should be aware that each of the distribution families above have corresponding `d*`, `p*`, and `q*` functions for computing densities, percentiles (CDF), or quantiles (inverse CDF) for each distribution.

When we call one of the `r*` functions to generate random draws from a distribution, R relies on a pseudo-random number generate to generate from `U(0,1)`
 and produce the results. Thus the outcome of these calls depends on the current state of the generator. It is sometimes desirable to reproduce exactly the same pseudo-random sequence. You can do this by fixing the random seed using set.seed() which takes an integer argument. The function `RNGkind()` can be used to display or set the random number generator.




## Basic Example 1

As a quick example, let’s use these functions to compute percentiles for t-distributions with various degrees of freedom. Let $θ_q$
 be the parameter of interest,
 
 $$\theta_q = F(q) = \int_{-\infty}^q f(x) dx = \int 1[x\le q] f(x)dx$$
 
 where $F(⋅)$ is the CDF and $f(⋅)$
 the PDF of a given $t$-distribution.
 

In this case, our Monte Carlo estimate of $(\theta_{-1.96},\theta_{1.96})$
 is $\bar{θ}=
 (0.0704, 0.9269)$. The actual values are $(\theta_{-1.96},\theta_{1.96})
 = (0.0724261, 0.9275739)$.

## Basic Example 2

Suppose we are interested in computing the following integral where  $\phi$ is the standard normal density function:

$$
\int_{-\infty}^{\infty} [\sin(x) - \cos(x)]^2 \phi (x) dx
$$

We can recast this as the expectation below,

$$
E[h(X)], \qquad h(x) = [\sin(x)-\cos(x)]^2, \qquad X\sim N(0,1).
$$

The following R code provides a Monte Carlo estimate,




Compare this to an estimate using numerical integration,



These values are fairly close to the analytic solution based on the identity $[\sin(x)−\cos(x)]2=1−\sin(2x)$
 and the symmetry about zero of both $\sin(⋅)$
 and $\phi(\cdot)$. Suppose $X \sim N(0,1)$
, then

\begin{equation}
\begin{split}
E[(\sin(X) - \cos(X))^2] & = E[1-\sin(2(X))] \\ 
&= 1- E[\sin(2X)] \\
&= 1-0 \\
&= 1
\end{split}
\end{equation}

## Simulation Study for Nominal Confidence Intervals
We will investigate the coverage probability of nominal 95% confidence intervals when the data does not come from a Normal (Gaussian) distribution.

We will assume the data come from an exponential distribution with mean one. The strategy here is to generate many (`mcrep`) data sets of size n.

For each data vector, we then calculate a nominal 95% confidence interval for the mean and check whether this interval contains the true value of one.

Since coverage is binary with a fixed probability p
, the number of intervals that cover one (“successes”) in our study is Binomial(`mcrep`, $p$
). We can use this fact to estimate the Monte Carlo error which represents the uncertainty in our estimate from the chosen number of replications.

In this case the estimated coverage is 0.93 (0.926, 0.936).

