# Confidence Sets

In this notebook we provide an estimation method for the confidence intervals of the lower/upper bounds in CHH(2020). We follow the estimation framework in CCT(2018).

## 1. Moments

To simplify notation we let $\lambda(z) = (\lambda_1(z), ... , \lambda_d(z), \nu(z))^\prime$ and $\lambda_{d+1}(z) = \nu(z)$. Denote $\alpha=(\mu, \lambda, v)$ and $\alpha^*=(\mu^*,\lambda^*, v^*)$. Let 

$$
\rho(X_1, \alpha, \theta) = \begin{bmatrix} f(X_1, \theta) N_1(\lambda_0, v_1) \\ N_1(\lambda_0, v_1)-1 \\ F(X_1, \lambda_0, v_1, \theta) -\mu\end{bmatrix}
$$

where 

$$
F(X_1, \lambda_0, v_1, \theta) = -\xi\exp(\lambda_0\cdot f(X_1) + \nu_0 - \frac{1}{\xi}(g(X_1)+v(Z_1)) - 1) - v(Z_0) + \xi\nu_0
$$

For relative entropy divergence the problem slightly simplifies as $\nu^*$ could be solved out in other parameters. Let $\lambda=(\lambda_1, ..., \lambda_d)^\prime.$ Let $\alpha=(\mu, \lambda, v)$ and $\alpha^*=(\mu^*, \lambda^*, v^*)$. Let

$$
\rho(X_1, \alpha, \theta) = \begin{bmatrix}f(X_1, \theta)\exp\left[-\frac{g(X_1,\theta)+v(Z_1)}{\xi} + \lambda(Z_0)\cdot f(X_1,\theta)\right] \\ \exp\left[-\frac{g(X_1,\theta)+v(Z_1)-v(Z_0)}{\xi} + \lambda(Z_0)\cdot f(X_1,\theta)\right] - \exp\left[-\frac{\mu}{\xi}\right]\end{bmatrix}
$$

or 

$$
\rho(X_1, \alpha, \theta) = \begin{bmatrix}f(X_1, \theta)\exp\left[-\frac{g(X_1,\theta)+v(Z_1)-v(Z_0)}{\xi} + \lambda(Z_0)\cdot f(X_1,\theta)\right] \\ \exp\left[-\frac{g(X_1,\theta)+v(Z_1)-v(Z_0)}{\xi} + \lambda(Z_0)\cdot f(X_1,\theta)\right] - \exp\left[-\frac{\mu}{\xi}\right]\end{bmatrix}
$$

## 2. Profile Sieve Estimation With Fixed $\theta$

### 2.1 Optimally-weighted or continuously-updated sieve GMM estimation with fixed $\theta$

Following Hansen, Heaton and Yaron (1996) we define the continuously updated sieve GMM criterion function: $\max_{\alpha\in \mathcal{A}_t} L_T(\alpha)$ with

$$
L_T(\alpha) = -0.5\left[\frac{1}{T}\sum_{t=1}^T P^{K_T}(Z_{t-1})\otimes \rho(X_t, \alpha, \theta)\right]^\prime \left[\hat{W}(\alpha,\theta)\right]^- \left[\frac{1}{T}\sum_{t=1}^T P^{K_T}(Z_{t-1})\otimes \rho (X_t, \alpha, \theta)\right]
$$

where $P^{K_T}(Z_{t-1})$ is $K_T$-dimensional vector of known basis functions (such as splines), and for each $\alpha \in \mathcal{A}_T$, $\hat{W}(\alpha,\theta)$ is a consistent estimator of $W(\alpha, \theta)$:

$$
W(\alpha, \theta) = \lim_T \text{Var} \left(\frac{1}{\sqrt{T}}\sum_{t=1}^T P^{K_T}(Z_{t-1})\otimes \rho(X_t, \alpha, \theta)\right)
$$

In particular,
$$
W(\alpha^*, \theta) = E[P^{K_T}(Z_{t-1})\otimes \Omega(Z_{t-1}, \alpha^*, \theta)\otimes P^{K_T}(Z_{t-1})^\prime]
$$


## 3. Monte Carlo Confidence Sets

### 3.1 Confidence sets for the identified set $\theta_I$

Here we seek a $100\alpha \%$ CS $\hat{\Theta}_\alpha$ for $\Theta_I$ using $L_n(\theta)$ that has asymptotically exact coverage, i.e.:

$$
\lim_{n\to \infty} \mathbb{P}(\Theta_I \subseteq \hat{\Theta}_\alpha) = \alpha
$$

1. Draw a sample $\{\theta^1,...,\theta^B\}$ from the quasi-posterior distribution $\Pi_n := \frac{\exp(n L_n(\theta)) \Pi(\theta)}{\int_{\Theta}\exp(n L_n(\theta)\Pi(\theta))}$

2. Calculate the $(1-\alpha)$ quantile of $\{L_n(\theta^1),...,L_n(\theta^B)\}$; call it $\zeta_{n,\alpha}^{mc}$.

3. Our $100\alpha\%$ confidence set for $\Theta_I$ is then:

$$
\hat{\Theta}_\alpha = \{\theta \in \Theta: L_n(\theta)\geq \zeta_{n,\alpha}^{mc}\}
$$

### 3.2 Confidence sets for the identified set $M_I$ of subvectors

Here we seek a $100\alpha \%$ CS $\hat{M}_\alpha$ for $M_I$ using $L_n(\theta)$ that has asymptotically exact coverage, i.e.:

$$
\lim_{n\to \infty} \mathbb{P}(M_I \subseteq \hat{M}_\alpha) = \alpha
$$

#### 3.2.1 Projection
A well-known method to construct a CS for $M_I$ is based on projection, which maps a CS $\hat{\Theta}_\alpha$ for $\Theta_I$ into one for $M_I$. The projection CS:

$$
\hat{M}_\alpha^{proj} = \{\mu: (\mu, \eta) \in \hat{\Theta}_\alpha \text{ for some } \eta \}
$$

is a valid $100\alpha\%$ CS for $M_I$ whenever $\hat{\Theta}_\alpha$ is a valid $100\alpha\%$ CS for $\Theta_I$. As is well documented, $\hat{M}_\alpha^{proj}$ is typically conservative, and especially so when the dimension of $\mu$ is small relative to the dimension of $\theta$.

#### 3.2.2 Profile criterion
We propose CSs for $M_I$ based on a profile criterion for $M_I$. Let $M = \{\mu: (\mu, \eta) \in \Theta \text{ for some } \eta \}$ and $H_\mu = \{\eta: (\mu, \eta) \in \Theta\}$. The profile criterion for a point $\mu \in M$ is $\sup_{\eta \in H_\mu}L_n(\mu, \eta)$, and the profile criterion for $M_I$ is

$$
PL_n(M_I) = \inf_{\mu \in M_I} \sup_{\eta\in H_\mu}L_n(\mu, \eta)
$$

Let $\Delta(\theta^b)$ be an equivalence set for $\theta^b$. In moment-based models we define $\Delta(\theta^b)=\{\theta\in\Theta:E[\rho(X_i, \theta)]=E[\rho(X_i, \theta^b)]\}$. Let $M(\theta^b)=\{\mu:(\mu, \eta)\in \Delta(\theta^b) \text{ for some }\eta \}$ and the profile criterion for $M(\theta^b)$ is

$$
PL_n(M(\theta^b)) = \inf_{\mu\in M(\theta^b)} \sup_{\eta\in H_\mu} L_n(\mu, \eta)
$$

1. Draw a sample $\{\theta^1, ..., \theta^B\}$ from the quasi-posterior distribution $\Pi_n$.

2. Calculate the $(1-\alpha)$ quantile of $\{PL_n(M(\theta^b)): b=1,...,B\}$; call it $\zeta_{n,\alpha}^{mc,p}$.

3. Our $100\alpha\%$ confidence set for $M_I$ is then:
$$
\hat{M}_\alpha = \{\mu\in M: \sup_{\eta\in H_\mu}L_n(\mu, \eta)\geq \zeta_{n,\alpha}^{mc,p}\}
$$

### 3.3 Adaptive Sequential Monte Carlo Algorithm

Let $J$ and $K$ be positive integers and let $\phi_1,...,\phi_J$ be an increasing sequence with $\phi_1=0$ and $\phi_J=1$. Set $w_1^b=1$ for $b=1,...,B$ and draw $\theta_1^1, ..., \theta_1^B$ from the prior $\Pi(\theta)$. 

For $j=2,...,J$, do:

1. Correction: Let $v_j^b = \exp((\phi_j-\phi_{j-1})nL_n(\theta^b_{j-1}))$ and $w_j^b = (v_j^bw_{j-1}^b)/(\frac{1}{B}\sum_{b=1}^Bv_j^bw_{j-1}^b)$.

2. Selection: Compute the effective sample size $ESS_j = B/(\frac{1}{B}\sum_{b=1}^B(w_j^b)^2)$. Then:
    - If $ESS_j>\frac{B}{2}$: set $\ell_j^b = \theta_{j-1}^b$ for $b=1,...,B$; or
    - If $ESS_j\leq \frac{B}{2}$: draw an i.i.d. sample $\ell_j^1,...,\ell_j^B$ from the multinomial distribution with support $\theta_{j-1}^1,...,\theta_{j-1}^B$ and weights $w_j^1,...,w_j^B$, then set $w_j^b=1$ for $b=1,...,B$.


3. Mutation: Run $B$ separate and independent MCMC chains of length $K$ using the random-walk Metropolis-Hastings algorithm initialized at each $\ell_j^b$ for the tempered quasi-posterior $\Pi_j(\theta|\mathbf{X}_n)\propto \exp(\phi_j n L_n(\theta))\Pi(\theta)$ and let $\theta_j^b$ be the final draw of the $b$th chain.

The resulting sample is $\theta^b = \theta^b_J$ for $b=1,...,B$. Multinomial resampling (step 2) and the $B$ independent MCMC chains (step 3) can both be computed in parallel, so the additional computational time relative to conventional MCMC methods is modest.

#### 3.3.1 Metropolis-Hastings

Suppose $q$ is the proposal distribution, $\pi$ is the desired joint distribution. 

Initialize $x_0$. for iteration $i = 1,2,...$ do:
1. Propose: $x^{cand} \sim q\left(x^{(i)}|x^{(i-1)}\right)$
2. Acceptance Probability:
    $$
    \alpha(x^{(cand)}|x^{(i-1)}) = \min \{1, \frac{q\left(x^{(i-1)}|x^{cand}\right)\pi(x^{cand})}{q\left(x^{cand}|x^{(i-1)}\right)\pi(x^{(i-1)})}\}
    $$
3. Draw $u$ from $\text{Uniform }(0,1)$;
4. If $u<\alpha$, then accept the proposal: $x^{(i)} \leftarrow x^{cand}$; Else, reject the proposal: $x^{(i)} \leftarrow x^{(i-1)}$

Note 1: In the applicaton, we use a random walk chain, i.e. $x^{(i)}=x^{(i-1)}+\epsilon$, where $\epsilon$ is a multivariate normal distribution with mean $0$ and covariance matrix $\Sigma$.

Note 2: We can initialize $x_0$ from an arbitrary distribution, for example, the prior.