# Preliminaries

## Packing, Covering, and Bracketing

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/junlong-feng/econ5280/main?filepath=ULLN_and_FCLT.ipynb)

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/junlong-feng/econ5280/main?urlpath=rstudio)

- Let $Q$ be a probability distribution. The $L_{r}$ norm w.r.t. $Q$ is defined as

$$
\|h\|_{Q,r}\equiv \left(\mathbb{E}_{Q}[\|h(X)\|^{r}]\right)^{1/r}.\label{eq.lqr}
$$

- Function $G(x)$ is an **envelope** of $g(x,\theta)$ if $\|g(x,\theta)\|\leq G(x)<\infty$ for all $x,\theta$ . 

**Packing number $D_{r}(\varepsilon,Q)$**: largest number of points $\theta_{j}$ such that $\theta_{j}\in\Theta$ and $d_{Q,r}(\theta_{j},\theta_{j'})>\varepsilon$, $\forall j\neq j'$.

- Uniform packing number: $D_{r}(\varepsilon)=\sup_{Q}D_{r}\left(\varepsilon\|G\|_{Q,r},Q\right)$.
- Normalized by the magnitude of $G$.

**Covering number $N_{r}(\varepsilon,Q)$**: minimal number of $L_{r}(Q)$-balls of radius $\varepsilon$ needed to cover $\Theta$.

- Entropy: the logarithm of $N_{r}(\varepsilon,Q)$.
- Uniform packing number: $N_{r}(\varepsilon)=\sup_{Q}N_{r}\left(\varepsilon\|G\|_{Q,r},Q\right)$.
- Normalized by the magnitude of $G$.
- Uniform covering integral (aka uniform entropy integral): $J_{r}(\delta)=\int_{0}^{\delta}\sqrt{\log N_{r}(\varepsilon)}d\varepsilon$.
- $J_{r}(\delta)<\infty$ if $N_{r}(\varepsilon)\leq O(\varepsilon^{-\rho})$ for some $0<\rho<\infty$.
- This rate is called **Euclidean** or **polynomial**.

**Bracketing**. If $l(x)\leq g(x,\theta)\leq u(x)$ for all $x,\theta$, then $[l,u]$ is a bracket of $g(x,\theta)$. 

- It is an $\varepsilon$-$L_{r}(Q)$-bracket if $\|l(x)-u(x)\|_{Q,r}\leq \varepsilon$. 
- The bracketing functions $l$ and $u$ need not to be in the function class of $g(x,\theta)$.
- A set of brackets **covers** $\Theta$ if for all $\theta\in\Theta$, $g(x,\theta)$ is covered by some bracket in the set.
- ==**IMPORTANT**==. Some authors (van der Vaart for instance) use the notion that the set of brackets covers $\mathcal{G}$, the famility of functions on $\mathcal{X}$ that $g(\cdot,\theta)$ is a member of. This is more rigorous I think. Here $\mathcal{G}$ is indexed by $\theta$ so I guess that is why Hansen uses this notation. This also applies to covering number.

**Bracketing number $N_{[\ ]}(\varepsilon,L_{r}(Q))$**. The minimum number of $\varepsilon$-$L_{r}(Q)$-brackets needed to cover $\Theta$. 

- Entropy with bracketing: the $\log N_{[\ ]}(\varepsilon,L_{r}(Q))$.
- Bracketing integral. $J_{[\ ]}(\delta,L_{r}(Q))=\int_{0}^{\delta}\sqrt{\log\left(N_{[\ ]}(\varepsilon,L_{r}(Q))\right)}d\varepsilon$.
- $J_{[\ ]}(\delta,L_{r}(Q))<\infty$ and $\lim_{\delta\to 0}J_{[\ ]}(\delta,L_{r}(Q))=0$ if $N_{[\ ]}(\varepsilon,L_{r}(Q))\leq O(\varepsilon^{-\rho})$ for some $0<\rho<\infty$.

**Relationship**. 	

- $N_{r}(\varepsilon)\leq D_{r}(\varepsilon)\leq N_{r}(\varepsilon/2)$.
- $N_{r}(\varepsilon,Q)\leq N_{[\ ]}(2\varepsilon,L_{r}(Q))$.

**Remark**. All these numbers depend on the function class $\mathcal{F}$ or $\Theta$. Omitted for simplicity. Made explicit in [Changing Classes](#Changing Classes).

## Stochastic Process

**Gaussian process**. Process $\{X_{t}\}$ is Gaussian if and only if for every finite set of indices $(t_{1},\ldots,t_{k})$, $(X_{t_{1}},\ldots,X_{t_{k}})$ is jointly normal.

**$P$-Brownian bridge**. A Gaussian process $\mathbb{G}_{P}$ with zero mean and covariance function as $Pfg-PfPg$ where $P$ is the expectation.

**Tightness**.

- A random vector is tight if for every $\varepsilon>0$, there exists $M>0$ such that $P(\|X\|>M)<\varepsilon$. **Any random vector is tight**.
- A set of random vectors $\{X_{\alpha}:\alpha\in A\}$ is uniformly tight if for every $\varepsilon>0$, there exists $M>0$ such that $\sup_{\alpha}P(\|X\|>M)<\varepsilon$.
- A sequence $X_{n}$ is asymptotically tight if for every $\varepsilon>0$, there exists a compact set $K$ with enlargement $K^{\delta}\equiv\{y:d(y,K)<\delta\}$ such that $\limsup_{n\to\infty}P^{*}(X_{n}\notin K^{\delta})<\varepsilon$, where $P^{*}$ is the outer probability (p.258 van der Vaart 1998).
  - If $X_{n}$ is Borel-measurable in $\mathbb{R}^{k}$, uniformly tightness is identical to asymptotically tightness. The latter is for general metric spaces.

**Prohorov's Theorem**. 

- If $X_{n}\to_{d}X$ for a tight $X$, then $\{X_{n}\}$ is uniformly tight if $X_{n}$ is Borel-measurable in $\mathbb{R}^{k}$, and is asymptotically tight and asymptotically measurable if $X_{n}$ is in some metric space.
- If $X_{n}$ is  uniformly tight (if $X_{n}$ is Borel-measurable in $\mathbb{R}^{k}$), or, is asymptotically tight and asymptotically measurable (if $X_{n}$ is in some metric space), then there exists a subsequence $X_{nj}\to_{d}X$ for some tight $X$.

## Asymptotic Equicontinuity 

A random function $S_{n}(\theta)$ is asymptotically equicontinuous with a metric $d(\theta_{1},\theta_{2})$ (default is Euclidean) if for all $\eta>0$ and $\varepsilon>0$ there exists some $\delta>0$ such that
$$
\limsup_{n\to\infty}P\left[\sup_{d(\theta_{1},\theta_{2})\leq \delta}\|S_{n}(\theta_{1})-S_{n}(\theta_{2})\|>\eta\right]\leq\varepsilon.
$$

- It's not necessary that $S_{n}$ is continuous. Consider the empirical distribution function. $1(X_{i}<\theta)$ is not continuous in $\theta$, but FCLT (see [Donsker](#Donsker)) only imposes (equi)continuity on i) the limit and ii) the normalized average.
- Some authors call asymptotic equicontinuity *stochastic equicontinuity*.
- An equivalent definition is that for any random sequences $\theta_{n}$ and $\theta_{n}'$ such that $\|\theta_{n}-\theta_{n}'\|\to_{p}0$, $\|S_{n}(\theta_{n})-S_{n}(\theta_{n}')\|\to_{p}0$.

## Miscellaneous

For an arbitrary probability space $(\Omega,\mathcal{A},P)$. Consider an arbitrary map $T:\Omega\mapsto \bar{\mathbb{R}}$. $T$ may not be a random variable because the map may not be measurable.

- **Outer expectation**: $\mathbb{E}^{*}T\equiv\inf\{\mathbb{E}U:U\text{ measurable},U\geq T,\mathbb{E}U\text{ exists}\}$.
  - **Inner expectation**: $\mathbb{E}_{*}T\equiv -\mathbb{E}^{*}(-T)$.
- **Outer probability**: For an arbitrary $B\subseteq\Omega$, $P^{*}(B)\equiv\inf\{P(A):B\subseteq A,A\in\mathcal{A}\}$.
- **Asymptotical measurability of $T_{n}$** : If $\mathbb{E}^{*}f(T_{n})-\mathbb{E}_{*}f(T_{n})\to 0$ for all bounded and continuous function $f$.

**Totally bounded space**: A metric space $(M,d)$ is totally bounded if and only if for every real number $\varepsilon>0$, there exists a finite cover such that the radius of each element of the cover is at most $\varepsilon$.

- Equivalent to the existence of a finite $\varepsilon$-net. 
- Equivalent to Cauchy-precompact: every sequence admits a Cauchy subsequence.
- Equivalent to a bounded space if the space is Euclidean.
- Implies boundedness.
- Not necessarily compact.
- Compact is and only if complete and totally bounded.

# Glivenko-Cantelli

#tags/glivenko-cantelli

**Definition**. $\mathcal{F}$ is $P$-Glivenko-Cantelli if $\sup_{f\in\mathcal{F}}|P_{n}f-Pf|\to_{p}0$ where $P_{n}f$ Is the expectation of $f$ under the empirical measure $P_{n}$ and $Pf$ Is the expectation under $P$. 

**Examples**. Let $f=g(X,\theta)$. Then we say $\bar{g}_{n}(\theta)\equiv \sum_{i}g(X_{i},\theta)/n$ is Glivenko-Cantelli if $\sup_{\theta\in\Theta}\|\bar{g}_{n}(\theta)-g(\theta)\|\to_{p}0$.

**Theorem (ULLN)**. $\bar{g}_{n}(\theta)$ is Glivenko-Cantelli if

- $X_{i}$ are i.i.d.
- $\mathbb{E}[G(X)]<\infty$.
- Any one of the following three holds:
  - For all $\varepsilon>0$, $N_{[\ ]}(\varepsilon,L_{1})<\infty$, or
  - For all $\varepsilon>0$, $N_{1}(\varepsilon)<\infty$, or
  - $g(X,\theta)$ is continuous in $\theta$ almost surely and $\Theta$ is compact.

The required continuity is mild; suppose $g(x,\theta)=1(x<\theta)$, then it is discontinuous at $\theta=x$ only, but this has probability 0 with a continuous $X$.

Another related and more familiar result for non-sample averages:

**Theorem (Generic Covergence)**. If $\Theta$ is compact, $g_{n}(\theta)\to_{p}0$ for all $\theta$, and $g_{n}$ is asymptotically equicontinuous, then $\sup_{\theta\in\Theta}\|g_{n}(\theta)-g(\theta)\|\to_{p}0$.

- See [Donsker](#Donsker) for sufficient conditions for asymptotic equicontinuity.

# Donsker

#tags/donsker

**Definition**. $\mathcal{F}$ is $P$-Donsker if sequence $\{\mathbb{G}_{n}f:f\in\mathcal{F}\}$ converges in distribution to a tight limit process in the space of $l^{\infty}(\mathcal{F})$ where $\mathbb{G}_{n}(\cdot)=\sqrt{n}\sum_{i}(\cdot)/n$ is an empirical process.

- The limit process is a $P$-Brownian bridge.
- Convergence in distribution is defined as $\mathbb{E}^{*}h\to\mathbb{E}h$ for all continuous (in sup-norm) and bounded function $h$.
- $\mathbb{E}^{*}$ is the outer-expectation as some random element may not be Borel-measurable. But the limit is required to be Borel-measurable.

**Example**. Let $\mathbb{G}_{n}f\equiv\sqrt{n}(\bar{g}_{n}(\theta)-g(X,\theta))\equiv\nu_{n}$. 

**Theorem (FCLT)**. $\nu_{n}\to_{d}\nu$ over $\Theta$ if and only if

- $(\nu_{n}(\theta_{1}),\ldots,\nu_{n}(\theta_{k}))\to_{d}(\nu(\theta_{1}),\ldots,\nu(\theta_{k})) $ for every finite set of $\theta_{1},\ldots,\theta_{k}\in\Theta$.
- There exists a finite partition $\Theta=\cup_{j=1}^{J}\Theta_{j}$ such that $\nu_{n}(\theta)$ is asymptotically equicontinuous over $\theta\in\Theta_{j}$ for $j=1,\ldots,J$.

## Sufficient Conditions for Asymptotic Equicontinuity

Any one of the following implies aymptotic equicontinuity:

1. $J_{[\ ]}(\delta,L_{2})<\infty$.

2. $J_{2}(\delta)<\infty$ and $\mathbb{E}^{*}(G(X)^{2})<\infty$.

**Sufficient conditions for 1**. Any one of the following holds:

a. $N_{[\ ]}(\varepsilon,L_{2})$ is polynomial rate.

b. *For all $\delta>0$* and $\theta_{1}\in\Theta$,
$$
\left(\mathbb{E}\left[\sup_{\|\theta-\theta_{1}\|<\delta}\left\|g(X,\theta)-g(X,\theta_{1})\right\|^{2}\right]\right)^{\frac{1}{2}}\leq C\delta^{\psi}
$$
*for some $C<\infty$ and $0<\psi<\infty$*. 

c. $g(x,\theta)$ is Lipschitz in $\theta$ with Lipschitz constant $B(x)$ such that $\mathbb{E}(B(X)^{2})<\infty$.

**Sufficient conditions for 2**. Any one of the following holds:

d. $N_{2}(\varepsilon)$ is polynomial rate.

e. c holds.

f. $g(x,\theta)=h(\theta'\psi(x))$ where $h$ has finite total variation.

g. $g$ is a combination of functions in e and f by addition, multiplication, minimum, maximum, and composition.

h. $\{g(\cdot,\theta):\theta\in\Theta\}$ is a Vapnik-Chervonenkis (VC) class.

- A collection $\mathcal{C}$ of measurable subsets $C\subseteq\mathcal{X}$ is a VC class if its index $V(\mathcal{C})$ is finite.
  - $V(\mathcal{C})$ is the minimal $n$ for which no set of size $n$: $\{x_{1},\ldots,x_{n}\}$ is shattered by $\mathcal{C}$.
  - $\mathcal{C}$ shatters  $\{x_{1},\ldots,x_{n}\}$ if $\mathcal{C}$ picks out each of its $2^{n}$ subsets.
  - $\mathcal{C}$ picks out a finite subset $A$ of $\{x_{1},\ldots,x_{n}\}$ if $A=\{x_{1},\ldots,x_{n}\}\cap  C$ for some $C\in\mathcal{C}$.
- A collection of functions $\mathcal{F}$ is a VC class of functions if the collection of all subgraphs $\{(x,t):f(x)\leq t\}$ forms a VC class of sets in $\mathcal{X}\times \mathbb{R}$ when $f$ ranges over $\mathcal{F}$.
- A collection of sets is a VC class of sets $C$ if and only if the collection of indication functions $1_{C}$ is a VC class of functions.
- More on VC class can be found on p.275 in van der Vaart (1998).

## Changing Classes

So far, $\mathcal{F}$ is assumed to be fixed in $n$. For instance, let $\mathcal{F}=\{1(X<\theta),\theta\in\Theta\}$ is a class of ==function of $X$== indexed by $\theta$. So for each $g(X_{i},\theta)=1(X_{i}<\theta)$, it is the same function of $X_{i}$ no matter how $i$ changes for the same $\theta$.

However, sometimes the collection of functions may change in $n$. For instance, in nonparametrics, $g(X_{i},\theta)=K([X_{i}-\theta]/h_{n})/h_{n}$. This function (as a function of $X_{i}$) changes with $n$ as $h_{n}$ is $n$-dependent.

**Theorem (FCLT with Changing Classes, Theorem 19.28 in van der Vaart (1998))**. Let $\mathcal{F}_{n}=\{f_{n}(\theta):\theta\in \Theta\}$ be a class of measurable functions indexed by a totally bounded semimetric space $(\Theta,\rho)$ with semimetric $\rho$ that satisfying 
$$
\sup_{\rho(\theta,\theta')<\delta_{n}}\mathbb{E}\left(f_{n}(\theta)-f_{n}(\theta')\right)^{2}\to 0,\forall\delta_{n}\to 0.
$$
Suppose the envelop function $F_{n}$ (a function of $X$) satisfying the Lindeberg condition:
$$
\mathbb{E}F_{n}^{2}=O(1),\\
\mathbb{E}F_{n}^{2}1\left(F_{n}>\varepsilon\sqrt{n}\right)\to 0,\forall\varepsilon>0.
$$
If $J_{[\ ]}(\delta_{n},\mathcal{F}_{n},L_{2}(P))\to 0$ for all $\delta_{n}\to 0$, or every $\mathcal{F}_{n}$ is suitably measurable and $J_{2}(\delta_{n},\mathcal{F}_{n})\to 0$ for all $\delta_{n}\to 0$, then $\{\mathbb{G}_{n}f_{n}(\theta):\theta\in\Theta\}$ converges in distribution to a tight Gaussian process, provided that the sequence of covariance functions $\mathbb{E}(f_{n}(\theta)f_{n}(\theta'))-\mathbb{E}(f_{n}(\theta))\mathbb{E}(f_{n}(\theta'))$ converges pointwise on $\Theta\times \Theta$.

In [None]:
a=pnorm(2,0,1)