In [2]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
from tabulate import tabulate

mpl.style.use("fivethirtyeight")

## 1.

(a) First, note that

$$
\left(n-1\right)S_{n}^{2}=\sum_{i}\left(X_{i}-\overline{X}\right)^{2}=\sum_{i}\left(X_{i}^{2}-2X_{i}\overline{X}+\overline{X}^{2}\right)=\left(\sum_{i}X_{i}^{2}\right)-n\overline{X}^{2}.
$$

Moreover,

$$
\overline{X}^{2}=\frac{1}{n^{2}}\left(\sum_{i}X_{i}^{2}+2\sum_{i<j}X_{i}X_{j}\right)
$$

and hence

$$
n\mathbb{E}\left[\overline{X}^{2}\right]=\mathbb{E}\left[X_{1}^{2}\right]+\left(n-1\right)\mathbb{E}X_{1}\mathbb{E}X_{2}=n\mu^{2}+\sigma^{2}.
$$

It follows that

$$
\left(n-1\right)\mathbb{E}\left[S_{n}^{2}\right]=n\mathbb{E}\left[X_{1}^{2}\right]-n\mathbb{E}\left[\overline{X}^{2}\right]=\left(n-1\right)\sigma^{2},
$$

as desired.

(b) By the above,

$$
S_{n}^{2}=\frac{n}{n-1}\left[\frac{1}{n}\left(\sum_{i}X_{i}^{2}\right)-\overline{X}^{2}\right].
$$

Then, using the weak law of large numbers (WLLN) and properties of pairs of sequence which convergence in probability, we obtain the desired result in a few steps:

1. By the WLLN, $\frac{1}{n}\sum_{i}X_{i}^{2}\xrightarrow{P}\mathbb{E}[X_{1}^{2}]=\mu^{2}+\sigma^{2}$ and $\overline{X}\xrightarrow{P}\mathbb{E}X_{1}=\mu$.
2. By Theorem 5.5(g), $\overline{X}^{2}\xrightarrow{P}\mu^{2}$.
3. By Theorem 5.5(a), $\frac{1}{n}\sum_{i}X_{i}^{2}-\overline{X}^{2}\xrightarrow{P}\sigma^{2}$.
4. Finally, by Theorem 5.5(d), $S_{n}^{2}\xrightarrow{P}\sigma^{2}$.

## 2.

Suppose $X_{n}$ converges to $b$ in quadratic mean. By Jensen's inequality

$$
\mathbb{E}\left[X_{n}-b\right]^{2}\leq\mathbb{E}\left[\left(X_{n}-b\right)^{2}\right]\rightarrow0.
$$

Therefore, $\mathbb{E}[X_{n}-b]\rightarrow0$ and hence $\mathbb{E}X_{n}\rightarrow b$.
Next, note that

$$
\mathbb{E}\left[\left(X_{n}-b\right)^{2}\right]=\mathbb{E}\left[X_{n}^{2}\right]-2b\mathbb{E}X_{n}+b^{2}=\mathbb{V}(X_{n})+\left(\mathbb{E}X_{n}\right)^{2}-2b\mathbb{E}X_{n}+b^{2}.
$$

Taking limits of both sides reveals that $\mathbb{V}(X_{n})\rightarrow0$.
As for the converse, we can apply the limits $\lim_{n}\mathbb{E}X_{n}=b$
and $\lim_{n}\mathbb{V}(X_{n})=0$ directly to the equation above.

## 3.

Since the expectation of $\overline{X}$ is $\mu$ and the variance of $\overline{X}$ converges to zero, the desired result is obtained by an application of our findings in Question 2.

## 4.

Let $\epsilon>0$. For $n$ sufficiently large,

$$
\mathbb{P}(\left|X_{n}-0\right|>\epsilon)=\mathbb{P}(X_{n}>\epsilon)=\mathbb{P}(X_{n}=n)=1/n^{2}\rightarrow0
$$

and hence $X_{n}$ converges in probability. However,

$$
\mathbb{E}\left[\left(X_{n}-0\right)^{2}\right]=\mathbb{E}\left[X_{n}^{2}\right]\geq\mathbb{E}\left[X_{n}^{2}I_{\{X_{n}=n\}}\right]=n^{2}\mathbb{P}(X_{n}=n)=1
$$

and hence $X_{n}$ does not converge in quadratic mean.

## 5.

It is sufficient to prove the second claim since convergence in quadratic
mean implies convergence in probability. Since the expectation of
$\overline{X}$ is $p$ and the variance of $\overline{X}$ converges
to zero, the desired result is obtained by an application of our findings
in Question 2.

## 6.

By the CLT,

$$
\mathbb{P}\biggl(\frac{X_{1}+\cdots+X_{100}}{100}\geq68\biggr)=\mathbb{P}\biggl(\underbrace{\frac{\sqrt{100}}{2.6}\left(\frac{X_{1}+\cdots+X_{100}}{100}-68\right)}_{\text{approximately }N(0,1)}\geq0\biggr)\approx0.5.
$$

## 7.

Let $f>0$ be an arbitrary function and $\epsilon>0$ be a constant.
Then,

$$
\mathbb{P}(\left|f(n)X_{n}-0\right|>\epsilon)=\mathbb{P}(X_{n}>\epsilon/f(n))\leq\mathbb{P}(X_{n}>0)=1-e^{-1/n}\rightarrow0.
$$

Take $f = 1$ for part (a) and $f = n$ for part (b).

## 8.

By the CLT,

$$
\mathbb{P}(Y<90)=\mathbb{P}\biggl(\underbrace{\frac{\sqrt{100}}{1}\left(\frac{Y}{100}-1\right)}_{\text{approximately }N(0,1)}<\frac{\sqrt{100}}{1}\left(\frac{90}{100}-1\right)\biggr)\approx\Phi(-1).
$$

We can compare this estimate to the true value below.

In [13]:
scipy.stats.norm.cdf(-1.)

0.15865525393145707

In [12]:
(np.random.poisson(lam=1.0, size=(100_000, 100)).sum(axis=1) < 90).mean()

0.14785

## 9.

Let $\epsilon>0$. Then,

$$
\mathbb{P}(\left|X_{n}-X\right|>\epsilon)\leq\mathbb{P}(X_{n}\neq X)=1/n\rightarrow0.
$$

Therefore, $X_{n}$ converges in probability (and hence in distribution) to $X$.
On the other hand, $X_{n}$ does not converge in quadratic mean since

$$
\mathbb{E}\left[\left(X_{n}-X\right)^{2}\right]=\mathbb{E}\left[\left(e^{n}-X\right)^{2}I_{\{X_{n}\neq X\}}\right]=\frac{1+e^{2n}}{n}\rightarrow\infty.
$$

## 10.

See Chapter 4 Question 6.

## 11.

First, note that $X$ is almost surely zero.
Let $\epsilon>0$ and $Z$ be a standard normal random variable. Then, by Chebyshev's inequality,

$$
\mathbb{P}(\left|X_{n}-X\right|>\epsilon)=\mathbb{P}(\left|X_{n}\right|>\epsilon)=\mathbb{P}(\left|Z\right|>\epsilon\sqrt{n})\leq\frac{\mathbb{E}\left[Z^{2}\right]}{\epsilon^{2}n}=\frac{1}{\epsilon^{2}n}\rightarrow0.
$$

Therefore, $X_{n}$ converges in probability (and hence in distribution) to zero.

## 12.

Let $F$ be the CDF of an integer valued random variable. Let $k$ be an integer.
It follows that $F(k)=F(k+\epsilon)$ whenever $0\leq \epsilon<1$.
We use this observation below.

Suppose $X_{n}\rightsquigarrow X$.
By definition, $F_{X_{n}}\rightarrow F_{X}$ at all points of continuity of $F_{X}$. Therefore,

$$
\mathbb{P}(X_{n}=k)=F_{X_{n}}(k+\epsilon)-F_{X_{n}}(k-\epsilon)\rightarrow F_{X}(k+\epsilon)-F_{X}(k-\epsilon)=\mathbb{P}(X=k).
$$

Conversely, if $\mathbb{P}(X_{n}=k)\rightarrow\mathbb{P}(X=k)$ for all integers $k$, note that
    
$$
F_{X_{n}}(x)=\sum_{k\leq x}\mathbb{P}(X_{n}=k)\rightarrow\sum_{k\leq x}\mathbb{P}(X=k)=F_{X}(x)
$$

and hence $X_{n}\rightsquigarrow X$ as desired.

## 13.

First, note that

$$
F_{X_{n}}(x)=\mathbb{P}(n\min\left\{ Z_{1},\ldots,Z_{n}\right\} \leq x)=1-\mathbb{P}(Z_{1}\geq x/n)^{n}.
$$

If $x\leq0$, $F_{X_{n}}(x)=0$. Otherwise,

$$
\mathbb{P}(Z_{1}\geq x/n)^{n}=\left(1-\mathbb{P}(Z_{1}\leq x/n)\right)^{n}=\left(1-\int_{0}^{x/n}f(z)dz\right)^{n}.
$$

The mean value theorem for integrals yields a $c_{n}$ between zero and $x/n$ such that

$$
\mathbb{P}(Z_{1}\geq x/n)^{n}=\left(1-f(c_{n})\frac{x}{n}\right)^{n}=\left(e^{-f(c_{n})x/n}+O(n^{-2})\right)^{n}\rightarrow e^{-\lambda x}.
$$

Therefore, $F_{X_{n}}(x)\rightarrow(1-e^{-\lambda x})I_{(0,\infty)}(x)$ and hence $X_{n}$ converges in distribution to an $\operatorname{Exp}(1/\lambda)$ random variable.

## 14.

By the CLT,

$$
\sqrt{n}\frac{\overline{X}-\mu}{\sigma}=\frac{\sqrt{n}}{1/\sqrt{12}}\left(\overline{X}-\frac{1}{2}\right)\rightsquigarrow N(0,1).
$$

Let $g(x)=x^{2}$ so that $g^{\prime}(x)=2x$. By the delta method,

$$
\sqrt{n}\frac{g(\overline{X})-g(\mu)}{\left|g^{\prime}(\mu)\right|\sigma}=\frac{\sqrt{n}}{1/\sqrt{12}}\left(\overline{X}^{2}-\frac{1}{4}\right)\rightsquigarrow N(0,1).
$$

## 15.

Define $g:\mathbb{R}^{2}\rightarrow\mathbb{R}$ by $g(x)=x_{1}/x_{2}$.
Then, $\nabla g(x)=(1/x_{2},-x_{1}/x_{2}^{2})^{\intercal}$. Define

$$
\nabla_{\mu}=\nabla g(\mu)=(1/\mu_{2},-\mu_{1}/\mu_{2}^{2})^{\intercal}
$$

for brevity. By the multivariate delta method,

$$
\sqrt{n}\left(\frac{\overline{X}_{1}}{\overline{X}_{2}}-\frac{\mu_{1}}{\mu_{2}}\right)\rightsquigarrow N(0,\nabla_{\mu}^{\intercal}\Sigma\nabla_{\mu})
$$

where

$$
\nabla_{\mu}^{\intercal}\Sigma\nabla_{\mu}=\frac{1}{\mu_{2}^{2}}\Sigma_{11}-2\frac{\mu_{1}}{\mu_{2}^{3}}\Sigma_{12}+\frac{\mu_{1}^{2}}{\mu_{2}^{4}}\Sigma_{22}.
$$

## 16.

Let $X_{n},X,Y\sim N(0,1)$ be IID and set $Y_{n}=X_{n}$. Trivially,
$X_{n}\rightsquigarrow X$ and $Y_{n}\rightsquigarrow Y$. However,
$\mathbb{V}(X_{n}+Y_{n})=\mathbb{V}(2X_{n})=4$ while $\mathbb{V}(X+Y)=\mathbb{V}(X)+\mathbb{V}(Y)=2$
and hence $X_{n}+Y_{n}$ does not converge in distribution to $X+Y$.