# Problem 1 [Done]

Let a sequence of random variables converge in distribution to a constant. Does it converge in probability? If yes, prove it. If not, give an example.

$$x_n \leadsto x \Longrightarrow \mathbb{P}(|x_n - x| \geqslant \varepsilon) = 
\mathbb{P}(x_n \geqslant x + \varepsilon) + 
\mathbb{P}(x_n \leqslant x - \varepsilon) =$$
$$= 1 - \mathbb{P}(x_n < x + \varepsilon) + \mathbb{P}(x_n \leqslant x - \varepsilon) \to 
1 - \mathbb{P}(x < x + \varepsilon) + \mathbb{P}(x \leqslant x - \varepsilon) = 1 - 1 + 0 = 0 
\Longrightarrow x_n \to x$$

# Problem 2 [Done]

Let $x_1, x_2, \dots, x_n$ be a sequence of random variables such that $\mathbb{P}(x_n = 1/n) = 1 - 1/n$ and $\mathbb{P}(x_n = n) = 1/n$. 
Does $x_n$ converge in probability?

It can be seen that $\forall n > 1 / \varepsilon$ the definition of convergence in probability is satisfied

$$\mathbb{P}(|x_n| \geqslant \varepsilon) = \mathbb{P}(x_n \geqslant \varepsilon) = 1/n \to 0 \Longrightarrow x_n \to 0$$

# Problem 3 [Done]

Can a biased (for any sample size) estimator be consistent? If yes, give an example. If not, prove it.

Let $x_n$ be an estimator for $0$ such that $\mathbb{P}(x_n = 0) = 1 - 1/n$ and $\mathbb{P}(x_n = n) = 1/n$, then it is consistent

$$\mathbb{P}(|x_n| \geqslant \varepsilon) = \mathbb{P}(x_n \geqslant \varepsilon) = 1/n \to 0$$

but biased for any sample size

$$\forall n: \mathbb{E}x_n = 1 \neq 0$$

# Problem 4 [Done]

Consider the problem of estimation the mean, where noise r.v. are i.i.d., have zero mean and finite variance. 
* Which of the following estimators are asymptotically normal for any noise distribution?

Let us denote $\mathbb{E}x_n = \mu$. Then, sample mean is asymptotically normal estimator for sure
$$\mathbb{E}\overline{x}_n = \mu,\;
\mathbb{D}\overline{x}_n = \frac{\mathbb{D}x_n}{n} \Longrightarrow \sigma(\overline{x}_n) = \sqrt{\mathbb{D}x_n} < \infty$$
$$\frac{\overline{x}_n\sqrt{n}}{\sqrt{\mathbb{D}x_n}} \leadsto \text{N}_{0, 1}$$

Sample median is not asymptotically normal estimator of variance in case of shifted exponential distribution
$y = -1/\lambda + x$, where $x \sim \text{E}_{\lambda}$, since it is not even consistent (limits are not equal)

$$F_y^{-1}(1/2) = -1/\lambda + \frac{\log{2}}{\lambda} \neq \mathbb{E}y = 0$$

As for mean of the first and last order sample statistics, I just believe that there is also an example when such estimator is not 
asymptotically normal, but can not come up with it

* Which of them will have smaller variance if noise is normally distributed? Sample mean, since it is MLE

# Problem 5 [Done]

Consider the asymptotically normal estimator obtained from a large i.i.d. sample of size $n$. 
Imagine that you need to improve the precision of an estimator by approximately $5$ times. 
How many additional samples should you add to the dataset (as a function of $n$)?

An estimator $\theta_n$ is asymptotically normal if

$$\frac{\theta_n - \theta}{\sigma(\theta_n) / \sqrt{n}} \leadsto \text{N}_{0, 1}$$

for some $\sigma(\theta_n)$. Here the variance of such estimator is

$$\sigma^2(\theta_n)/n$$

In order to make the error $5$ times smaller we need to reduce variance by $25$ times adding $24n$ more samples

# Problem 6 [Done]

Consider confidence intervals for the parameter of a Bernoulli distribution. Compare the one derived based on Hoeffding inequality and the one based on asymptotic normality (see Lecture 4). Which one will you choose for the sample size n = 10. Explain your choice.

The length of normal interval is

$$2\cdot z_{1 - \alpha/2}\sqrt{\frac{p_n (1 - p_n)}{n}}$$

while the length of Hoeffding interval is

$$2\cdot \sqrt{\frac{\log(2 / \alpha)}{2n}}$$

We need to compare these values — after simplifying this we obtain

$$z_{1 - \alpha/2}^2 p_n (1 - p_n) \leqslant \frac{z_{1 - \alpha/2}^2}{4} \leqslant \frac{\log(2 / \alpha)}{2}$$

So the normal interval is preferable because it is shorter

In [1]:
from scipy import stats
import numpy as np

In [11]:
alphas = np.linspace(0.01, 0.99)
result = True

for alpha in alphas:
    result *= stats.norm.ppf(1 - alpha / 2) ** 2 / 4 <= np.log(2 / alpha) / 2

In [12]:
result

True

# Problem 7 [Done]

Consider a random sample of size $n$ from a very large population. The question is to find what proportion $p \in [0,1]$ of people in the population have a certain opinion (it was yes/no question). The proportion in the sample who have the opinion is $f = 1/3$. How large must $n$ be so that the width of the confidence interval is guaranteed to be no larger than $0.01$? You may use normal interval.

So we need to find $n$ such that

$$2\cdot z_{1 - \alpha/2}\sqrt{\frac{p_n (1 - p_n)}{n}} < 0.01$$

In [18]:
alpha = 0.05
length = 0.01
p = 1 / 3
n = p * (1 - p) / (length / 2 / stats.norm.ppf(1 - alpha / 2)) ** 2

In [19]:
np.ceil(n)

34147.0

# Problem 8 [Done]

Let $\theta_n$ be asymptotically normal MLE of $\theta$ for some parametrization $\Theta$. Suppose, $\tau$ is a function mapping $\Theta$ 
to some set $\Psi$.

* Is $\tau(\theta_n)$ is an asymptotically normal estimator of $\tau(\theta)$ if $\tau$ is a continously differentiable 
function? Yes, by theorem about delta-method

* Is $\tau(\theta_n)$ is an MLE of $\tau(\theta)$ if $\tau$ is a bijection? Yes, by theorem about equivariance of MLE

# Problem 9 [Done]

Imagine that you use bootstrap to estimate the variance of some statistic. It appears that an estimate of variance by bootstrap has high variance itself. What should you do to improve the quality of the interval? Why?

Maybe increase the number of bootstrap subsamples

# Problem 10 [Done]

Let $x_1, x_2, \dots, x_n$ be a sample from standard Cauchy distribution with location parameter $\theta$. Consider a hypothesis testing 
problem $H_0: \theta = \theta_0$ against alternative $H_1: \theta \neq \theta_0$. Which statistic should you base your criterion on? 
Suggest a criterion for testing a hypothesis. You may assume large sample size.

Since Cauchy distribution has no mean (the corresponding Lebesgue integral does not converge), we can use median. In such case, the 
following statistic has asymptotically standard normal distribution (here $f$ is density function of Cauchy distribution)

$$m_n = x_{[n/2]},\;\sigma^2(m_n) = \frac{1}{4nf(m)^2} \Longrightarrow T(x_1, x_2, \dots, x_n) = 
\frac{m_n - m}{\sigma(m_n)} \leadsto \text{N}_{0, 1}$$

<!-- $$\mathbb{P}\bigg[\theta \in C_n = \Big(m_n - z_{1 - \alpha/2}\sigma(m_n), m_n + z_{1 - \alpha/2}\sigma(m_n)\Big)\bigg] \to 1 - \alpha$$ -->

Thus, we reject $H_0$ with significance level $\alpha$ if observe $|T(x_1, x_2, \dots, x_n)| \geqslant z_{1 - \alpha/2}$

# Problem 11 [Done]

Prove that $p$-value is uniformly distributed under the null hypothesis. Assume that the null hypothesis consists only of one 
continuous distribution. Why large $p$-value can not be a measure of confidence in $H_0$?

We can express $p$-value as $1 - \gamma$, where $\gamma = F_T\Big[T(x_1, x_2, \dots, x_n)\Big]$ under the null hypothesis $T \sim F_T$, and use 
method of inverse transform

$$F_{\gamma}(x) = \mathbb{P}(\gamma < x) = \mathbb{P}\Big[F_T\Big[T(x_1, x_2, \dots, x_n)\Big] < x\Big] = 
\mathbb{P}\Big[T(x_1, x_2, \dots, x_n) < F_T^{-1}(x)\Big] = F_T\Big[F_T^{-1}(x)\Big] = x$$

Since $\gamma \sim \text{U}_{0, 1}$, the same is true for $p$-value $\sim F_p$

$$F_p(x) = \mathbb{P}(1 - \gamma < x) = 1 - \mathbb{P}(\gamma < 1 - x) = 1 - (1 - x) = x$$

Big $p$-values may not only support the null hypothesis but also can be caused by the lack of data. That is why $p$-value 
can not be the measure of confidence in $H_0$