###### Hypothesis Testing
## Exposition
In this section, I review the general gist of how to carry out hypothesis testing. This section will largely be inspired and collected from Wasserman's "All of Statistics" and Cox and Hinkley's "Theoretical Statistics". Read Wasserman Chapter 10, Keener Chapter 12, and Cox/Hinkley Chapters 3, 4, and 5 for more thorough exposition.\
\
A hypothesis test is basically a comparison of two competing hypotheses where we have amassed some data $X\in\mathbb{R}^{n\times p}$ where $n$ refers to our number of observations and $p$ refers to the variables we collect. We assume our $X$ evolves according to some distribution function $F:\mathbb{R}^p\rightarrow [0,1]$ where $F$ has parameters $(\theta_1,\dots,\theta_k)$. A **two-sided hypothesis test** compares a simple null hypothesis (a point estimate for the parameter) to the complement of that point: $H_0: \theta = (\psi_1,\dots,\psi_k)$ versus $H_1:\theta\neq (\psi_1,\dots,\psi_k)$. A **one-sided hypothesis test** compares two composite hypotheses where a composite hypothesis is one of the forms: $\theta\leq \psi$, $\theta_1\leq \psi_1,\dots,\theta_k\geq \psi_k$, or $\lVert \theta\rVert\leq \lVert \psi\rVert$ (comparing intervals). In both cases, we partition the parameter space entirely so that the union of the null and alternate hypotheses gives back the entire space. We could also for instance compare two simple hypotheses.

Let $t$ be a function of the observations $X=(x_1,\dots,x_n)$ and let $T=t(X)$ be the corresponding random variable for the test statistic (note a function of a random variable is in itself a random variable). We call $T$ a **test statistic** for testing $H_0$ if 1) the distribution of $T$ when $H_0$ is true is known and 2) larger $t$ corresponds to stronger evidence for rejecting $H_0$. Once we obtain a suitable test statistic, the next goal is to find a **critical region** $S$ such that we reject the null if $X\in S$ and retain the null otherwise. This is a very challenging task as how do we numerically assign values to the boundaries of what is usual versus what isn't. Furthermore, this has given rise to very subjective interpretations of what is important - see the flood of articles on why p-values should not be a benchmark for publishing. Once we specify a critical region, we now define the performance of a hypothesis test ($H_0$ vs. $H_1$) by the power function $\beta$ which elucidates the chance of rejecting the null as a function of the parameter $\theta$. That is, the **power function** is defined by  $\beta(\theta)=\mathbb{P}(X\in S|\theta)$. Ideally, we would like our power function to yield 0 if $\theta$ is under the null hypothesis (i.e. not rejecting the null when the parameter is actually contained in the null) and 1 if $\theta$ is contained in the alternate hypothesis. Most of the times in applications, we think of the null as the status quo, or what we would normally expect - innocent until proven otherwise. With this in mind, we are interested in tests where there is the smallest chance of error that we reject the null when it turns out true. This can formaly be quantified as the significance level of the test which is the suprememum of the power function over all $\theta$ under the null. In symbols, 

$$\alpha=\sup\limits_{\theta\in \theta_{H_1}}\beta(\theta).$$ 

In non-randomized tests, we typically reject the null if the data is in some critical region $R$ and accept otherwise. In randomized tests, we have a test/critical function $\phi:X\rightarrow [0,1]$ which represents the chance of rejecting the null given the data. A non-randomized test can be seen as a randomized test in which $\phi$ is the indicator function that takes value 1 if it lies in the critical region and 0 otherwise. Observe that 

$$\beta(\theta)=\mathbb{P}(\text{Reject} H_0|\theta)=\mathbb{E}(\mathbb{P}(\text{Reject} H_0|\theta,X)|\theta)=\mathbb{E}(\phi(X)|\theta,X).$$
\
\
Testing two simple hypotheses: Let us consider two simple hypotheses. That is, let us consider the mean age of atheletes on a baseball team: $H_0:\mu=25$ versus $H_1:\mu=30$. This is a poorly formulated hypothesis, but just stick with me okay... As we just have two simple hypotheses, let $L(X;\mu=25)$ and $L(X;\mu=30)$ denote the likelihoods of the data under the two parameters. With only two distributions, the power function of the test has two values $\alpha=\beta(25)$ and $\beta(30)$ where 

$$\beta(25)=\mathbb{E}(\phi(X)|\mu=25)=\int \phi(x)L(x;25)d\mu(x)=\sum \phi(x)L(x;25).$$

Our goal is to develop a $\phi$ that minimizes the above while maximizing $\beta(30)$. We can do so with the help of Lagrange multipliers. If we suppose that $\phi$ has level $\alpha$ (the size or supremum of the power function under the null is bounded above by $\alpha$), then 

$$\mathbb{E}(\phi(X)|\mu=30)\leq \mathbb{E}(\phi(X)|\mu=30)-\lambda\mathbb{E}(\phi(X)|\mu=25)+\lambda\alpha.$$

Our objective is to maximize the above by picking the right $\phi$. As $\lambda\alpha$ is a constant, our objective function can be minimized as follows

$$\mathbb{E}(\phi(X)|\mu=30)-\lambda\mathbb{E}(\phi(X)|\mu=25)=\int L(x|\mu=30)\phi(x)d\mu(x)-k\int L(x|\mu=25)\phi(x)d\mu(x)$$

$$=\int \left[L(x|\mu=30)-\lambda L(x|\mu=25)\right]\phi(x)d\mu(x).$$

Clearly from here, we can partition the integral into two cases - one where $L(x|\mu=30)>\lambda L(x|\mu=25)$ and another for the reverse direction. Denote $S_1$ as the first region of $x$s satisfying the condition and $S_2$ as the complement condition. Then the above becomes 

$$\int_{S_1} \left[L(x|\mu=30)-\lambda L(x|\mu=25)\right]\phi(x)d\mu(x)+\int_{S_2}\left[\lambda L(x|\mu=25)-\lambda L(x|\mu=30)\right]\phi(x)d\mu(x).$$

The choice of $\phi$ from here is obvious - we simply take $\phi(x)=1$ for $x\in S_1$ and $0$ for $x\in S_2$. What does this result tell us? Well, if 

$$\frac{L(X|\mu=30)}{L(X|\mu=25)}\geq \lambda,$$

then $\phi$ is one and so we reject the null. Any test of this form where we end up characterizing the rejection or acceptance of the null based on likelihoods is more broadly understood as **likelihood ratio tests**. These kinds of tests are very powerful and an important result stemming from the above discussion is known as the **Neyman-Pearson lemma** which states that given a level $\alpha\in(0,1)$, there exists a likelihood ratio test $\phi_\alpha$ with level $\alpha$, and any likelihood ratio test with level $\alpha$ maximizes the power of the alternate ($\mathbb{E}(\phi(X)|\theta_{H_1})$) among all tests with level at most $\alpha$.\
\
Generally, for hypothesis testing the procedure is as follows: 
1. Identify the parameter of interest. 
2. State the null hypothesis, $H_0$.
3. Specify an appropriate alternative hypothesis, $H_1$.
4. Choose a significance level, $\alpha$.
5. Determine an appropriate test statistic of the data, $T(X)$.
6. State the rejection region for the statistic.
7. Use sample quantities, substitute these into the test statistic, and compute the value.
8. Decide whether or not $H_0$ should be rejected and report that in the problem context.

## Tests
### Wald Test
1.  


Univariate testing: 

Multivariable testing: As a simplistic example, we condense the space to just two variables for now - sugar levels in the US and sugar levels in Europe for college students. Our interest is in understanding if any difference of sugar levels of college students in the US against sugar levels of college students in Europe exists. We have two candidate hypotheses we would like to consider: the sugar levels are the same - $\theta_0=\theta_1$ versus $\theta_0\neq \theta_1$.

## Defintion Examples
Test Statistic:
- asdf
- asdf
- asdf

Critical Region:
- asdf
- asdf
- asdf

Power Function:
- asdf
- asdf
- asdf

P-value:
- asdf
- asdf
- asdf

Likelihood Ratio Test:
- asdf
- asdf
- asdf




## Problems
1) Consider random variables $Y_1,\dots,Y_n$ representing directions round a circle, i.e. angles between $0$ and $2\pi$. The null hypothesis is that the random variables are independently uniformly distributed. Show that a suitable test statistic for detecting clustering around $0$ is $\sum\limits_{i=1}^n Y_i$ and that it can be interpreted as the abscissa of a random walk in the plane with steps of unit length. Suggest test statistics when clustering may be around 0 and $\pi$, and also when clustering may be around an unknown direction. For both cases, obtain a normal approximation to the distribution under the null hypothesis of a uniform distribution of angles around the circle.\
\
2)



## Answers
1)
