**Hypothesis Tests**

A very common problem that scientists face is the **assessment of significance** in scattered statistical data. Owing to the limited availability of observational data, scientists apply inferential statistical methods to decide if the observed data contains significant information or if the scattered data is nothing more than the manifestation of the inherently probabilistic nature of the data generation process.

Generally speaking, a scientist states such a problems as follows. The scientist builds a model, which is just a simplification of the data generation process, and considers a particular assumption - a so called **hypothesis** - of this model. Given data, he wants to evaluate this tentative hypothesis.

The framework of hypothesis testing is all about making **statistical inferences about populations based on samples taken from the population**. One way to estimate a population parameter is the construction of confidence intervals. Another way is to make a decision about a parameter in form of a test. Any hypothesis test involves the collection of data (sampling). If the hypothesis is assumed to be correct, the scientist can calculate the expected results of an experiment. If the observed data differs significantly from the expected results, then one considers the assumption to be incorrect. Thus, based on the observed data the scientist makes a decision as to whether or not there is sufficient evidence, based upon analyses of the data, that the model - the hypothesis - should be rejected, or that there is not sufficient evidence to reject the stated hypothesis.

**Introduction to Hypothesis Testing**

**Inferential statistics** is all about making decisions or judgments about the value of a particular observation or measurement. One of the most commonly used methods for making such decisions is to perform a **hypothesis test**. A **hypothesis** is a proposed explanation for a phenomenon. In the context of statistical hypothesis tests the term hypothesis is a statement about something that is supposed to be true.

A hypothesis test involves two hypothesis: the **null hypothesis** and the **alternative hypothesis**. The null hypothesis $(H_0)$ is a statement to be tested. The alternative hypothesis $(H_A)$ is a statement that is considered to be an alternative to the null hypothesis.

The hypothesis test is aimed to test if the null hypothesis should be rejected in favor of the alternative hypothesis. The basic logic of a hypothesis test is to compare two statistical data sets. One data set is obtained by sampling and the other data set originates from an idealized model. If the sample data is consistent with the idealized model, the null hypothesis is not rejected; if the sample data is inconsistent with the idealized model and thus, supports an alternative hypothesis, the null hypothesis is rejected in favor of the alternative hypothesis.

The criterion for deciding whether to reject the null hypothesis involves a so called **test statistic**. The test statistic is a number calculated from the data set, which is obtained by measurements and observations, or more general by sampling.



**Hypothesis Formulation**

Any hypothesis test starts with the formulation of the null hypothesis and the alternative hypothesis. This section focuses on hypothesis tests for one population mean, $\mu$, however, the general procedure applies to any hypothesis test.

The null hypothesis for a hypothesis test concerning a population mean, $\mu$, is expressed

$$H_0 : \mu = \mu_0$$,

where $\mu_0$ is some number.

The formulation of the alternative hypothesis depends on the purpose of the hypothesis test. There are three ways to formulate an alternative hypothesis.

If the hypothesis test is about deciding, whether a population mean, $\mu$, is different from the specified value $\mu_0$, the alternative hypothesis is expressed as

$$H_A : \mu \neq \mu_0.$$

Such a hypothesis test is called **two-sided test**.

If the hypothesis test is about deciding, whether a population mean, $\mu$, is less than the specified value $\mu_0$, the alternative hypothesis is expressed as

$$H_A : \mu < \mu_0.$$

Such a hypothesis test is called **left tailed test.**

If the hypothesis test is about deciding, whether a population mean, $\mu$, is greater than a specified value $\mu_0$, the alternative hypothesis is expressed as

$$H_A : \mu > \mu_0.$$

Such a hypothesis test is called **right tailed test.**

Note that a hypothesis test is called **one tailed** test if it is either left tailed or right tailed.

\begin{array}{|l|c|c|}
\hline
 & \text{Two-sided test} &   \text{Left-tailed test}  & \text{Right-tailed test}  \\
\hline
\text{Sign in } H_A & \ne & < & >  \\
\text{Rejection region } & \text{Both sides} & \text{Left side} & \text{Right side} \\
\hline 
\end{array}

**Type I Error, Type II Error and Significance Level**

Any decision made based on a hypothesis test may be incorrect. In the framework of hypothesis tests there are two types of errors: **Type I error** and **type II error**. A type I error occurs if a true null hypothesis is rejected (a “**false positive**”), while a type II error occurs if a false null hypothesis is not rejected (a “**false negative**”). In other words, a type I error is detecting an effect that is not present, while a type II error is failing to detect an effect that is present.

\begin{array}{l|ll}
 & H_0 \text{ is true} & H_0 \text{ is false}  \\
\hline
\text{Do not reject } H_0 & \text{Correct decision} & \text{Type II error}  \\
\text{Reject } H_0 & \text{Type I error} & \text{Correct decision} \\
\hline 
\end{array}


![type1_type2](type1_type2.jpg)

Conducting  a hypothesis test always implies that there is a chance of making an incorrect decision. The probability of the type I error (a true null hypothesis is rejected) is commonly called the **significance level** of the hypothesis test and is denoted by $\alpha$. The probability of a type II error (a false null hypothesis is not rejected) is denoted by $\beta$. Keep in mind that for a fixed sample size, the smaller we specify the significance level, $\alpha$, the larger will be the probability $\beta$, of not rejecting a false null hypothesis.

The outcome of a hypothesis test is a statement in favor of the null hypothesis or in favor of the alternative hypothesis. If the null hypothesis is rejected, the data does provide sufficient evidence to support the alternative hypothesis. If the null hypothesis is not rejected, the data does not provide sufficient evidence to support the alternative hypothesis. If the hypothesis test is performed at the significance level $\alpha$, one may state that the test results are **statistically significant at the $\alpha$ level**. When the null hypothesis is not rejected at the significance level $\alpha$, one may state that the test results are **not statistically significant at the $\alpha$ level**.

**The Critical Value and the p-Value Approach to Hypothesis Testing**

In order to make a decision whether to reject the null hypothesis a **test statistic** is calculated. The decision is made on the basis of the numerical value of the test statistic. There are two approaches how to derive at that decision: The **critical value approach** and the **p-value approach**.

**The critical value approach**

By applying the **critical value approach** it is determined, whether or not the observed test statistic is more extreme than a defined critical value. Therefore the observed test statistic (calculated on the basis of sample data) is compared to the critical value, some kind of cutoff value. If the test statistic is more extreme than the critical value, the null hypothesis is rejected. If the test statistic is not as extreme as the critical value, the null hypothesis is not rejected. The critical value is computed based on the given significance level $\alpha$ and the type of probability distribution of the idealized model. The critical value divides the area under the probability distribution curve in **rejection region(s)** and in **non-rejection region**.

The following three figures show a right tailed test, a left tailed tests, and a two-sided test. The idealized model in the figures, and thus $H_0$, is described by a bell-shaped normal probability curve.

In a **two-sided test** the null hypothesis is rejected if the test statistic is either too small or too large. Thus the rejection region for such a test consists of two parts: one on the left and one on the right.

![two_sided_critical](two_sided_critical.png)

For a **left-tailed test**, the null hypothesis is rejected if the test statistic is too small. Thus, the rejection region for such a test consists of one part, which is left from the center.

![left_critical](left_critical.png)

For a **right-tailed test**, the null hypothesis is rejected if the test statistic is too large. Thus, the rejection region for such a test consists of one part, which is right from the center.

![right_critical](right_critical.png)

**The $p$-value approach**

For the $p$-value approach, the likelihood ($p$-value) of the numerical value of the test statistic is compared to the specified significance level $(\alpha)$ of the hypothesis test.

The $p$-value corresponds to the probability of observing sample data at least as extreme as the actually obtained test statistic. Small $p$-values provided evidence against the null hypothesis. The smaller (closer to $0$) the $p$-value, the stronger is the evidence against the null hypothesis.

If the $p$-value is less than or equal to the specified significance level $\alpha$, the null hypothesis is rejected; otherwise, the null hypothesis is not rejected. In other words, if $p \leq \alpha$, reject $H_0$; otherwise, if $p > \alpha$ do not reject $H_0$.

n consequence, by knowing the $p$-value any desired level of significance may be assessed. For example, if the $p$-value of a hypothesis test is $0.01$, the null hypothesis can be rejected at any significance level larger than or equal to $0.01$. It is not rejected at any significance level smaller than $0.01$. Thus, the $p$-value is commonly used to evaluate the strength of the evidence against the null hypothesis without reference to significance level.

The following table provides guidelines for using the $p$-value to assess the evidence against the null hypothesis

\begin{array}{l|l}
\hline
 \text{p-value} & \text{Evidence against } H_0   \\
\hline
\ p > 0.10 &  \text{Weak or no evidence}   \\
\ 0.05 < p \le 0.10 & \text{Moderate evidence}    \\
\ 0.01 < p \le 0.05 & \text{Strong evidence}    \\
\  p \le 0.01 & \text{Very strong evidence}    \\
\hline 
\end{array}