# Ex 4: Hypothesis Testing

## Background

Hypothesis testing or significance testing is a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample. In this method, we test some hypothesis by determining the likelihood that a statistic value could have been sampled, if the hypothesis regarding the population parameter were true.

**Reference material:** compendium from page 76.

**Important:** For all questions, include in your answer these steps of hypothesis testing:
1. State the null hypothesis $\left(H_0\right)$ and alternative hypothesis $\left(H_a\right)$
2. State your assumptions and choice of test statistic
3. State the rejection criterion
4. Compute the test statistic
5. Make a decision

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats as st
data = pd.read_csv('radcliffe.csv', usecols=[0, 1], index_col=0, parse_dates=[0]).to_period().squeeze()

## Question 1

Consider the time series of annual total rainfalls (in mm) recorded since 1767, at the Radcliffe Meteorological Station, in Oxford, England (*radcliffe.csv*). Assume the normal distribution fitted to the sample data. For this exercise, assume that the available sample begins in the year 1950 and ends in 2014. Test the hypothesis that the population mean is 646 mm, at significance level $\alpha=5\,\%$.

- $H_0$: $\mu = 646$ mm
- $H_1$: $\mu \neq 646$ mm
- Variance unknown: use Student's $t$ distribution
- $H_0$ is rejected with probability of wrongful rejection $\alpha = 0.05$ ($\alpha$ = significance level) if
$$\left| \overline{x} - 646 \right| > t_{1-\frac{\alpha}{2}} \dfrac{s_x}{\sqrt{n}} $$
that is
$$\dfrac{\left| \overline{x} - 646 \right|}{s_x}\sqrt{n} > t_{1-\frac{\alpha}{2}, n-1}$$
where $\overline{x}$ and $s_x$ are the sample mean and standard deviation of the annual precipitation series $\{x_i, i=1950,\ldots,2014\}$, $n$ is the sample size, and $t_{q, d}$ is the $q$ quantile of the Student's $t$ distribution with $d$ degrees of freedom

## Question 2

Solve question 1 again, but now assuming the population variance $\sigma^2$ is known and equal to 13045.89 mm$^2$.

## Question 3

Consider the Radcliffe rainfall record. Split the sample into two sub-samples: one, denoted by X, for the period 1767 to 1899, and the other, denoted by Y, from 1900 to 2014. Test the hypothesis, at $\alpha=5\,\%$, that the mean annual rainfall depths for the two sub-samples do not differ significantly.

- $H_0: \mu_X = \mu_Y$
- $H_1: \mu_X \neq \mu_Y$

**Hint:** check pages 83-84 in the compendium for a relevant example.

## Question 4

Consider again the Radcliffe rainfall record. Assume that the available sample begins in the year 1950 and ends in 2014. At $\alpha=5\,\%$, test the null hypothesis that the population variance is 13000 mm$^2$ against the alternative that it is larger than 13000 mm$^2$.

- $H_0: \sigma^2_X = \sigma^2_0 = 13000$
- $H_1: \sigma_X > \sigma^2_0$
- $H_0$ rejected at significance level $\alpha = 0.05$ if
$$\left(n-1\right)\dfrac{s^2_X}{\sigma^2_0} > \chi^2_{1-\alpha, n-1}$$
where $s^2_X$ is the sample variance of r.v. $X$

## Question 5
In a river station, the annual mean discharge and variance calculated from a 20 year sample data are 14.5 mm and 9.7 mm, respectively. Assume the annual discharge is normally distributed. Test the hypothesis that the population mean annual discharge is 16.4 mm $\left(\alpha=5\,\%\right)$.

## Question 6
As above, but test the hypothesis that the population mean annual discharge is 16.4 mm with an alternative hypothesis that the mean annual discharge is less than 16.4 mm.

## Question 7
As above, but test the hypothesis that the population mean annual discharge is 16.4 mm with an alternative hypothesis that the mean annual discharge is larger than 16.4 mm.

## Question 8
In Nassau on the Bahamas precipitation has been measured during two periods:
- 1872 – 1890 (19 years): $\overline{x} = 1346\,\textrm{mm}$, $s_x = 333\,\textrm{mm}$
- 1896 – 1919 (24 years): $\overline{y} = 1152\,\textrm{mm}$, $s_y = 262\,\textrm{mm}$

where $\overline{x}$ and $\overline{y}$ are sample means, and $s_x$ and $s_y$ are sample standard deviations.

*a) Is the difference between the mean values of the two periods significant? What are your assumptions?*

*b) Is the difference between the variances of the two periods significant? What are your assumptions?*