```{eval-rst}
.. jupyterlite:: hypothesis_chisquare.ipynb
   :new_tab: False
```

(hypothesis_chisquare)=
# Hypothesis testing: Chi-square test

The chi-square test tests the null hypothesis that the categorical data has the given frequencies.

In [^4], bird foraging behavior was investigated in an old-growth forest of
Oregon. In the forest, 44% of the canopy volume was Douglas fir, 24% was
ponderosa pine, 29% was grand fir, and 3% was western larch. The authors
observed the behavior of several species of birds, one of which was the
red-breasted nuthatch. They made 189 observations of this species foraging,
recording 43 ("23%") of observations in Douglas fir, 52 ("28%") in ponderosa
pine, 54 ("29%") in grand fir, and 40 ("21%") in western larch.

Using a chi-square test, we can test the null hypothesis that the proportions of
foraging events are equal to the proportions of canopy volume. The authors of
the paper considered a p-value less than 1% to be significant.

Using the above proportions of canopy volume and observed events, we can infer
expected frequencies.

In [None]:
import numpy as np
f_exp = np.array([44, 24, 29, 3]) / 100 * 189

The observed frequencies of foraging were:

In [None]:
f_obs = np.array([43, 52, 54, 40])

We can now compare the observed frequencies with the expected frequencies.

In [None]:
from scipy.stats import chisquare
chisquare(f_obs=f_obs, f_exp=f_exp)

The p-value is well below the chosen significance level. Hence, the authors
considered the difference to be significant and concluded that the relative
proportions of foraging events were not the same as the relative proportions of
tree canopy volume.

## References

[^4]: Mannan, R. William and E. Charles. Meslow. "Bird populations and vegetation
 characteristics in managed and old-growth forests, northeastern Oregon."
 Journal of Wildlife Management 48, 1219-1238, :doi:`10.2307/3801783`, 1984.