# Hypthesis test: Mann-Whitney test

**date**
: 2021-04-17

**data**
: `osa.csv`

**ref**
: Computer book B, Activity 40

**desc**
: Performing a **Mann-Whitney test**.

In [1]:
from scripts.data import Data
from scipy.stats import mannwhitneyu

In [2]:
sample = Data.load_osa()

In [11]:
sample.head()

Unnamed: 0,Present,Absent
0,6,6.0
1,8,7.0
2,8,7.0
3,10,7.0
4,10,7.0


Note that `Absent` is type `float` due to `NaN` values.

In [12]:
# use local vars for ease of access
# dropna due to different lengths
present = sample["Present"].dropna()
absent = sample["Absent"].dropna()

A study was carried out of the relationship between customs related to illness and child-rearing practices in 39 non-literate societies.
On the basis of ethnographical reports, each of the societies was given a numerical rating for the degree of something called **oral socialisation anxiety (OSA),** which is a concept derived from psychoanalytic theory relating to child-rearing practice.
(Socialisation refers to the acquiring of social skills.)
Each society was also placed into one of two groups.

The data therefore comprise one sample of **OSA** values for, as it turned out, 23 societies where oral explanations of illness were present, and a second sample of OSA values for 16 societies where oral explanations of illness were absent.

These data were collected to investigate a hypothesis that oral explanations of illness are more likely to be present in societies with high levels of OSA than in societies with low levels of **OSA.**

Let the hypotheses be

$$
H_{0} : \ell_{A} = \ell_{P}, \hspace{3mm} \ell_{A} < \ell_{P},
$$

where $\ell_{A}, \ell_{P}$ represent the underlying locations of the populations where the samples were drawn. (**P**=Present, **A**=Absent.)

In [14]:
mannwhitneyu(
    x=absent,
    y=present,
    alternative="less"
)

MannwhitneyuResult(statistic=64.0, pvalue=0.00029455886864989165)

Given that $p < 0.01$, we conclude that there is strong evidence against the null hypothesis.

There is strong evidence that societies where oral explanations of illness are present tend to have higher oral socialisation anxiety scores than societies where oral explanations of illness are absent.

In [6]:
help(mannwhitneyu)

Help on function mannwhitneyu in module scipy.stats.stats:

mannwhitneyu(x, y, use_continuity=True, alternative=None)
    Compute the Mann-Whitney rank test on samples x and y.
    
    Parameters
    ----------
    x, y : array_like
        Array of samples, should be one-dimensional.
    use_continuity : bool, optional
            Whether a continuity correction (1/2.) should be taken into
            account. Default is True.
    alternative : {None, 'two-sided', 'less', 'greater'}, optional
        Defines the alternative hypothesis.
        The following options are available (default is None):
    
          * None: computes p-value half the size of the 'two-sided' p-value and
            a different U statistic. The default behavior is not the same as
            using 'less' or 'greater'; it only exists for backward compatibility
            and is deprecated.
          * 'two-sided'
          * 'less': one-sided
          * 'greater': one-sided
    
        Use of the None opt