# Hypthesis test: Mann-Whitney test

**date**
: 2021-04-17

**data**
: `dopamine.csv`

**ref**
: Computer book B, Activity 38

**desc**
: Performing a **Mann-Whitney test**.

In [1]:
from scripts.data import Data
from scipy.stats import mannwhitneyu

In [2]:
sample = Data.load_dopamine()

In [3]:
sample.head()

Unnamed: 0,Psychotic,Non-psychotic
0,0.015,0.0104
1,0.0204,0.0105
2,0.0208,0.0112
3,0.0222,0.0116
4,0.0226,0.013


In [4]:
# declare local vars to hold the columns
# use dropna given the sames are not equal size
# so there will NaN values
psy = sample["Psychotic"].dropna()
non_psy = sample["Non-psychotic"].dropna()

In a study into the causes of schizophrenia, 25 hospitalised patients with schizophrenia were treated with antipsychotic medication, and after a period of time were classified as psychotic or non-psychotic by hospital staff.
A sample of cerebro-spinal fluid was taken from each patient and tested for dopamine $\beta$-hydroxylase enzyme activity.
The measurements are in unit **nmol/(ml)(h)/mg** of protein,

Let the hypotheses be

$$
H_{0} : \ell = 0, \hspace{3mm} \ell \neq 0,
$$

where $\ell$ is the underlying difference between the locations the populations where the samples were drawn.

Perform the wilcoxon signed rank test using `scipy.stats.mannwhitneyu`.

In [11]:
mannwhitneyu(
    x=non_psy,
    y=psy,
    alternative="two-sided"
)

MannwhitneyuResult(statistic=20.0, pvalue=0.0024970589395120965)

Note that `"two-side"` should be passed as the actual argument for `alternative`, given the default is `None`.

We can return the unadjusted for ties $p$-value by passing `True` as an actual argument for `use_continuity`.

In [12]:
mannwhitneyu(
    x=non_psy,
    y=psy,
    use_continuity=False,
    alternative="two-sided"
)

MannwhitneyuResult(statistic=20.0, pvalue=0.002277481157072463)

In [6]:
help(mannwhitneyu)

Help on function mannwhitneyu in module scipy.stats.stats:

mannwhitneyu(x, y, use_continuity=True, alternative=None)
    Compute the Mann-Whitney rank test on samples x and y.
    
    Parameters
    ----------
    x, y : array_like
        Array of samples, should be one-dimensional.
    use_continuity : bool, optional
            Whether a continuity correction (1/2.) should be taken into
            account. Default is True.
    alternative : {None, 'two-sided', 'less', 'greater'}, optional
        Defines the alternative hypothesis.
        The following options are available (default is None):
    
          * None: computes p-value half the size of the 'two-sided' p-value and
            a different U statistic. The default behavior is not the same as
            using 'less' or 'greater'; it only exists for backward compatibility
            and is deprecated.
          * 'two-sided'
          * 'less': one-sided
          * 'greater': one-sided
    
        Use of the None opt