The **Wilcoxon signed rank sum test** is another example of a non-parametric or distribution free test. As for the sign test, the Wilcoxon signed rank sum test is used is used to test the null hypothesis that the median of a distribution is equal to
some value. It can be used 

a) in place of a one-sample $t$-test 

b) in place of a paired $t$-test or 

c) for ordered categorial data where a numerical scale is inappropriate but where it is possible to rank the observations.


**Carrying out the Wilcoxon signed rank sum test**

**Case $1$: Paired data**

1. State the null hypothesis - in this case it is that the median difference, $M$, is equal to zero.


2. Calculate each paired difference, $d_i = x_i − y_i$, where $x_i, y_i$ are the pairs of observations.


3. Rank the $d_i$'s, ignoring the signs (i.e. assign rank $1$ to the smallest $\lvert d_i \rvert $, rank $2$ to the next etc.).


4. Label each rank with its sign, according to the sign of $d_i$.


5.  Calculate $W^{+}$, the sum of the ranks of the positive dis, and $W^{−}$, the  sum of the ranks of the negative $d_i$'s. (As a check the total, $W^{+} + W^{−}$, should be equal to $n(n+1)/2$, where $n$ is the number of pairs of observations in the sample).

**Case 2: Single set of observations**

1. State the null hypothesis - the median value is equal to some value $M$.


2. Calculate the difference between each observation and the hypothesised median, $d_i = x_i − M$.


3.  Apply Steps $3-5$ as above.

Under the null hypothesis, we would expect the distribution of the differences to be approximately symmetric around zero and the the distribution of positives and negatives
to be distributed at random among the ranks. Under this assumption, it is possible to work out the exact probability of every possible outcome for $W$. To carry out the test,
we therefore proceed as follows:

6. Choose $W = \min(W^{−}, W^{+})$


7. Use tables of critical values for the Wilcoxon signed rank sum test to find the probability of observing a value of $W$ or more extreme. Most tables give both one-sided and two-sided $p$-values. If not, double the one-sided $p$-value to obtain the two-sided $p$-value. This is an exact test.

**Normal approximation**

If the number of observations/pairs is such that $\frac{n(n+1)}{2}$ is large enough (> $20$), a normal approximation can be used with 

$$\mu_W = \frac{n(n+1)}{4}, \ \sigma_W = \sqrt{ \frac{n(n+1)(2n+1)}{24}}$$

Dealing with ties:
There are two types of tied observations that may arise when using the Wilcoxon signed
rank test:

- Observations in the sample may be exactly equal to $M$ (i.e. $0$ in the case of paired differences). Ignore such observations and adjust $n$ accordingly

- Two or more observations/differences may be equal. If so, average the ranks across the tied observations and reduce the variance by $\frac{t^3 -t}{48}$ for each group of $t$ tied ranks.


Example:
The table below shows the hours of relief provided by two analgesic drugs in $12$ patients
suffering from arthritis. Is there any evidence that one drug provides longer relief than
the other?

In [5]:
import pandas as pd
drugs = {'Drug A':[2,3.6,2.6,2.6,7.3,3.4,14.9,6.6,2.3,2,6.8,8.5],
         'Drug B':[3.5,5.7,2.9,2.4,9.9,3.3,16.7,6,3.8,4,9.1,20.9] }
data = pd.DataFrame(drugs)
data

Unnamed: 0,Drug A,Drug B
0,2.0,3.5
1,3.6,5.7
2,2.6,2.9
3,2.6,2.4
4,7.3,9.9
5,3.4,3.3
6,14.9,16.7
7,6.6,6.0
8,2.3,3.8
9,2.0,4.0


Solution:

1. In this case our null hypothesis is that the median difference is zero.

2. Our actual differnces (Drug B - Drug A) are:

In [6]:
data['Drug B'] - data['Drug A']

0      1.5
1      2.1
2      0.3
3     -0.2
4      2.6
5     -0.1
6      1.8
7     -0.6
8      1.5
9      2.0
10     2.3
11    12.4
dtype: float64

3. Ranking the difference and affixing a sign to each rank (steps $3$ and $4$ above):

Calculating $W^{+}$ and $W^{-}$ gives:

$W^{-} = 1+2+4 = 7$,

$W^{+} = 3 + 5.5 + 5.5 + 7 + 8 + 9 + 10 +11 +12 = 71$.

Therefore, we have $n = \frac{12 \times 13}{2} = 78W = \max(W^{-}, W^{+}) = 71$.

We can use a normal approximation in this case. We have one group of $2$ tied ranks, so we must reduce the variance by $(8-2)/48 = 0.125$. We get:

$$ z = \frac{71 - \frac{12 \times 13}{4} }{ \sqrt{ \frac{12 \times 13 \times 25}{24} - 0.125} } = 2.511 $$

This gives a two-sided $p$-value of $p=0.012$. There is strong evidence that Drug B provides more relief than Drug A.

In [14]:
# Examples in python

# The differences in height between cross- and self-fertilized corn plants is given as follows:
d = [6, 8, 14, 16, 23, 24, 28, 29, 41, -48, 49, 56, 60, -67, 75]

#Cross-fertilized plants appear to be be higher. To test the null hypothesis that there is no height difference, we can apply the two-sided test:

from scipy.stats import wilcoxon
w, p = wilcoxon(d)
print(w, p)

# Hence, we would reject the null hypothesis at a confidence level of 5%, concluding that there is a difference in height between the groups. To confirm that the median of the differences can be assumed to be positive, we use:

w, p = wilcoxon(d, alternative='greater')
print(w, p)

# This shows that the null hypothesis that the median is negative can be rejected at a confidence level of 5% in favor of the alternative that the median is greater than zero. The p-values above are exact. Using the normal approximation gives very similar values:

w, p = wilcoxon(d, mode='approx')

print(w, p)

# Note that the statistic changed to 96 in the one-sided case (the sum of ranks of positive differences) whereas it is 24 in the two-sided case (the minimum of sum of ranks above and below zero).


24.0 0.041259765625
96.0 0.0206298828125
24.0 0.04088813291185591
