# The Wilcoxon Rank-Sum Test
The Wilcoxon rank-sum test is a nonparametric alternative to the twosample
t-test which is based solely on the order in which the observations
from the two samples fall.

The Mann-Whitney test is essentially identical to the Wilcoxon test, even
though it uses a di®erent test statistic.

**Task**: Assume we have two independent samples of data, [x1, x2, ...] and [x1', x2', ...]
, each from a different population. Also assume that the sample sizes are small or the populations are not normally distributed, but that the two population distributions are approximately the same shape. The _Wilcoxon Rank-Sum Test_ allows testing whether there is a significant difference between the two medians (or if one is significantly greater than or less than the other).

In [None]:
import numpy as np

In [None]:
# Given
xA = [8.50, 9.48, 8.65, 8.16, 8.83, 7.76, 8.63]
xB = [8.27, 8.20, 8.25, 8.14, 9.00, 8.10, 7.20, 8.32, 7.70]

In [None]:
nA = len(xA); nB = len(xB)
print("nA = %d, nB = %d" % (nA, nB))

In [None]:
# Combine data
xC = xA + xB
lbl = ['A'] * nA + ['B'] * nB
print(xC)
print(lbl)

In [None]:
# Sort combined data in ascending order
i_sort = np.argsort(xC)
xC_sorted = [xC[i] for i in i_sort]
lbl_sorted = [lbl[i] for i in i_sort]
print(xC_sorted)
print(lbl_sorted)

In [None]:
# List ranks of each value
ranks = [r for r in range(1, len(xC)+1)]
print(ranks)

In [None]:
# Treatment of ties (TODO)

In [None]:
# Find sums of ranks
wA = 0; wB = 0
for i_r, r in enumerate(ranks):
    if lbl_sorted[i_r] == 'A':
        wA += r
    elif lbl_sorted[i_r] == 'B':
        wB += r
    else:
        raise ValueError("Labels 'lbl' could only take values 'A' or 'B'.")

print("wA = %d, wB = %d" % (wA, wB))

In [None]:
if wA > wB:
    print("Find the p-value from the upper-tail of the distribution.")
else:
    print("Find the p-value from the lower-tail of the distribution.")

In [None]:
# From Wilcoxon Rank-Sum Table, for nA = 7, nB = 9, and wA = 75, prob is btw 0.05 and 0.1
prob = 0.114 / 2 # Taken from Ch10.wilcoxon.pdf
pval = 2 * prob
print("p-val = %.3f" % pval)

## Using scipy to perform the same test

In [None]:
from scipy.stats import ranksums
res = ranksums(xA, xB)
print("p-val = %.3f" % res.pvalue)