-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Describe your issue.
This is not really a bug but an invitation for improvement and simplification. Imho the _MWU class in _mannwhitneyu.py may be abolished.
In pmf_recursive() or pmf_iterative() a 3-dimensional array is used, which stores fmnk s for all combinations of m, n, k i.e. stores k-distributions for all values of m and n.
This is not necessary as formula (5) of DiBucchianico:Combinatorics_computer_algebra_and_wilcoxon_mann_whitney_test
allows for recursive updating from all f(m)nk to f(m+1)nk by adding a k-shifted f(m)(n-1)k to f(m)nk.
An example: f2,0,k = (1,0,..) f2,1,k = (1,1,1,0,..) f2,2,k=(1,1,2,1,1,0,..),..
Now f3,1,k=(1,1,1,1,0,..) = f2,1,k + rightshift(f3,0,k|3) and f3,2,k=(1,1,2,2,2,1,1,0,..) = f2,2,k + rightshift( f3,1,k |3).
This is faster and requires only a 2-dimensional array. This way fmnk may be computed for higher m and n values.
Code for this is below, free for your adaptation.
Reproducing Code Example
import numpy as np
from scipy import special
# Computes f_mnk distribution needed for exact Mann-Whitney U statistics using formula (5) of
# DiBucchianico (1996): "Combinatorics, computer algebra and Wilcoxon-Mann-Whitney test".
#
# We use unsigned quads with max value of 2**64-1 = 18446744073709551615
# as long as a safety test to check if math_comb(m+n,n) > 2**64-1 is fulfilled
#
def fmnk(m, n, k=None): # if k is None, return full dist: (fmn0, fmn1,.., fmn(m*n)), else only (fmn0, fmn1,.., fmnk)
dtyp = np.longdouble if special.binom(n+m, n) > 2**64 else np.uint64 # overflow protection: sum of all densities fits -> every entry fits
n, m = sorted((m, n)) # make n the smaller of both (n requires memory), we iterate through m
if k is None: # v-- original: l = m*n+1 # return array with full fmn distribution for k = 0,..,m*n
l = (m*n+1)//2 if m*n % 2 else (m*n)//2 + 1 # we compute only half of all distributions (including center), for symmetry reasons
else:
l = k+1 # compute and return only k-densities for k = 0,..,k
fm = np.zeros((n+1, l), dtype=dtyp)
fm[:,0] = 1
for j in range(1, m+1): # update all fjik until fmik in place: j=1,..., m
for i in range(1, n+1): # fj0k remains (1,0,..0); fjik will be updated for i >= 1
fm[i,j:]+=fm[i-1,:-j] # add right shifted (by m value which is j) previous (n-1-th) k-distributrion
if k is not None: # this is formula (5) of DiBucchianico:Combinatorics_computer_algebra_and_wilcoxon_mann_whitney_test
return fm[n,:] # return f_mn which is k-density for k = 0,..,k
r = np.zeros((m*n+1), dtype=dtyp) # k was None and we had only computed half of the k distribution
r[:l] = fm[n,:l]
print(f'')
r[l:] = fm[n,l-1::-1] if m*n % 2 else fm[n,l-2::-1] # so we return a new array and fill it symmetrically
return r # return full f_mn which is k-density for k = 0,..,m*n
print(fmnk(3,3))
Error message
[1 1 2 3 3 3 3 2 1 1]
SciPy/NumPy/Python version information
1.9.3 1.23.4 sys.version_info(major=3, minor=10, micro=8, releaselevel='final', serial=0)