ENH: The exact p-value calculation in `stats.cramervonmises_2samp` can be improved #15685

fiveseven-lambda · 2022-03-02T02:10:44Z

Is your feature request related to a problem? Please describe.

We have stats.cramervonmises_2samp() function for the 2-sample Cramér-von Mises test, but the exact p-value calculation takes a long time. The comment in the source code says:

The exact calculation will be very slow even for moderate sample sizes as the number of combinations increases rapidly with the size of the samples. If method=='auto', the exact approach is used if both samples contain less than 10 observations, otherwise the asymptotic distribution is used.

In fact, the following code runs in about 3 seconds on my computer (the actual code uses the exact approach if both samples contain equal to or less than 10 observations, not less than 10 as mentioned above).

import numpy as np
from scipy import stats
x = np.random.normal(0, 1, 10)
y = np.random.normal(0, 1, 10)
print(stats.cramervonmises_2samp(x, y))

Describe the solution you'd like.

The following paper seems to show a more efficient algorithm to calculate the exact p-value: Xiao, Y., Gordon, A., & Yakovlev, A. (2006). A C++ Program for the Cramér-Von Mises Two-Sample Test. Journal of Statistical Software, 17(8), 1–15. https://doi.org/10.18637/jss.v017.i08

Implementing this would significantly improve the calculation time of stats.cramervonmises_2samp().

Describe alternatives you've considered.

No response

Additional context (e.g. screenshots, GIFs)

No response

The text was updated successfully, but these errors were encountered:

tupui · 2022-03-02T09:02:28Z

Hi @fiveseven-lambda, thanks for reporting.

The exact calculation will be very slow even for moderate sample sizes as the number of combinations increases rapidly with the size of the samples. If method=='auto', the exact approach is used if both samples contain less than 10 observations, otherwise the asymptotic distribution is used.

Looking at the code itself, it should read less or equal than 10. We should update the documentation here.

The following paper seems to show a more efficient algorithm to calculate the exact p-value: Xiao, Y., Gordon, A., & Yakovlev, A. (2006). A C++ Program for the Cramér-Von Mises Two-Sample Test. Journal of Statistical Software, 17(8), 1–15. doi.org/10.18637/jss.v017.i08

Implementing this would significantly improve the calculation time of stats.cramervonmises_2samp().

Are you proposing to do this? Before any work is done, my suggestion would be to time the current code. Maybe there are part that can be moved to Cython/Pythran before thinking about another algorithm.

fiveseven-lambda · 2022-03-02T13:21:50Z

Thank you for your advice. I am now trying what you said, and the current progress is here:

Timing the Whole Code

The function _pval_cvm_2samp_exact() is called inside cramervonmises_2samp() to calculate the exact p-value, and the complete definition of _pval_cvm_2samp_exact() is as follows:

import numpy as np
from itertools import combinations

# L673-L686 in stats/_hypotests.py
def _all_partitions(nx, ny):
    """
    Partition a set of indices into two fixed-length sets in all possible ways
    Partition a set of indices 0 ... nx + ny - 1 into two sets of length nx and
    ny in all possible ways (ignoring order of elements).
    """
    z = np.arange(nx+ny)
    for c in combinations(z, nx):
        x = np.array(c)
        mask = np.ones(nx+ny, bool)
        mask[x] = False
        y = z[mask]
        yield x, y

# L1265-L1288 in stats/_hypotests.py
def _pval_cvm_2samp_exact(s, nx, ny):
    """
    Compute the exact p-value of the Cramer-von Mises two-sample test
    for a given value s (float) of the test statistic by enumerating
    all possible combinations. nx and ny are the sizes of the samples.
    """
    rangex = np.arange(nx)
    rangey = np.arange(ny)

    us = []

    # x and y are all possible partitions of ranks from 0 to nx + ny - 1
    # into two sets of length nx and ny
    # Here, ranks are from 0 to nx + ny - 1 instead of 1 to nx + ny, but
    # this does not change the value of the statistic.
    for x, y in _all_partitions(nx, ny):
        # compute the statistic
        u = nx * np.sum((x - rangex)**2)
        u += ny * np.sum((y - rangey)**2)
        us.append(u)

    # compute the values of u and the frequencies
    u, cnt = np.unique(us, return_counts=True)
    return np.sum(cnt[u >= s]) / np.sum(cnt)

I measured the time it takes to evaluate _pval_cvm_2samp_exact(s, nx, ny) where nx = ny = 10 and s = 7000, 7500, 8000 with the following code:

import time

s = 7000 # edit here to specify the value of s

time_sum = 0
for _ in range(10):
    begin = time.perf_counter()
    _pval_cvm_2samp_exact(s, 10, 10)
    end = time.perf_counter()
    diff = end - begin
    print('%.3f' % diff)
    time_sum += diff
# the average of 10 trials
print('%.3f' % (time_sum / 10))

The result was:

`s`	1	2	3	4	5	6	7	8	9	10	average
7000	1.949	1.930	1.904	1.882	1.877	1.886	1.885	1.923	1.956	1.947	1.914
7500	1.951	1.964	1.973	2.018	1.966	1.961	1.946	1.951	1.945	1.956	1.963
8000	2.009	1.987	2.008	1.983	1.955	1.974	1.983	1.983	1.969	1.976	1.983

It takes about 2 seconds for all cases, and this is what I want to improve.

Since the value of s seems to have little effects on the computation time, I only measure the case s = 8000 below.

Timing the Slow Part

Next, I measured the part

    for x, y in _all_partitions(nx, ny):
        u = nx * np.sum((x - rangex)**2)
        u += ny * np.sum((y - rangey)**2)
        us.append(u)

by rewriting the code as

# the definition of _all_partitions() omitted

# the sum of 10 trials
time_sum = 0

def _pval_cvm_2samp_exact(s, nx, ny):
    global time_sum

    rangex = np.arange(nx)
    rangey = np.arange(ny)
    us = []

    begin = time.perf_counter()

    for x, y in _all_partitions(nx, ny):
        u = nx * np.sum((x - rangex)**2)
        u += ny * np.sum((y - rangey)**2)
        us.append(u)

    end = time.perf_counter()
    diff = end - begin
    print('%.3f' % diff)
    time_sum += diff

    u, cnt = np.unique(us, return_counts=True)
    return np.sum(cnt[u >= s]) / np.sum(cnt)

for _ in range(10):
    _pval_cvm_2samp_exact(8000, 10, 10)

# the average of 10 trials
print('%.3f' % (time_sum / 10))

Then I got

`s`	1	2	3	4	5	6	7	8	9	10	average
8000	1.792	1.814	1.798	1.818	1.806	1.816	1.861	1.847	1.774	1.801	1.813

These 4 lines takes 1.813 / 1.983 = 91.4% of the whole computation time.

tupui · 2022-03-02T13:29:48Z

Thank you for the details @fiveseven-lambda! Note that you would have an easier debugging session using tools like https://github.com/pyutils/line_profiler.

fiveseven-lambda · 2022-03-02T13:41:19Z

@tupui Thank you. I didn't know such tools. I will use it since the next comment, though I use the legacy way to time the codes in this comment in order to align the conditions as above.

Trying Cython

Now I try using Cython in the following steps:

Copy the definition of _all_partitions() _pval_cvm_2samp_exact() in a file named _pval_cvm_2samp_exact.pyx.

Create setup.py and write:

from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("_pval_cvm_2samp_exact.pyx")
)

Run python setup.py build_ext --inplace.

Run

from _pval_cvm_2samp_exact import _pval_cvm_2samp_exact

import time

time_sum = 0
for _ in range(10):
    begin = time.perf_counter()
    _pval_cvm_2samp_exact(8000, 10, 10)
    end = time.perf_counter()
    diff = end - begin
    print('%.3f' % diff)
    time_sum += diff
# the average of 10 trials
print('%.3f' % (time_sum / 10))

Then, the result was:

`s`	1	2	3	4	5	6	7	8	9	10	average
8000	1.996	1.984	1.988	1.986	2.001	1.990	1.996	1.991	2.036	2.069	2.004

It didn't seem to be much improved by using Cython.

tupui · 2022-03-02T13:56:07Z

If you just copied the functions without edits it will not do much I am afraid. Going to Cython means that you have to re-write the code using C-like syntax. So no use of NumPy functions. If you want to do this, you can try Pythran which has some support for NumPy. We have some examples in stats using Pythran. See https://pythran.readthedocs.io/en/latest/SUPPORT.html# seems like everything needed would be there.

fiveseven-lambda · 2022-03-02T14:10:13Z

@tupui Thank you. I tried Pythran, but failed:

Trying Pythran (failed)

I added type hints before the function definitions as:

# pythran export _all_partitions(int, int)
def _all_partitions(nx, ny):

# pythran export _pval_cvm_2samp_exact(int, int, int)
def _pval_cvm_2samp_exact(s, nx, ny):

I ran pythran _pval_cvm_2samp_exact.py, then I got an error:

WARNING: Compilation error, trying hard to find its origin...
WARNING: Nop, I'm going to flood you with C++ errors!
CRITICAL: Cover me Jack. Jack? Jaaaaack!!!!
E: error: Command "gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -flto -ffat-lto-objects -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -flto -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -flto -fPIC -DENABLE_PYTHON_MODULE -D__PYTHRAN__=3 -DPYTHRAN_BLAS_BLAS -I/usr/lib/python3.10/site-packages/pythran -I/usr/lib/python3.10/site-packages/numpy/core/include -I/usr/local/include -I/usr/include/python3.10 -c /tmp/tmpe8_rnn4k.cpp -o /tmp/tmpbwpjnz5o/tmp/tmpe8_rnn4k.o -std=c++11 -fno-math-errno -fvisibility=hidden -fno-wrapv -Wno-unused-function -Wno-int-in-bool-context -Wno-unknown-warning-option" failed with exit status 1

tupui · 2022-03-02T14:31:08Z

I suppose it's because of yield. I don't think Pythran supports this Python keyword. So you would need to not use a generator but return a list or array instead.

fiveseven-lambda · 2022-03-02T14:47:17Z

Thank you @tupui. You were right:

Trying Pythran (successful)

I rewrote the _pval_cvm_2samp_exact.py without using yield:

import numpy as np
from itertools import combinations

# pythran export _pval_cvm_2samp_exact(int, int, int)
def _pval_cvm_2samp_exact(s, nx, ny):
    rangex = np.arange(nx)
    rangey = np.arange(ny)

    us = []

    z = np.arange(nx+ny)
    for c in combinations(z, nx):
        x = np.array(c)
        mask = np.ones(nx+ny, bool)
        mask[x] = False
        y = z[mask]
        # compute the statistic
        u = nx * np.sum((x - rangex)**2)
        u += ny * np.sum((y - rangey)**2)
        us.append(u)

    u, cnt = np.unique(us, return_counts=True)
    return np.sum(cnt[u >= s]) / np.sum(cnt)

Then pythran _pval_cvm_2samp_exact.py and got:

1	2	3	4	5	6	7	8	9	10	average
0.031	0.030	0.030	0.031	0.031	0.032	0.031	0.030	0.031	0.030	0.031

It works! Thank you.

Another Problem of the Current Code

As mentioned above, stats.cramervonmises_2samp(x, y) uses the asymptotic method if either len(x) or len(y) is greater than 10, which means that the returned p-value can be a bit different from the true p-value if len(x) > 10 or len(y) > 10, especially when len(x) and len(y) are close to 10. In statistical testings, asymptotic method is used only when the sample size (i.e. the number of observations) is large enough so that the error of the result is negligible.

But the following result shows that the current threshold 10 is too small for stats.cramervonmises_2samp().

Running this code

from scipy import stats
x = [1, 2, 3, 6, 7, 8, 11, 12, 14, 15, 16]
y = [4, 5, 9, 10, 13, 17, 18, 19, 20, 21, 22]
# auto (asymptotic)
print(stats.cramervonmises_2samp(x, y))
# exact
print(stats.cramervonmises_2samp(x, y, method = 'exact'))

we have

CramerVonMisesResult(statistic=0.4690082644628095, pvalue=0.04778422596827481)
CramerVonMisesResult(statistic=0.4690082644628095, pvalue=0.05102405334603477)

The defference between the two results (0.003) is unignorable, because it crosses over the line 0.05, which is used widely as the significance level.

It seems the value of threshold 10 has to be raised, but I have no idea how much number suffices.

tupui · 2022-03-02T14:53:33Z

Great, thanks for pushing this! It sounds like you have a case to make a PR (this function would fit in _hypotests_pythran.py).

For the low value, maybe it was just done for performance reasons. We could do this in 2 steps, do the performance improvement and quick fix the doc, then change the threshold. Or if you find previous PR/issues on this code, maybe they explained the reason why it was set to 10.

fiveseven-lambda · 2022-03-02T15:13:02Z

I found a conversation in this PR:

Also, there are more efficient methods for computing this, right? Prefer to save that for future work? This gives a short summary of some methods.

yes, the calculation is very similar to the KS test exact method and there are more efficient path counting methods comparable to the code implemented by @pvanmulbregt . I just wanted to add a simple way to compute the exact values for small samples.

OK.

It seems that a larger threshold or the more efficient method wasn't just their purpose.

tupui · 2022-03-02T15:16:32Z

Ok but then it seems like we can update the value as we want until performance are a concern again. During the review we will ping both Matt and Christoph.

mdhaber · 2022-03-02T15:42:47Z

Yes, the current exact calculation is equivalent to permutation_test. It is not really worth moving that to Cython right now. At best, it might allow you to calculate the null distribution with one or two more observations, but the number of permutations grows too quickly as the sample size increases for a constant speedup to help much. The intent is to add efficient methods for calculating exact distributions for all test statistics where possible, but there are lots of other things to work on, and reviewer time is limited, so not all are done yet.

Slow speed is the only reason why the threshold was set so low. Check the literature for a recommendation about where to set the cutoff. If you can't find one, do some experiments to see when you think the error between the two is small enough.

I'd suggest starting with pure Python.

tupui · 2022-03-02T15:46:38Z

@mdhaber I don't think I understand. Going to Pythran is making the code go from 2s to 30ms. Why do you propose to stick with pure Python here?

mdhaber · 2022-03-02T16:55:43Z

I think increasing the number of observations per sample by 10 increases the number of computations by a factor of about a million. I don't think we can get very far with this brute force algorithm before Pythran gets slow.

tupui · 2022-03-02T16:56:56Z

Ah yes indeed... So you advice on not taking this in favour of using another algorithm?

tupui · 2022-03-02T17:21:13Z

@fiveseven-lambda, I am summarising a small talk we had @mdhaber and I.

Matt makes a good point that the current algorithm is not scalable due to the combination part (I skipped this part too fast). Yes it would still add a small value in terms of speed to migrate this part to Pythran, but at the cost of doing it/review and most importantly maintaining it because it's not pure Python anymore and it opens the door to new potential issues.

What would be preferable instead would be to implement in pure Python (at first at least) the new algorithm you proposed (I did not do a literature review, so I personally cannot say if there would be better approaches at that point). If you do that, first of all thank you! Second, please keep it first as close to the paper (no need to be creative on naming nor try to find tricks at first). mannwhitneyu should be a good example of the desired end state.

fiveseven-lambda · 2022-03-03T03:59:27Z

I am reading the paper, and I implemented the algorithm 1 in page 6 so far. Further improvement is performed in the following sections.

So the function _pval_cvm_2samp_exact_new() defined as

import math
import itertools

def _pval_cvm_2samp_exact_new(s, m, n):
    # [1] Y. Xiao, A. Gordon, and A. Yakovlev, “A C++ Program for the Cramér-Von Mises Two-Sample Test”, J. Stat. Soft., vol. 17, no. 8, pp. 1–15, Dec. 2006.
    # [2] T. W. Anderson "On the Distribution of the Two-Sample Cramer-von Mises Criterion," The Annals of Mathematical Statistics, Ann. Math. Statist. 33(3), 1148-1159, (September, 1962)

    # compute T (eq. 9 in [2])
    # same as the code in cramervonmises_2samp(), this can be omitted if cramervonmises_2samp() passes the value of t instead of u to _pval_cvm_2samp_exact()
    k, N = m * n, m + n
    t = s / (k * N) - (4 * k - 1)/(6 * N)

    # [1, p. 3]
    l = math.lcm(m, n)
    # [1, p. 4], below eq. 3
    a = l // m
    b = l // n
    # eq. 2 in [1]
    zeta = t * (m + n) ** 2 * l ** 2 / (m * n) - 1e-6
    # Each dictionary g[u][v] is the frequency table of $g_{u, v}^+$ defined in [1, p. 6]
    g = [[{} for _ in range(m + 1)] for _ in range(n + 1)]
    for u in range(n + 1):
        for v in range(m + 1):
            if u == 0:
                # eq. 13 in [1]
                g[u][v][a * a * v * (v + 1) * (2 * v + 1) // 6] = 1
                continue
            if v == 0:
                # eq. 12 in [1]
                g[u][v][b * b * u * (u + 1) * (2 * u + 1) // 6] = 1
                continue
            # eq. 11 in [1]
            d = a * v - b * u
            for (value, frequency) in itertools.chain(g[u - 1][v].items(), g[u][v - 1].items()):
                g[u][v].setdefault(value + d * d, 0)
                g[u][v][value + d * d] += frequency
    return sum(frequency for (value, frequency) in g[n][m].items() if value >= zeta) / math.comb(m + n, m)

is pretty fast:

1	2	3	4	5	6	7	8	9	10	average
0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001

But the conversion between several statistics occur:

# from s (or u in cramervonmises_2samp()) to t
t = s / (k * N) - (4 * k - 1) / (6 * N)

# from t to zeta
zeta = t * (m + n) ** 2 * l ** 2 / (m * n) - 1e-6

and this causes floating-point error. The current code subtracts a small number 1e-6 from zeta to avoid the wrong result, but maybe the calculation can be done without using floating-point numbers.

tupui · 2022-03-03T08:12:57Z

Looks like a good start. Does this scale? The data structure g seems pretty huge from a quick test.

fiveseven-lambda · 2022-03-03T09:07:10Z

The computation time for larger m and n was as follows:

`m`	`n`	time (seconds)
10	10	0.001
15	15	0.011
20	20	0.053
25	25	0.178
30	30	0.476

I also counted the total size of the dicts in g by adding a line:

print(sum(sum(len(table) for table in row) for row in g))

result:

`m`	`n`	total size of g
10	10	4428
15	15	34376
20	20	149200
25	25	464279
30	30	1170635

We only have to count the total number of the values equal to or greater than zeta, so the dicts can be replaced with just lists, I think.

mdhaber · 2022-03-03T23:57:46Z

Looks like a good start @fiveseven-lambda. I haven't checked too carefully, but one thing to think about is whether you can vectorize at least one of those for loops.

fiveseven-lambda · 2022-03-05T05:21:55Z

Sorry I'm late. The current progress:

Eliminating floating-point arithmetic

I took seriously that a floating-point error occurs while converting from u to zeta, although u and zeta are both integers. So I read the paper and got the right formula for the conversion: zeta = l ** 2 * N * (6 * u - mn * (4 * mn - 1)) // (6 * mn ** 2). The following program checks if it is correct:

import numpy as np
import math

for N in range(2, 15):
    for xy in range(1, (1 << N) - 1):
        # array of 1 (represents X) and 0 (represents Y) of length N
        xy = xy >> np.arange(N) & 1
        # number of X's
        m = np.count_nonzero(xy)
        # number of Y's
        n = N - m
        # lcm
        l = math.lcm(m, n)
        a = l // m
        b = l // n
        # empirical distribution functions
        f = np.insert(xy.cumsum(), 0, 0)
        g = np.arange(N + 1) - f
        # zeta, must be an integer
        zeta = sum((a * f - b * g) ** 2)
        # ranks of x in xy
        r = np.arange(N)[xy == 1]
        # ranks of y in xy
        s = np.arange(N)[xy == 0]
        # u, must be an integer
        u = m * sum((r - np.arange(m)) ** 2) + n * sum((s - np.arange(n)) ** 2)
        # conversion from u to zeta
        mn = m * n
        zeta2 = l ** 2 * N * (6 * u - mn * (4 * mn - 1)) / (6 * mn ** 2)
        if zeta != zeta2:
            print(m, n, xy, zeta, u, zeta2)

This shows that the value of zeta2 calculated as l ** 2 * N * (6 * u - mn * (4 * mn - 1)) / (6 * mn ** 2) equals to zeta, and zeta must be an integer because it is defined as sum((a * f - b * g) ** 2) where a, b, f and g are all integers or integer arrays. Therefore, the division / (6 * mn ** 2) can be done with // (6 * mn ** 2) instead.

Now that we don't have to worry about the floating-point errors.

Faster algorithm described in the paper

The paper (Xiao, Y., Gordon, A., & Yakovlev, A. 2006) seems to introduce a faster algorithm (algorithm 2) than I implemented 2 days ago (algorithm 1). But I cannot insist that we should use the algorithm 2, because it includes convolution, and doesn't seem much faster than the algorithm 1 if m and n are not so large (the asymptotic method will do if they are so large). So the way I think the best is to improve the current code in algorithm 1 (by vectorizing some loops as @mdhaber said, for example).

fiveseven-lambda · 2022-03-05T16:54:47Z

I have refactored the code. Just replacing _pval_cvm_2samp_exact() with this _pval_cvm_2samp_exact_new() allows us to raise the threshold to at least 20: _pval_cvm_2samp_exact_new(146600, 20, 20) was calculated in 0.005 second.

import math
import numpy as np

def _pval_cvm_2samp_exact_new(s, m, n):
    # [1] Y. Xiao, A. Gordon, and A. Yakovlev, “A C++ Program for the Cramér-Von Mises Two-Sample Test”, J. Stat. Soft., vol. 17, no. 8, pp. 1–15, Dec. 2006.
    # [2] T. W. Anderson "On the Distribution of the Two-Sample Cramer-von Mises Criterion," The Annals of Mathematical Statistics, Ann. Math. Statist. 33(3), 1148-1159, (September, 1962)

    # [1, p. 3]
    l = math.lcm(m, n)
    # [1, p. 4], below eq. 3
    a = l // m
    b = l // n
    # eq. 9 in [2] and eq. 2 in [1]
    mn = m * n
    zeta = l ** 2 * (m + n) * (6 * s - mn * (4 * mn - 1)) // (6 * mn ** 2)
    # each g[u][v] is the cumulative frequency table of $g_{u, v}^+$ defined in [1, p. 6]
    g = [[np.zeros(zeta, int) for _ in range(m + 1)] for _ in range(n + 1)]
    g[0][0] = np.ones(zeta, int)
    for u in range(n + 1):
        for v in range(m + 1):
            # calculate g[u][v] by g[u - 1][v] and g[u][v - 1] with eq. 11 in [1]
            d = a * v - b * u
            if d * d >= zeta:
                continue
            if u > 0:
                g[u][v][d * d:] += g[u - 1][v][:zeta - d * d]
            if v > 0:
                g[u][v][d * d:] += g[u][v - 1][:zeta - d * d]
    # the number of all combinations
    total = math.comb(m + n, m)
    # >= zeta is the complement of <= zeta - 1
    return (total - g[n][m][zeta - 1]) / total

fiveseven-lambda · 2022-03-07T05:24:45Z

There was a problem in the code I wrote 2 day ago. It uses a 3-dimensional array of size zeta * (m + 1) * (n + 1), but the value of zeta can be very large if lcm(m, n) is large (see Table 1). On the other hand, the previous one uses (m + 1) * (n + 1) dicts, and the size of each dict is much smaller than zeta, even if m and n are co-prime (see Table 2).

Table 1: The maximum value of zeta for each m, n

m\n	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
1	1	5	14	30	55	91	140	204	285	385	506	650	819	1015	1240	1496	1785	2109	2470	2870
2	5	6	65	34	245	100	609	220	1221	410	2145	686	3445	1064	5185	1560	7429	2190	10241	2970
3	14	65	19	350	620	111	1505	2156	330	3965	5159	730	8216	10115	1365	14744	17510	2289	24035	27830
4	30	34	350	44	1230	490	2926	260	5694	1890	9790	776	15470	4746	22990	1720	32606	9570	44574	3220
5	55	245	620	1230	85	3355	4970	7020	9555	505	16280	20570	25545	31255	1510	45080	53295	62445	72580	3350
6	91	100	111	490	3355	146	7735	2716	1635	4840	24871	870	38779	11830	6335	16984	80155	2604	108775	31330
7	140	609	1505	2926	4970	7735	231	15820	21336	27965	35805	44954	55510	1379	81235	96600	113764	132825	153881	177030
8	204	220	2156	260	7020	2716	15820	344	29580	9660	49324	3860	76076	23100	110860	2056	154700	45084	208620	14980
9	285	1221	330	5694	9555	1635	21336	29580	489	51585	65670	9114	100815	122199	16260	173400	203541	2925	273714	314070
10	385	410	3965	1890	505	4840	27965	9660	51585	670	85085	26510	130065	39340	7525	55640	260865	75810	349885	4010
11	506	2145	5159	9790	16280	24871	35805	49324	65670	85085	891	134090	164164	198275	236665	279576	327250	379929	437855	501270
12	650	686	730	776	20570	870	44954	3860	9114	26510	134090	1156	203450	61334	32490	21560	403274	12990	538346	38480
13	819	3445	8216	15470	25545	38779	55510	76076	100815	130065	164164	203450	1469	298935	355810	419224	489515	567021	652080	745030
14	1015	1064	10115	4746	31255	11830	1379	23100	122199	39340	198275	61334	298935	1834	427315	125720	586551	169680	779779	222530
15	1240	5185	1365	22990	1510	6335	81235	110860	16260	7525	236665	32490	355810	427315	2255	596440	694960	89265	922165	42070
16	1496	1560	14744	1720	45080	16984	96600	2056	173400	55640	279576	21560	419224	125720	596440	2736	815320	235416	1079960	76920
17	1785	7429	17510	32606	53295	80155	113764	154700	203541	260865	327250	403274	489515	586551	694960	815320	3281	1094205	1253886	1427830
18	2109	2190	2289	9570	62445	2604	132825	45084	2925	75810	379929	12990	567021	169680	89265	235416	1094205	3894	1444665	410970
19	2470	10241	24035	44574	72580	108775	153881	208620	273714	349885	437855	538346	652080	779779	922165	1079960	1253886	1444665	4579	1879670
20	2870	2970	27830	3220	3350	31330	177030	14980	314070	4010	501270	38480	745030	222530	42070	76920	1427830	410970	1879670	5340

Table 2: The maximum size of dicts

m\n	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
1	1	2	2	3	3	4	4	5	5	6	6	7	7	8	8	9	9	10	10	11
2	2	2	6	6	11	8	19	15	28	17	38	27	51	29	64	43	78	45	97	62
3	2	6	4	17	18	18	33	59	26	89	80	65	115	175	71	236	194	146	244	378
4	3	6	17	8	43	42	94	46	159	130	241	69	353	263	500	208	643	458	838	197
5	3	11	18	43	14	108	95	207	169	102	268	529	379	753	140	1021	703	1337	893	466
6	4	8	18	42	108	25	237	197	188	190	687	195	1010	397	584	936	1885	261	2432	1540
7	4	19	33	94	95	237	42	469	372	803	589	1247	845	332	1171	2397	1537	3141	1975	3979
8	5	15	59	46	207	197	469	63	867	640	1396	533	2063	1358	2869	521	3808	2368	4900	1611
9	5	28	26	159	169	188	372	867	92	1480	1043	885	1516	3235	773	4398	2738	773	3497	7249
10	6	17	89	130	102	190	803	640	1480	130	2378	1597	3479	1214	1140	3027	6442	2091	8250	1093
11	6	38	80	241	268	687	589	1396	1043	2378	177	3644	2362	5159	3263	6977	4275	9086	5451	11484
12	7	27	65	69	529	195	1247	533	885	1597	3644	233	5309	3362	2702	2374	9719	2086	12450	2001
13	7	51	115	353	379	1010	845	2063	1516	3479	2362	5309	305	7537	4698	10178	6147	13208	7831	16651
14	8	29	175	263	753	397	332	1358	3235	1214	5159	3362	7537	385	10407	6314	13722	4278	17537	10245
15	8	64	71	500	140	584	1171	2869	773	1140	3263	2702	4698	10407	478	14005	8401	6462	10678	4971
16	9	43	236	208	1021	936	2397	521	4398	3027	6977	2374	10178	6314	14005	591	18446	10897	23532	7052
17	9	78	194	643	703	1885	1537	3808	2738	6442	4275	9719	6147	13722	8401	18446	715	23895	13969	30000
18	10	45	146	458	1337	261	3141	2368	773	2091	9086	2086	13208	4278	6462	10897	23895	854	30407	17600
19	10	97	244	838	893	2432	1975	4900	3497	8250	5451	12450	7831	17537	10678	23532	13969	30407	1016	38185
20	11	62	378	197	466	1540	3979	1611	7249	1093	11484	2001	16651	10245	4971	7052	30000	17600	38185	1192

The previous one (using dicts) was better:

import math
from collections import Counter

def _pval_cvm_2samp_exact_new(s, m, n):
    # [1] Y. Xiao, A. Gordon, and A. Yakovlev, “A C++ Program for the Cramér-Von Mises Two-Sample Test”, J. Stat. Soft., vol. 17, no. 8, pp. 1–15, Dec. 2006.
    # [2] T. W. Anderson "On the Distribution of the Two-Sample Cramer-von Mises Criterion," The Annals of Mathematical Statistics, Ann. Math. Statist. 33(3), 1148-1159, (September, 1962)

    # [1, p. 3]
    l = math.lcm(m, n)
    # [1, p. 4], below eq. 3
    a = l // m
    b = l // n
    # eq. 9 in [2] and eq. 2 in [1]
    mn = m * n
    zeta = l ** 2 * (m + n) * (6 * s - mn * (4 * mn - 1)) // (6 * mn ** 2)
    # each g[u][v] is the cumulative frequency table of $g_{u, v}^+$ defined in [1, p. 6]
    g = [[Counter() for _ in range(m + 1)] for _ in range(n + 1)]
    g[0][0][0] = 1
    for u in range(n + 1):
        for v in range(m + 1):
            tmp = Counter()
            if u > 0:
                tmp += g[u - 1][v]
            if v > 0:
                tmp += g[u][v - 1]
            d = a * v - b * u
            for value, frequency in tmp.items():
                g[u][v][value + d * d] = frequency
    return sum(frequency for value, frequency in g[n][m].items() if value >= zeta) / math.comb(m + n, m)

But I'm sorry I don't know how to vectorize for loops in this code. Could you please improve this code?

tupui · 2022-03-07T07:15:03Z

I am not an expert in vectorization. What you want to do this is to change the data structure, elements need to be in a NumPy array so you can use reduction on some axis.

fiveseven-lambda · 2022-03-07T07:27:44Z

To use NumPy array, we need a way equivalent to collections.Counter's + operation.
Given 2 arrays

[[10  5]
 [50  3]]

and

[[30  9]
 [50  4]
 [60  2]]

we want to merge these and sort:

[[10  5]
 [30  9]
 [50  7]
 [60  2]]

(Note that [50 3] and [50 4] are added into [50 7].)

Is there any way to do this in a short time?

tupui · 2022-03-07T07:45:33Z

I am not sure. Maybe a combination of np.unique, np.argwhere and summing for the reduction part.

fiveseven-lambda · 2022-03-07T07:59:48Z

I've come up with this one, using np.intersect1d:

import numpy as np
a = np.array([[10, 5], [50, 3]])
b = np.array([[30, 9], [50, 4], [60, 2]])
av, af = a.T
bv, bf = b.T
intersection_v, a_index, b_index = np.intersect1d(av, bv, return_indices = True)
intersection = np.array([intersection_v, af[a_index] + bf[b_index]]).T
result = np.concatenate([intersection, np.delete(a, a_index, axis = 0), np.delete(b, b_index, axis = 0)])
result.sort(axis = 0)
print(result)

Does there seem to be a better way? If not, I will use this.

fiveseven-lambda · 2022-03-07T09:05:09Z

So this is the code using NumPy arrays:

import math
import numpy as np

def _pval_cvm_2samp_exact_new(s, m, n):
    # [1] Y. Xiao, A. Gordon, and A. Yakovlev, “A C++ Program for the Cramér-Von Mises Two-Sample Test”, J. Stat. Soft., vol. 17, no. 8, pp. 1–15, Dec. 2006.
    # [2] T. W. Anderson "On the Distribution of the Two-Sample Cramer-von Mises Criterion," The Annals of Mathematical Statistics, Ann. Math. Statist. 33(3), 1148-1159, (September, 1962)

    # [1, p. 3]
    l = math.lcm(m, n)
    # [1, p. 4], below eq. 3
    a = l // m
    b = l // n
    # eq. 9 in [2] and eq. 2 in [1]
    mn = m * n
    zeta = l ** 2 * (m + n) * (6 * s - mn * (4 * mn - 1)) // (6 * mn ** 2)
    # each g[u][v] is the cumulative frequency table of $g_{u, v}^+$ defined in [1, p. 6]
    g = [[None for _ in range(m + 1)] for _ in range(n + 1)]
    for u in range(n + 1):
        for v in range(m + 1):
            # calculate g[u][v] by g[u - 1][v] and g[u][v - 1] with eq. 11 in [1]
            if u == 0:
                if v == 0:
                    g[u][v] = np.array([[0, 1]])
                else:
                    g[u][v] = g[u][v - 1].copy()
            else:
                if v == 0:
                    g[u][v] = g[u - 1][v].copy()
                else:
                    v0, f0 = g[u][v - 1].T
                    v1, f1 = g[u - 1][v].T
                    vi, i0, i1 = np.intersect1d(v0, v1, return_indices = True)
                    g[u][v] = np.concatenate([
                        np.array([vi, f0[i0] + f1[i1]]).T,
                        np.delete(g[u][v - 1], i0, axis = 0),
                        np.delete(g[u - 1][v], i1, axis = 0)
                    ])
            d = a * v - b * u
            g[u][v][:, 0] += d * d
    last_g = g[n][m]
    return np.sum(last_g[last_g[:, 0] >= zeta][:, 1]) / math.comb(m + n, m)

I think this is good enough to create a pull request. Is there anything to do other than the two below?

Update the code of _pval_cvm_2samp_exact_new()
Update the code and the doc comment of cramervonmises_2samp() (about the shreshold)

tupui · 2022-03-07T09:29:12Z

Did the NumPy code help compared to the other version you had?

fiveseven-lambda · 2022-03-07T10:05:39Z

Yes. I compared the one with cumulative frequency table, the one with collections.Counter, and the one with NumPy array.

Timer unit: 1e-06 s

Total time: 2.36472 s
File: a.py
Function: _pval_cvm_2samp_exact_cumulative_frequency_table at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           @profile
     6                                           def _pval_cvm_2samp_exact_cumulative_frequency_table(s, m, n):
     7                                               # [1] Y. Xiao, A. Gordon, and A. Yakovlev, “A C++ Program for the Cramér-Von Mises Two-Sample Test”, J. Stat. Soft., vol. 17, no. 8, pp. 1–15, Dec. 2006.
     8                                               # [2] T. W. Anderson "On the Distribution of the Two-Sample Cramer-von Mises Criterion," The Annals of Mathematical Statistics, Ann. Math. Statist. 33(3), 1148-1159, (September, 1962)
     9
    10                                               # [1, p. 3]
    11         1          2.0      2.0      0.0      l = math.lcm(m, n)
    12                                               # [1, p. 4], below eq. 3
    13         1          1.0      1.0      0.0      a = l // m
    14         1          1.0      1.0      0.0      b = l // n
    15                                               # eq. 9 in [2] and eq. 2 in [1]
    16         1          1.0      1.0      0.0      mn = m * n
    17         1          2.0      2.0      0.0      zeta = l ** 2 * (m + n) * (6 * s - mn * (4 * mn - 1)) // (6 * mn ** 2)
    18                                               # each g[u][v] is the cumulative frequency table of $g_{u, v}^+$ defined in [1, p. 6]
    19         1        826.0    826.0      0.0      g = [[np.zeros(zeta, int) for _ in range(m + 1)] for _ in range(n + 1)]
    20         1       1560.0   1560.0      0.1      g[0][0] = np.ones(zeta, int)
    21        22         14.0      0.6      0.0      for u in range(n + 1):
    22       441        366.0      0.8      0.0          for v in range(m + 1):
    23                                                       # calculate g[u][v] by g[u - 1][v] and g[u][v - 1] with eq. 11 in [1]
    24       420        319.0      0.8      0.0              d = a * v - b * u
    25       420        290.0      0.7      0.0              if d * d >= zeta:
    26                                                           continue
    27       420        170.0      0.4      0.0              if u > 0:
    28       400    1877266.0   4693.2     79.4                  g[u][v][d * d:] += g[u - 1][v][:zeta - d * d]
    29       420        354.0      0.8      0.0              if v > 0:
    30       399     483540.0   1211.9     20.4                  g[u][v][d * d:] += g[u][v - 1][:zeta - d * d]
    31                                               # the number of all combinations
    32         1          6.0      6.0      0.0      total = math.comb(m + n, m)
    33                                               # >= zeta is the complement of <= zeta - 1
    34         1          4.0      4.0      0.0      return (total - g[n][m][zeta - 1]) / total

Total time: 6.6444 s
File: a.py
Function: _pval_cvm_2samp_exact_counter at line 36

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    36                                           @profile
    37                                           def _pval_cvm_2samp_exact_counter(s, m, n):
    38                                               # [1] Y. Xiao, A. Gordon, and A. Yakovlev, “A C++ Program for the Cramér-Von Mises Two-Sample Test”, J. Stat. Soft., vol. 17, no. 8, pp. 1–15, Dec. 2006.
    39                                               # [2] T. W. Anderson "On the Distribution of the Two-Sample Cramer-von Mises Criterion," The Annals of Mathematical Statistics, Ann. Math. Statist. 33(3), 1148-1159, (September, 1962)
    40
    41                                               # [1, p. 3]
    42         1          4.0      4.0      0.0      l = math.lcm(m, n)
    43                                               # [1, p. 4], below eq. 3
    44         1          1.0      1.0      0.0      a = l // m
    45         1          0.0      0.0      0.0      b = l // n
    46                                               # eq. 9 in [2] and eq. 2 in [1]
    47         1          0.0      0.0      0.0      mn = m * n
    48         1          3.0      3.0      0.0      zeta = l ** 2 * (m + n) * (6 * s - mn * (4 * mn - 1)) // (6 * mn ** 2)
    49                                               # each g[u][v] is the frequency table of $g_{u, v}^+$ defined in [1, p. 6]
    50         1        474.0    474.0      0.0      g = [[Counter() for _ in range(m + 1)] for _ in range(n + 1)]
    51         1          3.0      3.0      0.0      g[0][0][0] = 1
    52        22         10.0      0.5      0.0      for u in range(n + 1):
    53       441        216.0      0.5      0.0          for v in range(m + 1):
    54       420       6938.0     16.5      0.1              tmp = Counter()
    55       420        223.0      0.5      0.0              if u > 0:
    56       400    1639626.0   4099.1     24.7                  tmp += g[u - 1][v]
    57       420        201.0      0.5      0.0              if v > 0:
    58       399    1334903.0   3345.6     20.1                  tmp += g[u][v - 1]
    59       420        286.0      0.7      0.0              d = a * v - b * u
    60   3708506    1492975.0      0.4     22.5              for value, frequency in tmp.items():
    61   3708086    2165124.0      0.6     32.6                  g[u][v][value + d * d] = frequency
    62         1       3415.0   3415.0      0.1      return sum(frequency for value, frequency in g[n][m].items() if value >= zeta) / math.comb(m + n, m)

Total time: 0.208749 s
File: a.py
Function: _pval_cvm_2samp_exact_numpy at line 64

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    64                                           @profile
    65                                           def _pval_cvm_2samp_exact_numpy(s, m, n):
    66                                               # [1] Y. Xiao, A. Gordon, and A. Yakovlev, “A C++ Program for the Cramér-Von Mises Two-Sample Test”, J. Stat. Soft., vol. 17, no. 8, pp. 1–15, Dec. 2006.
    67                                               # [2] T. W. Anderson "On the Distribution of the Two-Sample Cramer-von Mises Criterion," The Annals of Mathematical Statistics, Ann. Math. Statist. 33(3), 1148-1159, (September, 1962)
    68
    69                                               # [1, p. 3]
    70         1          3.0      3.0      0.0      l = math.lcm(m, n)
    71                                               # [1, p. 4], below eq. 3
    72         1          1.0      1.0      0.0      a = l // m
    73         1          1.0      1.0      0.0      b = l // n
    74                                               # eq. 9 in [2] and eq. 2 in [1]
    75         1          1.0      1.0      0.0      mn = m * n
    76         1          3.0      3.0      0.0      zeta = l ** 2 * (m + n) * (6 * s - mn * (4 * mn - 1)) // (6 * mn ** 2)
    77                                               # each g[u][v] is the frequency table of $g_{u, v}^+$ defined in [1, p. 6]
    78         1         46.0     46.0      0.0      g = [[None for _ in range(m + 1)] for _ in range(n + 1)]
    79        22          8.0      0.4      0.0      for u in range(n + 1):
    80       441        261.0      0.6      0.1          for v in range(m + 1):
    81                                                       # calculate g[u][v] by g[u - 1][v] and g[u][v - 1] with eq. 11 in [1]
    82       420        218.0      0.5      0.1              if u == 0:
    83        20          7.0      0.3      0.0                  if v == 0:
    84         1         11.0     11.0      0.0                      g[u][v] = np.array([[0, 1]])
    85                                                           else:
    86        19         23.0      1.2      0.0                      g[u][v] = g[u][v - 1].copy()
    87                                                       else:
    88       400        164.0      0.4      0.1                  if v == 0:
    89        20         36.0      1.8      0.0                      g[u][v] = g[u - 1][v].copy()
    90                                                           else:
    91       380        893.0      2.4      0.4                      v0, f0 = g[u][v - 1].T
    92       380        432.0      1.1      0.2                      v1, f1 = g[u - 1][v].T
    93       380     149054.0    392.2     71.4                      vi, i0, i1 = np.intersect1d(v0, v1, return_indices = True)
    94       760      11093.0     14.6      5.3                      g[u][v] = np.concatenate([
    95       380       8128.0     21.4      3.9                          np.array([vi, f0[i0] + f1[i1]]).T,
    96       380      17743.0     46.7      8.5                          np.delete(g[u][v - 1], i0, axis = 0),
    97       380      16579.0     43.6      7.9                          np.delete(g[u - 1][v], i1, axis = 0)
    98                                                               ])
    99       420        302.0      0.7      0.1              d = a * v - b * u
   100       420       3643.0      8.7      1.7              g[u][v][:, 0] += d * d
   101         1          1.0      1.0      0.0      last_g = g[n][m]
   102         1         98.0     98.0      0.0      return np.sum(last_g[last_g[:, 0] >= zeta][:, 1]) / math.comb(m + n, m)

fiveseven-lambda · 2022-03-08T03:37:49Z

I am very sorry to create a pull request with a wrong file. My local environment setup had a problem, and I failed to detect the error. I am now retrying to run the tests correctly on my computer, but it may takes a while. I am sorry to bother you.

fiveseven-lambda added the enhancement A new feature or improvement label Mar 2, 2022

tupui added scipy.stats Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org labels Mar 2, 2022

fiveseven-lambda mentioned this issue Mar 8, 2022

ENH: improve the computation time of stats.cramervonmises_2samp() #15709

Merged

mdhaber closed this as completed in #15709 Aug 2, 2022

mdhaber added this to the 1.10.0 milestone Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: The exact p-value calculation in `stats.cramervonmises_2samp` can be improved #15685

ENH: The exact p-value calculation in `stats.cramervonmises_2samp` can be improved #15685

fiveseven-lambda commented Mar 2, 2022

tupui commented Mar 2, 2022 •

edited

Loading

fiveseven-lambda commented Mar 2, 2022

tupui commented Mar 2, 2022

fiveseven-lambda commented Mar 2, 2022

tupui commented Mar 2, 2022 •

edited

Loading

fiveseven-lambda commented Mar 2, 2022

tupui commented Mar 2, 2022 •

edited

Loading

fiveseven-lambda commented Mar 2, 2022

tupui commented Mar 2, 2022

fiveseven-lambda commented Mar 2, 2022

tupui commented Mar 2, 2022

mdhaber commented Mar 2, 2022 •

edited

Loading

tupui commented Mar 2, 2022

mdhaber commented Mar 2, 2022

tupui commented Mar 2, 2022 •

edited

Loading

tupui commented Mar 2, 2022

fiveseven-lambda commented Mar 3, 2022

tupui commented Mar 3, 2022

fiveseven-lambda commented Mar 3, 2022

mdhaber commented Mar 3, 2022 •

edited

Loading

fiveseven-lambda commented Mar 5, 2022

fiveseven-lambda commented Mar 5, 2022

fiveseven-lambda commented Mar 7, 2022 •

edited

Loading

tupui commented Mar 7, 2022

fiveseven-lambda commented Mar 7, 2022 •

edited

Loading

tupui commented Mar 7, 2022

fiveseven-lambda commented Mar 7, 2022

fiveseven-lambda commented Mar 7, 2022

tupui commented Mar 7, 2022

fiveseven-lambda commented Mar 7, 2022

fiveseven-lambda commented Mar 8, 2022

ENH: The exact p-value calculation in stats.cramervonmises_2samp can be improved #15685

ENH: The exact p-value calculation in stats.cramervonmises_2samp can be improved #15685

Comments

fiveseven-lambda commented Mar 2, 2022

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context (e.g. screenshots, GIFs)

tupui commented Mar 2, 2022 • edited Loading

fiveseven-lambda commented Mar 2, 2022

Timing the Whole Code

Timing the Slow Part

tupui commented Mar 2, 2022

fiveseven-lambda commented Mar 2, 2022

Trying Cython

tupui commented Mar 2, 2022 • edited Loading

fiveseven-lambda commented Mar 2, 2022

Trying Pythran (failed)

tupui commented Mar 2, 2022 • edited Loading

fiveseven-lambda commented Mar 2, 2022

Trying Pythran (successful)

Another Problem of the Current Code

tupui commented Mar 2, 2022

fiveseven-lambda commented Mar 2, 2022

tupui commented Mar 2, 2022

mdhaber commented Mar 2, 2022 • edited Loading

tupui commented Mar 2, 2022

mdhaber commented Mar 2, 2022

tupui commented Mar 2, 2022 • edited Loading

tupui commented Mar 2, 2022

fiveseven-lambda commented Mar 3, 2022

tupui commented Mar 3, 2022

fiveseven-lambda commented Mar 3, 2022

mdhaber commented Mar 3, 2022 • edited Loading

fiveseven-lambda commented Mar 5, 2022

Eliminating floating-point arithmetic

Faster algorithm described in the paper

fiveseven-lambda commented Mar 5, 2022

fiveseven-lambda commented Mar 7, 2022 • edited Loading

tupui commented Mar 7, 2022

fiveseven-lambda commented Mar 7, 2022 • edited Loading

tupui commented Mar 7, 2022

fiveseven-lambda commented Mar 7, 2022

fiveseven-lambda commented Mar 7, 2022

tupui commented Mar 7, 2022

fiveseven-lambda commented Mar 7, 2022

fiveseven-lambda commented Mar 8, 2022

ENH: The exact p-value calculation in `stats.cramervonmises_2samp` can be improved #15685

ENH: The exact p-value calculation in `stats.cramervonmises_2samp` can be improved #15685

tupui commented Mar 2, 2022 •

edited

Loading

tupui commented Mar 2, 2022 •

edited

Loading

tupui commented Mar 2, 2022 •

edited

Loading

mdhaber commented Mar 2, 2022 •

edited

Loading

tupui commented Mar 2, 2022 •

edited

Loading

mdhaber commented Mar 3, 2022 •

edited

Loading

fiveseven-lambda commented Mar 7, 2022 •

edited

Loading

fiveseven-lambda commented Mar 7, 2022 •

edited

Loading