<div class='alert alert-warning'>

SciPy's interactive examples with Jupyterlite are experimental and may not always work as expected. Execution of cells containing imports may result in large downloads (up to 60MB of content for the first import from SciPy). Load times when importing from SciPy may take roughly 10-20 seconds. If you notice any problems, feel free to open an [issue](https://github.com/scipy/scipy/issues/new/choose).

</div>

In [None]:
import numpy as np
from scipy.special import smirnov
from scipy.stats import norm

Show the probability of a gap at least as big as 0, 0.5 and 1.0 for a
sample of size 5.


In [None]:
smirnov(5, [0, 0.5, 1.0])

array([ 1.   ,  0.056,  0.   ])

Compare a sample of size 5 against N(0, 1), the standard normal
distribution with mean 0 and standard deviation 1.

`x` is the sample.


In [None]:
x = np.array([-1.392, -0.135, 0.114, 0.190, 1.82])

In [None]:
target = norm(0, 1)
cdfs = target.cdf(x)
cdfs

array([0.0819612 , 0.44630594, 0.5453811 , 0.57534543, 0.9656205 ])

Construct the empirical CDF and the K-S statistics (Dn+, Dn-, Dn).


In [None]:
n = len(x)
ecdfs = np.arange(n+1, dtype=float)/n
cols = np.column_stack([x, ecdfs[1:], cdfs, cdfs - ecdfs[:n],
                       ecdfs[1:] - cdfs])
with np.printoptions(precision=3):
   print(cols)

[[-1.392  0.2    0.082  0.082  0.118]
 [-0.135  0.4    0.446  0.246 -0.046]
 [ 0.114  0.6    0.545  0.145  0.055]
 [ 0.19   0.8    0.575 -0.025  0.225]
 [ 1.82   1.     0.966  0.166  0.034]]

In [None]:
gaps = cols[:, -2:]
Dnpm = np.max(gaps, axis=0)
print(f'Dn-={Dnpm[0]:f}, Dn+={Dnpm[1]:f}')

Dn-=0.246306, Dn+=0.224655

In [None]:
probs = smirnov(n, Dnpm)
print(f'For a sample of size {n} drawn from N(0, 1):',
      f' Smirnov n={n}: Prob(Dn- >= {Dnpm[0]:f}) = {probs[0]:.4f}',
      f' Smirnov n={n}: Prob(Dn+ >= {Dnpm[1]:f}) = {probs[1]:.4f}',
      sep='\n')

For a sample of size 5 drawn from N(0, 1):
 Smirnov n=5: Prob(Dn- >= 0.246306) = 0.4711
 Smirnov n=5: Prob(Dn+ >= 0.224655) = 0.5245

Plot the empirical CDF and the standard normal CDF.


In [None]:
import matplotlib.pyplot as plt
plt.step(np.concatenate(([-2.5], x, [2.5])),
         np.concatenate((ecdfs, [1])),
         where='post', label='Empirical CDF')
xx = np.linspace(-2.5, 2.5, 100)
plt.plot(xx, target.cdf(xx), '--', label='CDF for N(0, 1)')

Add vertical lines marking Dn+ and Dn-.


In [None]:
iminus, iplus = np.argmax(gaps, axis=0)
plt.vlines([x[iminus]], ecdfs[iminus], cdfs[iminus], color='r',
           alpha=0.5, lw=4)
plt.vlines([x[iplus]], cdfs[iplus], ecdfs[iplus+1], color='m',
           alpha=0.5, lw=4)

In [None]:
plt.grid(True)
plt.legend(framealpha=1, shadow=True)
plt.show()