# Stochastic Simulation

*Winter Semester 2024/25*

08.11.2024

Prof. Sebastian Krumscheid<br>
Assistants: Stjepan Salatovic, Louise Kluge

<h3 align="center">
Exercise sheet 01
</h3>

---

<h1 align="center">
Random Number Generation
</h1>

In [14]:
import matplotlib.pylab as plt
import numpy as np

from scipy.stats import uniform
from ipywidgets import interact

## Exercise 1

Consider **SciPy's** default **uniform** random number generator
(RNG) `uniform` within the Statistics module `scipy.stats` and use it to generate a sequence of numbers $U_1, U_2, \dots, U_n$. Have a look at its [documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html) and the [full list](https://docs.scipy.org/doc/scipy/reference/stats.html) of available distributions and other statistical functionalities in `scipy.stats`. Consider different values of $n$, for example $n=25,100,10^3,10^5$, and address the following points:

1. Plot the cumulative distribution function (CDF) of the theorized uniform distribution $\mathcal{U}(0,1)$ together with the empirical CDF of the data. Furthermore, produce a Q-Q plot of the data. Use both plots to assess the quality of the sequence with respect to the theorized $\mathcal{U}(0,1)$ distribution. Describe your observations.

**Hint:** The function [`plt.step`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.step.html) might come in handy.

In [15]:
def cdf(seq: np.array, x: np.array) -> np.array:
    """Computes the empirical CDF of `seq` and evaluates in `x`."""
    # TODO
    return

2. Implement the Kolmogorov-Smirnov test to ascertain whether the empirical CDF of the sample $U_1, U_2, \dots, U_n$ matches the theoretical CDF of the $\mathcal{U}(0,1)$ distribution at level $\alpha=0.1$. That is, we reject the null hypothesis $H_0$ at level $\alpha>0$ that the sample $U_1,\dots,U_n\overset{\text{iid}}{\sim} \mathcal{U}(0,1)$ if $\sqrt{n}D_n>K_{\alpha,n}$, where $D_n={\sup}_{x\in \mathbb{R}}| \hat F(x)- F(x)|,$ and $K_{\alpha,n}$ is such that $\mathbb{P}(\sqrt{n}D_n>K_{\alpha,n})<\alpha$.

    **Note:** It is known that the appropriately scaled test statistic $D_n$ converges in distribution to a Kolmogorov random variable $K_{\alpha, \infty}$ independently of $F$, where $\mathbb{P}(K_{\alpha, \infty} \le x) = 1+2\sum_{j=1}^\infty{(-1)}^je^{-2j^2x^2}$, $x>0$. This asymptotic result can then be used to compute the required $1-\alpha$ quantiles of $K_{\alpha, n}$ approximately by using $K_{\alpha,n}\simeq K_{\alpha,\infty}$ for $n \gg 1$. It is however also possible to characterize the distribution of $D_n$ directly, which is useful for small values of $n$. The following Table presents some of these pre-asymptotic $1 − \alpha$ quantiles $K_{\alpha, n}$.
    
| | $\alpha$ | $0.20$ | $0.10$ | $0.05$ | $0.01$ |
|---|---|---|---|---|---|
|$n$||||||
| 1 || 0\.90 | 0\.95 | 0\.98 | 0\.99 |
| 2 || 0\.96 | 1\.10 | 1\.19 | 1\.32 |
| 3 || 0\.97 | 1\.11 | 1\.23 | 1\.44 |
| 4 || 0\.98 | 1\.12 | 1\.24 | 1\.46 |
| 5 || 1\.01 | 1\.14 | 1\.25 | 1\.50 |
| 6 || 1\.00 | 1\.15 | 1\.27 | 1\.52 |
| 7 || 1\.01 | 1\.16 | 1\.30 | 1\.53 |
| 8 || 1\.02 | 1\.16 | 1\.30 | 1\.53 |
| 9 || 1\.02 | 1\.17 | 1\.29 | 1\.53 |
| 10 || 1\.01 | 1\.17 | 1\.30 | 1\.55 |
| 11 || 1\.03 | 1\.16 | 1\.29 | 1\.56 |
| 12 || 1\.04 | 1\.18 | 1\.32 | 1\.56 |
| 15 || 1\.05 | 1\.16 | 1\.32 | 1\.55 |
| 20 || 1\.03 | 1\.16 | 1\.30 | 1\.57 |
| 30 || 1\.04 | 1\.20 | 1\.31 | 1\.59 |
| 35 || 1\.06 | 1\.24 | 1\.36 | 1\.60 |
| 40 || 1\.08 | 1\.20 | 1\.33 | 1\.58 |
| 45 || 1\.07 | 1\.21 | 1\.34 | 1\.61 |
| $n>45$ || 1.07 | 1.22 | 1.36 | 1.63 |
    
**Tip:** Instead of the table, you can also use [`scipy.stats.kstwo`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstwo.html) to get pre-asymptotic $1-\alpha$ quantiles $K_{\alpha,n}$. For instance, to get the $1 - \alpha$ quantile of the two-sided Kolmogorov-Smirnov test statistic with $n$ samples, use the command `np.sqrt(n) * kstwo(n).ppf(1 - alpha)`.

The function `ppf` is the percent point function (inverse of CDF) of a random variable in **SciPy**. For a sample of size $n > 45$, use the asymptotic distribution of the test statistic for better results: `scipy.stats.kstwobign` (no need for specifying `n` here).

In [16]:
def kolmogorov_smirnov(seq: np.array, alpha: float=.1) -> bool:
    """
    Kolmogorov Smirnov test for data `seq` and significance `alpha`.
    Returns `True` if H0 cannot be rejected and `False` if rejected at level `alpha`.
    """
    # TODO
    return

3. Implement the $\chi^2$ goodness of fit test to ascertain whether the sequence $U_1, U_2, \dots, U_n$ is uniformly distributed. A description of such method can be found on section 8.7.4 of _Handbook of Monte Carlo Methods_ (see also Section 1.2.1 of the lecture notes). 
  
    **Hint:** Again, you can use the `ppf` function of the `scipy.stats.chi2` class to compute quantiles of a $\chi^2$ distribution.

In [17]:
def chi_squared(seq: np.array, K: int=10, alpha: float=.1) -> bool:
    """
    Chi-Squared test for data `seq` and significance `alpha`.
    Returns `True` if H0 cannot be rejected and `False` if rejected at level `alpha`.
    """
    # TODO
    return

4.  Repeat the tests in points 2. and 3. for different values of $\alpha$. What do you observe? Explain your findings.

In [18]:
# TODO

## Exercise 2

Implement the linear congruential generator (LCG) 
\begin{equation*}
  X_k = (aX_{k-1} + b) \bmod m\;,\quad U_k := \frac{X_k}{m}\;,
\end{equation*}
with $a=3$, $b=0$, and $m=31$.

In [19]:
def LCG(x0: float=1, n: int=100, a: int=3, b: int=0, m: int=31) -> np.array:
    """
    Linear congruential generator: Generates `n` uniformly distributed numbers.
    """
    # TODO
    return 

1. Use your LCG procedure to generate a sequence $U_1, U_2, \dots, U_n$ and repeat Exercise 1. Discuss your results.

In [20]:
# TODO

2. Explain why one would expect that the Serial test (with $d=2$, say) is an appropriate test to scrutinize the LCG. Support your explanation by applying the Serial test at level $\alpha=0.1$ to sequences (for various values of $n$) from both the LCG and from the default `scipy.stats` RNG `uniform`.

In [21]:
def serial_test(seq, d=2, alpha=.1):
    """
    Serial test for data `seq` and significance `alpha`.
    Returns `True` if H0 cannot be rejected and `False` if rejected at level `alpha`.
    """
   # TODO
    return

3. Implement the Gap test. Apply the test to both a sequence obtained from the default `scipy.stats` RNG `uniform` and to a sequence generated by the LCG. What do you observe?

In [22]:
def gap_test(seq, alpha=.1, a=0., b=.5, r=5):
    """
    Gap test for data `seq` and significance `alpha`.
    Returns `True` if H0 cannot be rejected and `False` if rejected at level `alpha`.
    """
    # TODO
    return

**Comment on built-in functions**

Many of the tasks that need to be implemented for this exercise sheet already exist as **Python** built-in functions. 
For example, the empirical CDF can be conveniently plotted using the `ECDF` function that is available in the [statsmodel](https://pypi.org/project/statsmodels/) package. The [Kolmogorv-Smirnov](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html) and [$\chi^2$ test](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html) are also available. 
There is, of course, very little reason to reinvent the wheel, and we strongly encourage you to use these built-in functions in future exercise sheets, if not stated otherwise. However, before naively relying on built-in functions, it is important to understand the underlying mathematical procedure.