# Stochastic Simulation

*Winter Semester 2023/24*

10.11.2023

Prof. Sebastian Krumscheid<br>
Asstistant: Stjepan Salatovic

<h3 align="center">
Exercise sheet 02
</h3>

---

<h1 align="center">
Random Variable Generation
</h1>

In [1]:
import matplotlib.pylab as plt
import numpy as np
import time

from ipywidgets import interact
from scipy.integrate import quad
from scipy.stats import linregress
from scipy.stats import uniform, norm, expon, bernoulli, burr12
from typing import Callable, Optional, Tuple
from tqdm.notebook import tqdm

The following `cdf` function is the one you implemented in Lab 1 and you will need it later in this Lab, too.

In [2]:
def cdf(seq: np.array, x: np.array) -> np.array:
    """Computes the empirical CDF of `seq` and evaluates in `x`."""
    n = len(seq)
    indices = np.searchsorted(np.sort(seq), x, side='right')
    y = np.concatenate(([0], np.arange(1, n + 1) / n))
    return y[indices]

## Exercise 1

Consider the random variable $X$ with cumulative distribution function
(CDF) $F\colon[-1,3]\to [0,1]$ given by

\begin{equation*}
  F(x) = \begin{cases}
    0\;,& -1\le x<0\;,\\
    1-\frac{2}{3}e^{-x/2}\;,& 0\le x< 2\;,\\
    1\;,& 2\le x\le 3\;.
  \end{cases}
\end{equation*}

Implement the inverse-transform method to generate $n$ independent
copies of the random variable $X$. Assess the quality of the
realizations by comparing the empirical CDF with the theoretical CDF
$F$ for various values of $n$.

In [3]:
# Defines the cdf F and pdf f

F = lambda u: (u < 2) * (u >= 0) * (1 - 2 * np.exp(-u / 2.) / 3.) + (u >= 2) * (u <= 3)
f = lambda u: (u >= 0) * (u <= 2) * np.exp(- u / 2.) / 3.

In [4]:
def inverse_transform(inv_cdf: Callable, n: int) -> np.array:
    """
    Generates sequence of `n` random numbers using the inverse-transform method.
    """
    # TODO
    return

## Exercise 2

Consider the random variable $X$ with probability density function
(PDF) $f$, which is only known up a multiplicative
constant. Specifically, let $f(x) := k\tilde{f}(x)$ with

\begin{equation*}
  \tilde{f}(x) := \bigl(\sin^2(6x)+3\cos^2(x)\sin^2(4x)+1\bigr)e^{-x^2/2}\;,
\end{equation*}

where the normalization constant
$k = \left(\int_\mathbb{R}\tilde{f}(x)\,dx\right)^{-1}$ is unknown.

1. Argue that $\tilde{f}(x)$ can be bounded by $C\phi(x)$, where
  $C$ is an appropriately chosen constant and $\phi$ denotes the PDF
  of the standard normal distribution, that is
  $\phi(x) = e^{-x^2/2}/\sqrt{2\pi}$. Find an acceptable value for
  $C$.

2. Generate $n=10^4$ random variables according to the PDF $f$ using the Acceptance-Rejection Method. 

    **Hint:** Use `scipy.stats.norm.rvs()` to sample normally distributed random variables in **Python**.

In [5]:
def acceptance_rejection(n: int) -> Tuple[np.array, float]:
    """
    Generates sequence of `n` random numbers using the Acceptance-Rejection method.
    Returns the sequence as well as the acceptance rate.
    """
    # TODO
    return

3. Derive an estimate of the normalization constant $k$, using your procedure's acceptance probability. Compare it to the exact value $k=0.1696542774$. Furthermore, compare the empirical CDF to the theorized, normalized CDF $F(x) = \int_{-\infty}^xf(u)\, du$.

   **Hint:** Here, you should take advantage of the fact that your AR method also returns the acceptance rate in addition to the actual samples. The theorized PDF and CDF is given down below.

In [6]:
def PDF(x: float) -> float:
    e = lambda x: np.exp(x)
    f= (np.sin(6*x)**2+3*np.cos(x)**2*np.sin(4*x)**2+1)*np.exp(-x**2/2)
    mass=(-4-3*e(22)-6*e(40)-3*e(54)+6*e(70)+18*e(72))*np.sqrt(np.pi*0.5)/(4*e(72))
    return f/mass;

def CDF(x: float) -> float:
    f=quad(PDF, -10, x)[0]
    return f

## Exercise 3

An element $(x,y) \in \mathbb{R}^2$ may be represented by its polar coordinates $(\rho,\Theta) \in [0,\infty) \times [0,2\pi)$ defined by

$$
\rho(x,y) = \sqrt{x^2 + y^2},
$$
and
$$
  \Theta(x,y) = \begin{cases}
    \tan^{-1} \left({\frac{y}{x}}\right) & \text{if } x>0 \text{ and } y \ge 0,\\
    \tan^{-1} \left({\frac{y}{x}}\right) +\pi & \text{if } x<0,\\
    \tan^{-1} \left({\frac{y}{x}}\right) +2 \pi & \text{if } x>0 \text{ and } y \le 0,\\
    0 & \text{if } x=y=0,
    \end{cases}
$$

where $\tan^{-1}:\mathbb{R} \to (-\pi/2, \pi/2)$.  

1. Show by calculation that if the random variables $X,Y \sim N(0,1)$ are independent, then the polar coordinate representation of $(X,Y)$ satisfies

    $$
    \rho^2 \sim \exp(1/2) \quad \text{and} \quad \Theta \sim U([0,2\pi)).
    $$

    Show further that $\rho$ and $\Theta$ are independent.

2. In the opposite direction, show that if $\rho^2 \sim \exp(1/2)$ and $\Theta \sim U([0,2\pi))$ and $\rho$ and $\Theta$ are independent, then the Cartesian representation of the polar coordinates $(\rho, \Theta)$,

    $$
    X = \rho \cos(2\pi \Theta) \quad \text{and} \quad Y =\rho \sin(2 \pi \Theta),
    $$

    satisfies $X,Y \sim N(0,1)$ with $X$ and $Y$ being independent.

3. In order to construct an Acceptance-Rejection (AR) method for generating standard normal random variables consider the auxiliary PDF $g(x) = \frac{1}{2} e^{-|x|}$. For your auxiliary PDF, determine a $C\ge1$ such that
  
    $$
    \frac{e^{-x^2/2}}{\sqrt{2\pi}} \le C g(x), \qquad \forall x \in \mathbb{R}.
    $$

    **Hint:** See lecture notes for how to sample from the PDF $g$.

4. Implement the above AR method and the Box-Muller method and use the built-in timer function `time()` within `time` module, to compare the performance of the respective methods in terms of runtime per sample.

    **Hint:** To measure the time of your code, you can save start and end time and calculate the elapsed time using their difference:
   ```python
   start = time.time()
   # some code
   end = time.time()

   elapsed_time = end - start
   ```

In [7]:
def acceptance_rejection_normal(n: int) -> Tuple[np.array, float]:
    """
    Generates sequence of `n` standard normal random variables using the Acceptance-Rejection method.
    Returns the sequence as well as the acceptance rate.
    """
    # TODO
    return

In [8]:
def box_muller(n: int) -> np.array:
    """
    Generates sequence of `n` standard normal random variables using the Box-Muller method.
    """
    # TODO
    return

## Exercise 4

The density of a random variable $X$ may be approximated by a mixed distribution generated by so called kernel density estimation. In its simplest form, kernel density estimation consists of the following steps:

1. Choose a so called kernel function $K \in \{f: \mathbb{R} \to \mathbb{R}_+ \mid \|f\|_{L^1(\mathbb{R})} =1 \}$ (so the kernel is itself a PDF).
2. For some $n \in \mathbb{N}$, generate a sequence of i.i.d. random variables $X_1,X_2,\ldots,X_n$ from the distribution of $X$.
3. Define the kernel density estimator by

    $$
    f(x)= \frac{1}{n} \sum_{i=1}^n K_\delta(x-X_i),
    $$

    where

    $$
    K_\delta(x-X_i) := \frac{1}{\delta} K\left(\frac{x-X_i}{\delta}\right), \quad \delta >0,
    $$

    and $\delta$ is an appropriately chosen scaling parameter relating to the width of the kernel.
   
The Burr type XII distribution has the CDF

$$
F(x; \alpha,c,k) =
\begin{cases}
  0 & x \le 0\\
  1- \left(1 + \left(\frac{x}{\alpha}\right)^c \right)^{-k}, \quad x \in (0,\infty),
  & x>0,
\end{cases}
$$

with parameters $\alpha,c,k>0$.

1. Consider the Gaussian density kernel function
    $$
    K(x) = \frac{1}{\sqrt{2 \pi}} e^{-x^2/2},
    $$
    and implement a kernel density estimator for

    $$
    X \sim \text{BurrXII}(\alpha=1, c=2, k=4).
    $$

    **Hint:** Samples of $X$ can be obtained using [`scipy.stats.burr12`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.burr12.html). 

In [9]:
def kde(data: np.array, kernel: Callable, delta: Optional[float]=None) -> Callable:
    """
    Kernel density estimator for `data` using a `kernel` and optional `delta`.
    Returns a Callable.
    """
    # TODO
    return

2. For varying $n =[100,10^5]$ and $\delta(n) =n^{-1/5}$ compute kernel density estimators
    $$
    f_{n}(x) = \frac{1}{n} \sum_{i=1}^n K_{\delta(n)}(x-X_i),
    $$
    and plot $f_n$ and the PDF of BurrXII(1,2,4). Furthermore, for each value of $n$ sample $N=200.000$ i.i.d. random variables $Y^{n}_i \sim f_n(x)$ by means of the composition method. 

    **Hint:** The `numpy.random.randint` built-in function might come handy.

In [10]:
# TODO

3. Study how well the empirical CDF of $Y_{1}^{n},Y_{2}^{n}, \ldots,Y_{N}^{n}$, which we denote by $F_{n}^N(x)$, converges towards $F(x; 1,2,4)$. That is, investigate how fast

    $$
    D_n^N=\sup_{x \in [-2,5]} |F_{n}^N(x) - F(x;1,2,4)|
    $$

    decreases as $n$ increases.

In [11]:
# TODO

## Exercise 5

The PDF for the Cauchy distribution centered at $x_0 \in \mathbb{R}$ and with scale parameter $\gamma \in \mathbb{R}$ is given by

$$
f(x;x_0,\gamma) = \frac{1}{\pi\gamma\left(1 + \left(\frac{x-x_0}{\gamma}\right)^2\right)}.
$$

1. Show by integration of the PDF that the CDF of the Cauchy distribution with $x_0=0$ and $\gamma=1$ is given by

    $$
    F(x;0,1) = \tan^{-1}(x)\,/\,\pi +1/2,
    $$

    for $\tan^{-1}:\mathbb{R} \to (-\pi/2, \pi/2)$. 

2. Show that if $X_1,X_2 \sim N(0,1)$ are independent, then $X = X_1/X_2$ is Cauchy distributed with $x_0=0$ and $\gamma=1$.

3. Based on the information from exercise 5.1 and 5.2, describe and implement two algorithms for sampling $X \sim F(\cdot;0,1)$. Compare the performance of the respective algorithms in terms of runtime per sample.

In [12]:
def cauchy_by_inversion(n: int) -> np.array:
    """
    Returns `n` Cauchy distributed random variables by the inversion-method.
    """
    # TODO
    return

In [13]:
def cauchy_by_gauss(n:int) -> np.array:
    """
    Returns `n` Cauchy distributed random variables by the Gaussian ratio approach.
    """
    # TODO
    return

4. It is possible to extend the preceding methods to sample from $X \sim F(\cdot; x_0,\gamma)$ for any $x_0 \in \mathbb{R}$ and $\gamma>0$. How? 

## Exercise 6 (optional, no solution)

Let $\boldsymbol{X} = (X_1,X_2,\dots,X_n)^T\sim \mathcal{U}\bigl({(0,1)}^n\bigr)$ and denote by $X_{(1)}\le X_{(2)}\le\dots\le X_{(n)}$ the ordered sample (i.e. the order statistic).

1. Implement a procedure to generate the order statistic $X_{(1)}\le X_{(2)}\le\dots\le X_{(n)}$, $n\in\mathbb{N}$, based on *sorting* a collection $\boldsymbol{X}$ of i.i.d. uniform random variables.

2. Prove the following properties:
    1. Show that
    $$
    \mathbb{P}\bigl(X_{(j)}\le  x\bigr) = \sum_{i=j}^n\binom{n}{i}x^i{(1-x)}^{n-i}\;,
    $$
        for any $x\in (0,1)$. Furthermore, use this fact to infer the distribution of the random variable $\max\{X_1,X_2,\dots,X_n\}$.
    2. Then show that
    $$
    \mathbb{P}\bigl(X_{(j)}\le  z \, \bigl\vert\bigr. \, X_{(k)}=x_k\;,\;\forall\,k>j\bigr) = {\left(\frac{z}{x_{j+1}}\right)}^{j}\;
    $$
        for all $z\le x_{j+1}$ and any $j<n$.

3. Use the facts above to implement a procedure that enables generating copies from the order statistic $X_{(1)}\le X_{(2)}\le\dots\le X_{(n)}$ *without sorting*. Compare this procedures and the procedure based on sorting with respect to time for various values of $n$. What do you observe?

4. Implement a procedure that generates uniform random vectors in the unit simplex
$$
\mathcal{S} = \Bigl\{(x_1,x_2,\dots,x_n)^T\in\mathbb{R}^n\colon x_i\ge 0\;\forall\,i\;,\;\sum_{i=1}^nx_i\le 1\Bigr\}\;.
$$
Assess your sampling procedure by visualizing $N=1000$ sampling points for $n=3$.

**Hints:**
- [3D scatter plots](https://matplotlib.org/stable/gallery/mplot3d/scatter3d.html) in Python.
- Notice that a vector, whose coordinates are distributed according to the order statistic of a collection of i.i.d. $\mathcal{U}(0,1)$, takes values in the "wedge"

    $$
    \mathcal{W}=\bigl\{(u_1,u_2,\dots,u_n)^T\in\mathbb{R}^n\colon 0\le u_i\le 1\;\forall\,i\;,\; u_1\le u_2\le\dots \le u_n\}. 
    $$
    
    The unit simplex $\mathcal{S}$ is then simply the image of the "wedge" $\mathcal{W}$ under the linear transformation $\boldsymbol{x} = A\boldsymbol{u}$ where
    
    $$
    A = \begin{pmatrix}1 & 0 &\dots & 0\\
    		-1 & 1 & \dots & 0\\
    		\vdots & \ddots & \ddots & \vdots\\
    		0 & \dots & -1 & 1\end{pmatrix}\;.
    $$