\section{Numerical mathematics}
All numerical and statistical computations are performed on computers

The numbers represented by a computer do not have all the same properties as real numbers!

\subsection{Floating point numbers}

\textbf{Floating point numbers}
\[
x_{FP} = \pm c 2^q
\]
Both the coefficient $c$ and exponent $q$ are integers

Underflow: there are real numbers for which the closest floating point number is 0

Overflow: a result that is larger than the largest floating number

IEEE Standard for Floating-Point Arithmetic (IEEE 754). 64 bit double precision numbers and 32 bit single precision numbers

\begin{minted}{Python}
import numpy as np
#imports the numpy package

#class numpy.finfo(dtype)
#Machine limits for floating point types.

print(np.finfo(np.float32))
#Prints machine parameters for float32
#E.g. precision = 6 and max = 3.4028235e+38
\end{minted}

Eps returned above is the smallest number of the given type for which $1 + eps \neq 1$ 

\subsubsection{Special values}
Infinity and not-a-number are special values.
\begin{minted}{Python}
import numpy as np
print('exp(1000) =', np.exp(1000))

## exp(1000) = inf
## 
## /usr/bin/python3:1: RuntimeWarning: overflow encountered in exp
\end{minted}

All arithmetic operations including NaN result in NaN!
\begin{minted}{R}
np.isnan()
#Testing if a given number is NaN
\end{minted}

\subsection{Some important properties of floating numbers}
Uneven distribution of numbers, more dense near zero

Using $\log p$ instead of $p$ can help with probabilities

Avoid computing $x-y$ for $x, y >> 0$ or $x, y << 0$

\subsection{Computing with probabilities}
It is often numerically advantageous to store their logarithms $\log p$ instead of raw values $p$.

\subsubsection{Logsumexp}
Consider a vector $\pi = (\pi_i) = (\log p_i)$, where $p_i > 0$. 
\[
\log S  = \log \sum_{i \in I} \exp(\pi_i)
\]
The overflow can be avoided by finding 
\[
M = {\textrm{max} \pi_i}
\]
for ${i \in I}$ and computing
\[
\log S = M + \log \sum_{i \in I} \exp(\pi_i - M)
\]

\subsection{Best practices and hints}
You should not do equality comparisons for floating point numbers because of possible rounding errors

Equality tests for special cases: 
\begin{minted}{Python}
    np.isfinite()
    np.isinf()
    np.isnan()
\end{minted}

\subsection{Lecture 01 exercises}

\subsubsection{1. Computing with floating point numbers}
Write a program to increment x = 0.0 by 0.1 100 times. Compute x - 10. How do you interpret the result?

\begin{minted}[breaklines]{Python}
x = 0.0
#Defining start value

for i in range(100):
    x += 0.1
    #Adding, x = x + 0.1
print(x - 10)

##-1.9539925233402755e-14
\end{minted}

\subsubsection{2. Computing probabilities}
Compute $\log(p)$ instead of $p$.

1. Probability of randomly drawing the 8191-letter HIV-1 genome

2. Probability that you need at least 5000 throws of a regular 6-sided die to get the first 6

3. Probability that $x = 200$ when $x \sim \textrm{Poisson(1)}$ 

\begin{minted}[breaklines]{Python}
import numpy as np
#1
# Probability of drawing one letter from 4-letter alphabet is 1/4
# Assuming probalities are independent we get Pr(genome) = 0.25^8191
logp_hiv = 8191*np.log10(0.25)
print(logp_hiv)

#2
# Probality for 4999 throws before first 6 is given 
#by geometric distribution with p = 1/6
logp_dice = 4999*np.log10(5/6)+np.log10(1/6)
print(logp_dice)

#3
# Probabitity for x=200 when x ~ #Poisson(1) is given by exp(-1)/200!.
# Logarithm of n! can be computed as the sum_i=1^n (log(i))
logp_poi = -np.log10(np.exp(1)) - sum([np.log10(i+1) for i in range(200)])
print(logp_poi)
\end{minted}

\subsubsection{3. Numerical algorithms}
Playing with sums and variances in Python

\begin{minted}[breaklines]{Python}
import numpy as np
import numpy.random as npr

#1
def two_pass(x):
    n = len(x)
    mean = 0
    for x_i in x:
        mean += x_i/n
    variance = 0
    for x_i in x:
        variance += 1/(n-1)*(x_i-mean)**2
    return variance

#2
def one_pass(x):
    n = len(x)
    square_sum_x = 0
    sum_x = 0
    for x_i in x:
        square_sum_x += x_i**2
        sum_x += x_i
    return (square_sum_x-sum_x**2/n)/(n-1)

#3
sample = npr.normal(1e9, 1, size=1000)
print(two_pass(sample)) 
# variance of sample computed using two-pass approach
print(one_pass(sample))  
# variance of sample computed using one-pass approach

#4
#enumerate allows us to loop over something and have an automatic counter.

def welfords(x):
    m = 0
    s = 0
    for k, x_i in enumerate(x):
        oldm = m
        m += (x_i-m)/(k+1)
        s+= (x_i-m)*(x_i-oldm)
    return s/k 
# note that indexing of array in python starts from 0 and ends to length(array)-1
# so at the end of for loop k=len(x)-1

print(welfords(sample))
\end{minted}

\subsubsection{4. Useful special functions}
\begin{minted}{Python}
#2
from scipy.special import gammaln
logp_poi = (-1 - gammaln(201))*np.log10(np.exp(1))
print(logp_poi)
\end{minted}

\section{Simulating pseudorandom numbers}
Computers are deterministic devices. Most practical applications use pseudorandom numbers - deterministic sequences of numbers that appear to be random.

Pseudorandom number generation is based on random number generators: random integers to uniform distribution, unifomr distribution into other distributions.

\subsection{Uniform pseudorandom number generation}
Uniform pseudorandom number generators (PRNGs) form the foundation for any computation requiring randomness.

Properties of a good random number generator:
\begin{enumerate}
    \item good statistical properties
    \item long period before the number repeat
    \item efficient and compact implementation
    \item usability in parallel computation
\end{enumerate}

Mersenne Twister is the most widely used PRNG, but it cannot be used in parallel computation applications

\subsection{Transformations for non-uniform pseudorandom number generation}
The simplest way to generate random numbers following a non-uniform distribution is to transform the generated numbers.

Affine transformations $A(x) = ax + b$ can adjust the bounds and scale of a distribution

Shape of distribution: non-linear transformations

\begin{Theorem}
If $p$ is a random variable with the absolutely continuous cumulative density function $\Theta_p (x)$ and $x \sim p$, then
\[
\Theta_p (x) \sim Uniform(0,1)
\]
\end{Theorem}
Applying the inverse transform $\Theta_p^{-1}(y)$ yields the following corollary

\begin{Theorem}
Under the assumptions of the previous theorem, if $y \sim Uniform(0,1)$, then
\[
\Theta_p^{-1}(y) \sim p
\]
\end{Theorem}

Taking an exponentially distributed variable $x\sim~ Exponential(\lambda)$ as an example, we have the CDF $\Theta(x) = 1 - exp(-\lambda x)$ whose inverse is 
\[
\Theta^{-1}(y) = - \frac{log(1 - y)}{\lambda}
\]
\subsubsection{Generating normally distributed random draws}
Normally distributed random variables can be generated using the Box-Muller transformation based on the polar coordinate representation of the bivariate normal distribution. 

\subsection{Rejection sampling}
Rejection sampling provides a generic alternative for sampling complicated targets for which a transformation is not available.

The density function is graphed onto a rectangular board and darts are thrown at it. The x-positions of these darts will be distributed according to the random variable's density. 

Rejection sampling can be used in higher dimensions.

Target distribution $f(x)$, dominated by the tractable envelope $M g(x)$. Sample for each proposal $x_i$ an additional $u \sim Uniform(0,1)$ and accept $x_i$ if $u M g(x_i) < f(x_i)$. Before applying rejection sampling, it is critical to check that $f(x) < M g(x)$.

\begin{figure}
  \centering
    \includegraphics[width=1\textwidth]{Pictures/figure2_RejectionSampling.png}
\end{figure}

\begin{minted}[breaklines]{Python}
import numpy as np
import numpy.random as npr

#f_pdf is the target distribution
#g_pdf is the proposal

#The rejection sampler
def RejectionSampler(f_pdf, g_pdf, g_sample, M, N):
    # Returns N samples following f_pdf() using proposal g-pdf()
    # Requirement: f_pdf(x) <= M*g_pdf(x) for all x
    i = 0
    #Initialize index
    x = np.zeros(N)
    #Initialize vector: vector with N zeroes
    while i < N:
        x_prop = g_sample()
        #We sample a x from the proposal
        u = npr.uniform(0, 1)
        if (u * M * g_pdf(x_prop)) < f_pdf(x_prop):
            # Accept the sample and record it
            x[i] = x_prop
            i += 1
    return x
    
import matplotlib.pyplot as plt
 
# Set the random seed
npr.seed(42)
 
# Define normal pdf
def normpdf(x, mu, sigma):
    return 1/np.sqrt(2*np.pi*sigma**2) * np.exp(-(x-mu)**2 / (2*sigma**2))
 
# Define target pdf as a mixture of two normals
def target_pdf(x):
    return 0.6*normpdf(x, -2, 0.8) + 0.4*normpdf(x, 2, 1)
 
# Define the proposal pdf and a function to sample from it
def proposal_pdf(x):
    return normpdf(x, 0, 2)
 
def sample_proposal():
    #randn returns a sample (or samples) from the “standard normal” distribution.
    #For random samples from N(mu, sigma^2), use: sigma * np.random.randn(...) + mu
    return 2*npr.randn()
 
# Define M
M = 3
N = 3000
mysample = RejectionSampler(target_pdf, proposal_pdf, sample_proposal, M, N)

#Two subplots, 1 row and 2 columns
fig, ax = plt.subplots(1, 2)
t = np.linspace(-6, 6, 100)
t2 = np.linspace(-10, 10, 100)
 
# Plot f(x) / (M * g(x)) to verify M is valid (the line should be below 1)
#First figure
ax[0].plot(t2, target_pdf(t2) / (M*proposal_pdf(t2)))
ax[0].set_title('$f(x) / (M \cdot g(x))$')
#Second figure with histogram and plot
ax[1].hist(mysample, 100, normed=True)
ax[1].plot(t, target_pdf(t), 'g')
ax[1].set_title('samples')
plt.show()
\end{minted}

\subsection{Inexact methods for non-uniform pseudorandom number generation}
Markov chain Monte Carlo can be used to approximately draw samples from a given distribution.

\subsection{Best practices}
\begin{enumerate}
    \item Use a good PRNG
    \item Be careful when using PRNGs in parallel
    \item Set and save your random seed
    \item Validate the output of your samplers carefully by testing on a known distribution
\end{enumerate}

\subsection{Lecture 2 exercises}
\subsubsection{1. Linear congruental random number generator}
Linear congruental RNGs are a simple class of algorithms that used to be very popular. As an example, we can consider the following example that is suggested in the POSIX.1-2001 standard as a possible implementation of the C language rand() function:
seed = seed * 1103515245 + 12345  
return (seed // 65536) \% 32768

1. Implement the above algorithm

2. Guess how many random number do you need to generate until you see a repeated number. Test your guess!

\begin{minted}[breaklines]{Python}
import numpy as np

def crand(seed):
    seed = seed * 1103515245 + 12345  
    value = (seed // 65536) % 32768
    return seed, value

seed = np.array([42], np.int64)
#Use a 64 bit integer to store the seed used in the iteration
print(seed)
print(crand(seed))
# create an array for storing the output sequence
values = np.zeros(2000, np.int64)
for i in range(2000):
    seed, val = crand(seed)
    if val in values:
        print("Repeat at step", i)
    values[i] = val
\end{minted}

\subsubsection{2. Testing random number generation with the cdf transformation}
Using Python tools for cdf and random number generation of the normal distribution, let's test the properties of the cdf transformation and uniform distribution.

$x \sim p$, $\Theta_p (x) \sim Uniform(0,1)$

\begin{minted}[breaklines]{Python}
# Initialise plotting in the notebook. These commands only need to be run once in each session.

import matplotlib.pyplot as plt  
# import plotting commands with conventional alias 'plt'
import numpy as np              
# import numpy with conventional alias 'np'
import numpy.random as npr      
# import numpy random number generators with conventional alias 'npr'
from scipy.stats import norm     
# import normal distribution functions from scipy.stats

x = npr.normal(size=10000)
z = norm.cdf(x)
plt.hist(z)

#2.
#First we generate random vector a with 10 000 values sampled from the uniform distribution, Uniform(0,1)
a = npr.uniform(size = 10000)
#The inverse cumulative density function is the percentile function ppf
b = norm.ppf(a)
#We calculate the cumulative density function Omega_p^-1(y)
plt.hist(b)
#We see that the the cumulative density function Omega_p^-1(y) converges toward the cumulative density 
#function Omega_p(x), in this case the normal distribution, as stated in corollary 2.1.

\end{minted}

\subsubsection{3. Mersenne Twister}
The most popular random number generator is the Mersenne Twister, which is also used by Python (random.random() and numpy.random.randint()). It can be used to generate uniformly distributed integers in a given interval, which can be transformed to floating point numbers in a given interval, as is done by the numpy.random.random().

1. Find the documentation of the NumPy random number generators.

2. Try the numpy.random.randint() and numpy.random.random() generators while setting the seed to a known value.

3. Test that you can recreate the same sequence of numbers by setting the seed to the same value.

4. Guess how many different random numbers you need to sample until you start seeing the same number repeated. Test your guess. (Hint: you may run into problems when running out of memory etc. It is advisable to increase the number you test relatively slowly (say by a factor of 2 at a time) to avoid starting runs that completely kill your computer for a long time.)

\begin{minted}[breaklines]{Python}
#2
import numpy as np
import numpy.random as npr
npr.seed(123)
draws1 = npr.random(10)

#3
npr.seed(123)
draws2 = npr.random(10)
print(draws1==draws2)

#4
for i in range(20,30):
    print("Testing length", 2**i)
    draws3 = npr.random(2**i)
    if len(np.unique(draws3)) != 2**i:
        print("Clash found at length", 2**i)
        break

#The loop stops if an element is repeated
#numpy.unique
#np.unique([1,1,2,2,3,3])
##array([1,2,3])
\end{minted}

\subsubsection{Simulating discrete distributions}
Many statistical applications depend on generating random numbers with a specific distribution. We will return to the topic many times with increasingly complex methods, but before that we start simple.

Write a program to simulate a fair 6-sided die using a uniform(0,1) RNG such as numpy.random.random()
Write a program to simulate a biased with a specified bias coin using a uniform(0,1) RNG.
Simulate a coin flip competition between a person using a fair coin and a person using a biased coin 100 times. How large a bias do you need for it to lead to 95\% probability of the person using the biased coin to obtain more heads than the person using the fair coin? What if you want 99\% probability?

\begin{minted}[breaklines]{Python}
import numpy.random as npr

#1
npr.seed(123)
def dice():
    draw = npr.random()
    if 0<=draw<1/6:
        return 1
    elif 1/6<=draw<2/6:
        return 2
    elif 2/6<=draw<3/6:
        return 3
    elif 3/6<=draw<4/6:
        return 4
    elif 4/6<=draw<5/6:
        return 5
    else: return 6

throws = [dice() for i in range(100)]

#2
def biased_coin(bias, n_draws):
    return 1*(npr.random(n_draws)<bias)
    
#3
# Consider that aim at each flip is to get 1
bias = 0.51
while True:
    fair_wins = 0 # Initialize wins with fair coin
    biased_wins = 0 # Initialize wins with biased coin
    # Consider we redo the experiment 100 times
    for i in range(100):
        if sum(biased_coin(bias, 100))>sum(biased_coin(0.5, 100)): biased_wins+=1
        else: fair_wins+=1
    if biased_wins>=95 or bias>=0.99: break
    else: bias += 0.01
print(bias)
\end{minted}

\subsubsection{Simulating continuous distributions with transformations}
1. Derive and implement a method for sampling from the distribution  Exponential($\lambda$)  using inverse cumulative density transformation (see course notes!).

2. Test your method by drawing 1000 samples with the values  $\lambda$=0.1,1,10 . Compute the mean and standard deviation of the samples for each case. Plot a histogram of the samples together with the density and check if they match.

3. Implement a method for sampling from the normal distribution using Box-Muller transformation
Test your method and plot the histogram of the samples together with the density and check if they match.

4. Check what is the largest value of a normally distributed random number that can be generated like this?

\begin{minted}[breaklines]{Python}
import numpy as np
import numpy.random as npr
import matplotlib.pyplot as plt
#1 
# The Cumulative Density Function (CDF) for exponential distribution is 1-exp(-lambda*x).
# Inverse CDF is given by x = -log(1-y)/lambda
def exp_rvs(l, n_draws): #lambda is a keyword in Python so we use l to denote the parameter of exp dist.
    y = npr.random(n_draws)
    draws = -np.log(1-y)/l
    return draws

#2
draw0 = exp_rvs(0.1, 1000)
draw1 = exp_rvs(1.0, 1000)
draw2 = exp_rvs(10.0, 1000)

def exp_pdf(x, l):
    return l*np.exp(-l*x)

%matplotlib inline

x0 = np.linspace(0.0, np.max(draw0), 100)
plt.plot(x0, exp_pdf(x0, 0.1))
plt.hist(draw0, bins=100, normed=True)

x1 = np.linspace(0.0, np.max(draw1), 100)
plt.plot(x1, exp_pdf(x1, 1.0))
plt.hist(draw1, bins=100, normed=True)

x2 = np.linspace(0.0, np.max(draw2), 100)
plt.plot(x2, exp_pdf(x2, 10.0))
plt.hist(draw2, bins=100, normed=True)

#3
def normal_rvs(mu, sigma, n_draws):
    u1 = npr.random(n_draws)
    u2 = npr.random(n_draws)
    return sigma*np.sqrt(-2*np.log(u1))*np.cos(2*np.pi*u2)+mu

#4
def normal_pdf(x, mu, sigma):
    return np.exp(-(x-mu)**2/(2*sigma**2))/(np.sqrt(2*np.pi)*sigma)

draws = normal_rvs(0.0, 1.0, 1000)
x = np.linspace(np.min(draws), np.max(draws), 1000)
plt.plot(x, normal_pdf(x, 0.0, 1.0))
plt.hist(draws, bins=100, normed=True)

#5
# double has 53 bits for the fraction, therefore the smallest possible value is 2^-53
# the largest value of Box-Muller is obtained when u1 is smallest, which is approximately
print("5)", np.sqrt(-2*np.log(2**-53)))
\end{minted}

\subsubsection{Basic rejection sampling}
$p(x) = 4x^3$, $x \in [0,1]$, uniform distribution on the interval [0,1] as the proposal.

\begin{minted}[breaklines]{Python}
%matplotlib inline
import numpy.random as npr
import numpy as np
import matplotlib.pyplot as plt

def RejectionSampler(f_pdf, g_pdf, g_sample, M, N):
    # Returns N samples following pdf f_pdf() using proposal g(x)
    # with pdf g_pdf() that can be sampled by g_sample()
    # Requirement: f_pdf(x) <= M*g_pdf(x) for all x
    i = 0
    rejects = 0
    x = np.zeros(N)
    while i < N:
        x_prop = g_sample()
        u = npr.uniform(0, 1)
        if (u * M * g_pdf(x_prop)) < f_pdf(x_prop):
            # Accept the sample and record it
            x[i] = x_prop
            i += 1
        else:
            rejects += 1
    print("Acceptance rate:", N/(N+rejects))
    return x

samples = RejectionSampler(lambda x: 4*x**3, lambda x: 
1, npr.random, 4, 10000)
plt.hist(samples, 30, normed=True)
t = np.linspace(0, 1, 30)
plt.plot(t, 4*t**3)
\end{minted}

\subsubsection{Rejection sampling in higher dimensions}
\begin{minted}[breaklines]{Python}
import numpy as np
import numpy.random as npr

npr.seed(123)

#Two_dimensional sampler
def B2_rvs(n_draws):
    n_tot = 0
    #Initialise empty n_draws x 2 matrix (n_draws rows, 2 columns)
    samples = np.empty([n_draws,2])
    i = 0
    while True:
        n_tot += 1
        x, y = 2*npr.random()-1, 2*npr.random()-1
        # Because both proposal and target are uniform 
        #distributions, by setting M = 1/Z of the #target,
        # we can accept all proposals that are inside 
        #the designated region
        if x**2+y**2<=1:
            samples[i] = [x,y]
            i += 1
        if i==n_draws: break
    accept_ratio = n_draws/n_tot
    return samples, accept_ratio

#2
draws, ar = B2_rvs(1000)
plt.scatter(draws[:,0], draws[:,1])
#draws[:,0] are all the rows in column 0
#draws[:,1] are all the rows in column 1
\end{minted}

\subsubsection{Rejection sampling of Beta(2,3)}
$p(x) = 12x(1-x)^2$, $0 \leq x \leq 1$. 
\begin{minted}[breaklines]{Python}
#First we define the rejection sampler
import numpy as np
import numpy.random as npr

#Rejection sampler 1D
def RejectionSampler(f_pdf, g_pdf, g_sample, M, N):
    # Returns N samples following pdf f_pdf() using proposal g(x)
    # with pdf g_pdf() that can be sampled by g_sample()
    # Requirement: f_pdf(x) <= M*g_pdf(x) for all x
    i = 0
    x = np.zeros(N)
    while i < N:
        x_prop = g_sample()
        u = npr.uniform(0, 1)
        if (u * M * g_pdf(x_prop)) < f_pdf(x_prop):
            # Accept the sample and record it
            x[i] = x_prop
            i += 1
    return x

import matplotlib.pyplot as plt
# Set the random seed
npr.seed(42)
# Define target pdf
def target_pdf(x):
    return 12*x*((1-x)**2)
# Define the proposal pdf and a function to sample from it
def proposal_pdf(x):
    return 1
def sample_proposal():
    return npr.uniform()

#We plot p(x)/q(x) to find a value for M
x = np.linspace(0,1,100)
plt.plot(target_pdf(x)/proposal_pdf(x))

#We define m
M = 2
N = 100000
mysample = RejectionSampler(target_pdf, proposal_pdf, sample_proposal, M, N)

#We plot the results
#subplots one row, two columns
fig, ax = plt.subplots(1, 2)
t = np.linspace(0, 1, 100)
t2 = np.linspace(0, 1, 100)
#First subplot
ax[0].plot(t2, target_pdf(t2) / (M*proposal_pdf(t2)))
ax[0].set_title('$f(x) / (M \cdot g(x))$')
#Second subplot
ax[1].hist(mysample, 100, normed=True)
ax[1].plot(t, target_pdf(t), 'g')
ax[1].set_title('samples')
plt.show()

#Compute the expectation
print(np.mean(mysample**5))
#The theoretical value
print((2/5)*(3/6)*(4/8)*(5/7)*(6/9))
\end{minted}

\subsubsection{Rejection sampling of a Gaussian with Laplace}
$q(x) = 1/2 \exp(-|x|)$ and $p(x) = \frac{1}{\sqrt{2 \pi}} \exp(-\frac{1}{2}x^2)$

\begin{minted}[breaklines]{Python}
#1.
import numpy as np
import math
import matplotlib.pyplot as plt
def laplacesampler(size):
    y = np.random.uniform(0,1,size)
    x = ()
    for i in range(0, len(y)-1):
        if y[i] <= 0.5:
            x = x + (math.log(2*y[i]),)
        if y[i] > 0.5:
            x = x + (-math.log(2*(1-y[i])),)
    return x
mysample = laplacesampler(10000)
plt.subplot(2,1,1)
plt.hist(mysample, 100, normed=True)
t = np.linspace(-10,10, 1000)
import scipy.stats as scs
plt.plot(t, scs.laplace.pdf(t), "g")

#2.
import numpy.random as npr
def RejectionSampler(f_pdf, g_pdf, g_sample, M, N):
    # Returns N samples following pdf f_pdf() using proposal g(x)
    # with pdf g_pdf() that can be sampled by g_sample()
    # Requirement: f_pdf(x) <= M*g_pdf(x) for all x
    i = 0
    x = np.zeros(N)
    while i < N:
        x_prop = g_sample()
        u = npr.uniform(0, 1)
        if (u * M * g_pdf(x_prop)) < f_pdf(x_prop):
            # Accept the sample and record it
            x[i] = x_prop
            i += 1
    return x

import matplotlib.pyplot as plt
# Set the random seed
npr.seed(42)
# Define target pdf
def target_pdf(x):
    return scs.norm.pdf(x)
# Define the proposal pdf and a function to sample from it
def proposal_pdf(x):
    return scs.laplace.pdf(x)
def sample_proposal():
    return npr.laplace()

#We plot p(x)/q(x) to find a value for M
x = np.linspace(-10,10,100)
plt.subplot(2,1,2)
plt.plot(target_pdf(x)/proposal_pdf(x))

#We define m
M = 2
N = 10000
mysample = RejectionSampler(target_pdf, proposal_pdf, sample_proposal, M, N)

#We plot the results
fig, ax = plt.subplots(1, 2)
t = np.linspace(-3, 3, 1000)
t2 = np.linspace(-3,3, 1000)
ax[0].plot(t2, target_pdf(t2) / (M*proposal_pdf(t2)))
ax[0].set_title('$f(x) / (M \cdot g(x))$')
ax[1].hist(mysample, 100, normed=True)
ax[1].plot(t, target_pdf(t), 'g')
ax[1].set_title('samples')
plt.show()
\end{minted}

\section{Multivariate normal distributions and numerical linear algebra}
The multivariate normal distribution is perhaps the most common distribution in statistics

$d$-dimensional multivariate normal $N( \mu, \Sigma)$, $d$-dimensional mean vector $\mu$ and a symmetric positive-definite $d \times d$ covariance matrix $\Sigma$

\subsection{Cholesky decomposition}
A symmetric positive definite matrix $\Sigma$ can be represented as
\[
\Sigma = LL^T
\]

\begin{minted}{Python}
    scipy.linalg.cholesky
    numpy.linalg.cholesky(.., lower=True)
\end{minted}

\subsection{Evaluating the multivariate normal density}
\[
\log p(x) = -\frac{d}{2}\log (2\pi) - \frac{1}{2}\log |det \Sigma| - \frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)
\]

\subsubsection{Determinant evaluation using the Cholesky decomposition}
\[
\log det \Sigma = 2 \sum_{i=1}^s\log (l_{ii})
\]

\subsubsection{Quadratic form evaluation with Cholesky}
\[
(x-\mu)^T \Sigma^{-1} (x - \mu) = \sum_{i=1}^d z_i^2
\]
where $z = (z_1, \dots, z_d)^T=L^{-1}(x-\mu)$

You should never explicitly compute $\Sigma^{-1}$!

\begin{minted}{Python}
import scipy.linalg as slg
z = slg.solve_triangular(L, x-mu, lower=True)
\end{minted}

\subsection{Simulating multivariate normal random draws}

\begin{theorem}
Assuming $x \~ N(0, I_d)$, we can obtain $y \~ N(\mu, \Sigma)$ by using the transformation
\[
y = Lx + \mu
\]
where L is the Cholesky decomposition of $\Sigma$
\end{theorem}

\subsection{Numerical integration}
Bayesian statistics provides a theoretically well-founded framework for answering many statistical questions. However, complicated integrals cannot be solved analytically.

Numerical integration concerns the approximation of definite integrals through finite sums:
\[
\int_V f(x) dx \approx \sum_i w_i f(x_i)
\]
\begin{enumerate}
\item Rectangle rule: piece-wise constant interpolation 
\item Trapezoidal rule: piece-wise linear interpolation
\item Simpson's rule: piece-wise quadratic interpolation
\end{enumerate}

No efficient general methods for high-dimensional numerical integration are known. Monte Carlo methods are often the best choice.

\subsubsection{Software for numerical integration}
\begin{minted}{Python}
scipy.integrate.quad
scipy.integrate.trapz
scipy.integrate.simps
\end{minted}

\subsection{Some best practices}
Always use Cholesky instead of explicit inverse

Always use Cholesky instead of explicit determinant

\subsection{Exercises}
\subsubsection{Generating multivariate normal random vectors}
We will test generating and visualising multivariate random vectors in 2D.

\begin{minted}[breaklines]{Python}
%matplotlib inline
import numpy as np
import numpy.random as npr
import matplotlib.pyplot as plt

rho = 0.5
#Covariance matrix
S = np.array([[1.0, rho], [rho, 1.0]])
A = np.linalg.cholesky(S)
x = npr.randn(2, 10000)
y = A @ x
plt.plot(y[0,:], y[1,:], '.', alpha=0.1)
# Set equal axes to make the plot easier to interpret
plt.axes().set_aspect('equal', 'datalim')
l, V = np.linalg.eig(S)
plt.plot(np.array([0, V[0,0]]), np.array([0, V[1, 0]]), 'k')
plt.plot(np.array([0, V[0,1]]), np.array([0, V[1, 1]]), 'k')
\end{minted}

\subsubsection{Multivariate normal density}
1. Compute $ log|det\Sigma|$  using the Cholesky decomposition of $\Sigma$  without using the determinant function. 

2. Evaluate the quadratic form  

3. Repeat the example with changing the value 

\begin{minted}[breaklines]{Python}
import scipy.linalg

#1
log_det = 2*sum(np.log(np.abs(np.diag(A))))
print(log_det)
print(np.linalg.slogdet(S)[1])

#2
x = np.array([1.0, 2.0])
mu = np.zeros(2)
A = np.linalg.cholesky(S)
res0 = scipy.linalg.solve_triangular(A, x-mu, lower=True)
res0 = res0 @ res0
print(res0)
print((x-mu) @ np.linalg.inv(S) @ (x-mu))

#3
sigma1 = sigma2 = 1
rhos = np.linspace(0.99, 1.0, 10, endpoint=False)


for rho in rhos:
    S = np.array([[sigma1**2, rho*sigma1*sigma2], [rho*sigma1*sigma2, sigma2**2]])
    A = np.linalg.cholesky(S)
    res0 = scipy.linalg.solve_triangular(A, x-mu, lower=True)
    res0 = res0 @ res0
    res1 = (x-mu) @ np.linalg.inv(S) @ (x-mu)
    print(np.abs(res0-res1))
    
\end{minted}

\subsubsection{High-dimensional normal random variables}
1. Distribution of the norms of the random vectors. Generate d-dimensional multivariate normal random variables for d=1, 3, 10, 30, 100, 300, 1000 and plot a histogram of their norms. What do you observe? How do these match the intuition of a low-dimensional bell-shaped curve with most of the probability near the origin.

2. Distribution of the angles between pairs of random vectors. Generate pairs of d-dimensional multivariate normal random variables for d=3, 10, 30, 100, 300, 1000 and plot a histogram of the angles between them. What do you observe?

\begin{minted}[breaklines]{Python}
%matplotlib inline
import numpy as np
import numpy.random as npr
import matplotlib.pyplot as plt

def gaussnorms(d, n=10000):
    x = npr.normal(size=(d, n))
    r = np.sqrt(np.sum(x**2, 0))
    return r

plt.hist(gaussnorms(1000))

def gaussprods(d, n=10000):
    x = npr.normal(size=(d, n))
    y = npr.normal(size=(d, n))
    r = np.sum(x*y, 0) / (np.sqrt(np.sum(x**2, 0)) * np.sqrt(np.sum(y**2, 0)))
    return r

plt.hist(gaussprods(100))
\end{minted}

\subsubsection{Numerical integration by quadrature in 1D}
$f(x) = \cos(||x||_2)p_x(x)$, $p_x (x) = (2 \pi)^{(-D/2)}\exp(-||x||_2^2 / 2)$, identity covariance matrix $I_D$

\begin{minted}[breaklines]{Python}
%matplotlib inline
import numpy as np
import numpy.random as npr
import scipy.integrate
import matplotlib.pyplot as plt

def unitgausspdf(x):
    D = np.alen(x)
    return (2*np.pi)**(-D/2) * np.exp(-0.5*np.linalg.norm(x)**2)

def f(x):
    return np.cos(np.linalg.norm(x)) * unitgausspdf(x)
    
t = np.linspace(-10, 10, 100)
y = np.zeros(t.shape)
for k in range(len(t)):
    y[k] = f(t[k])
plt.plot(t, y)

scipy.integrate.quad(f, 0, 1)
scipy.integrate.quad(f, 0, 3)
scipy.integrate.quad(f, 0, np.inf)
scipy.integrate.quad(f, 0, 10)
#scipy.integrate.quad calculates the definite integral
\end{minted}

\subsubsection{Monte Carlo numerical integration}
\begin{minted}[breaklines]{Python}
def mc_integral(lower, upper, n):
    try:
        d = len(lower)
        V = np.prod(np.abs(lower-upper))
        x = npr.random([n, d])*upper + lower
        return np.mean([f(x_i) for x_i in x])*V
    except: 
        V = upper-lower
        x = npr.random(n)*V+lower
        return np.mean([f(x_i) for x_i in x])*V
    
for i in range(1,5):
    print('I_mc={}, n={}'.format(mc_integral(0.0, 10.0, 10**i), 10**i))
print('I_scipy={}'.format(scipy.integrate.quad(f, 0, 10)))
\end{minted}

\subsubsection{Numerical evaluation of multivariate normal probabilities}
\[
\ln p(x; \mu, \Sigma) = - \frac{d}{2} \ln(2 \pi) - \frac{1}{2}\ln |det\Sigma| - \frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)
\]

\begin{minted}[breaklines]{Python}
import math
import numpy as np
import scipy.linalg as slg

def logpdf(x, mu,  Sigma):
    L = np.linalg.cholesky(Sigma)
    z = slg.solve_triangular(L, x-mu, lower=True)
    value = -(len(x)/2)*math.log(2*math.pi) - 1/2*math.log(abs(np.linalg.det(Sigma))) - 1/2 * np.dot(z,z)
    return value
#i.
print(logpdf(np.array([0,0]), np.array([0,0]), np.matrix( ((4,2*0.8), (2*0.8, 1)) )))
#ii.
print(logpdf(np.array([0,0]), np.array([0,0]), np.matrix( ((4,2*0.999), (2*0.999, 1)) )))
#iii..
print(logpdf(np.array([1,1]), np.array([0,0]), np.matrix( ((4,2*0.999), (2*0.999, 1)) )))
#iv.
print(logpdf(np.array([0,0]), np.array([0,0]), np.matrix( ((4,2*(-0.999)), (2*(-0.999), 1)) )))
\end{minted}

\section{Week 1}

\subsubsection{Input ranges for overflow and underflow}
Find the largest integer for which exp() over double precision floating point numbers (float64) return a finite value

Wirte a program to determine the smallest integer $x$ for which $\phi(x) = 1$ when using double precision floating point arithmetic

\begin{minted}[breaklines]{Python}
#i.
import math
import numpy as np
import sys

#We can cheat and look for the maximum value directly 
print(np.finfo(np.float64).max)
print(math.log(np.finfo(np.float64).max))
#709

#ii.
i = 0
while True:
    a = 1/(1+math.exp(np.float64(-i)))
    if a == 1:
        break
    i = i + 1
print(i)
#37
\end{minted}

\subsubsection{Numerical computation of binomial probabilities}
\[
f(l, u, n, p) = \sum_{i=l}^n {n \choose i} p^i (1-p)^{n-i}
\]

\begin{minted}[breaklines]{Python}
import scipy as sp
import math 
def f(l, u, n, p):
    summa = 0
    for i in range(l, u):
        value = (math.factorial(n)/(math.factorial(i)*math.factorial(n-i)))*(p**i)*((1-p)**(n-i))
        summa = summa + value
    return summa
#i.
print(f(0,5,10,0.25))
#ii.
print(f(10,20,20,0.25))
#iii.
print(f(40,60,100,0.25))
#iv
print(f(75,100,100,0.25))
\end{minted}