# Integration and Sampling

In this notebook we first investigate sampling random numbers other than uniform, and then use random number sampling to calculate integrals.

## Requirements

We need a random number generator. We could use one of the RNGs implemented in [`rng.ipynb`](rng.ipynb), but instead we will use the default `numpy` RNG. We also need the `math` module and `matplotlib`.

In [None]:
# Import the `numpy` and `math` modules.
import numpy as np
import math

# Import the `matplotlib` module.
import matplotlib.pyplot as plt

# Create an RNG, with a seed of 10.
rng = np.random.default_rng(10)

## Introduction

Typical events produced within the Large Hadron Collider (LHC) from colliding protons have $\mathcal{O}(100)$ or more particles produced. When calculating a cross-section for a two-to-two process we typically only need to integrate over two variables, $\theta$ and $\phi$. A two-to-$n$ process requires integrating over $3n -4$ variables, so a typical LHC event would require integrating over $\mathcal{O}(300)$ variables. This is numerically challenging, at best, and with current technology is just simply not possible. To calcululate LHC events, we can instead factorise the problem into more manageable parts using probabilistic methods. Even still, calculating a perturbative cross-section for a $4$-body final state requires integrating over $8$ variables which is a challenging numerical integration. The bottom line is that performing high dimension integrals quickly and efficiently is a core problem in particle physic, and is very numerically challenging.

However, before we tackle integration with MC, we need to first discuss how we can efficiently sample distributions. In the [`rng.ipynb`](mc/rng.ipynb) notebook, we have hard to make a good generator for uniformly-distributed random variates. In practice, however, the probability distributions of interest are not uniform. Fortunately, uniform random variates can either be transformed into a different distribution or used as part of an accept/reject algorithm that converges to the desired probability distribution. Random variates -- uniform or not -- are also a primary part of the Monte Carlo integration method, so it is worthwhile to know how to transform uniform into complicated.

In this notebook, we only consider continous distributions, but everything that we say can be applied, with some modification, to discrete distributions.

## Analytic Sampling

Analytic, or inverse cumulative distribution function (CDF) sampling allows us to transform a uniform distribution into our target distribution, $f(x)$. However, this is not possible for every $f(x)$. To sample $f(x)$ the following must generally be fulfilled.

1. The sampling of $f(x)$ is bounded, where over this range $f(x)$ is positive.

$$
f(x) \geq 0 \text{ for } x_\min < x < x_\max
$$

2. The integral of $f(x)$ can be calculated.

$$
F(x) = \int \text{d}x\, f(x)
$$

3. The integral of $f(x)$ can be inverted, which we label $F^{-1}(x)$.

With these three conditions met we can then sample a distribution for $f(x)$ as follows. First, we can consider integrating a distribution from $x_\min$ to $x$, as shown in the figure below.

![Schematic of analytic sampling.](figures/sampleAnalytic)

We then draw a uniform random number $R$ which gives us the following relation.

$$
\int_{x_{\min}}^x \text{d}x'\, f(x') = R \int_{x_{\min}}^{x_{\max}} \text{d}{x'}\, f(x')
$$

We then perform the integration, where $F(x)$ is the indefinite integral of $f(x)$.

$$
F(x) - F(x_{\min}) = R(F(x_\max) - F(x_\min))
$$

We can then write $F(x_\max) - F(x_\min)$ as $A$, the area under the integral.
$$
F(x) - F(x_{\min}) = R A
$$

We then solve for $x$.

$$
x = F^{-1}(F(x_{\min}) + R A)
$$

So, we can uniformly sample $R$ and then use the final relation to transform this into $x$, as sampled from $f(x)$.

### Exercise: generic sampler

Before we try to generate any specific distributions using this method, let us first set up a generic sampler class which uses the steps above.

In [None]:
### START_EXERCISE
class SampleAnalytic:
    """
    Base class to analytically sample a distribution from a random
    distribution.
    """

    def __init__(self, rng, xmin, xmax):
        """
        Initialize the sampler, given the limits on f(x).

        rng:  uniform random number generator, should have method `uniform()`.
        xmin: lower bound of the sampling region.
        xmax: upper bound of the sampling region.
        """
        self.rng = rng
        self.xmin = xmin
        self.xmax = xmax
        self.F_xmin = self.F(xmin)
        self.area = self.F(xmax) - self.F(xmin)

    def f(self, x):
        """
        Return the function being sampled, f(x). This method is not necessary,
        but very useful for importance sampling and checking the distribution.

        x: value to calculate f(x) for.
        """
        # Implment f(x) here.
        return 0.0

    def F(self, x):
        """
        Returns F(x), the indefinite integral for f(x).

        x: value to calculate the indefinite integral for f(x).
        """
        # Implement F(x) here.
        return 0.0

    def F_inv(self, f):
        """
        Returns the inverse of the F(x).

        F: the value of F(x) to calculate the inverse.
        """
        # Implement F^-1(x) here.
        return 0.0

    def __call__(self):
        """
        Return the sampled value.
        """
        # Define the function from above that transforms a uniformly sampled
        # random number to the desired distribution.
        return 0.0


###STOP_EXERCISE

In [None]:
### START_SOLUTION
class SampleAnalytic:
    """
    Base class to analytically sample a distribution from a random
    distribution.
    """

    def __init__(self, rng, xmin, xmax):
        """
        Initialize the sampler, given the limits on f(x).

        rng:  uniform random number generator, should have method `uniform()`.
        xmin: lower bound of the sampling region.
        xmax: upper bound of the sampling region.
        """
        self.rng = rng
        self.xmin = xmin
        self.xmax = xmax
        self.F_xmin = self.F(xmin)
        self.area = self.F(xmax) - self.F(xmin)

    def f(self, x):
        """
        Return the function being sampled, f(x). This method is not necessary,
        but very useful for importance sampling and checking the distribution.

        x: value to calculate f(x) for.
        """
        return 0.0

    def F(self, x):
        """
        Returns F(x), the indefinite integral for f(x).

        x: value to calculate the indefinite integral for f(x).
        """
        return 0.0

    def F_inv(self, f):
        """
        Returns the inverse of the F(x).

        F: the value of F(x) to calculate the inverse.
        """
        return 0.0

    def __call__(self):
        """
        Return the sampled value.
        """
        # Sample the uniform random number.
        r = self.rng.uniform()
        return self.F_inv(self.F_xmin + r * self.area)


###STOP_SOLUTION

### Exercise: linear function

Sample from a linear distribution with the following form.

$$
f(x) = mx + b
$$

In [None]:
###START_EXERCISE
class SampleLinear(SampleAnalytic):
    """
    Class to analytically sample a linear function.
    """

    def __init__(self, rng, xmin, xmax, m, b):
        """
        Initialize the sampler, given the limits on f(x) and the linear
        parameters.

        f(x) = mx + b

        rng:  uniform random number generator, should have method `uniform()`.
        xmin: lower bound of the sampling region.
        xmax: upper bound of the sampling region.
        m:    slope of the linear distribution.
        b:    intercept of the linear distribution.
        """
        # Set the linear parameters. This must be done before the base class
        # is initialized.
        # Initialize the base class.

    def f(self, x):
        """
        Return the function being sampled, f(x).

        x: value to calculate f(x) for.
        """
        return 0.0

    def F(self, x):
        """
        Returns F(x), the indefinite integral for f(x).

        x: value to calculate the indefinite integral for f(x).
        """
        return 0.0

    def F_inv(self, f):
        """
        Returns the inverse of the F(x).

        F: the value of F(x) to calculate the inverse.
        """
        # Handle the special case of no slope.
        return 0.0


###STOP_EXERCISE

In [None]:
###START_SOLUTION
class SampleLinear(SampleAnalytic):
    """
    Class to analytically sample a linear function.
    """

    def __init__(self, rng, xmin, xmax, m, b):
        """
        Initialize the sampler, given the limits on f(x) and the linear
        parameters.

        f(x) = mx + b

        rng:  uniform random number generator, should have method `uniform()`.
        xmin: lower bound of the sampling region.
        xmax: upper bound of the sampling region.
        m:    slope of the linear distribution.
        b:    intercept of the linear distribution.
        """
        # Set the linear parameters. This must be done before the base class
        # is initialized.
        self.m = m
        self.b = b
        # Initialize the base class.
        super().__init__(rng, xmin, xmax)

    def f(self, x):
        """
        Return the function being sampled, f(x).

        x: value to calculate f(x) for.
        """
        return self.m * x + self.b

    def F(self, x):
        """
        Returns F(x), the indefinite integral for f(x).

        x: value to calculate the indefinite integral for f(x).
        """
        return self.m * x**2 / 2 + self.b * x

    def F_inv(self, f):
        """
        Returns the inverse of the F(x).

        F: the value of F(x) to calculate the inverse.
        """
        # Handle the special case of no slope.
        if self.m == 0:
            return f / self.b
        else:
            return abs(((self.b**2 + 2 * self.m * f) ** 0.5 - self.b) / self.m)


###STOP_SOLUTION

Now, let us test whether this sampler works for $m = 3$ and $b = 2$ between $0$ and $1$. We will want to test a number of distributions, so let us first write a little method that does just that. The following method plots the normalized sampled distribution and compares this to the normalized target function $f(x)$.

In [None]:
def plot_sampler(sampler, n=100000, bins=50):
    """
    Plots the distribution from a sampler for a specific distribution.

    sampler: random number sampler.
    n:       number of points to sample.
    bins:    number of bins in the histogram.
    """
    # Sample the distribution.
    rns = []
    for i in range(0, n):
        # Store the value.
        rns += [sampler()]

    # Calculate the target function.
    xs = np.linspace(sampler.xmin, sampler.xmax)
    fs = [sampler.f(x) / sampler.area for x in xs]

    # Create the plot.
    fig, ax = plt.subplots()

    # Draw the histogram.
    ax.hist(rns, bins=bins, density=True, label="generated")

    # Draw the target function, make sure to normalize.
    ax.plot(xs, fs, label="target")

    # Draw the legend.
    ax.legend()

    return fig, ax

With this method, test to see if the distribution being generated matches the target.

In [None]:
###START_EXERCISE
# Create the sampler.

# Call the `plot_sampler` method.
###STOP_EXERCISE

In [None]:
###START_SOLUTION
# Create the sampler.
sampler = SampleLinear(rng, 0, 1, 3, 2)

# Plot the comparison.
plot_sampler(sampler);
###STOP_SOLUTION

### Exercise: Breit-Wigner

Also known as a Cauchy distribution, the relativistic Breit-Wigner is of particular importance in particle physics because it can describe the distribution of masses for a specific particle type, e.g., a $Z$ boson. If we want to be able to efficiently sample a mass distribution, then we need to be able to sample a relativistic Breit-Wigner. The form of the function is as follows.

$$
f(x) = \frac{1}{\pi}\left(\frac{\gamma}{(x - x_0)^2 + \gamma^2}\right)
$$

Rememner, the normalization of this function does not matter. In a particle physics context, $\gamma$ is $M\Gamma$ where $M$ is the mass of the particle and $\Gamma$ is its width. Then, $x_0$ is $M$.

Implement a sampler for the Breit-Wigner distribution.

In [None]:
###START_EXERCISE
class SampleCauchy(SampleAnalytic):
    """
    Class to analytically sample a Cauchy function.
    """

    def __init__(self, rng, xmin, xmax, x0, gamma):
        """
        Initialize the sampler, given the limits on f(x) and the linear
        parameters.

        f(x) = 1/pi * (gamma/(x - x0)^2 + gamma^2)

        rng:   uniform random number generator, should have method `uniform()`.
        xmin:  lower bound of the sampling region.
        xmax:  upper bound of the sampling region.
        x0:    location parameter.
        gamma: scale parameter.
        """
        # Set the parameters.
        # Initialize the base class.
        super().__init__(rng, xmin, xmax)

    def f(self, x):
        """
        Return the function being sampled, f(x).

        x: value to calculate f(x) for.
        """
        return 0.0

    def F(self, x):
        """
        Returns F(x), the indefinite integral for f(x).

        x: value to calculate the indefinite integral for f(x).
        """
        return 0.0

    def F_inv(self, f):
        """
        Returns the inverse of the F(x).

        F: the value of F(x) to calculate the inverse.
        """
        return 0.0


###STOP_EXERCISE

In [None]:
###START_SOLUTION
class SampleCauchy(SampleAnalytic):
    """
    Class to analytically sample a Cauchy function.
    """

    def __init__(self, rng, xmin, xmax, x0, gamma):
        """
        Initialize the sampler, given the limits on f(x) and the linear
        parameters.

        f(x) = 1/pi * (gamma/(x - x0)^2 + gamma^2)

        rng:   uniform random number generator, should have method `uniform()`.
        xmin:  lower bound of the sampling region.
        xmax:  upper bound of the sampling region.
        x0:    location parameter.
        gamma: scale parameter.
        """
        # Set the parameters.
        self.x0 = x0
        self.gamma = gamma
        # Initialize the base class.
        super().__init__(rng, xmin, xmax)

    def f(self, x):
        """
        Return the function being sampled, f(x).

        x: value to calculate f(x) for.
        """
        return 1 / math.pi * self.gamma / ((x - self.x0) ** 2 + self.gamma**2)

    def F(self, x):
        """
        Returns F(x), the indefinite integral for f(x).

        x: value to calculate the indefinite integral for f(x).
        """
        return 1 / math.pi * math.atan((x - self.x0) / self.gamma) + 1 / 2

    def F_inv(self, f):
        """
        Returns the inverse of the F(x).

        F: the value of F(x) to calculate the inverse.
        """
        return self.x0 + self.gamma * math.tan(math.pi * (f - 1 / 2))


###STOP_SOLUTION

Check this distribution for a $\gamma = 5$, $x_0 = 20$, and range of $0$ to $40$.

In [None]:
###START_EXERCISE
# Create the sampler.

# Plot the comparison.
###STOP_EXERCISE

In [None]:
###START_SOLUTION
# Create the sampler.
sampler = SampleCauchy(rng, 0, 40, 20, 5)

# Plot the comparison.
plot_sampler(sampler);
###STOP_SOLUTION

### Exercise: Gaussian

Perhaps one of the most sampled distributions out there is the Gaussian.

$$
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)}
$$

As a matter of fact, the Gaussian is critical for many machine learning techniques. However, looking at the function above, it is clear that it is not possible to calculate a closed analytic form for either $F(x)$ or $F^{-1}(x)$. So, our analytic method from above fails. What do we do instead? One option is to numerically calculate the $F(x)$ and $F^{-1}(x)$, of which there are relatively efficient methods.

However, it turns out there is a very clever transform that you can do, commonly called the Box-Muller transform. We won't go through the derivation here, but it has to do with relating Cartesian and polar coordinates. Anyhow, the method is as follows.

1. Sample two random numbers $R_1$ and $R_2$.
2. Transform these into two indepdent Gaussian distributed numbers $x_1$ and $x_2$ with the following.

$$
x_1 = \sqrt{-2\log(R_1)}\cos(2\pi R_2)
$$

$$
x_2 = \sqrt{-2\log(R_1)}\sin(2\pi R_2)
$$

3. These $x$ are for a Gaussian with $\mu = 0$ and $\sigma = 1$, so they need to be muplitied by $\sigma$ with $\mu$ added on.

What is great about this method is that not only is it simple and fast, it also is not bounded, which is required of the method we used above! Using this transformation, define a Gaussian sampler.

In [None]:
###START_SOLUTION
class SampleGaussian:
    """
    Class to sample a Gaussian distribution.
    """

    def __init__(self, rng, xmin, xmax, mu, sigma):
        """
        Initialize the sampler. Note, the limits `xmin` and `xmax` here only
        define the limits when used for drawing with the `plot_sampler` method.
        Sampling is performed without any limits.

        rng:   uniform random number generator, should have method `uniform()`.
        xmin:  minimum x for plotting (not sampling).
        xmax:  maximum x for plotting (not sampling).
        mu:    mean of Gaussian.
        sigma: width of Gaussian.
        """
        # Set the parameters.
        self.rng = rng
        self.xmin = xmin
        self.xmax = xmax
        self.mu = mu
        self.sigma = sigma

        # Set the area being sampled. This distribution is normalized.
        self.area = 1

    def f(self, x):
        """
        Return the function being sampled, f(x).

        x: value to calculate f(x) for.
        """
        return (
            1
            / (2 * math.pi * self.sigma**2) ** 0.5
            * math.exp(-((x - self.mu) ** 2) / (2 * self.sigma**2))
        )

    def __call__(self):
        """
        Return the sampled value.
        """
        # Sample the two uniform random numbers.
        r1 = self.rng.uniform()
        r2 = self.rng.uniform()
        # Return only one of the two transformed values.
        return (
            self.sigma * (-2 * math.log(r1)) ** 0.5 * math.cos(2 * math.pi * r2)
            + self.mu
        )


###STOP_SOLUTION

Check this distribution.

In [None]:
### START_EXERCISE
### STOP_EXERCISE

In [None]:
### START_SOLUTION
# Create the sampler.
sampler = SampleGaussian(rng, 0, 10, 5, 2)

# Plot the comparison.
plot_sampler(sampler);
### STOP_SOLUTION

## Binned Sampling

### Example: create a histogram

### Example: implement binned sampling

## Accept/Reject Sampling

## Importance Sampling

## Multichannel Sampling

## Quadarature Integration

## MC Integration