# Sampling
In this chapter we will explore the generation of random numbers and techniques to sample from probability distribution functions

## Random numbers
In order to obtain random samples from a probability distribution function we have to have a way to create random numbers.  In principle, you could throw dice and then type the numbers into the computer, but that seems quite tedious.  Instead, a lot of effort has been spent on developing algorithms to create so-called pseudo-random number generators.  We will begin by looking at a particular instantiation of random number generators that are called "Linear congruential generator"
$$X_{n+1}=(a X_{n} + c) \mod m$$
even though these number generators are not much used anymore, they illustrate nicely the basic principle of a pseudo-random number generator.  For a given u,a and m, we can create a string of random numbers once we pick a so-called "seed" number $X_{1}$.  The parameters u,a, and m are chosen in such a way that the numbers do not repeat in within a particular range and that consequitve numbers do not exhibit correlations (for example Numerical Recipies suggests $m=2^{32}$, $a=1664525$, $c=1013904223$).  Given that they come from a deterministic alogrithm this is impossible to achieve, but modern random number generators are pretty close to random for most practical purposes.  It is therefore a good idea to test your results using several independent runs of random numbers to make sure that the result does not depend on the random "seed".

## Sampling from distributions
In python, we have access to many forms of random numbers and distributions, but the most common random number generator produces a random float from a uniform distribution covering the interval $[0,1)$ and can be accessed using the function np.random.rand()

In last weeks homework we explored how to use such random numbers to obtain a sample from a normal distribution.

## Sampling from any pdf
Now we want to explore the question of how to obtain a random sample from an arbitrary probability function.  The most general solution to this problem is to use the inverse cummulative probability density function (cdf).  The basic idea is that cdf is given by the integral of a probability function $pdf(\xi)$ from $-\inf$ to $x$:
$$cdf(x)=\int_{-\inf}^{x}pdf(\xi)d\xi$$
It is clear that since the $pdf(\xi)$ is normalized, the $cdf(x)$ will have a range from $[0,1]$.  This immediately suggests the following strategy:  If we can find the inverse of the cdf then we can map the range of $[0,1]$ to all the accessible values of the pdf.  So in order to create a random sample of the pdf we feed a stream of random numbers in the range $[0,1]$ into the inverse cdf function.  The only technical problem is to properly integrate the pdf and to inverse the resulting cdf.  We will explore this technique in the Bioassay example.