# Uniform Distribution

### Theory

**Discrete uniform distribution**

Models a finite number of $n$ values are equally likely to be observed, hence each an equal probability $1/n$.

Properties for $X \sim \text{unif}\{a,b\}$ and $n = b - a + 1$:
- Probability mass function: $P(X = k) = \begin{cases}1/n &{\text{for }}a \leq k \leq b,\\[8pt]0&{\text{otherwise}}.\end{cases}$
- Mean: $\frac12 (a + b)$
- Variance: $\frac{1}{12}(n^2 - 1)$

**Continuous uniform distribution**

Describes an experiment where there is an arbitrary outcome that lies between certain bounds $a$ and $b$

Properties for $X \sim \text{U}(a,b)$
- Probability density function $p(x) = \displaystyle {\begin{cases}{1/(b-a)}&{\text{for }}a\leq x\leq b,\\[8pt]0&{\text{otherwise}}.\end{cases}}$
- Mean: $\frac12 (a + b)$
- Variance: $\frac{1}{12}(b - a)^2$


### Example (discrete uniform)

Given an array of $n$ items, select $k \leq n$ items with equal probability, i.e. sample items from a discrete uniform distribution.

Note that $n$ can be known or unknown

In [None]:
import numpy as np
import matplotlib.pyplot as plt

plt.rc('axes', axisbelow=True)

### Case 1: $n$ known

Want to select $k$ out of $N$ numbers with equal probability.

In [None]:
N = 20
k = 5

In [None]:
def random_choice(xs, k):
    # simplistic version of np.random.choice(xs, k, replace=False)
    value = []

    items = xs.tolist()
    for j in range(k):
        ix = int(np.random.random() * len(items))
        value.append(items[ix])
        del items[ix]

    return value

xs = np.arange(N)
print("Numbers:  ", xs)

sel = random_choice(xs, k)
print("Selection:", sel)

In [None]:
# let's convince ourselves that this is really uniform
n_exp = 20000
acc = np.zeros(N)
for i in range(n_exp):
    sel = random_choice(xs, k)
    acc[sel] += 1

print("Expected:", n_exp * k / N)

plt.bar(np.arange(N), acc)
plt.grid(axis="y")
plt.xticks(np.arange(N))
plt.show()

### Case 2: $n$ unknown

Sample problem: A server needs to run random routine checks on a stream of messages:
- There will be capacity to check exactly $k$ messages by the end of day
- Whenever a messages arrives, we need to decide immediately whether to hold it back or
  to let it pass
- We don't know the total number of messages $n$ that will arrive

This is an instance of a *reservoir sampling* problem. Many algorithms exists (see e.g.
[Wikipedia](https://en.wikipedia.org/wiki/Reservoir_sampling))

**Algorithm R**

Famous \& simple (Vitter, 1985)

Algorithm:
1. Initialize reservoir array $R$ of size $k$ and copy first $k$ messages $m_1 \dots
   m_k$ to that array
2. Then for $i > k$: With probability $k/i$, replace a randomly selected message in the
   reservoir with the incoming message $m_i$

Result: At all times, every message has the same chance to end up in the reservoir (and
being ckecked)


In [None]:
def run_experiment(secret_n, k):
    assert k <= secret_n
    input_stream = np.arange(secret_n)

    # run algorithm; note that it doesn't know N
    R = input_stream[:k]  # step 1
    for i in range(k, N):
        if np.random.random() < k / (i + 1):  # step 2, note that i is 0-based
            j = int(np.random.random() * k)
            R[j] = input_stream[i]

    return R


# let's convince ourselves that this is really uniform
n_exp = 20000
acc = np.zeros(N)
for i in range(n_exp):
    sel = run_experiment(N, k)
    acc[sel] += 1

print("Expected:", n_exp * k / N)

plt.bar(np.arange(N), acc)
plt.grid(axis="y")
plt.xticks(np.arange(N))
plt.show()

### Appendix: Derivation of *Algorithm R*

First observe that after step $i - 1$ is complete, each item has a $\displaystyle
\frac{1}{i - 1}$ probability to be in $R$. Then, at step $i$, each item must have a
$\displaystyle \frac{1}{i}$ probability to be in $R$. 

Define event $\mathcal E := $"item $r$ which is in $R$ at $i-1$ will remain in $R$ at
$i$".

Also let $p_i$ be the probability that we select the item at step $i$ and replace a
random item in $R$ with it.
- Need to choose $p_i$ s.t. the probability for each item present in $R$ drops from
  $\displaystyle \frac{1}{i - 1}$ to $\displaystyle \frac{1}{i}$ between step $i - 1$
  and $i$
- This means each item in $R$ must see a factor of $\displaystyle \frac{i - 1}{i}$
  applied to it's probability to remain in $R$
  
So now we have $P(\mathcal E) = \displaystyle \frac{i - 1}{i}$.

We can also derive $P(\mathcal E)$ by observing that an item $r$ will remain in $R$ in
exactly two cases:
- We let item $i$ pass, so none of the items in $R$ gets replaced. Happens with
  probability $1 - p_i$
- We select item $i$ for checking but item $r$ is not being replaced with $i$. Happens
  with probability $\displaystyle p_i\,\frac{k - 1}{k}$

Since these two cases are mutually exclusive, the probabilities sum: $\displaystyle
P(\mathcal E) = (1 - p_i) + p_i\,\frac{k - 1}{k}$.

The two expressions for $P(\mathcal E)$ must be equal:
$\displaystyle \frac{i - 1}{i} \stackrel{!}{=} (1 - p_i) + p_i\,\frac{k - 1}{k}$.

Solving for $p_i$ yields $\displaystyle p_i = \frac{k}{i}$ 

The general correctness of this rule for $p_i$ can be proven by induction.
