# The Poisson Process: Section 2 - Mathematical Analysis

## Probability Distributions

The main random variables of interest in the Poisson process are:
* $X_i$ is the **time interval between** the $i-1$st and $i$th arrival.
* $S_n$ is the **arrival time** of the $n$th arrival.
* $N(t)$ is the **number of arrivals** by time $t$.

Assume that $X_i \sim \text{Exponential}(\lambda)$ and are independent.

It follows that $S_n \sim \text{Erlang}(n, \lambda)$ (you will have to show this for homework):
$$\mathsf{P}(S_n \leq x) = 1 - e^{-\lambda x} \sum_{k=0}^{n-1} \frac{(\lambda x)^k}{k!}.$$

To deduce the distribution of $N(t)$ from the distribution of $S_n$, note the identity
$$\{N(t) < n\} = \{S_n > t\}.$$
That is, “there are fewer than $n$ arrivals at time $t$” is equivalent to ”the $n$th arrival occurs after time $t$”.  

We use this to find the probability mass function of $N(t)$:
$$\begin{aligned}
    \mathsf{P}(N(t) = n) &= \mathsf{P}(N(t) < n+1) - \mathsf{P}(N(t) < n) \\
    &= \mathsf{P}(S_{n+1} > t) - \mathsf{P}(S_n > t) \\
    &= e^{-\lambda t} \sum_{k=0}^n \frac{(\lambda t)^k}{k!} - e^{-\lambda t} \sum_{k=0}^{n-1} \frac{(\lambda t)^k}{k!} \\
    &= e^{-\lambda t} \frac{(\lambda t)^n}{n!}
  \end{aligned}$$
and recognize this as the probability mass function of the [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution).  Therefore, $N(t) \sim \text{Poisson}(\lambda t)$.

**Note:** The distribution of $N(t)$ depends on $t$ since there tends to be more arrivals for longer time periods.

### Remarks:

You may want to review the [exponential](https://en.wikipedia.org/wiki/Exponential_distribution), [Erlang](https://en.wikipedia.org/wiki/Erlang_distribution), and [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution) distributions, including their moment generating functions and expected values.  Remember that $N(t) \sim \text{Poisson}(\lambda t)$ (**don't forget $t$** even though the standard description of a Poisson distribution is only in terms of a rate $\lambda$).

Also, you should remember the power series definition of the [exponential function](https://en.wikipedia.org/wiki/Exponential_function#Formal_definition):
$$e^x = \sum_{k=0}^\infty \frac{x^k}{k!}$$

### Exercise 2.1

Provide a justification for each of the following:
* $\{N(t) \geq n\} = \{S_n \leq t\}$
* $\{N(t) > n\} \neq \{S_n < t\}$
* $\{N(t) = n\} \neq \{S_n = t\}$
* $\{N(t) > n\} = \{S_{n+1} \leq t\}$

### Summary:
* $X_i$ is the time interval between the $i-1$st and $i$th arrivals.  
    * $X_i \sim \text{Exponential}(\lambda)$
    * $\mathsf{P}(X_i \leq x) = 1 - e^{-\lambda x}$
    * $\displaystyle\mathsf{E}[X_i] = \frac{1}{\lambda}$
* $S_n$ is the arrival time of the $n$th arrival.  $\displaystyle S_n \overset{\mathsf{def}}{=} \sum_{i=1}^n X_i$
    * $S_n \sim \text{Erlang}(n, \lambda)$
    * $\displaystyle\mathsf{P}(S_n \leq x) = 1 - e^{-\lambda x} \sum_{k=0}^{n-1} \frac{(\lambda x)^k}{k!}$
    * $\displaystyle\mathsf{E}[S_n] = \frac{n}{\lambda}$
* $N(t)$ is the number of arrivals by time $t$.  $\displaystyle N(t)\overset{\mathsf{def}}{=} \max\{n\,:\, S_n \leq t\}$
    * $N(t) \sim \text{Poisson}(\lambda t)$
    * $\displaystyle\mathsf{P}(N(t) = n) = e^{-\lambda t} \frac{(\lambda t)^n}{n!}$
    * $\displaystyle\mathsf{E}[N(t)] = \lambda t$
    
    
* A useful identity is $\{N(t) < n\} = \{S_n > t\}$.

## Example: Bus Stop

Recall the bus stop example from Section 1, where passengers **arrive according to a Poisson process with rate $\lambda = 2$** passengers per minute.  Suppose that a bus departs every $d = 10$ minutes and takes all the waiting passengers.  (For simplicity, assume that the bus loads and departs instantly.)

The mathematical model underlying this problem is that the arrivals follow a Poisson process.  This means we can answer all sorts of probabilistic questions if we can frame them in terms of $X_i$, $S_n$, and $N(t)$ (since we know their distributions).  Suppose that a bus has just departed at time 0 and the next bus departs at time $d$.

For example:
* The number of passengers on a bus is $N(10) \sim \text{Poisson}(20)$ — the number of arrivals during the $d = 10$ minutes between bus departures.
* The $n$th passenger arrives at time $S_n \sim \text{Erlang}(n, 2)$
* The bus stop remains empty until the first passenger arrives at time $X_1 = S_1 \sim \text{Exponential}(2)$
* Some probabilities may be calculated multiple ways, since $\mathsf{P}(N(t) < n) = \mathsf{P}(S_n > t)$.  

  For example, the probability that the $3$rd passenger arrives within the first $5$ minutes can be calculated in two ways:  
  * the number of arrivals by time 5, $N(5)$, is at least 3:
  $$\mathsf{P}(N(5) \geq 3) = \sum_{n=3}^\infty \mathsf{P}(N(5) = n) = \sum_{n=3}^\infty e^{-5 \lambda} \frac{(5 \lambda)^n}{n!} = 1 - \sum_{k=0}^2 e^{-5 \lambda} \frac{(5 \lambda)^n}{n!}$$
  * the arrival time of the 3rd passenger, $S_3$, is at most $5$.
  $$\mathsf{P}(S_3 \leq 5) = 1 - e^{-5 \lambda} \sum_{k=0}^2 \frac{(5 \lambda)^k}{k!}$$

### Exercise 2.2

For the bus stop example above (with $\lambda = 2$ and $d = 10$), calculate:
* What is the expected number of passengers on the next bus?  
* What is the probability there are (exactly) $10$ passengers on the next bus?  
* What is the probability there are $5$ passengers waiting after $3$ minutes?
* What is the expected arrival time of the $3$rd passenger?
* What is the probability that the $5$th passenger arrives within the first $3$ minutes?
* What is the probability that the bus stop remains empty for the first $10$ minutes?
* What is the probability that at least $10$ passengers wait $5$ minutes or longer for the bus?
* Suppose you arrive and see $n = 20$ passengers waiting.  What is the probability that the bus departs in the next $s=1$ minute?

## Simulated Sample Path and Visualization

In [1]:
import numpy as np
import pandas as pd

from IPython.display import display, Math, Markdown

import bokeh.plotting as bplt
from bokeh.models import Range1d, LabelSet, ColumnDataSource, Arrow, TeeHead, VeeHead, NormalHead, OpenHead, Whisker
from bokeh.models.glyphs import VBar, Line, Step
from bokeh.models.markers import Circle
from bokeh.io import output_notebook, push_notebook
output_notebook()

In [2]:
lam = 2
d = 10
num_arr = np.random.poisson(lam*d)
S_0 = 0
S = pd.DataFrame(np.sort(d*np.random.rand(num_arr)), 
                 index=pd.Index(range(1,num_arr+1), name='n'), 
                 columns=['S_n'])
X = np.append(S['S_n'].iloc[0], np.diff(S['S_n'].values))
S['X_n'] = X
N = pd.DataFrame(np.append(np.arange(0,num_arr+1), num_arr), 
                 index=pd.Index(np.append(np.append(S_0, S['S_n'].values), d), name='t'), 
                 columns=['N(t)'])

plt = bplt.figure(title='Bus Stop Arrival Process')
plt.add_glyph(ColumnDataSource(N),
              Step(x="t", y="N(t)", line_color="blue", mode="after"))
plt.add_glyph(ColumnDataSource(S),
              Circle(x="S_n", y=0, size=10, line_color="red", fill_color="red"))
plt.xaxis.axis_label = "t"
plt.yaxis.axis_label = "N(t)"
plt.x_range = Range1d(0, d)

display(Markdown('$\lambda = ' + f'{lam}' + '$ passengers per minute'))
display(Markdown('$d = ' + f'{d}' + '$ minutes between bus departures'))
display(Markdown('$N(d) = ' + f'{num_arr}' + '$ passengers arrive during the period $[0, d]$'))
bplt.show(plt)
display(S)

$\lambda = 2$ passengers per minute

$d = 10$ minutes between bus departures

$N(d) = 18$ passengers arrive during the period $[0, d]$

Unnamed: 0_level_0,S_n,X_n
n,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.01355,0.01355
2,1.531595,1.518044
3,1.829762,0.298167
4,2.013056,0.183294
5,2.221277,0.208221
6,2.99437,0.773093
7,3.529132,0.534762
8,3.760155,0.231023
9,3.916384,0.156229
10,4.171011,0.254627
