# Geometric Distribution

## Overview

In this section we discuss the <a href="https://en.wikipedia.org/wiki/Geometric_distribution">geometric distribution</a>.
We use the geometric distribution in order to model the number of experiments we need to perform in order to observe the 
first time the event of interest.

## Geometric distribution

Let's consider the following question. Assume you draw from the standard normal distribution. What is the aprroximate expected 
number of draws we need to do to a get a value greater than 5? Similarly assume you roll a die, how many rolls you need to do in order to get a 6? 
The geometric distribution can help us answer these kind of questions.

A variable $X$ has a geometric distribution with parameter of success $p\in (0,1)$ if [1]

\begin{equation}
P(X=k) = p(1-p)^{k-1}, k \geq 1
\end{equation}

We then say that $X \sim Geom(p)$. The expected value and variance of $X$ are given by

\begin{equation}
E[X] = \frac{1}{p}, ~~ Var[X] = \frac{1-p}{p^2}
\end{equation}

----
**Remark**

The expected value may also be given by 

\begin{equation}
E[X] = \frac{1 - p}{p}
\end{equation}

this will be the case if $k \geq 0$ i.e. $k$ can assume the value zero.

----

Let's now use the geometric distribution to answer the questions posed above. We start with the fair die.

### Example 1

Assume you have a fair die. How many rolls we need to do until we see a 6? 

**_ANS_**

We can use the geometric distribution to answer this. We have $p=1/6$. So the expected number of rolls we need to do is

\begin{equation}
E[X] = \frac{1}{p} = 6 \text{rolls}
\end{equation}

Notice that this will be the same for every face of the die.

### Example 2

Assume you draw from the standard normal distribution. What is the aprroximate expected 
number of draws we need to do to a get a value greater than 5?

**_ANS_**


Again we can use the geometric distribution for answering this. However, we need to feed it with $p$. This is given by


\begin{equation}
P(X > 5) = 1 -P(X\leq 5) = 1- \Phi(5)
\end{equation}


I don't know $\Phi(5)$ so I will use Python for this.

In [4]:
from scipy.stats import norm
phi_2 = norm.cdf(2)
print(phi_2)
phi_5 = norm.cdf(5)
print(phi_5)

0.9772498680518208
0.9999997133484281


Thus $p=1-0.9999$. Hence, the expected number of experiments we need to perform until see 5 for the first time is

In [3]:
p = 1.0 - 0.9999
print(f"Number of experiments {int(1.0/p)}")

Number of experiments 10000


## Summary

This section discussed the geometric distribution. This distribution models the probability that the first occurrence of success requires 
$k$ independent trials, each with success probability $p$. If the probability of success on each trial is $p$, then the probability that the 
$k$-th trial is the first success is

\begin{equation}
P(X=k) = p(1-p)^{k-1}, k \geq 1
\end{equation}

It has the following mean and variance values

\begin{equation}
E[X] = \frac{1}{p}, ~~ Var[X] = \frac{1-p}{p^2}
\end{equation}

The continuous counterpart of the geometric distribution is the <a href="https://en.wikipedia.org/wiki/Exponential_distribution">exponential distribution</a> which we will see later on.

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.