# Lecture 10 
- Important Discrete Random Variables (Bernoulli, Binomial, Geometric, Poisson)
- Properties of CDFs
- Continuous Random Variables
- Probability Density Functions

___

# Exam 1 rescheduled to <font color=blue> **Tuesday, October 4th** </font> @ 8.20pm

### The location for Exam 1 is **LIT 0113** and **LIT 0121**. Students whose last name begins with **A-M** should go to **LIT 0113**, and those whose last name begins **N-Z** should go to **LIT 0121**. 




The exam has 2 parts:

**<font color=blue>Part 1 - Analytical.</font>**
* You are allowed 1-page letter-sized front and back of formulas (handwritten or typed).
* You are allowed a scientific calculator.
* **<font color=orange>TOTAL TIME:</font>** 1 hour
 

**<font color=blue>Part 2 - Simulation.</font>**
* Bring the computer you have been using with Anaconda installed.
* This part is open-book. You are allowed access to the textbooks and lecture materials (including assignments).
* You are allowed to use the internet, if needed.
* *Recommended*: create a python "cheat sheet" where you will add useful functions, simulations and other Python implementations.
* **<font color=orange>TOTAL TIME:</font>** 1 hour

**<font color=red>Communications between students or anyone else is considered cheating. Turn off all Slack notifications and other communications channels!</font>**

## Exam 1 Coverage

Exam 1 will cover all materials from Lecture 1-7. These include:

1. **Introduction to Data Science and Python**
    * Building functions with Python (positional vs keyword argument parameters)
    * Operate with ```numpy```, ```matplotlib``` and ```random``` libraries
    
2. **Introduction to Probability**
    * Fair experiments
    * Relative frequency
    * Simulations
    * Sample space
    * Set operations
    * Probabilistic models (axioms of probability and corollaries)
    * Conditional Probability
    * Statistical independence
    * Law of Total Probability
    * Bayes' Rule
    * Frequentist vs Bayesian probability
    * Combinatorics (sampling with and without replacement, with and without ordering)
    * Histogram plots, bar plots
    
3. **Exploratory Data Analysis**
    * Operate with the ```pandas``` library
    * Summary statistics
    * Boxplot (or whisker plot)
    * Population, sample, statistics
    
4. **Hypothesis Testing and Confidence Intervals**
    * Binary hypothesis testing (null hypothesis $H_0$ and alternative hypothesis $H_1$)
    * One-sided and two-sided
    * Resampling with Bootstrap
    * Resampling with Permutation tests or Monte Carlo simulations
    * p-value
    * Significance value, $\alpha$
    * Confidence interval
    * Bayesian Hypothesis Testing
    
5. **Introduction to Statistics**
    * Statistical inference
    * Decision rules
    * Maximum Likelihood Estimation (MLE) decision rule
    * Maximum A Posteriori (MAP) decision rule

___

# Cheating and Plagiarism
You are expected to submit your own work. If you are suspected of dishonest academic activity, I will invite you to discuss it further in private. Academic dishonesty will likely result in grade reduction, with severity depending on the nature of the dishonest activity. I am obligated to report on academic misconduct with a letter to the department, college and/or university leadership. Repeat offenses will be treated with significantly greater severity.

___

# Discrete Random Variables

In Lecture 08, we introduced *random variables* (RVs), in particular *discrete RVs*.

<div class="alert alert-info">
    <b>Random Variable (RV)</b>
    
Given an experiment and the corresponding set of possible outcomes (the sample space), a **random variable** associates a particular *number* with each outcome. We refer to this number as the **numerical value** or simply the **value** of the RV. Mathematically, a **random variable is a real-valued function of the experimental outcome**.
</div>

<div class="alert alert-info">
    <b>Probability Mass Function (PMF)</b>

If $(\Omega,\mathcal{F},P)$ is a probability space with $X$ a real discrete RV on $\Omega$, if $x$ is any real number, the **probability mass** of $x$, denoted $p_X(x)$, is the probability of the event $\{X=x\}$ consisting of all outcomes that give rise to a value $X$ equal to $x$:

$$p_X(x) = P(X=x)$$
</div>

<div class="alert alert-info">
    <b>Cumulative Distribution Function (CDF)</b>
    
If $(\Omega,\mathcal{F},P)$ is a probability space with $X$ a real discrete RV on $\Omega$, the **Cumulative Distribution Function (CDF)** is denoted as $F_X(x)$ and provides the probability $P(X\leq x)$. In particular, for every $x$ we have

$$F_X(x) = P(X\leq x) = \sum_{k\leq x} p_X(k)$$
</div>

# Important Discrete RVs

In [None]:
import numpy as np
import numpy.random as npr

import scipy.stats as stats

import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('bmh')

# The Bernoulli Random Variable

An event $A\in\mathcal{F}$ is considered a "success".

* A **Bernoulli RV** $X$ is defined by

$$X(x) = \begin{cases}1, & x\in A \\ 0,& x\notin A \end{cases}$$

* The PMF for a Bernoulli RV $X$ is defined by

$$p_X(x) = P(X=x) = \begin{cases}p,&x=1\\1-p,&x=0\\0,& \text{o.w.}  \end{cases}$$

* We have seen this PMF before when we considered *data likelihood* for a coin flip. Remember for the toss of a coin, which comes up heads with probability $p$, and a tail with probability $1-p$.

* We say that the "R.V. $X$ follows a Bernoulli distribution with parameter $p$" and we write this as:

$$X \sim \text{Bernoulli}(p) $$

* **Engineering examples/applications:** whether a bit is 0 or 1, whether a bit is in error, whether a component has failed, whether something has been detected.

Let's now plot the histogram of this sample. Let's start by defining the bins of the histogram:

We can plot the relative frequency of all the values in each bin, by changing the parameter ```density``` in the histogram function:

We can overlay the histogram and stem plots:

# The Binomial Random Variable

* A Binomial RV represents the number of successes on $n$ independent Bernoulli trials.
    * Example: a coin is tossed $n$ times.

* Thus, a Binomial RV can also be defined as the sum of $n$ independent Bernoulli RVs.
    * Example: At each toss, the coin comes up heads with probability $p$ and a tail with probability $1-p$, independently of prior tosses.
    
* Let $X$ be the # of successes.
    * Example: X is the number of heads in the $n$-toss sequence.
    
* We refer to $X$ as the **Binomial** RV **with parameters $n$ and $p$**:

$$X \sim \text{Binomial}(n,p)$$

* The PMF of $X$ is given by

$$p_X(x) = P(X=x) = \begin{cases} \binom{n}{x} p^x (1-p)^{n-x}, & x=0,1,\dots,n \\ 0, & \text{o.w.} \end{cases} $$

* **Engineering examples/applications:** The number of bits in error in a packet, the number of defective items in a manufacturing run.

Let's compute $P_X(2)$ where $X \sim \text{Binomial}(12,0.2)$:

Let's build a simulation, using NumPy arrays, to verify this result:

The complete PMF of this Binomial RV is:

What happens if we change the probability of 'success'?

Let's plot its CDF:

Let's generate some samples (random variables) from this distribution and plot their PMF:

The plotting function ```hist``` can also plot the CDF of an RV:

The histogram "fills" the area under the (CDF) curve. We can overlay the CDF curve on top:

But this plotting function does not look good. The probability seems to increase in between discrete values, which is not valid. Instead, we use the ```step``` plotting function:

## Binomial as the Sum of Bernoulli RVs

The Binomial RV can also be defined as the sum of $n$ independent Bernoulli RVs.

Let's overlay the true PMF function of a Binomial RV with parameters $n=12$ and $p=0.2$:

* **Conclusion:** Adding together independent Bernoulli RVs (with the same probability $p$) produces a Binomial RV.

# The Geometric Random Variable

* A Geometric RV occurs when independent Bernoulli trials are conducted until the first success
    * Example: repeatedly and independently toss a coin with probability of a heads equal to $p$, where $0<p<1$. 
    
* $X$ is the number of trials required.
    * Example: The Geometric RV is the number $X$ of tosses needed for a head to come up for the first time.

$$X \sim \text{Geometric}(p)$$

* The PMF of $X$ is given by

$$p_X(x) = P(X=x) = \begin{cases}p(1-p)^{x-1}, & x=1,2,\dots \\ 0, & \text{o.w.}\end{cases}$$

* **Engineering examples/applications:** The number of retransmissions required for a packet, number of white dots between black dots in the scan of a black and white document.

* What is the probability that the first success occurs in the 1st trial (coin flip)?

$$p_X(1) = p(1-p)^{1-1} = p = 0.2$$

* 6th trial?

$$p_X(6) = p(1-p)^{6-1} = 0.2 \times 0.8^5 \approx 0.0655$$

* Let's visualize the PMF for the Geometric with parameter $p=0.5$?

* Let's visualize the PMF for the Geometric with parameter $p=0.8$?

* If you flip a fair coin until you see heads, what is the probability that it takes more than 6 flips?

$$P(G>6) = 1- P(G \leq 6) = 1 - F_X(6)$$

<div class="alert alert-info">
    <b>Survival Function</b>
    
If $(\Omega,\mathcal{F},P)$ is a probability space with $X$ a real discrete RV on $\Omega$, the **Survival Function (SF)** is denoted as $S_X(x)$ and provides the probability $P(X > x)$. In particular, for every $x$ we have

$$S_X(x) = P(X > x) = 1 - P(X \leq x) = 1 - F_X(x)$$
</div>

* Let's write a simulation to verify this:

## The Poisson Random Variable

* A Poisson RV models events that occur randomly in space or time
    * Example: number of produced items in a factory line

* Let $\lambda$ = the # of events/(unit of space or time)

* Consider observing some period of time or space of length $t$ and let $\alpha= \lambda t$ 

* Let the RV $X =$ the \# events in time (or space) $t$

$$X \sim \text{Poisson}(\alpha)$$

* The PMF of the Poisson random variable is:

$$ P_X(x) = \begin{cases} \frac{\alpha^x}{x!} e^{-\alpha}, & x=0,1,\ldots \\ 0, & \mbox{o.w.} \end{cases}
$$

* For large $\alpha$, the Poisson PMF has a bell shape. For examples, see below.

* **Engineering examples/applications:**
    * calls coming in to a switching center
    * packets arriving at a queue in a network
    * processes being submitted to a scheduler
    
* Other examples:
    * \# of misprints on a group of pages in a book
    * \# of people in a community that live to be 100 years old
    * \# of wrong telephone numbers that are dialed in a day
    * \# of $\alpha$-particles discharged in a fixed period of time from some radioactive material
    * \# of earthquakes per year
    * \# of computer crashes in a lab in a week

___

## Example

**An engineering professor makes an average of 60 mistakes during lectures over the course of a typical semester. A semester consists of 40 lectures, each of which is 50 minutes long.**

**<font color=blue>Question 1</font> In a new semester, what is the probability that the professor makes at least one mistake during some 20 minute period?**

We want to compute:

<!-- $$P(X>0) = 1 - P(X\leq 0) = 1 - F_X(0) = S_X(0)$$ -->

**<font color=blue>Question 2</font> What is the probability that the professor makes 4 mistakes in a lecture?**

We want to compute:

<!-- $$P(X=4)$$ -->

**<font color=blue>Question 3</font> What is the probability that the professor makes less or equal than 4 mistakes in a lecture?**

**<font color=blue>Question 4</font> What is the probability that the professor has at least one lecture with 4 or more mistakes in a semester?**

<!-- Let $p$= prob. of 4 or more mistakes in a lecture.

Let $Z$= number of lectures with 4 or more mistakes in a 40-lecture semester.

Then $Z$ is Binomial(40,$p$).

Need to find $p$ first: -->

# Properties of Cumulative Distribution Functions (CDFs)

Recall:
<div class="alert alert-info">
    <b>Cumulative Distribution Function (CDF)</b>
    
If $(\Omega,\mathcal{F},P)$ is a probability space with $X$ a real discrete RV on $\Omega$, the **Cumulative Distribution Function (CDF)** is denoted as $F_X(x)$ and provides the probability $P(X\leq x)$. In particular, for every $x$ we have

$$F_X(x) = P(X\leq x) = \sum_{k\leq x} p_X(k)$$
</div>

For shorthand, we write 
$$
F_X(x) = P(X \le x)
$$

Let's create some random variables useful for us to study the properties of the CDF:

In [None]:
Bn = stats.binom(20,0.2)
G = stats.geom(0.3)

x = range(-10,20)
y = range(-10,20)

## Property 1

$$0 \le F_X(x) \le 1$$

**Proof:** $F_X(x)$ is a prob. measure 

## Property 2

$$F_X(-\infty)=0 \text{ and }F_X(\infty)=1$$

**Proof:** The proof is rather technical.

Basically, $F_X(-\infty)$ and $F_X(\infty)$ are defined as limits, and the corresponding subsets of the samples space $\{x \in \Omega: X \le x\}$ are either shrinking to $\emptyset$ or $\Omega$.







## Property 3

**$F_X(x)$ is monotonically nondecreasing, i.e.,**

$$F_X(a) \le F_X(b)\text{ iff } a \le b$$

**Proof:**
\begin{align*}
P\left\{X \in (- \infty,b]\right\} &= P(X \in (-\infty, a]) + P( X \in (a,b]) \\
\Rightarrow F_X(b) &= F_X(a) + P(a <  X \le b)
\end{align*}

## Property 4

$$P(a < X \le b) = F_X(b)  -F_X(a)$$

**Proof:** rewriting the result in the proof of property 3.

## Property 5

**$F_X(x)$ is continuous on the right, i.e.,** 

$$F_X(b) =\lim_{h \rightarrow 0} F_X(b+h) =F_X(b)$$

*(The value at a jump discontinuity is the value __after__ the jump.)*


**Proof:** Rather technical. Will be omitted. Let's instead build a simulation to observe this property:

In [None]:
h = 1e-10
x = range(-10,20)



## Property 6

$$P(X>x) =1 - F_X(x)$$

**Proof:**

\begin{align*}
\{X>x\} &= \{X\le x\}^c \\
\Rightarrow P(X>x) &= 1 - P(X \le x)  = 1 - F_X(x)
\end{align*}

___

# Introduction to Continuous RVs

## Uniform Continuous RVs

Previously, we introduced a way to choose random values from the interval $[0,1)$:

We could also use scipy.stats for this:

* Let's find $F_U(0.2)=P(U \le u)$ for $u=0.2$:

So, what is the CDF for $U$?

Check:

* Let's find $P(0.25 < U \le 0.75)$:

* Let's find $P(0.45 < U \le 0.55)$:

Note that since $F_U(u)=u$, then $F_U(b)-F_U(a)=b-a$.

* So what is  $P(0.4995 < U \le 0.5005)$?

* Then what is $P(U=0.5)$?

In general, what is $P(U=u)$?

**This random variable has no probability at any of the values it takes on! So, the PMF is *meaningless* for continuous RVs. How can we deal with this?**

* This video from 3Blue1Brown provides another view of this: ["Why *probability of 0* does not mean *impossible*"](https://www.youtube.com/watch?v=ZA4JkHKZM50)

# Continuous RVs

<div class="alert alert-info">
    <b>Continuous Random Variable</b>
    
A random variable $X$ is called **continuous** if its there is a nonnegative continuous function $f_X$, called the **probability density function of $X$**, or PDF for short, such that
    
$$P(X\in B) = \int_B f_X(x) dx$$

</div>

<div class="alert alert-warning">
    
Continuous random variables **do not** have probability at any discrete points, i.e., 
    
$$P(X = x)=0~~ \forall x \in \mathbb{R}$$

The probability is distributed over ranges of real numbers.
</div>

<div class="alert alert-success">
    
It is possible to have a random variable for which some of the probability is concentrated at individual points and some of the
probability is distributed over continous ranges.

These are called **mixed** random variables and will not be covered in this class

</div>

Continuous random variables do not have probability at any discrete values. However, they do have **density** of probability at values:

<div class="alert alert-info">
    <b>Probability Density Function</b>
    
The **probability density function (pdf)** of a random variable $X:\Omega \rightarrow \mathbb{R}$ is denoted by $f_X(x)$ and is the derivative (which may not exist at some places) of the CDF function $F_X(x)$:

$$f_X(x)= \frac{d F_X(x)}{dx}$$
</div>

<div class="alert-success">

Then, by the *Fundamental Theorem of Calculus*,
    
\begin{align}
F_X(x) &= \int_{-\infty}^{x} f_X(x)~dx +F_X(-\infty)\\
&= \int_{-\infty}^{x} f_X(u)~du
\end{align}
</div>

Reminder of the Fundamental Theorem of Calculus in a gif:

![Fundamental Theorem of Calculus](https://upload.wikimedia.org/wikipedia/commons/3/31/Fundamental_theorem_of_calculus_%28animation_%29.gif)
                                   

Let's plot the CDF of a uniform random variable:

In [None]:
u = np.linspace(-0.1,1.1,100)

plt.plot(u,U.cdf(u));

The CDF of the Uniform random variable is given by:

$$F_X(x) = \begin{cases} 0 & x<a \\ \frac{x-a}{b-a} & x\in [a,b]\\ 1 & x>b \end{cases}$$

In [None]:
plt.step(u, U.pdf(u));

* A Uniform RV models continuous-valued instances that are equally likely to occur in a given interval $[a,b]$.

$$X \sim \text{Uniform}(a,b) \text{ or }$$

For short, 

$$X \sim U(a,b) $$

* The **probability density function (PDF)** of the Uniform random variable is:

$$f_X(x) = \begin{cases}\frac{1}{b-a} & x\in [a,b]\\ 0 & \text{o.w.}\end{cases}$$

* The **cumulative distribution function (CDF)** of this Uniform random variable is:

$$F_X(x) = \begin{cases} 0 & x<a \\ \frac{x-a}{b-a} & x\in [a,b]\\ 1 & x>b \end{cases}$$

## Exponential RV

**Used to model:** Lifetime of an electrical device, service time or time between arrivals in a queue, distance between mutations on a DNA strand, monthly and annual maximum values of daily rainfall.

* Obtainable as a limit of Geometric random variables.

* This RV has a single parameter. Typically use $\lambda$, but some books use $\mu=1/\lambda$.

* We say that $X$ is an exponential RV and write as: $X\sim \text{Exponential}(\lambda )$.

* The **probability density function (pdf)** is given as:

$$ f_X(x) = \begin{cases} \lambda e^{-\lambda x}, & x \ge 0 \\ 0, & x < 0  \end{cases} $$

(or)

$$ f_X(x) = \begin{cases} \frac{1}{\mu} e^{-x/\mu}, & x \ge 0 \\ 0, & x < 0  \end{cases} $$

We will use the first form because it is more common and simpler.

* The CDF is given as:

\begin{align*}
F_X(x) &= \int_{-\infty}^{\infty} f_X(x) dx\\ 
&= \begin{cases}\int_0^{\infty} \lambda e^{-\lambda x}dx & x\geq 0 \\ 0 & x<0 \end{cases}\\
&= \begin{cases} 1-e^{-\lambda x} & x\geq 0 \\ 0 & x<0 \end{cases}\\
\end{align*}

# Limits of RVs

## Limit of Geometric Random Variables

Let $G\sim \text{Geometric}(p)$ and $E\sim\text{Exponential}(\lambda)$, their CDFs look like:

In [None]:
p=0.2
lam = 0.2
G = stats.geom(p)
E = stats.expon(scale=1/lam)
x = np.linspace(-1,20,100)

plt.figure(figsize=(8,5))
plt.step(x, G.cdf(x), label='Geometric(p=0.2)')
plt.step(x, E.cdf(x), label='Exponential($\lambda$=0.2)')
plt.legend(fontsize=15);

We can write 

$$F_G(x) = F_E(x\delta)$$ 

for all $x=1,2,\dots$, where $\delta$ is chosen so that $e^{-\lambda\delta}=1-p$.

As $\delta$ approaches zero, the exponential RV can be interpreted as the "limit" of the geometric RV.

In [None]:
delta = -np.log(1-p)/lam
delta

In [None]:
plt.figure(figsize=(8,5))
plt.plot(x, G.cdf(x), label='$F_{Geom}(x)$')
plt.plot(x, E.cdf(x*delta),'--', label='$F_{Exp}(x\delta)$')
plt.legend(fontsize=15); plt.xlabel('$x'); plt.ylabel('CDF');

Suppose now that you toss a coin very quickly (every $\delta$ seconds, $\delta<<1$), a biased coin with a very small probability of Heads (equal to $p = 1 - e^{-\lambda\delta}$). Then, the first time to obtain a Heads (a geometric RV with parameter $p$) is a close approximation to an exponential RV with parameter $\lambda$, in the sense that the corresponding CDFs are very close to each other, as illustrated above.

* This relationship between the Geometric and Exponential RVs will play an important role in Bernoulli and Poisson Point processes (we will not cover these in this course).

* A Poisson point process can model events that appear to happen at a certain rate, but completely at random (without a certain structure).
    * Application example: Poisson processes to model lattice cellular networks.

## Limit of Binomial Random Variables

If the limit of properly chosen Geometric RVs is an Exponential RV, what is the limit of Binomial RVs?

Let's consider a sequence of binomial random variables with 4, 20, 40, and 200 trials. Then divide the values by 1, 5, 10 and 50, so that the values stay centered around the same range:

In [None]:
from ipywidgets import interactive

What do you observe?

<!-- 1. The curves become smooth and bell-shaped.

2. The width of the "bell" gets smaller as we increase the number of Bernoulli trials. -->

**Why?** To start, let's focus on the 1st observation.

Recall that the Binomial RV is the sum of independent Bernoulli RVs. Maybe we have a similar phenomena with other RVs?

## Limit of the Sum of Uniform RVs

## Limit of the Sum of Exponential RVs

Fascinating! The average of RVs (themselves without a bell-shaped distribution) will approach a bell-shaped distribution as the number of RVs increases!

This bell-shaped distribution is known as the **Gaussian random variable** and it has very important properties such as the **Central Limit Theorem (CLT)**.

___

# Central Limit Theorem

<div class="alert alert-info">
    <b>Central Limit Theorem</b>
    
The **Central Limit Theorem (CLT)** says (very roughly) that the **average** of a large number of almost *any* type of random variables will have the same type of distribution, called the **Gaussian** distribution.
    
</div>

More formally, if $X_i, i=1,2,\dots, N$, is a sequence of independent random variables with finite variance (to be defined later), then the distribution function of

$$\overline{X} = \lim_{n\longrightarrow\infty} \frac{1}{n} \sum_{i=1}^n X_i$$

The distribution for $\overline{X}$ is called **Gaussian**.

The density function for a Gaussian random variable has a somewhat complicated form:

$$f_X(x) = \frac{1}{\sqrt{2 \pi \sigma^2}}\exp \left\{ - \frac{(x-\mu)^2}{2\sigma^2} \right\},$$

with parameters $\mu$ and $\sigma^2 \ge 0$. 

* The parameter $\mu$ is called the **mean** of the Gaussian distribution.
* The parameter $\sigma^2$ is called the **variance** of the Gaussian distribution.
* The parameter $\sigma$ is called the **standard deviation** of the Gaussian distribution.

* Sometimes the term **Normal** distribution is used to refer to the Gaussian random variable that has parameters $\mu=0$ and $\sigma^2=1$, so we will use the term Gaussian to refer to any random variable with this density function. 

(Also, "Gaussian" is more common in ECE. "Normal" is more common in Statistics and Math.)