# Chapter 2. **Probability**


In the first chapter, we introduced some basic problems and ideas of probabilistic programming and probability. Now, let's review some of those concepts in greater detail. Recall the Example 1.2 of A/B Testing from the introduction, in which we used the binomial distribution in our probabilistic model to generate data. This chapter will explain why the binomial distribution was the best choice for that example.

A basic understanding of *probability* is critical for mastering probabilistic programming because we want to model our data while understanding the controlling factors. To answer our experimental questions, we need to represent our assumptions about our variables as *probability distributions* to generate new samples with a generative model.


## Probability Background


---


---
***Example 2.1.*** **Patient with a cold**

Recall the generative model for a patient with a cold:

![](media/chapter2/gen_model_cold.jpg)

In this example, the patient can have symptoms of fever, coughing or sneezing. 

The set of all possible outcomes is called a <a id='sample space'>*sample space*</a> $S$. 

Thus, $S=\{fever, coughing, sneezing\}$.  

Sometimes, we also want to consider an event, such as the patient coughing. 

Formally, an <a id='event'>*event*</a> is any subset of the sample space $S$. 

For example, the above event can be represented with a subset $A=\{coughing\}$, which includes patients that are only coughing. If we wanted to look at patients that only have a fever, we would describe them with another event, which can be expressed as a subset $B=\{fever\}$. Alternatively, if we want to consider patients that either are coughing or have a fever, the subset that describes such a subgroup would be $C=\{coughing, fever\}$.

Let's consider another example from the previous chapter to reinforce our understanding of the definitions of sample space and event.

***Example 2.2 *** **The testing of Version A of the email sign-up design on 8 people**

The generative model for the testing of Version A of the email sign-up page on 8 people is described in the figure below.

![alt text](https://drive.google.com/uc?id=1HRkUWSl-lVuyRewwyPa0tCQnuB89ZiPX)

What outcomes can be expected from the experiment? Before answering this question, it is helpful to observe that the outcomes can be interpreted in two different ways:



1.   We could just be interested in the **number of people** who signed up. For example, 5 out 8 users signed up regardless of the order that they signed up.

2.   We could want to see the response of each person **(Yes/No)**. This outcome assumes that our people are numbered from 1 to 8, and we are interested in whether each specific person signed up or not.


Suppose we are interested in the second interpretation and denote a person that signs up by $Y$ and a person that doesn't by $N$. Since we want to see each person's response, we enumerate them before the experiment. Then the outcome in which only the first person signed up can be written as $YNNNNNNN$; or the outcome in which only the second person signed up is $NYNNNNNN$. 

The sample space $S$ consists of possible combinations of $Y$ and $N$ for 8 users. We could write all possible outcomes down, but since there are so many of them, we will not do that here.

Suppose we are interested in event $A$ that represents '$7$ users signed up'. Then this event is a subset of the sample space $S$ that consists of all outcomes in which $7$ users signed up.
So $A=\{\\
NYYYYYYY,\\
YNYYYYYY,\\
YYNYYYYY,\\
YYYNYYYY,\\
YYYYNYYY,\\
YYYYYNYY,\\
YYYYYYNY,\\
YYYYYYYN\\
\}.$

If we want $B$ to be an event that exactly describes the above picture's configuration, then $B=\{YNNYYNYY\}$.

## Quiz 2.1


---

1.   In Example 2.1, which of the following is the subset $D$ that represents the event where a patient has a fever or is sneezing?

        a) $D = \{fever\}$, $D = \{sneezing\}$
        
        b) $D = \{fever, sneezing\}$  
        
        c) $D = \{fever \& sneezing\}$   
        
        d) $D = \{fever, coughing, sneezing\}$


2.   Using the problem posed in Example 2.2, which of the following is an event?
        
        a) $A = \{NYYNNN\}$   
        
        b) $A = \{N\}$  
        
        c) $A = \{YNYYNNYN\}$   
        
        d) $A = \{YN\}$

       




## Probability

Let's say you run an experiment a very large number of times. The probability of event $A$ occurring is $p$, where $p$ represents the proportion of times $A$ will occur, and the probability that $A$ will not occur is represented by $(1-p)$. Intuitively, the probability of an event ocurring is a number between $0$ and $1$. If the probability of an event occuring is closer to $1$, then that event is more likely to occur than not occur. Likewise, if the probability of an event occurring is closer to $0$, then that event is more likely to not occur than to occur.  We have already encountered probability in previous examples (the parameter $p$ that describes the probability of a person signing up in the A/B testing example is a probability). Now we are going to properly define it.

<a id='probability'>*Probability*</a> is a function denoted by $P()$ that assigns a value between $0$ and $1$ to each event $A$ of the sample space $S$ and satisfies certain requirements called axioms:

1.   Probability of any event is non-negative. ($P(A)\geq 0$ for any event $A$)
2.   Probability that the outcome is from a sample space equals 1. ($P(S)=1$, where $S$ is a sample space).
3.   If two events $A$ and $B$ cannot happen simultaneously (are mutually exclusive), then $P(A+B)=P(A)+P(B)$, where $A+B$ means at least one of the events $A$ or $B$ happens.

Let's define a probability function that fits the depicted model of a patient with cold from Example 2.1.

<html>
<table style="width:50%">
  <tr>
    <th><b><center>Event</center></b></th>
    <th><b><center>Probability</center></b></th> 
  </tr>
  <tr>
    <td><center>$\{fever\}$</center></td>
    <td><center>$1/10$</center></td>
  </tr>
  <tr>
    <td><center>$\{coughing\}$</center></td>
    <td><center>$4/10$</center></td>
  </tr>
    <tr>
    <td><center>$\{sneezing\}$</center></td>
    <td><center>$5/10$</center></td>
  </tr>
    <tr>
    <td><center>$\{fever, coughing\}$</center></td>
    <td><center>$1/10+4/10 = 5/10$</center></td>
  </tr>
    <tr>
    <td><center>$\{fever,sneezing\}$</center></td>
    <td><center>$1/10+5/10=6/10$</center></td>
  </tr>
    <tr>
    <td><center>$\{coughing,sneezing\}$</center></td>
    <td><center>$4/10+5/10=9/10$</center></td>
  </tr>
    <tr>
    <td><center>$S=\{fever,coughing,sneezing\}$</center></td>
    <td><center>$1/10+4/10+5/10=1$</center></td>
  </tr>
</table>
  </html>
    
You can check yourself that the probability function above satisfies all of the axioms.

***Example 2.3.*** Consider just one person from Example 2.2. 

![](media/chapter2/model_user.jpg)

There are only two possible outcomes, so the sample space $S=\{Y, N\}$, where as previously mentioned, $Y$ represents the case where the user signs up and $N$ represents the case where the user does not sign up for email updates. Let $A$ be an event "the person signs up", that is $A=\{Y\}$. We know from the above model, that chance of signing up is $5/8$, so we set $P(A)=5/8=0.625$. Let $B$ be an event "the person doesn't sign up", or $B=\{N\}$, which also has a chance of $5/10$ and thus we set $P(B)=3/8=0.375$. Another possible event is the sample space itself and since events $A$ and $B$ cannot happen simultaneously, $P(S)=P(A+B)=5/8+3/8=1$ that satisfies the axiom 2.


---


In summary, for any random experiment one has to assign probability $P()$ that expresses one's belief in how the experiment is conducted while satisfying the three axioms. The probability of an event is a sum of probabilities of outcomes belonging to this event.

We will not focus on counting probabilities in this module, instead we will let the computer do that for us.


In the context of Probabilistic Programming, we are interested not in a probability of a single event, but rather the 'big picture': what are the probabilities for each of the events, which one is more likely to happen, etc. Probability distributions can help answer these questions. But first, let's look at the notion of random variable.

## Quiz 2.2


---

1.   Which of the following could represent the probability of Event $X$ occurring?

        a) $P(X) = 0.5$
        
        b) $P = 0.5$
        
        c) $0.5$
        
        d) $P(X) = 1.2$
        


2.   Given that $P(A) = 0.10$, $P(B) = 0.75$, $P(C) = 0.80$ and $P(D) = 0.05$, which of the following events is most likely to occur?
        
        a) Event $A$
        
        b) Event $B$
        
        c) Event $C$
        
        d) Event $D$

       

## Random Variables

We have now defined the sample space $S$ and the event, $A$. Previously, we have used words and abbreviations to represent the outcomes, but we can also denote the outcomes of an experiment with numbers. Working with numbers allows one to use mathematics to extend the theory. 

Recall in the illness Example 2.1, our sample space $S$ consisted of $S=\{fever,coughing,sneezing\}$. We can rewrite our sample space in terms of numbers, where fever corresponds to $1$, coughing to $2$, and sneezing to $3$. This results in a change in how the sample space is represented ($S=\{1,2,3\}$), but the random experiment itself doesn't change. This transition from a textual description of outcomes to numerical encoding is captured by the <a id='random variable'>*random variable*</a> ($X$). The same way one can generate an outcome by performing a random experiment, one can also generate the quantified outcome (in other words, generate an outcome and transition to a value specified by a random variable). From now on, we are going to use the quantified outcomes (i.e. random variables) and instead of working with a sample space (that remains unchanged, $S=\{fever, coughing, sneezing\}$), we are going to use the <a id='range'>*range of a random variable*</a> ($R(X)=\{1,2,3\}$), which is defined as the set of all values a random variable can take on. 

Let's go back to the Example 2.2 (A/B testing). In this case the sample space $S$ consists of all sets of 8 (octuples) of $Y$ or $N$. Suppose now we want to keep track of only the number of sign-ups in a random experiment. For that, we assign random variable $Z$ to be equal to the number of successes in each outcome. For example, $Z(YNNNNNNN)=1$ and $Z(NNYYNNNN)=2$, etc.
The set of all possible quantified outcomes is $R(Z)=\{0,1,2,3,4,5,6,7,8\}$.

One can define probabilities for random variables the same way as we defined probabilities for individual events. This is exactly what the probability distribution captures. Specifically, the <a id='distribution'>*probability distribution*</a> is an assignment of a probability to each value in the range of a random variable, where the probabilities sum to one.

We can easily sample from the binomial distribution that we introduced for A/B testing in the Introduction. The following python code shows how to generate $1,000$ random samples from the binomial distribution, where the sample space is $S=\{0,1,2,3,4,5,6,7,8\}$, and the probability of success for a single trial is $p=0.625$:






In [6]:
# generate random variables
from numpy.random import binomial

n = 8
p = 0.625
size = 1000

# generate 1000 random samples with n = 8 and p = 0.625
r = binomial(n, p, size = size)

#print(r)

## Quiz 2.3


---



1.   Let sample space $S$ consist of all sets of 8 (octuples) of Y or N, where Y represents sign-up and N represents not signing up (same as in A/B testing Example 2.2). Let random variable $Z$ be equal to the number of successes in each outcome.


   *   $Z(YNYYYNNNN)=$
   
          a) $2$
          
          b) $4$
          
          c) $5$
          
          d) $0$
          
          
   *   $Z(NNNNNNNN)=$
   
          a) $2$
          
          b) $4$
          
          c) $5$
          
          d) $0$
          
          
   *   $Z(NNYYNNNN)=$
   
          a) $2$
          
          b) $4$
          
          c) $5$
          
          d) $0$
          
2.   In the setup of question 1, which answer has a numerical encoding of 3? 
        
       (i.e. $Z(?)=3$)
            
        a) $YYNNNNNY$
        
        b) $YYYYNNNN$
        
        c) $NYNYNYNN$
        
        d) $NNNYYYNN$



## Binomial Distribution

In the introduction, we briefly discussed the binomial distribution in the context of A/B testing. Recall, we had two versions of a visual, Versions A and B as shown below:

![](media/chapter2/versions.jpg)

In this problem, we were able to take advantage of the binomial distribution as a shortcut to determine which version of the graphic resulted in a greater number of people signing up for emails. In this example, the number of objects $n$ represented the number of people in a testing group ($n = 8$ and $n = 10$ for Versions A and B, respectively) and the estimate of the parameter of the true underlying distribution, $p$, was the ratio of the number of people that signed up to the number of people in a testing group. In other words, the observation of our random variable $X$ is equal to the number of successes (e.g. for Version $A$, $X = 5$ and for Version $B$, $X = 2$), and the probability of success $p$ is the number of successes divided by the number of people (e.g. for Version A, $p = 5/8 = 0.625$ and for Version $B$, $p = 2/10 = 0.2$).

Let's now redefine the binomial distribution given what we have just learned about probability. The <a id='binomial'>*binomial distribution*</a> can be used to describe a dichotomous variable (such as a yes/no question, or in this case sign-ups) and is described by two parameters:

$n$ = number of trials.

$p$ = probability of success for each trial, represented as a number between $0$ and $1$.

If we want to generate a random number for the given example, we can use the binomial distribution with $n = 8$ and $p = 0.625$ as in the following code:

In [10]:
from numpy.random import binomial

# parameters n and p
n = 8
p = 0.625

# randomly generated value for the given example
rand = np.random.binomial(n, p, size = 1)

print('Out of 8 people', rand[0], 'signed up')

Out of 8 people [6] signed up


The graph of the binomial distribution with $n = 8$ and $p = 0.625$ is the following:

![](media/chapter2/binomial.png)

To assign the binomial distribution to a random variable `X` with PyMC3 package, use `X = pymc3.Binomial('X', n, p)`.

The binomial probability mass function looks graphically different when the parameters $n$ and $p$ are changed.

In [31]:
# @hidden_cell

%matplotlib inline
import scipy as sp
from scipy.stats import binom
from ipywidgets import interactive
import matplotlib.pyplot as plt
import numpy as np

def f(n = 8, p = 0.625):
    fig, ax = plt.subplots(1, 1)

    x = np.arange(0, n + 1)
    
    ax.plot(x, binom.pmf(x, n, p), 'bo')
    plt.title('Binomial')
    ax.vlines(x, 0, sp.stats.binom.pmf(x, n, p), colors='k')
    plt.xlim(-0.9, n+1)
    #plt.ylim(0,1)
    plt.show()

interactive_plot = interactive(f, n=(1, 100), p=(0, 1, 0.01))
output = interactive_plot.children[-1]
output.layout.height = '350px'
interactive_plot


interactive(children=(IntSlider(value=8, description='n', min=1), FloatSlider(value=0.625, description='p', ma…

## Bernoulli Distribution

The <a id='bernoulli'>Bernoulli distribution</a> is a specific kind of binomial distribution. Like the binomial distribution, it is a discrete distribution that answers a dichotomous question. The Bernoulli distribution however, only has one parameter $p$ because it describes the case where $n = 1$ (i.e. only one trial is conducted instead of $n$ trials).

In other words, the Bernoulli distribution is just the binomial distribution when $n = 1$. Set $n$ to 1 in the slider above and observe how alterations of parameter $p$ affect the graph.

## Quiz 2.4


---

1.   Are there any parameters $p$ for the Bernoulli distribution, so the probability of one of the outcomes is equal to 1?

        a) Yes
        
        b) No


2.   Determine possible parameters $p$ that satisfy the following: outcome $12$ is more likely to happen than any other outcome in the Binomial distribution with parameters $p$ and $n=16$. 
        
        a) $p=0.58$
        
        b) $p=0.73$
        
        c) $p=0.75$
        
        d) $p=0.77$


## Distributions
---
Recall that a probability distribution describes the probability that an event will occur for all values in the range of a random variable. Probability distributions are described by a random variable and a probability function. Probability distributions can be either discrete or continuous (see below) and are described by <a id='distpar'>distribution parameters</a>.


A random variable and distribution are called <a id='discrete'>*discrete*</a> if the range of a random variable is countable (i.e. one can count all elements of the set). The alternative case, when one cannot count all the elements in the range of a random variable (i.e. they exist on a continuum), a random variable and a distribution are called <a id='continuous'>*continuous*</a>.


---
In Example 2.2, $R(Z)=\{0,1,2,3,4,5,6,7,8\}$, one can count all elements of the set, so the distribution that describes the example, (the binomial distribution) is considered a discrete distribution.

***Example 2.4.*** Consider a random experiment that describes the number of text messages an individual receives during the day. Suppose we know that this person receives about $13$ messages daily on average. How many text messages can one receive? It is possible to receive no messages at all, but one could receive 10 or 100 messages during the day. Assuming there are no memory restrictions in the phone, the possible number of text messages during the day can be any non-negative integer number: $R(Z)=\{0,1,2,3,...\}$. There are infinitely many options, but still one can count all of the messages they received on any single day. Therefore, this experiment is described by a discrete distribution and a random variable $Z$.

***Example 2.5.*** Suppose you ordered a pizza and are waiting for the pizza delivery to arrive. According to the person you spoke with on the phone, you can expect your pizza to arrive anywhere from 2 to 15 minutes from now. When will your pizza arrive?

What is the range of a random variable $R(X)$ in this example? The pizza can arrive any time between 2 minutes and 15 minutes. Therefore, $R(X)=\{2\leq x\leq 15\}=[2,15]$. One cannot count all the elements of such set, thus distribution that describes this experiment is continuous.




## Quiz 2.5


---



1.   Choose all of the discrete random variables from the following:
      
        a) random variable $X$, given $R(X)=\{2,4,8,10\}$
        
        b) random variable $Y$, given $R(Y)=\{0\leq x\leq 10\}=[0,10]$ (all real numbers between $0$ and $10$)
        
        c) random variable $Z$, given $R(Z)=\{1,2,3,...\}$ (all positive integers)
        
        
          
2.   Which of the following is a range of a random variable X for an experiment that describes the number of full glasses of water one can drink during the day?

        a) $R(X)=\{1,2,3,4\}$
        
        b) $R(X)=\{0,1,2,3,...\}$
        
        c) $R(X)=\{1,2,3,...\}$



### Discrete distributions

---



For discrete distributions, the probability function that describes the distribution is called a <a id='pmf'>*probability mass function*</a> (PMF) and for each value $k$ of random variable $X$ it describes probability $P({X=k})$ of an event $\{X=k\}$ in a way that satisfies above axioms. Distributions can be represented as a graph where the $x$-axis corresponds to the values of random variable, and the $y$-axis corresponds to a probability mass function at each value.

As an example, if we go back to the graph of binomial distribution and look at the point whose $x$-coordinate is 6 (i.e. this point represents an event $\{X=6\}$), one can find probability $P(X=6)$ that is the $y$-coordinate of the considered point.


![](media/chapter2/binomial_pmf.jpg)



Let's consider a few of the most common discrete distributions.

### Discrete Uniform Distribution

---


***Example 2.6.*** The University of Hawaiʻi's Bookstore is giving away a free laptop. $50$ people entered to win the laptop, and each person can enter their name only once, and the person who wins the laptop will be chosen at random. Therefore, each person has an equal chance of winning . In this random experiment, the sample space consists of all the people that entered the contest to win the laptop. To consider this problem in terms of a random variable, we can enumerate participants, such that the first person is given a label of $1$, the second a label of $2$, etc. The range of a random variable $X$ is then $R(X)=\{1,2,3,...,49,50\}$ and the winner of the laptop is chosen at random from a number between $1$ and $50$, where each individual has an equal chance of winning. The distribution that describes such an experiment is called a *discrete uniform distribution*.

The <a id='dud'>*discrete uniform distribution*</a> describes a distribution of finite number of equally spaced and equally likely outcomes. For instance, in the example above, we have $50$ individuals (a finite number of outcomes) who entered the giveaway. The outcomes are equally spaced because you cannot have half of a person, and each person has an equal chance of winning, therefore satisfying the requirements for a discrete uniform distribution.


The distribution is described by two parameters:

*   $a =$ minimum value
*   $b =$ maximum value

The discrete uniform variable takes on one of the following values: $a,a+1,...,b-1,b$.

Returning to Example 2.6, we can see that the experiment is described by the discrete uniform distribution where the minimum value parameter is $a=1$ and the maximum value parameter is $b=50$. The following code in Python shows how to generate a winner using the distribution.




In [12]:
from numpy.random import randint

# parameters a and b
a = 1
b = 50

# randomly generated value for the given example
rand = randint(a, b, size = 1)

print('We can now pick the winner, Person #', rand[0], 'wins the laptop!')

We can now pick the winner, Person # [25] wins the laptop!


The graph of the discrete uniform distribution with  $a=1$  and  $b=50$  is the following:

![](media/chapter2/discreteuniform.png)

To assign the discrete uniform distribution to a random variable `X` with PyMC3 package, use `X = pymc3.DiscreteUniform('X', a, b)`.

Notice how the discrete uniform distribution changes as its parameters are altered.





In [33]:
# @hidden_cell

%matplotlib inline
import scipy as sp
from ipywidgets import interactive
import matplotlib.pyplot as plt
import numpy as np

def f(a = 1, b = 50):
    if (a>=b):
        print("error: a should be less than b")
    else:
        x = np.arange(a, b + 1)
        pmf = [1 / (b - a + 1)] * len(x)
        fig, ax = plt.subplots(1, 1)
        ax.plot(x, pmf, 'bo')
        ax.vlines(x, 0, pmf, colors = 'k' )
        plt.title('Discrete Uniform')
        #plt.ylim(0, 1)
        plt.show()
    

interactive_plot = interactive(f, a=(-10, 50), b=(-10, 50))
output = interactive_plot.children[-1]
output.layout.height = '350px'
interactive_plot

interactive(children=(IntSlider(value=1, description='a', max=50, min=-10), IntSlider(value=50, description='b…

## Quiz 2.6


---

1.   Which pairs of parameters, $a$ and $b$, do not exist for the discrete uniform distribution?

        a) $a=10$, $b=15$
        
        b) $a=-1$, $b=0$
        
        c) $a=30$, $b=20$
        
        d) $a=10$, $b=12$
        


2.   How many possible outcomes does the uniform distribution with parameters $a=18$ and $b=28$ have?
        
        a) 9
        
        b) 10
        
        c) 11
        
        d) 12


### Poisson Distribution

Another very popular discrete probablity distribution is the Poisson (pronounced [/ˈpwɑːsɒn/](https://www.youtube.com/watch?v=bTENbblyAKE)) distribution. For example, if we are interested in the number of text messages one receives during the day (Example 2.4), we can use the Poisson distribution to model this. 

%%add link to IPA above for Poisson with audio recording of pronunciation

The <a id='poisson'>*Poisson distribution*</a> models the number of events occurring in a fixed period of time when the times at which events occur are independent. The distribution is described by one parameter:

*   $\lambda=$ expected number of events per interval ($\lambda>0$)

To choose the parameters for the experiment described in Example 2.4, we need to know what we are expecting. Recall that we know the person usually receives about $13$ messages daily. Based off of this prior knowledge, we can expect this person to receive about $13$ messages for any day of interest. Thus, we set $\lambda=13$. The following code in Python shows how to generate an outcome for this random experiment.



In [14]:
from numpy.random import poisson

# parameter lam
lam = 13

# randomly generated value for the given example.
rand = poisson(lam, size = 1)

print(rand[0], 'messages were received.')

8 messages were received.


The graph of the Poisson distribution with  $\lambda=13$ is the following:

![](media/chapter2/poisson.png)

To assign the Poisson distribution to a random variable `X` with PyMC3 package, use `X = pymc3.Poisson('X', lam)`.

Notice how Poisson distribution changes as the parameter $\lambda$ is changing.

In [8]:
# @hidden_cell

%matplotlib inline
import scipy as sp
from scipy.stats import poisson
from ipywidgets import interactive
import matplotlib.pyplot as plt
import numpy as np

def f(lam = 13):
    fig, ax = plt.subplots(1, 1)
    x = np.arange(0,80)
    ax.plot(x, poisson.pmf(x, lam), 'bo')
    ax.vlines(x, 0, poisson.pmf(x, lam), colors='k')
    plt.xlim(-5,80)
    plt.title('Poisson')
    plt.show()

interactive_plot = interactive(f, lam=(0.1, 50, 0.1))
output = interactive_plot.children[-1]
output.layout.height = '350px'
interactive_plot

interactive(children=(FloatSlider(value=13.0, description='lam', max=50.0, min=0.1), Output(layout=Layout(heig…

## Quiz 2.7


---

1.   Compare $P(X=20)$ and $P(Y=50)$ if $X$ is generated by the Poisson distribution with parameter $\lambda=20$ and $Y$ is generated by the Poisson distribution with parameter $\lambda=50$. Which of the following is true?

        a) $P(X=20)=P(Y=50)$
        
        b) $P(X=20)<P(Y=50)$
        
        c) $P(X=20)>P(Y=50)$


2.   Let $Z$ be generated by the Poisson distribution with parameter $\lambda=7$. How many different outcomes $k$ satisfy $P(Z=k)>0.08$?
        
        a) $0$
  
        b) $5$
        
        c) $8$
        
        d) infinitely many

## Continuous Distributions
---
A random variable (or a distribution) is *continuous* if the range of a random variable are values within a continuum and thus can have an infinite number of values within a range. The probability function for a continuous distribution is called a <a id='pdf'>*probability density function*</a> or PDF. Unlike the probability mass function, the probability density function doesn't describe a probability of a specific outcome (a probability of event $X=k$ where $X$ is a random variable and $k$ is a value from the set $R(X)$). Since there are uncountably many possible outcomes, the probability of a single outcome to happen equals zero. 

Imagine water drops falling on a rectangular table. The surface of the table consists of uncountably many points. What is a probability that a drop will fall exactly in the middle of the table? There could be many drops that will fall very close to the middle of the table, but the probability that a drop will fall exactly in the middle or at any other specific point on the table is zero. Still, it is possible to note the areas where more drops fall compared to other areas in the case where drops of water come from a sprinkler that is located on one side of the table, or maybe drops cover table uniformly in the case where it is raining. The probability density function describes how dense the outcomes are distributed.

<center><img src="media/chapter2/table.png" width="700"></center>

How can one measure density? Let's suppose one side of the table is $5$ ft long, and keep track where the drops fall only with respect to this side of the table. We want to see how the drops are distributed after the rain has been falling for a while. Let's put $5$ containers on the table, each $1$ ft wide, so that each drop falls into one of them. Then after a while we evaluate how many drops fell into each container and obtain a histogram with $5$ bars. Now, let's instead consider $10$ containers each $6$ inches wide, then $20$ containers each $3$ inches wide and so on. As we increase the number of containers and decrease their width, the density of the number of points per container) is getting closer to the density at the specific point. In other words, the histogram is getting closer and closer to a probability density function.

Now let's look at several common continuous distributions, the continuous uniform and Gaussian distributions.

### Continuous Uniform Distribution
---
***Example 2.7.*** Suppose you are waiting for a pizza to arrive. You were told to expect the pizza delivery anytime between 2 to 15 minutes from when you ordered the pizza. When will your pizza arrive? The distribution that describes such an experiment is called the *continuous uniform distribution*.

The <a id='cud'>*continuous uniform distribution*</a> (sometimes referred to as a rectangular distribution) describes a distribution where each point from the interval $[a,b]$ is equally likely to be chosen.

Similar to the discrete uniform distribution, the continuous uniform distribution has two ($a$ and $b$) parameters which can be any value between $-\infty$ and $\infty$:

*   $a$ = minimum value
*   $b$ = maximum value

In terms of our example, because your pizza will come any time between 2 and 15 minutes from now, the parameters should be  $a=2$ and $b=15$. The following code in Python shows how to generate an outcome for this random experiment.

In [16]:
from numpy.random import uniform

# parameters a and b
a = 2
b = 15

# randomly generated value for the given example
rand = uniform(a, b, size = 1)

print('The pizza will arrive in', rand[0], 'minutes.')


The pizza will arrive in 9.806036398609734 minutes.


The graph of the continuous uniform distribution with  $a=2$  and  $b=15$  is the following:

![](media/chapter2/continuousuniform.png)

To assign the continuous uniform distribution to a random variable `X` with PyMC3 package, use `X = pymc3.Uniform('X', a, b)`.

Notice how the graph of continuous uniform distribution changes as the parameters are changing.

In [1]:
# @hidden_cell

%matplotlib inline
from ipywidgets import interactive
import matplotlib.pyplot as plt
import numpy as np

def f(a = 2, b = 15):
    if (a>=b):
        print("error: a should be less than b")
    else:
        x = np.arange(a,b+1)
        pdf = [1/(b-a)]*len(x)
        fig4, ax = plt.subplots()
        plt.plot(x,pdf)
        plt.title('Continuous Uniform Distribution')
        plt.show()

interactive_plot = interactive(f, a = (-10, 30, 0.1), b = (-10, 30, 0.1))
output = interactive_plot.children[-1]
output.layout.height = '350px'
interactive_plot

interactive(children=(FloatSlider(value=2.0, description='a', max=30.0, min=-10.0), FloatSlider(value=15.0, de…

## Quiz 2.8


---

1.   Which pairs of parameters $a$ and $b$ do not exist for the continuous uniform distribution?

        a) $a=2$, $b=10$
        
        b) $a=3$, $b=4$
        
        c) $a=3.5$, $b=4$
        
        d) $a=5.2$, $b=4.6$
        


2.   What does the density function of the continuous uniform distribution with parameters $a=5$ and $b=5.5$ evaluated at $x=5.1$ equal?
        
        a) $0.5$
        
        b) $1$
        
        c) $2$
        
        d) $20$

### Gaussian (normal) Distribution
---



***Example 2.8.*** You go to a <a href = "https://en.wikipedia.org/wiki/Poke_(Native_Hawaiian_dish)"> poke</a> shop regularly and always ask for 8 oz. of poke each time. After you buy the poke you measure its weight with a very precise experimental lab scale. As opposed to commercial grade scales, experimental lab scales are much more precise and accurate. Therefore, it shouldn't be a surprise if the actual weight of your poke bowl is actually slightly above or below the one-pound mark. 

Out of curiosity, you decide to keep track of the difference between the actual weight and $8$ oz. For example, if the actual weight is 8.15 oz., then you record the difference: $8.15-8=0.15$, or if the actual weight is 7.9 oz., you record $7.9-8=-0.1$. Since we asked for $8$ oz, the difference should be close to $0$ each time you buy poke. Also, we can assume that we most likely will not have more than $0.25$ oz. off of the weight you asked for. The distribution that describes such an experiment is called the *Gaussian (or normal) distribution*. 

The <a id='gaussian'>*gaussian distribution*</a> describes a distribution that generates data that varies or fluctuates around an average value in an expected way. The gaussian distribution has two distribution parameters:

*   $\mu$ = mean (or average), the value around which outcomes are centered. The mean is a location parameter.
*   $\sigma$ = standard deviation, describes how far from the mean outcomes tend to be; it is known that 95% of all outcomes appear to be between $\mu-2\sigma$ and $\mu+2\sigma$. The standard deviation is a scale parameter.

For the normal distribution, approximately $68$% of the data will fall within one standard deviation ($\sigma$) from the mean ($\mu$), $95$% will fall within two standard deviations ($2\sigma$), and $99.7$% of the data will fall within three standard deviations ($3\sigma$) from the mean. 

![reference](media/chapter2/Empirical_Rule.PNG)

In the poke example, we assume the recorded values will be centered somewhere around $0$, meaning we set $\mu=0$. We also decided that most likely the difference in weight will be no more than 0.5 oz. Let's also assume that in our case 'most likely' means in 95% of cases. That is, 95% of the differences in weights will be between $-0.5$ and $0.5$. Therefore, $2\sigma=0.5$, and $\sigma=0.25$. The following code in Python shows how to generate an outcome for this random experiment.


In [18]:
from numpy.random import normal

# define the standard deviation (sigma) and mean
sigma = 0.25
mean = 0

# randomly generated value for the given example
rand = normal(mean, sigma, size = 1)

print('The weight of poke differs from asked value by', rand[0], 'oz.')

The weight of poke differs from asked value by 0.4837818343958355 oz.


The graph of the Gaussian distribution with  $\mu=0$  and  $\sigma=0.25$  is the following:

![](media/chapter2/gaussian.png)

To assign the Gaussian distribution to a random variable `X` with PyMC3 package, use `X = pymc3.Normal('X', a, b)`

The shape of the distribution depends a lot on the parameters. Try to change the mean and standard deviation and notice how it affects the graph.

In [10]:
# @hidden_cell

%matplotlib inline
import scipy as sp
from scipy.stats import norm
from ipywidgets import interactive
import matplotlib.pyplot as plt
import numpy as np

def f(mean = 0, sigma = 0.25):
    x = np.arange(-15,15,.001)
    pdf = norm.pdf(x, mean, sigma)
    fig5 = plt.figure()
    plt.plot(x, pdf)
    plt.title('Gaussian')
    plt.show()


interactive_plot = interactive(f, mean=(-10, 10, 0.1), sigma=(0.01, 10, 0.01))
output = interactive_plot.children[-1]
output.layout.height = '350px'
interactive_plot

interactive(children=(FloatSlider(value=0.0, description='mean', max=10.0, min=-10.0), FloatSlider(value=0.25,…

## Quiz 2.9


---

1.   Let $X$ be generated by the Gaussian distribution with parameters $\mu=0$ and $\sigma=2.5$. If $\sigma$ is increased to $6$, how will $P(X=0)$ change?

        a) $P(X=0)$ will not change
        
        b) the value will increase
        
        c) the value will decrease
        


2.   Which of the following is an alternative name for the Gaussian distribution?
        
        a) uniform
        
        b) normal
        
        c) Poisson

%%%hidden part for those who want to learn more

## Data Types and Classification

In addition to distributions, data can also be classified as discrete or continuous. 

<u>Examples of discrete data types</u>

*   *ordinal data*: categorical data that has ordering (ex: letter grade on an exam)
*   *count data*: number of observations (ex: number of people in a class)
*   *nominal data*: categorical data that does not have ordering (ex: hair or eye color)


<u>Examples of continuous data types</u>

*   *ratio scale data*: zero point is fixed (ex: temperature in Kelvin)
*   *interval scale data*: zero point is not fixed (ex: temperature in Fahrenheit or Celsius)


%%% end of hidden part

## How does this all relate to probabilistic programming?

You are likely wondering right now why we've dedicated an entire chapter of this module to probability. It may not be obvious now, but these concepts are all used in probabilistic programming. 

For instance, how do the PMF and PDF relate back to probabilistic programming? The PMF (or PDF, depending on the situation) are used in probabilistic programming when sampling the data (the sampled data are proportional to the prior distribution; this will be explained in greater detail in the next chapter). Also knowledge of common distributions such as the binomial or gaussian distributions allow for a more informed decision when constructing a generative model.

**Example 2.9.** You are given a dataset that describes the number of Hawaiʻi Police Department (HPD) traffic incidents per day between July 15 and October 15 of 2018 (92 days total). Previous studies have shown that there is a sudden increase in the number of incidents during the given period of time. A research group has determined that the average number of incidents before the increase is 186 and the average number of incidents after the sudden jump is 224. Your job is to determine on what day the increase happened.

![alt text](media/chapter2/binomial_HPD.png)

What distribution describes the number of incidents that happened during the day? The Poisson distribution describes this situation the best. Recall that it has a parameter $\lambda$ that is the expected number of incidents happened during the day. If you expect traffic behavior to be the same during these 92 days, then the number of traffic incidents occurring each day is generated by the Poisson distribution with the same parameter (prior) $\lambda$. 

From the information received however, we know about a sudden jump. This means the generative model has to be adjusted accordingly. 

We assume that there is a rapid switch in behavior on day $D$. The number of incidents happened each day before day $D$ was generated according to Poisson distribution with parameter $\lambda_1$, after that according to the Poisson distribution with parameter $\lambda_2$. Based on the observed data we want to analyze when the switch in the behavior happened.

Note that we also know the parameters $\lambda_1$ and $\lambda_2$ from the research group. Therefore, $\lambda_1=186$ and $\lambda_2=224$.

Let's build model that describes the above behavior in PyMC3.

First, we import the PyMC3 package


```
import pymc3 as pm
```

Then we need to create the model within the context of a Model object. We start with creating the variable `switch` that represents the day where the change of habits happened. Since it should happen somewhere between Day 1 (July, 15) and Day 92 (October, 15), we use discrete uniform distribution with parameters $a=1$  and $b=92$.

```
with pm.Model() as HPD_model:
  switch=pm.DiscreteUniform("switch", 1, 92)
```

We also need to add information about  $\lambda_1$ and $\lambda_2$ to our model.

```
with HPD_model:
  lam_1 = 186
  lam_2 = 224
```
Our model has to generate the data, namely it has to generate a number of incidents for each of 92 days where number of incidents during the day is generated by Poisson distribution with either parameter `lam_1` or `lam_2`. Therefore, we want to generate an array of 92 elements, each with a Poisson distribution with parameter equal to either `lam_1` or `lam_2` depending on the `switch` variable. So first of all, we need to create an array `lam` that will keep track of which of the parameters has to be used for chosen day.

```
with HPD_model:
  day = np.arange(1, 92)
  lambda_ = pm.math.switch(day < switch, lambda_1, lambda_2)

```
`pm.math.switch(CONDITION, A, B)` returns `A` if `CONDITION==true` and returns `B` otherwise. Since `day` is an array, `day < switch` is an array of 1's and 0's. For each element of array `day < switch` `pm.math.switch()` returns either `lam_1` or `lam_2`, which makes `lam` an array as well.

The last step in creating the model is to generate the number of text messages received for each day, as well as include the observed data stored in the variable `HPD_data`

```
with HPD_model:
  observation = pm.Poisson("obs", lam, observed = HPD_data)
```

In the next chapter, we will explain specific concepts of probabilistic programming and demonstrate how to finish this example in PyMC3. 

In [None]:
import pymc3 as pm
import numpy as np

HPD_data = [183, 181, 155, 177, 193, 174, 187, 181, 178, 196, 187, 171,
 181, 194, 178, 201, 183, 192, 212, 159, 179, 195, 190, 206, 171, 195,
 177, 184, 202, 175, 173, 195, 207, 216, 225, 246, 197, 250, 217, 250,
 229, 230, 216, 239, 222, 224, 231, 225, 204, 227, 237, 225, 211, 209,
 235, 241, 209, 196, 246, 215, 241, 238, 233, 235, 202, 214, 225, 230, 
 187, 220, 234, 206, 228, 234, 242, 189, 250, 233, 225, 246, 224, 211,
 201, 217, 219, 224, 225, 251, 222, 220, 222, 202]

with pm.Model() as HPD_model:
    switch = pm.DiscreteUniform("switch", 1, 92)
    lam_1 = 186
    lam_2 = 224
  
    day = np.arange(1, 92+1)
  
    lam = pm.math.switch(day < switch, lam_1, lam_2)
  
    observation = pm.Poisson("obs", lam, observed = HPD_data)

## Quiz 2.10


---

1.  Determine $a$ and $b$ for which `pm.math.switch(a < b, 1, 0)` returns 1.

       a) $a=10$, $b=7$
       
       b) $a=0$, $b=1$
       
       c) $a=-10$, $b=-10$
       
       d) $a=101$, $b=100$
        
        

2.   `pm.math.switch([20, 10, 3] < 5, 100, 200) =`
        
        a) 200
        
        b) False
        
        c) [200, 200, 100]
        
        d) [False, False, True]
        
        e) There is no correct answer

### <u>Definitions</u>


<a href='#probability'>**probability (p)**</a>: a function that assigns a value between $0$ and $1$ denoted by  $P(A)$ to each event $A$ of the sample space $S$.

<a href='#sample space'>**sample space (S)**</a>: a set of all possible outcomes of an experiment.

<a href='#event'>**event**</a>: a subset of the sample space.

<a href='#random variable'>**random variable (X)**</a>: the transition from a textual description of an experiment to a numerical encoding.

<a href='#range'>**range of a random variable (R(X))**</a>: the set of all values a random variable can take on.

<a href='#distribution'>**probability distribution**</a>:  an assignment of a probability to each value in the range of a random variable, where the probabilities sum to one.

<a href='#binomial'>**binomial distribution**</a>: a distribution where the range of a random variable is dichotomous and described by two parameters ($n$ = number of trials, $p$ = odds of success, expressed as a number between $0$ and $1$)

<a href='#distpar'>**distribution parameter**</a>: a statistical parameter that describes the shape and location of a distribution.

<a href='#discrete'>**discrete distribution**</a>: a distribution where the distribution and range of a random variable are countable.

<a href='#continuous'>**continuous distribution**</a>: a distribution where the distribution and range of a random variable are not countable (i.e. the values exist on a continuum/have an infinite number of possible values).

<a href='#pmf'>**probability mass function (PMF)**</a>: the probability function that describes a discrete distribution. For each value $k$ of a random variable $X$, the probability is described by $P({X=k})$ for an event $\{X=k\}$.

<a href='#dud'>**discrete uniform distribution**</a>:  a distribution with a finite number of equally spaced and equally likely outcomes and is described by two parameters ($a =$ minimum value, $b =$ maximum value).

<a href='#poisson'>**Poisson distribution**</a>: a distribution that is described by the number of events occurring in a fixed period of time. The times at which events occur are independent and the distribution is described by one parameter ($\lambda=$ expected number of occurences).

<a href='#pdf'>**probability density function (PDF)**</a>: the probability function that describes a continuous distribution is called a *probability density function* or PDF. Unlike the PMF, the probability of a single outcome is equal to zero.

<a href='#cud'>**continuous uniform (rectangular) distribution**</a>: a distribution where each point from the interval $[a,b]$ is equally likely to be chosen. It is described by two parameters ($a =$ minimum value, $b =$ maximum value).

<a href='#gaussian'>**Gaussian (normal) distribution**</a>: a distribution that generates data that varies or fluctuates around an average value in an expected way (where approximately $68$% of the data will fall within one standard deviation ($\sigma$) from the mean ($\mu$), $95$% will fall within two standard deviations ($2\sigma$), and $99.7$% of the data will fall within three standard deviations ($3\sigma$) from the mean). The Gaussian distribution is described by two parameters ($\mu$ = mean (or average) is a location parameters and refers to the value around which outcomes are centered.  $\sigma$ = standard deviation is a scale parameter and describes how far from the mean outcomes can be; it is known that 95% of all otcomes appear to be between $\mu-2\sigma$ and $\mu+2\sigma$).






