# Probability Function {#sec-probability-function}

## Overview

In this section we review the probability function $P$ and state some of its properties. 
This function assigns a real number $P(E)$ to each event $E$ has 
to  satisfy three axioms [1]. There are many interpretations of $P(E)$. The two most common ones are frequencies and degrees of belief [1]. The first interpretation 
assigns $P(E)$ the proportion of times the event $E$ is true in repetitions. The degree-of-belief interpretation
assigns $P(E)$ the belief of an observer that the event $E$ is true [1]. This difference will manifest itself when we look into
statistical inference. 
 

## Probability function

Let's start this section with a definition.

----
**Definition 1:**

A function $P$ that assigns a real number $P(E)$ to an event $E$ is a probability measure if it satisfies the following axioms [1]:

$$P(E) \geq 0, \forall E$$
$$P(\Omega) = 1$$
If $E_i$ are disjoint then

$$P( \bigcup_{i=1}^{\infty} E_i) = \sum_{i=1}^{\infty}P(E_i)$$

----


The last property is called sigma-additivity, [3], or the addition law for probabilities [6], and only holds for disjoint events i.e.
$E_i \bigcap E_j = \emptyset$. Note that if $E_i$ are not mutially disjoint, then we need to account for their intersection. For example for two events:


$$P( E_i \bigcup E_j) = P(E_i) + P(E_j) - P(E_i \bigcap E_j)$$

There are many ways to interpret $P(E)$ but the two most common ones are frequencies and degrees of belief [1].
The first interpretation 
assigns $P(E)$ the proportion of times the event $E$ is true in repetitions. The degree-of-belief interpretation
assigns $P(E)$ the belief of an observer that the event $E$ is true [1]. This difference will manifest itself when we look into
statistical inference. Note however that the axioms given above should hold for either interpretation.

For discrete random variables, see [chapter @sec-random-variables], we call the function $P$ the probability mass function or PMF. For continuous
random variables we call $P$ the probability density function.

Let's now look at how we can calculate the probability of events on finite sample spaces.

### Calculate probability on finite sample spaces

Let's assume that the sample space $\Omega$ we are working on is finite. A finite sample space has a finitie
number of elements and therefore it is always countable i.e. its members can be labelled 
somehow by positive integers [5]. For example, when we toss a coin, the sample space $\Omega$ 
has only two elements; heads and tails:

\begin{equation}
\Omega=\{H, T\} 
\end{equation}

If we toss the coin twice then $\Omega$  will be

\begin{equation}
\Omega=\{HH, TT, HT, TH\} 
\end{equation}

Similarly, when we toss a die the sample space is:

\begin{equation}
\Omega=\{1,2,3,4,5,6\} 
\end{equation}

For such sample spaces, if we assume that each outcome is equally likely, then the probability
of an event $E$ is given by [1]:

\begin{equation}
P(E)=\frac{|E|}{|\Omega|} 
\end{equation}

Thus for a fair coin, the probability of getting $H$ or $T$ is 0.5:

\begin{equation}
P(H)=P(T) = \frac{1}{|\Omega|}
\end{equation}

Similarly, when tossing a die, the probability of getting any outcome within the sample space is 1/36.
Let's see how we can use Python to compute the probabilities above.

**Example** Let's say we flip a fair coin 5 times. What is the probability that we see at least one head?

### Example 1: Coin toss  

In [4]:
coin_toss_omega = {'H', 'T'}

In [5]:
def h_or_t(outcome: str) -> bool:
    return outcome == 'H' or outcome == 'T' 

def h(outcome: str) -> bool:
    return outcome == 'H'

def t(outcome: str) -> bool:
    return outcome == 'T'

Let's write a function that will compute the events we are interested in

In [6]:
def get_events(event_condition, sample_space: set):
    return set([outcome for outcome in sample_space
                if event_condition(outcome)])

Now we can easily compute the probabilities we are interested in

In [7]:
number_of_heads = len(get_events(h, coin_toss_omega))

# this should have size 1
assert number_of_heads == 1

# the probability of getting heads is
print(f"Probability of heads={number_of_heads / len(coin_toss_omega)}")

# similarly for tails
number_of_tails = len(get_events(t, coin_toss_omega))

# this should have size 1
assert number_of_tails == 1

# the probability of getting heads is
print(f"Probability of tails={number_of_tails / len(coin_toss_omega)}")

Probability of heads=0.5
Probability of tails=0.5


### Example 2: Tossing a coin twice  

In [8]:
twice_coin_toss_omega = {'HH', 'TT', 'HT', 'TH'}

In [9]:
def at_least_one_h(outcome: str) -> bool:
    return outcome == 'HH' or outcome == 'HT' or outcome == 'TH'

number_of_heads_at_least_once = len(get_events(at_least_one_h, twice_coin_toss_omega))

# this should have size 3
assert number_of_heads_at_least_once == 3


# the probability of getting heads at least one is
print(f"Probability of heads={number_of_heads_at_least_once / len(twice_coin_toss_omega)}")

Probability of heads=0.75


### Example 3: Extreme cases

For some extreme cases is fairly easy to calculate their probability. Such cases are the probability
$P(\Omega)$ and the probability of the empty set $P(\emptyset)$ these are respectively:

$$P(\Omega) = 1, P(\emptyset) = 0$$

## Examples

Let's see some simple examples of computing probabilities. These examples are taken from [1, 2, 3, 4]. 
We first introduce some utility functions

In [14]:
from collections import defaultdict
from typing import Union
from itertools import product

In [15]:
def get_events(event_condition, sample_space: set):
    return set([outcome for outcome in sample_space
                if event_condition(outcome)])

def compute_probability(event_condition, sample_space: Union[set, dict]):
    event = get_events(event_condition, sample_space)
    
    if type(sample_space) == type(set()):
        return len(event) / len(sample_space)
    
    event_size = sum(sample_space[outcome] 
                     for outcome in event)
    
    return event_size / sum(sample_space.values())



### Example 3

Let's assume that a family has 5 children. What is the probability that 
the family has exactly two boys? Forming the sample space for this problem may be involved. We can use Python
however, to compute the probability. We will assume that a child is equally likely
to be either a boy or a girl.

In [16]:
children_type = ['B', 'G']
omega = set( product(children_type, repeat=5) )
    
print(f"Size of sample space {len(omega)}")

print("\n")
# print the sample space
for outcome in omega:
    print(outcome)

Size of sample space 32


('G', 'G', 'G', 'B', 'G')
('G', 'B', 'G', 'G', 'B')
('B', 'G', 'G', 'G', 'B')
('B', 'G', 'B', 'B', 'G')
('G', 'G', 'G', 'G', 'B')
('G', 'G', 'B', 'B', 'G')
('G', 'B', 'B', 'G', 'G')
('G', 'B', 'B', 'B', 'G')
('G', 'B', 'G', 'B', 'B')
('B', 'B', 'B', 'G', 'G')
('B', 'B', 'G', 'B', 'B')
('B', 'G', 'B', 'G', 'B')
('B', 'G', 'G', 'B', 'B')
('B', 'B', 'G', 'G', 'G')
('G', 'G', 'B', 'G', 'B')
('G', 'G', 'G', 'B', 'B')
('B', 'B', 'B', 'B', 'G')
('B', 'G', 'G', 'G', 'G')
('B', 'G', 'B', 'B', 'B')
('G', 'B', 'G', 'G', 'G')
('G', 'G', 'G', 'G', 'G')
('G', 'G', 'B', 'B', 'B')
('G', 'G', 'B', 'G', 'G')
('G', 'B', 'B', 'G', 'B')
('G', 'B', 'B', 'B', 'B')
('G', 'B', 'G', 'B', 'G')
('B', 'B', 'B', 'G', 'B')
('B', 'B', 'G', 'B', 'G')
('B', 'G', 'B', 'G', 'G')
('B', 'G', 'G', 'B', 'G')
('B', 'B', 'B', 'B', 'B')
('B', 'B', 'G', 'G', 'B')


Similar to what we did in in the previous section, we define the 
following boolean function:

In [17]:
def two_boys(outcome: tuple):
    return len([c for c in outcome if c == 'B']) == 2


We will feed this function to the ```compute_probability``` function in order to calculate the needed probability

In [18]:
probability = compute_probability(two_boys, omega)
print(f"Probability of a family with 5 children to have exactly two boys is={probability}")

Probability of a family with 5 children to have exactly two boys is=0.3125


What is the probability of having at least three boys? 

In [21]:
def at_least_three_boys(outcome: tuple):
    return len([c for c in outcome if c == 'B']) >= 3

In [22]:
probability_at_least_3_boys = compute_probability(at_least_three_boys, omega)
print(f"Probability of a family with 5 children to have at least three boys is={probability_at_least_3_boys}")

Probability of a family with 5 children to have at least three boys is=0.5


In [24]:
events = get_events(at_least_three_boys, omega)
print(f"Number of events that have at least three boys={len(events)}")

('G', 'B', 'B', 'B', 'B')
('G', 'G', 'B', 'B', 'B')
('B', 'B', 'G', 'G', 'B')
('B', 'B', 'B', 'G', 'G')
('B', 'B', 'B', 'B', 'B')
('B', 'B', 'G', 'B', 'G')
('B', 'B', 'B', 'B', 'G')
('B', 'G', 'B', 'B', 'G')
('B', 'G', 'G', 'B', 'B')
('G', 'B', 'B', 'B', 'G')
('G', 'B', 'B', 'G', 'B')
('B', 'G', 'B', 'G', 'B')
('B', 'B', 'G', 'B', 'B')
('B', 'B', 'B', 'G', 'B')
('G', 'B', 'G', 'B', 'B')
('B', 'G', 'B', 'B', 'B')
Number of events that have at least three boys=16


Let's now turn attention on how to compute probabilities 
when dealing with intervals. This is useful as often we want
to evaluate whether the data is too extreme. in such cases we want to
somehow evaluate whether the observed data is rather unsusual to have occurred by chance [1]. For example assume that we toss a fair coin ten times, then we want to know what is the probability that we get more than say eight heads? 

### Example 4: Biased sample space

Let's expand on the above by considering a biased sample space. In particular,
let's assume that getting 5 is three times more likely than getting 10 or 20. 
Let's model this using a map.

In [27]:
weighted_omega = {5:3, 10:1, 20:1}

In [28]:
def is_5_or_10_or_20(outcome: int) -> bool:
    return outcome in weighted_omega

def not_5_and_not_10_and_not_20(outcome: int) -> bool:
    return not is_5_or_10_or_20(outcome)

def is_5(outcome: int) -> bool:
    return outcome == 5

def is_10(outcome: int) -> bool:
    return outcome == 10

def is_20(outcome: int) -> bool:
    return outcome == 20

In [30]:
five_or_ten_or_twenty_event = get_events(is_5_or_10_or_20, weighted_omega)
event_size = sum(weighted_omega[outcome] for outcome in five_or_ten_or_twenty_event)
assert event_size == 5

Let's rewrite the ```compute_probability``` function so that it accounts 
for a weighted sample space.

In [31]:
def compute_probability(event_condition, sample_space: Union[set, dict]):
    event = get_events(event_condition, sample_space)
    
    if type(sample_space) == type(set()):
        return len(event) / len(sample_space)
    
    event_size = sum(sample_space[outcome] 
                     for outcome in event)
    
    return event_size / sum(sample_space.values())

In [34]:
events = [is_5, is_10, is_20]
for event_condition in events:
    prob = compute_probability(event_condition, weighted_omega)
    name = event_condition.__name__
    print(f"Probability of event arising from '{name}' is {prob}")


Probability of event arising from 'is_5' is 0.6
Probability of event arising from 'is_10' is 0.2
Probability of event arising from 'is_20' is 0.2


## Summary

In this section we reviewed the probability function $P$ and some of its properties. The function $P$ assigns a real number $P(E)$ 
to each event $E$ in a sample space $\Omega$. It has to satify the following requirements:

- $P(E) \geq 0, \forall E$
- $P(\Omega) = 1$
- If $E_i$ are disjoint then

$$P( \bigcup_{i=1}^{\infty} E_i) = \sum_{i=1}^{\infty}P(E_i)$$


In addition, we saw how to use the ```product``` function from Python's ```itertools```. This function takes as an input
an iterable such as a list or a set and returns the possible pairs for a given number of repeats. 

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.
2. Jose Unpingco, _Python for probability, statistics and machine learning_, Springer, 2016
3. Michael Baron, _Probability and Statistics for Computer Scientists_, 2nd Edition, CRC Press.
4. Leonard Apeltsin, _Data Science Bookcamp_, Manning Publications, 2021.
5. B. Daya Reddy, _Introductory Functional Analysis with Applications to Boundary Value Problems and Finite Elements_, Springer, 1998
6. Y.A. Rozanov _Probability Theory: A Concise Course_, Dover Publications, 1969.