# Probability and Bayesian Theory

## Additive and Multiplicative Rules of Probability

Consider $A$,$B$ are two events, and $P$ denotes the probability of occurence of an event. 

Lets define the events as:
* $A$ - Rolling a 4 on a fair die
* $B$ - Rolling a 3 on a fair die

With these event definitions we can observe that occurrence of $A$ prevents the occurrence of $B$ (or any other event in the sample space). Hence such events are called Mutually Exclusive Events.

### Sample Space of an Event

Sample space is the set of all possible favorable outcomes of an event. In the above example, each of the events $A$ and $B$ have one outcome in their sample space.
$A_0,A_1,A_2...A_n$ could be possible outcomes in the sample space of $A$ and similarly, $B_0,B_1,B_2...B_n$ be possible outcomes in the sample space of $B$.

If I re-define $A$ and $B$ from above example as:
* $A$ - Rolling an even number on a fair die
* $B$ - Rolling an odd number on a fair die

now the sample spaces of $A$ and $B$ each have 3 favorable outcomes each.

### Additive Rule

The additive rule of probability says that, for two events $A$ and $B$, the probability of occurrence of either event A or event B can be defined as, the total of the probability of occurrence of event $A$ and the probability of occurrence of event $B$ minus the probability of occurrence of both $A$ and $B$.

$P(A\cup B)=P(A)+P(B)-P(A\cap B)$

If $A$ and $B$ are mutually exclusive events, i.e., there is no overlap in sample spaces of event $A$ and event $B$, then $P(A\cap B)=0$. So,

$P(A\cup B)=P(A)+P(B)$

Considering a set of mutually exclusive events, $A,B,C,D...$ so on, we can write additive rule as:

$P(A\cup B \cup C \cup D...)=P(A)+P(B)+P(C)+P(D)...$

### Multiplicative Rule

Consider the example of drawing a ball from a bucket consisting of many different colored balls. For simplicity, consider that there are only two different colored balls - 6 Green balls and 4 Red balls, 10 balls in total. Now draws can be made in two different ways:

* with replacement - The ball drawn in every single draw is kept back into the bucket, keeping the total number of balls and mixture of colored balls constant, from draw to draw.

* without replacement - A ball drawn in a single draw is retained and not replaced or substituted in any way in the bucket. This changes total number of balls and mixture of balls with each draw. Note that with no replacement or substitution, the total number of single ball draws that can be made in the bucket would be equal to the total number of balls in the bucket.

<img src="../images/bib.png", style="height:60vh;">

When balls are drawn without replacements, each draw affects the probabilities/outcomes of subsequent draws and say, the outcome of the second draw will be dependent upon the outcome of the first draw. In the same scenario, if balls are drawn with replacement, that is, every ball drawn is kept back into the bucket or is replaced with an identical ball, then every draw is independent of another/subsequent draws.

So, when the probability of occurrence of event A does not, in any way, affect the probability of occurrence of event B and vice versa, then A and B are said to be independent events (events whose probability of occurrence is independent of each other).

Examples are 

* Tossing of two coins : Occurrence of heads in one coin is independent of occurrence of heads/tails in the second coin.
* Rolling of two dice  : The number on one die does not in any way affect the number on rolling the second die.


<img src="../images/Independent_events.png", style="height:60vh;">

According to the <b>Multiplication Rule</b>, if $A$ and $B$ are two independent events then,

$P(A \cap B) = P(A) * P(B)$

Conditional probability of occurence of event A given event B has occurred is given by,

$P(A|B) = \large \frac{P(A \cap B)}{P(B)}$

But since $A$ and $B$ are independent events,

$P(A|B) = \large \frac{P(A) * P(B)}{P(B)}$

$P(A|B) = P(A)$

Now we can see that conditional probability of $A$ given $B$ is equal to $A$ itself, which goes to show that occurrence or non-occurrence of $B$ does not affect the probability of occurrence of $A$ in any way.

This is called <b>"Conditional Independence"</b> of event $A$ over event $B$.

In [None]:
# Click the run button to continue

In [None]:
# solution


## Binary Communication System


Consider a binary communication system where the input is either a 0 or a 1 with probability p. The receiver has an error with probability $\epsilon$ which would mean that the received data gets flipped [Alberto-1]. This can be illustrated as shown in the figure below:
<img src="../images/bc_system1.png", style="width: 700px;">
[Alberto-1]Probability, Statistics and Random Processes for Electrical Engineering, Chap 2, Alberto Leon Garcia.

What we computed above is called as the theorem of total probability. Let $B_1 , B_2 , ... , B_n$ be mutually exclusive events whose union equals the sample space S[Alberto]. We refer to these sets as a partition of S. Any event A can be represented as the union of mutually exclusive events in the following way:
$$A = A \cap S = A \cap (B_1 \cup B_2 \cup \dots \cup B_n)$$$$p[A] = p[A \cap B_1] + p[A \cap B_2] + \dots p[A \cap B_n]$$$$p[A] = p[A|B_1]P[B_1] + p[A|B_2]P[B_2] + \dots p[A|B_n]P[B_n]$$
This can be illustrated as shown in the figure below:
<img src="../images/pspace1.png", style="width: 400px;">

### Exercise

Given that p = 0.8 and $\epsilon$=0.3,
* What would be the sum total of probabilities received? Assign this to the variable, sum_p.

In [2]:
p = 0.8
e = 0.3

We can observe from the above that the total probability can be visualized in partitions as shown below:

Suppose input was $I_0, I_1$ and the received output was $R_0, R_1$.

$$p[I_0 \cap R_0] = (1-p)(1-\epsilon)$$
$$p[I_0 \cap R_1] = (1-p)\epsilon$$
$$p[I_1 \cap R_0] = p\epsilon$$
$$p[I_1 \cap R_1] = p(1-\epsilon)$$

<img src="../images/pspace3.png", style="width: 600px;">

## Bayes' Rule

Suppose $B_1, B_2, ..., B_n$ is a partition of a sample space, then probabiblity of event $B_j$ given that an event A has occurred is defined as:

$$p[B_j|A] = \cfrac{p[A \cap B_j]}{p[A]} = \cfrac{p[A|B_j]p[B_j]}{p[A]}$$

$p[B_j]$ is known as a prior and $p[A|B_j]$ is known as the likelihood.

### Exercise

* What is probability that 1 was received? Assign the probability to the variable p_R_1
* What is the probability that a 0 was received? Assign the probability to the variable p_R_0

In [7]:
#p_R_1 = ?

The regions of 1s and 0s are illustrated in the partitions as shown in the figure:

<img src="../images/pspace4.png", style="width: 600px;">

Hence both of the probabilities add upto 1 as per the theorem of total probability.

## Given receiver output was a 1, which input was more likely?

Assuming that a 1 was received, let us find out which input was more likely? To formulate this problem using the bayesian theorem,

Is $p(I=1|R=1)$ > $p(I=0|R=1)$?

$$p(I=1|R=1) = \cfrac{p(R=1|I=1)p(I=1)}{p(R=1)}$$

and

$$p(I=0|R=1) = \cfrac{p(R=1|I=0)p(I=0)}{p(R=1)}$$


### Exercise

* Compute probability that input was 1 given that the received output was a 1 and assign it to p_i1r1.
* Compute probability that input was a 0, given that the received output was a 1 and assign it to p_i0r1.

In [10]:
#p_i1r1 = 
#p_i0r1 = 

Use the value of p(R=1) from the above example.

We see that the probability that the input was 1 is far more likely than the probability that the input was a 0, when we know that the output was a 1. This is intutitive as the probability of the input being 1 is high at probability 0.8 and the probability of the error is low at 0.3. 

<img src="../images/pspace4.png", style="width: 600px;">

## Credit Approval Process

The credit approval process is applied to customers whose credit history is at various levels, low-risk, mid-risk and high-risk. The task is to examine the process particularly for medium-risk and high-risk accounts.  The transaction is approved after validating an account for t seconds and the customer has still not defaulted. It is known that the mid-risk accounts have a 'credit default' rate of $\alpha$. The high-risk accounts have a credit risk of 100$\alpha$ as shown in the figure below:

<img src="../images/exp-credit-risk.png", style="width: 600px;">

### Exercise

For this problem assume that we have pooled all mid-risk and high-risk accounts together and are randomly selecting an account from this pool.

Let A be the event “transactions are still valid after t seconds,” and let M be the event “transactions are mid-risk,” and H the event “transactions are high-risk”.

By the theorem on total probability we have:
$p(A) = p(A|M)p(M) + p(A|H)p(H)$

Assume the ratio of credit accounts of medium-risk and high-risk are in the ratio of 0.75:0.25.

Given,

p(A|M) = p_AM = $e^{-\alpha t}$
p(A|H) = p_AH = $e^{-100\alpha t}$

What is the probability that a randomly selected account has not defaulted after t=200 seconds? Assign this probability score to the variable p_A.

In [1]:
import numpy as np

t = 200
alpha = 5e-5

p_AM = np.exp(-alpha*t) 

Use theorem of total probability, $p(A) = p(A|M)p(M) + p(A|H)p(H)$

## Probability that the Approved transactions are of Medium-risk

* Given that the transactions are approved after validating the transactions for t=200 seconds, what is the probability that are of medium risk?
* Assign your answer to the variable, p_MA.

In [16]:
# p_MA = 

Use Bayes rule:

$$p(M|A) = \cfrac{p(A|M)\times p(M)}{p(A)}$$

## Dependency of Testing Time on Probability of Medium Risk Approved Transactions

### Exercise

* Given that the transactions are approved, how long should you test a transaction given that 95% of the approved transactions are medium risk?


Let A be the event “transaction is still valid after t seconds,” and let G be the event “chip is good,” and H be the event “chip is bad.” The problem requires that we find the value of t for which


$$p[M|A] = 0.95$$

You can determine $p[M|A]$ using Bayes Theorem:

$$p[M|A] = \cfrac{p[A|M]p[M]}{p[A|M]p[M] + p[A|H]p[H]}$$

In [19]:
t = np.log(3.8)/(99*alpha)
print(t)

269.697185198


### Solution

$$p[M|A] = \cfrac{p[A|M]p[M]}{p[A|M]p[M] + p[A|H]p[H]}$$

$$ 0.95 = \cfrac{e^{(-\alpha t)} \times 0.6}{e^{(-\alpha t)} \times 0.6 + 0.2 \times e^{(-100\alpha t)}}$$

$$ 0.95 \times 0.6 e^{(-\alpha t)} + 0.2\times 0.95e^{(-100\alpha t)} = 0.6e^{(-\alpha t)}$$

$$ 0.05 \times 0.6 e^{(-\alpha t)} = 0.2 \times 0.95e^{(-100\alpha t)} $$

$$ e^{99\alpha t} = 3.8$$
$$ t = \cfrac{log(3.8)}{99\alpha}$$

You can now solve for t given alpha.

## Increase the Testing time

Increase the testing time for the transactions to t=2000 seconds to approve a transaction. How does the probability of medium risk accounts change? 
* Assign the probability values to p_MA and p_HA given $\alpha$ = 5e-5

In [15]:
# Write function to take in the parameters and return the probability

def prob_MA(t, alpha):
    '''
    Given the exponential distribution t, alpha, return the probabilities, p(M|A) and p(H|A).
    
    Args:
        t (float): time in seconds
        alpha (float): exponential distribution parameter (decay parameter)
    
    Returns:
        (p_MA, p_HA): A tuple of bayesian probabilities.
    '''

Call the function prob_MA(2000, 5e-5)