# Elementary Probability

## Fairness - Bias and Independence

Suppose we toss a coin 10 times and it comes up heads every time. What should we bet for the next toss?

The right answer depends on assumptions we make about **bias** and **independence** of the coin tosses.

Bias here implies a tendency to favor one outcome over another. For an **unbiased** coin, a toss should provide an equal probability of either heads or tails.

But suppose we have a coin that is completely unbiased - it comes as HTHTHTHT... heads and tails as alternating outcomes on each trial. This is an unbiased coin, but each toss determines the next one - thus the outcome of one trial is dependent on the previous one. What we want is some type of **randomness**, a **lack of regularity**. We want each trial to have no **memory** of the previous trials. Putting randomness another way, we want no gambling system designed to beat the system, to work - the random sequences are so unpredictable, that in the long run all gambling systems fail.

We refer to these properties as **independence** - trials are independent if outcomes of trials are not influenced in any way by outcomes of previous trials.

An experiment is said to be **fair** only if :

- it is unbiased
- outcomes are independent of each other

**Urn example**

Take an urn filled with colored balls. A trial consists of shaking the urn well, and noting the color. At the end of the trial if you put the ball back, that is *sampling with replacement* vs *sampling without replacement*. In the second case, after the urn is empty we replace all the balls and start again.

*An unbiased, independent trial*: Let the urn be filled with equal number of red and green balls, using sampling with replacement.  

*An unbiased, non-independent trial*: Urn filled with equal number of red + green balls, and repeated trials with sampling without replacement. Note that the probability will be diffeerent on each draw, but overall the setup is unbiased. However, the trials are not independent.  

*A biased, independent trial*: Urn filled with 90% red + 10% green balls, and repeated trials with sampling with replacement. Overall the setup is biased. However, the trials are independent.  

*A biased, non-independent trial*: Urn filled with 90% red + 10% green balls, and repeated trials with sampling without replacement. Overall the setup is biased. However, the trials are not independent.




**Gambler's fallacy**

A gambler sees black turn up 12 times in a row at roulette. He argues :

> This is a fair roulette wheel.  
> Black has turned up 12 times in a row.  
> Since the wheel is fair, red and black turn up equally often.  
> Hence, lot of reds are due to come soon, and I should bet on red, even if it is not the next spin.  

The gambler's fallacy, specifically, is to forget that a fair roulette wheel has no memory and the spins are independent. More generally, it is to assume a gambling system can be successful against a truly fair system.

However, imagine we drop the first premise - we don't know that the roulette wheel is fair. Imagine a *Skeptical Gambler* arguing :

> Black has come up 12 times in a row.  
> There is a good chance the roulette wheel is biased towards black.  
> I will start betting on black.  

The *Skeptical Gambler* may be right, but we now need empirical evidence to show the wheel is biased. This reasoning is risky.

And we can also argue :

> If the wheel was biased for black, people would quickly catch on  
> The casino could soon lose a lot of money  
> Hence, it probable the wheel is unbiased

This reasoning is also risky, based on our prior knowledge of casinos and wheels.

**Inverse Gambler's Fallacy**

The gambler's fallacy can work in reverse.

> Y sees X roll 4 dice and gets 4 sixes  
> Y says X must have tried this several times

Y is making the inverse gambler's fallacy. The trials are independent and trying n times should not increase your chance on the n+1th time.

On the other hand :

> Y hears X rolled 4 dice and got 4 sixes yesterday  
> Y says it is likely X must have rolled the dice many times  

This is not a fallacy. It is correct to think that your chances oof getting 4 sixes is higher if you try a 100 times than if you try once. The difference is that here we are not talking of a specific throw of the dice, but a time period when potentially many throws could have occured.

## Probability - Propositions vs Events

We can talk of the probability of a proposition being true. Example :

> Let proposition P = "It will rain this Sunday"  

What is the probability P is true?  

Alternatively :

> Let event E = "Rain falling on Sunday"

What is the probability event E will occur?

Propositions are true or false.
Events either occur or do not occur.

Usually we can translate between these views. 


## Set Operations vs Logical Connectives

We can convert between events (modeled as subsets of the sample space) and propositions as follows:

$A \cup B \equiv A \lor B$ : Union vs disjunction

$A \cap B \equiv A \land B$ : Intersection vs conjunction

$A^c \equiv \neg A$ : Complement vs negation

If Pr(A) represents the probability that event A occurs, we can say :

$Pr(A) = 0 \equiv \text{ A is false }$   
$Pr(A) = 1 \equiv \text{ A is certainly true }$  
$Pr(A) = x \equiv \text{ A is probably true with probability x }$   

Events that are certainly true are referred to as $\Omega$: $Pr(\Omega) = 1$



## Adding and Multiply Probability

Two events which cannot occur together, or two propositions which cannot be true simultaenously, are called **mutually exclusive** or **disjoint**.

For mutually exclusive events, probabilities add up.

$Pr(A \cup B) = Pr(A) + Pr(B)$ - A and B mutually exclusive

For **independent events** / propositions, probabilities can be multiplied.

$Pr(A \land B) = Pr(A) * Pr(B)$ - if A and B are independent

Often we deal with **compound events**. For example throwing two dice can be seen as a single compound event, formed of two elementary events. Compound events are often comprised of elementary events which are independent, which means we can multiply the probability of the compound event by multiplying the probabilities of the constituent elementary events. 


# Conditional statements vs categorical statements

A **categorical statement** may be "The probability India will beat Australia is 40%".

But, suppose a key player is injured. We make the following **conditional statement** : "Given X is injured, the probability India will beat Australia is 30%".

Conditional probability of event A, given B is said to be Pr(A/B).

e.g. 

> Pr(second card dealt is ace) = 4/52 * 3/51 + 48/52 * 4/51  
> Pr(second card dealt is ace/first card is not an ace) = 4/51  

$$Pr(A/B) = \frac{Pr(A \land B)}{Pr(B)} \text{  : given Pr(B) > 0}$$ 

A slightly different way to think of this is : What is the probably of A having happened, given B has happened? For example, what is the probably of the roll of dice being 6, given it is even? The answer is of course 1/3.

In the first case, A and B are **overlapping events**. In the second case, A is a subset of B i.e. $Pr(A \land B) = Pr(A)$. But the formula is the same in either case.

## Basic Rules of Probability for Finite Sample Spaces

Assumptions :  
* The rules are for finite groups of propositions (or events)
* If A and B are propositions (or events) so are $A \land B, A \lor B, \neg A$
* Elementary deductive logic is taken for granted
* If A and B are *logically equivalent*, then Pr(A) = Pr(B) 

**normality** : $$0 \leq Pr(A) \leq 1$$  

**certainty** : Given $\Omega$ is a certain proposition or a sure event, $$Pr(\Omega) = 1$$, 

**additivity** : Provided A and B are mutually exclusive, $$Pr(A \lor B) = Pr(A) + Pr(B)$$.

**overlap**.  : Provided A and B overlap i.e. are not mutually exclusive, $$Pr(A \lor B) = Pr(A) + Pr(B) - Pr(A \land B)$$ 

Proof:

$A \lor B \equiv (A \land B) \lor (A \land \neg B) \lor (\neg A \land B)$  

$Pr(A \lor B) = Pr(A \land B) + Pr(A \land \neg B) + Pr(\neg A \land B)$  - since RHS are mutually exclusive  

$Pr(A \lor B) = Pr(A \land B) + Pr(A \land \neg B) + Pr(A \land B) + Pr(\neg A \land B) - Pr(A \land B)$   

$Pr(A \lor B) = Pr(A) + Pr(B) - Pr(A \land B)$   


**conditional probability**: $$Pr(A/B) = Pr(A \land B)/Pr(B) \text{ if Pr(B) > 0}$$  

**multiplication** : $$Pr(A \land B) = Pr(A/B)Pr(B) \text{ if Pr(B) > 0}$$  

**total probability** : $$Pr(A) = Pr(B)Pr(A/B) + Pr(\neg B)Pr(A/ \neg B) \text{ if 0 < Pr(B) < 1}$$  

**logical consequence** : If B implies A i.e. $Pr(B) = Pr(A \land B)$ : $$Pr(A) = Pr(A \land B) + Pr(A \land \neg B) = Pr(B) + Pr(A \land \neg B)$$

**statistical independence** : If 0 < Pr(A) and 0 < Pr(B), then, $$Pr(A/B) = Pr(A)$$  



## Rules for Conditional Probability

The classic problem in logic is determining the truth of the conclusion given the truth of the premises. In the face of uncertainty, it amounts to asking the probability of the conclusion c being true, assuming the premises p are true.


**normality** : $$0 \leq Pr(A/E) \leq 1$$  

**certainty** : Given $\Omega$ is a certain proposition or a sure event, $$Pr(\Omega/E) = 1$$ 

> because $Pr(\Omega/E) = Pr(\Omega \land E)/P(E)$.   
> If $\Omega \land E \equiv E$, since $\Omega$ is always true.  
> Hence $Pr(\Omega/E) = Pr(E)/P(E) = 1$.  

**additivity** : Provided A and B are mutually exclusive, $$Pr(A \lor B/E) = Pr(A/E) + Pr(B/E)$$.

Proof:  
> $Pr(A \lor B/E) = Pr((A \lor B) \land E)/Pr(E)$  
> $(A \lor B) \land E = (A \land E) \lor (B \land E)$ - and these are mutually exclusive as well.  
> Hence, $Pr(A \lor B) = Pr(A \land E) + Pr(B \land E)$   
> Hence, $Pr(A \lor B/E) = (Pr(A \land E) + Pr(B \land E))/Pr(E) = Pr(A \land E)/Pr(E) + Pr(B \land E)/Pr(E)$  
> Hence, $Pr(A \lor B/E) = Pr (A/E) + Pr(B/E)$  


**overlap**.  : Provided A and B overlap i.e. are not mutually exclusive, $$Pr(A \lor B) = Pr(A) + Pr(B) - Pr(A \land B)$$ 

Proof:

> $A \lor B \equiv (A \land B) \lor (A \land \neg B) \lor (\neg A \land B)$  
> $Pr(A \lor B) = Pr(A \land B) + Pr(A \land \neg B) + Pr(\neg A \land B)$  - since RHS are mutually exclusive  
> $Pr(A \lor B) = Pr(A \land B) + Pr(A \land \neg B) + Pr(A \land B) + Pr(\neg A \land B) - Pr(A \land B)$   
> $Pr(A \lor B) = Pr(A) + Pr(B) - Pr(A \land B)$   

**conditional probability**: if Pr(E) > 0 and Pr(B/E) > 0 : $$Pr(A/(B \land E)) = Pr((A \land B)/E)/Pr(B/E)$$  

Proof:

> $Pr(A/(B \land E)) = Pr(A \land B \land E)/Pr(B \land E)$  
> But, $Pr(A \land B \land E) = Pr(A \land B/E)Pr(E)$  
> And, $Pr(B \land E) = Pr(B/E)Pr(E)$  
> So, the formular follows.



## Independence in terms of Conditional Probability

Revisiting the definition of independence - of the lack of "memory" etc. We can make this definition precise as follows :

Pr(A/B) = Pr(A)

We will show that :

$$Pr(B/A) = Pr(B)$$  

$$Pr(B \land A) = Pr(A)Pr(B)$$  

We can define A,B,C are **statistically independent** if A,B,C are pair-wise independent and
$$Pr(A \land B \land C) = Pr(A)Pr(B)Pr(C)$$  

Note that pair-wise independence **does not** imply statistical independence - the additional condition above is required.

Proof:

> Given $Pr(A/B) = Pr(B \land A)/Pr(B) = Pr(A)$  
> It follows that, as required : $Pr(B \land A) = Pr(A)Pr(B)$  
> It also follows : $Pr(B) = Pr(B \land A)/Pr(A) = Pr(B/A)$ ... i.e. B is independent A as well  


Example of Pairwise Independence not implying mutual independence:

> Toss a coin twice. Consider the events : A. Both tosses are same B. First is heads C. Second is heads.  
> You can see $Pr(A) = Pr(B) = Pr(C) = 1/2$  
> $Pr(A \land B) = Pr(B \land C) = Pr (A \land C) = 1/4$  
> We can easily see that Pr(A/B) = A, and so on i.e. A,B,C are pairwise indepedent  
> But $Pr(A \land B \land C) = 1/4$ since 2 heads is the common element.  
> While Pr(A)Pr(B)Pr(C) = 1/8. 





## Bayes' Rule

Consider two steps : We toss a fair coin and then, based on that we pick a ball from one of two urns. We find the ball is red (event E).

Let H1 = probability that urn 1 was chosen, and H2 = probability that urn 2 was chosen. 

Assume Pr(E/H1) = 0.4 and Pr(E/H2) = 0.2

The question is : Which is the most likely urn which was selected based on the coin toss, given we know the ball is red.

Then we need to calculate Pr(H1/E) and Pr(H2/E).

We can recast this in terms of thinking of experiments, hypothesis and evidence.

The experiment is what we have carried out - tossing a coin, picking a ball.

The evidence is that we got a red ball (event E occured).

The hypoothesis are H1 and H2. In this case H1 and H2 are mutually exclusive and exhaustive. Then :

$$Pr(H1/E) = \frac{Pr(H1)Pr(E/H1)}{Pr(H1)Pr(E/H1) + Pr(H2)Pr(E/H2)}$$

Given evidence E, and a hypotheses $H_1, H_2, H_3,..H_k$ are mutually exclusive, cumulatively exhaustive (MECE), and for each of the hypotheses $Pr(H_i) > 0$

$$Pr(H_j/E) = \frac{Pr(H_j)Pr(E/H_j)}{\sum_i Pr(H_i)Pr(E/H_i)}$$


## Utility - Expected Value

Logic analyzes reasons and arguments. The arguments may lead us to take a decision to alter oour **beliefs**, or undertake some **action**.

Decisions depend on two things :

* What we believe
* What we want / value

Sometimes we can represent our beliefs with probabilities, and the **utility** of what we value with an objective measure.

We take **actions** based on reasons, which have **consequences** which may occur with some probability and ultimately these consequences have some value, or **utility** to us.

Consider action - A with two consequences $C_1, C_2$. We have a utility function U which assigns some value to a consequence - this can be a dollar value, a quantity of happiness etc.

$$Exp(A) = Pr(C_1/A)U(C_1) + Pr(C_2/A)U(C_2)$$

Generalizing :

$$Exp(A) = \sum Pr(C_i/A)U(C_i)$$

One way to identify the right decision / action is to identify the action which maximizes utility. Utilities could be a monetary value, or some other measure, measured in general units called **utiles**.




