# Probability Theory

Probability theory is a branch of mathematics that deals with the study of uncertainty, randomness, and the likelihood of events occurring. It provides a formal framework for reasoning about uncertain outcomes and is widely used in various fields, including statistics, physics, economics, finance, and more.

## Sample Space 
**Sample Space $(S)$** is the set of all possible outcomes of an experiment. For example, when rolling a six-sided die, the sample space is: 
```python
sample_space = {1, 2, 3, 4, 5, 6}
```

## Event
**Event $(E)$** is a subset of the sample space, representing a specific outcome or a collection of outcomes. There are different Types of events:
### Simple Event:

A simple event is an event that consists of a single outcome. For example, rolling an even number on the die is a simple event.

```python
# Event: Rolling an even number
even_numbers = {2, 4, 6}  
```

### Compound Event:

A compound event consists of more than one outcome. For example, rolling a number greater than 3 is a compound event.

```python
# Event: Rolling a number greater than 3
greater_than_3 = {4, 5, 6}  
```

### Complementary Event:

The complementary event of an event $E$ ( denoted as $E^c$ or $\neg E$ or $\bar{E}$ ) consists of all outcomes that are not in $E$. For example, not rolling an even number.

```python
# Event: Not rolling an even number
not_even_numbers = sample_space - even_numbers  
```

### Impossible and Certain Events:

- An impossible event is an event that cannot occur, so its probability is $0$. For example, rolling a `7` on a standard six-sided die.
- A certain event is an event that is guaranteed to occur, so its probability is $1$. For example, if you roll a six-sided die, the event "rolling a number from `1` to `6`" is certain.

```python
impossible_event = set()  # Empty set
certain_event = sample_space
```

In [2]:
# Sample space for rolling a six-sided die
sample_space = {1, 2, 3, 4, 5, 6}

# Events
even_numbers = {2, 4, 6}
greater_than_3 = {4, 5, 6}
not_even_numbers = sample_space - even_numbers

# Impossible and Certain Events
impossible_event = set()  # Empty set
certain_event = sample_space

# Printing the results
print("Sample Space:", sample_space)
print("Even Numbers:", even_numbers)
print("Numbers Greater Than 3:", greater_than_3)
print("Not Even Numbers:", not_even_numbers)
print("Impossible Event:", impossible_event)
print("Certain Event:", certain_event)

Sample Space: {1, 2, 3, 4, 5, 6}
Even Numbers: {2, 4, 6}
Numbers Greater Than 3: {4, 5, 6}
Not Even Numbers: {1, 3, 5}
Impossible Event: set()
Certain Event: {1, 2, 3, 4, 5, 6}


## Random Experiment

A random experiment is a process that leads to uncertain outcomes, and its outcome cannot be predicted with certainty. It is characterized by its randomness and variability. Random experiments are fundamental in probability theory, as they provide the basis for studying and analyzing uncertain events.

### Characteristics of a Random Experiment

1. **Unpredictable Outcomes**: The outcomes of a random experiment cannot be predicted with certainty before the experiment is conducted.

2. **Repeatability**: The experiment can be repeated under identical conditions, yielding different outcomes on different trials.

3. **Well-Defined Sample Space**: The set of all possible outcomes of the experiment is known and is called the sample space.

4. **Randomness**: The outcome of the experiment is subject to inherent randomness, often influenced by factors that are difficult to control or predict.

In [1]:
import random

### Example: Rolling a Six-Sided Die

Rolling a six-sided die is a classic random experiment. The possible outcomes are the numbers 1 to 6, and the outcome of each roll is uncertain.

In [3]:
def roll_die():
    return random.randint(1, 6)

# Perform the experiment multiple times
num_trials = 10
outcomes = [roll_die() for _ in range(num_trials)]

print("Outcomes of rolling a die:", outcomes)

Outcomes of rolling a die: [5, 5, 6, 1, 2, 2, 2, 2, 6, 1]


### Example: Flipping a Coin
Flipping a coin is another random experiment. The possible outcomes are "Heads" and "Tails," and the result is unpredictable.

In [4]:
def flip_coin():
    return random.choice(["Heads", "Tails"])

# Perform the experiment multiple times
num_flips = 10
results = [flip_coin() for _ in range(num_flips)]

print("Results of coin flips:", results)

Results of coin flips: ['Heads', 'Heads', 'Heads', 'Tails', 'Heads', 'Tails', 'Heads', 'Heads', 'Heads', 'Heads']


### Example: Drawing Cards from a Deck
Drawing cards from a standard deck of playing cards is a random experiment. Each draw's outcome (card) is uncertain and can vary.


In [5]:
def draw_card(deck):
    return random.choice(deck)

# Create a standard deck of cards
suits = ["Hearts", "Diamonds", "Clubs", "Spades"]
values = ["2", "3", "4", "5", "6", "7", "8", "9", "10", "Jack", "Queen", "King", "Ace"]
deck = [(suit, value) for suit in suits for value in values]

# Perform the experiment by drawing cards
num_draws = 5
drawn_cards = [draw_card(deck) for _ in range(num_draws)]

print("Drawn cards:", drawn_cards)

Drawn cards: [('Spades', 'Ace'), ('Spades', '4'), ('Hearts', '3'), ('Clubs', 'Queen'), ('Spades', '7')]


## Probability: Mathematical Definition (Classical Approach)
Probability deals with the likelihood of events occurring. 

In probability theory, the mathematical definition of probability is based on the classical or frequentist approach. This definition relates the probability of an event to the ratio of the number of favorable outcomes to the total number of possible outcomes in an equally likely sample space.

If $(S)$ is the sample space and $(E)$ is the event, then $P(E)$ or the probability of an Event is the likelihood that event $(E)$ will occur. It is calculated as the ratio of the number of favorable outcomes $n(E)$ to the total number of equally likely outcomes $n(S)$.

$$P(E) = \frac{n(E)}{n(S)}$$

### Example: Rolling a Six-Sided Die
This example will demonstrate the classical approach to defining probability using the ratio of favorable outcomes to total outcomes in an equally likely sample space.

In [6]:
def calculate_probability(favorable_outcomes, total_outcomes):
    return favorable_outcomes / total_outcomes

# Define the sample space
sample_space = {1, 2, 3, 4, 5, 6}

# Define events
event_even = {2, 4, 6}  # Event: Rolling an even number
event_greater_than_3 = {4, 5, 6}  # Event: Rolling a number greater than 3

# Calculate probabilities using the classical definition
probability_even = calculate_probability(len(event_even), len(sample_space))
probability_greater_than_3 = calculate_probability(len(event_greater_than_3), len(sample_space))

# Print the results
print("Probability of rolling an even number:", probability_even)
print("Probability of rolling a number greater than 3:", probability_greater_than_3)

Probability of rolling an even number: 0.5
Probability of rolling a number greater than 3: 0.5


### Example:  Flipping a Coin

In [7]:
def flip_coin():
    return random.choice(['Heads', 'Tails'])

# Simulate multiple coin flips
num_flips = 100
heads_count = sum(1 for _ in range(num_flips) if flip_coin() == 'Heads')

print(f"Heads count: {heads_count}")
print(f"Tails count: {num_flips - heads_count}")
print(f"Heads probability: {heads_count / num_flips}")
print(f"Tails probability: {(num_flips - heads_count) / num_flips}")

Heads count: 51
Tails count: 49
Heads probability: 0.51
Tails probability: 0.49


## Probability: Apriori Definition (Subjective Approach)

The apriori or subjective definition of probability is based on an individual's beliefs or degree of confidence in the occurrence of an event. It does not rely on equally likely outcomes but instead reflects personal judgments.

If $(S)$ is the sample space and $(E)$ is the event, then $P(E)$ or the probability of an Event is a subjective measure of an individual's belief in the likelihood of event $(E)$. It ranges from `0` (**impossible**) to `1` (**certain**).

Apriori probabilities are based on qualitative factors, historical data, expert opinions, or other relevant information.

### Example:
Consider a scenario where you're estimating the probability of a basketball player making a free throw based on their historical performance and your subjective judgment.

In [8]:
def subjective_probability(expert_opinion):
    return expert_opinion

# Estimate the probability of the basketball player making a free throw
expert_opinion = 0.75  # An expert believes there's a 75% chance of success
probability_making_free_throw = subjective_probability(expert_opinion)

# Print the result
print("Subjective probability of making a free throw:", probability_making_free_throw)

Subjective probability of making a free throw: 0.75


**The classical definition relies on equally likely outcomes, while the apriori definition is based on individual beliefs or subjective judgments.**

## Mutually Exclusive Events

Mutually exclusive events are events that cannot occur simultaneously. If one of these events happens, the other(s) cannot occur in the same trial or experiment. In other words, if $A$ occurs then $B$ doesn't occurs, and vice versa.

Events $A$ and $B$ are mutually exclusive if and only if their intersection $(A \cap B)$ is an empty set $(\emptyset)$:

$$P(A \cap B) = P(\emptyset) = 0$$


For mutually exclusive events, the probability of the union of mutually exclusive events is the sum of their individual probabilities:

  $$P(A \cup B) = P(A) + P(B)$$
  
  


### Example: Rolling a Six-Sided Die

- $A$: Rolling an even number - $(2, 4, 6)$
- $B$: Rolling an odd number - $(1, 3, 5)$

Events $A$ and $B$ are mutually exclusive because an outcome cannot be both even and odd simultaneously. The intersection of $A$ and $B$ is empty:

$$A \cap B = \emptyset$$


In [9]:
# Sample space for rolling a six-sided die
sample_space = {1, 2, 3, 4, 5, 6}

# Define mutually exclusive events
event_A = {2, 4, 6}  # Rolling an even number
event_B = {1, 3, 5}  # Rolling an odd number

# Check if events are mutually exclusive
mutually_exclusive = event_A.isdisjoint(event_B)

# Print the result
if mutually_exclusive:
    print("Events A and B are mutually exclusive.")
else:
    print("Events A and B are not mutually exclusive.")

Events A and B are mutually exclusive.


## Axiomatic Definition of Probability

The axiomatic definition of probability provides a formal framework for understanding probability based on a set of axioms. These axioms establish the properties that probabilities must satisfy. 

If $(S)$ is the sample space and $(A \subset S)$ is any subset of the sample space, then:

1. **Non-Negativity**: The probability of any event is a non-negative real number.

   $$P(A) \geq 0 \text{ for all } A \subset S$$



2. **Normalization**: The probability of the entire sample space is `1`.

   $$P(S) = 1$$



3. **Additivity**: For any sequence of **mutually exclusive events** (events that have no outcomes in common), the probability of the union of those events is the sum of their individual probabilities.

   $$P\left(\bigcup_{i=1}^n A_i\right) = \sum_{i=1}^n P(A_i)$$


### Theorems Based on Axioms:

1. **Complement Rule**: The probability of the complement of an event $A$ is $1$ minus the probability of $A$.
   
   $$P(\bar{A}) = 1 - P(A)$$



2. **Inclusion-Exclusion Principle**: For any two events $A$ and $B$, the probability of their union is the sum of their individual probabilities minus the probability of their intersection.

   $$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$
   
   

3. **Subadditivity**: For any events $A$ and $B$, the probability of their union is less than or equal to the sum of their probabilities.

   $$P(A \cup B) \leq P(A) + P(B)$$



4. **Boole's Inequality**: For any finite sequence of events $A_1, A_2, \ldots, A_n$,

   $$P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i)$$



5. **Bonferroni Inequalities**: Generalizations of the inclusion-exclusion principle for multiple events.

In [10]:
# Sample space
sample_space = {1, 2, 3, 4, 5, 6}

# Events
event_A = {2, 4, 6}  # Rolling an even number
event_B = {4, 5, 6}  # Rolling a number greater than 3

# The complement rule
complement_A = sample_space - event_A
print("Complement of event A:", complement_A)

# Subadditivity
probability_union = len(event_A.union(event_B)) / len(sample_space)
print("Probability of A ∪ B (Subadditivity):", probability_union)

# Inclusion-Exclusion Principle
probability_sum = len(event_A) / len(sample_space) + len(event_B) / len(sample_space)
probability_intersection = len(event_A.intersection(event_B)) / len(sample_space)
probability_inclusion_exclusion = probability_sum - probability_intersection
print("Probability of A ∪ B (Inclusion-Exclusion):", probability_inclusion_exclusion)

Complement of event A: {1, 3, 5}
Probability of A ∪ B (Subadditivity): 0.6666666666666666
Probability of A ∪ B (Inclusion-Exclusion): 0.6666666666666667


## Conditional Probability

Conditional probability is a fundamental concept in probability theory that deals with the probability of an event occurring given that another event has already occurred. It quantifies the likelihood of one event happening under the condition that another event is known to have occurred.

The conditional probability of event $A$ given event $B$ is denoted as $P(A|B)$ and is calculated as follows:

$$ P(A|B) = \frac{P(A \cap B)}{P(B)}$$

### Properties and Interpretation:

- Conditional probability allows us to update our beliefs about an event based on new information.


- If events $A$ and $B$ are independent, then 

$$P(A|B) = P(A)$$


- **Bayes' theorem** is an important theorem in probability that provides a way to reverse the conditional probability. It is widely used in statistics and machine learning.

### Real-Life Application:

Conditional probability is a powerful tool for making informed decisions in various fields. It has numerous applications, including:
- **Medical diagnosis**: Assessing the probability of a disease given certain symptoms.
- **Finance**: Estimating the probability of default for a loan given a credit score.
- **Weather forecasting**: Predicting the probability of rain based on current weather conditions.

### Example:  Conditional Probability of Drawing Cards
Consider a deck of playing cards. Let's define the following events:

- $A$: Drawing a red card $\heartsuit$ or $\diamondsuit$.
- $B$: Drawing a heart card $\heartsuit$.

We want to find the probability of drawing a red card given that the drawn card is a heart.

In [11]:
# Sample space (deck of cards)
sample_space = {"\u2665", "\u2666", "\u2663", "\u2660"}  # Hearts, Diamonds, Clubs, Spades
print(f"Sample Space: {sample_space}")

# Define events
event_A = {"\u2665", "\u2666"}
print(f"\nEvent A: {event_A} (Red cards)")

event_B = {"\u2665"}
print(f"Event B: {event_B} (Heart cards)")

# Calculate conditional probability P(A|B)
probability_A_given_B = len(event_A.intersection(event_B)) / len(event_B)

# Print the result
print("\nConditional probability of drawing a red card given a heart card:", probability_A_given_B)

Sample Space: {'♣', '♥', '♦', '♠'}

Event A: {'♥', '♦'} (Red cards)
Event B: {'♥'} (Heart cards)

Conditional probability of drawing a red card given a heart card: 1.0


### Example: Conditional Probability and Dice Rolls
Consider rolling two six-sided dice. Let's find the probability that the sum of the rolls is 7, given that the first die shows a 3.

In [12]:
# Sample space for two dice rolls
sample_space = {(i, j) for i in range(1, 7) for j in range(1, 7)}
print(f"Sample Space: {sample_space}")

# Event A: First die shows a 3
event_A = {(3, j) for j in range(1, 7)}
print(f"\nEvent A: First die shows a 3 - {event_A}")

# Event B: Sum of rolls is 7
event_B = {(i, j) for i, j in sample_space if i + j == 7}
print(f"\nEvent B: Sum of rolls is 7 - {event_B}")

# Calculate conditional probability P(B|A)
probability_B_given_A = len(event_B.intersection(event_A)) / len(event_A)

# Print the result
print("\nConditional probability of sum being 7, given the first die shows a 3:", probability_B_given_A)

Sample Space: {(3, 4), (4, 3), (3, 1), (5, 4), (4, 6), (5, 1), (2, 2), (1, 6), (2, 5), (1, 3), (6, 2), (6, 5), (4, 2), (4, 5), (3, 3), (5, 6), (3, 6), (5, 3), (2, 4), (1, 2), (2, 1), (1, 5), (6, 1), (6, 4), (3, 2), (4, 1), (3, 5), (5, 2), (4, 4), (5, 5), (1, 1), (1, 4), (2, 3), (2, 6), (6, 6), (6, 3)}

Event A: First die shows a 3 - {(3, 4), (3, 1), (3, 3), (3, 6), (3, 2), (3, 5)}

Event B: Sum of rolls is 7 - {(3, 4), (4, 3), (6, 1), (1, 6), (2, 5), (5, 2)}

Conditional probability of sum being 7, given the first die shows a 3: 0.16666666666666666


## Bayes' Theorem

Bayes' Theorem is a fundamental concept in probability theory that provides a way to update our beliefs about the probability of an event based on new evidence or information. It relates the conditional probability of an event $A$ given event $B$ to the conditional probability of event $B$ given event $A$.

Bayes' Theorem can be stated as follows:

$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$

Here:
- $P(A|B)$ is the conditional probability of event $A$ given event $B$.
- $P(B|A)$ is the conditional probability of event $B$ given event $A$.
- $P(A)$ is the prior probability of event $A$.
- $P(B)$ is the probability of event $B$.


The **Multiplication Rule** is also defined from above:

$$P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)$$


### Example:

Consider a medical scenario where we want to find the probability that a person has a disease given that a diagnostic test is positive. A medical test for a rare disease is 99% accurate (both for positive and negative results).

- $A$: Person has the disease.
- $B$: Diagnostic test is positive.

We know the following probabilities:
- $A$ (prior probability of having the disease).
- $P(B|A)$ (probability of testing positive given having the disease).
- $P(\neg A)$ (probability of not having the disease, i.e., $1 - P(A)$.
- $P(B|\neg A)$ (probability of testing positive given not having the disease).

Using Bayes' Theorem, we can calculate $P(A|B)$, the probability of having the disease given a positive test result.

In [13]:
# Given probabilities
p_disease = 0.001  # Prior probability of having the disease
p_test_positive_given_disease = 0.99  # Probability of testing positive given having the disease
p_test_positive_given_no_disease = 0.01  # Probability of testing positive given not having the disease

In [14]:
# Calculate conditional probability P(Disease|Test Positive) using Bayes' theorem
p_test_positive = (p_disease * p_test_positive_given_disease) + ((1 - p_disease) * p_test_positive_given_no_disease)
p_disease_given_test_positive = (p_disease * p_test_positive_given_disease) / p_test_positive

# Print the result
print("Conditional probability of having the disease given a positive test result:", p_disease_given_test_positive)

Conditional probability of having the disease given a positive test result: 0.09016393442622951


In [15]:
# Defining Bayes' Theorem function
def bayes_theorem(p_a, p_b_given_a, p_b_given_not_a):
    p_not_a = 1 - p_a
    p_b = (p_a * p_b_given_a) + (p_not_a * p_b_given_not_a)
    p_a_given_b = (p_b_given_a * p_a) / p_b
    return p_a_given_b

# Calculating probability using Bayes' Theorem function
probability_disease_given_positive = bayes_theorem(p_disease, p_test_positive_given_disease, p_test_positive_given_no_disease)

# Print the result
print("Probability of having the disease given a positive test result (Bayes' Theorem):", probability_disease_given_positive)

Probability of having the disease given a positive test result (Bayes' Theorem): 0.09016393442622951


## Independent Events

Independent events are events that do not influence each other. The occurrence or non-occurrence of one event does not affect the probability of the other event happening. In other words, the outcome of one event provides no information about the outcome of the other event.

The multiplication rule for independent events states that the probability of two or more independent events occurring together is the product of their individual probabilities. Events $A$ and $B$ are considered independent if and only if the occurrence of one event does not change the probability of the other event:

$$P(A \cap B) = P(A) \cdot P(B)$$

In this case, knowing that event $B$ has occurred does not affect the probability of event $A$, and vice versa.

### Example:
- Flipping a fair coin twice: The outcome of the first flip does not affect the outcome of the second flip. These flips are independent events.


- Rolling a fair six-sided die and drawing a card from a well-shuffled deck: The result of the die roll does not influence the card drawn. These actions are independent events.

In [37]:
# In this example, we simulate coin flips and die rolls to calculate the probability of independent events 
# occurring together (coin flip: Heads, die roll: 6)

def flip_coin():
    return random.choice(["Heads", "Tails"])

def roll_die():
    return random.randint(1, 6)


# Perform coin flips and die rolls
num_trials = 10000
num_independent_events = 0

for _ in range(num_trials):
    coin_flip_result = flip_coin()
    die_roll_result = roll_die()
    
    if coin_flip_result == "Heads" and die_roll_result == 6:
        num_independent_events += 1

# Calculate the probability of independent events occurring together
probability_independent_events = num_independent_events / num_trials

# Print the result
print("Probability of independent events (coin flip: Heads, die roll: 6):", probability_independent_events)

Probability of independent events (coin flip: Heads, die roll: 6): 0.0836


### If events $A$ and $B$ are independent, the complements of these events, $\neg A$ and $\neg B$, are also independent.

This means that the occurrence or non-occurrence of one event does not affect the probability of the other event happening. 
To prove this, we need to show that:
$P(\neg A \cap \neg B) = P(\neg A) \cdot P(\neg B)$

#### Proof:

Given that $A$ and $B$ are independent, we know that:

$$ P(A \cap B) = P(A) \cdot P(B) $$

Since $A$ and $\neg A$ are complementary events, we have:

$$ P(A) + P(\neg A) = 1 $$

$$ P(\neg A) = 1 - P(A) $$


Similarly, for events $B$ and $\neg B$:

$$ P(B) + P(\neg B) = 1 $$

$$ P(\neg B) = 1 - P(B) $$


Now, we want to calculate $P(\neg A \cap \neg B)$, the probability of both $\neg A$ and $\neg B$ occurring:

$$ P(\neg A \cap \neg B) = 1 - P(A \cup B) = P(\neg{(A \cup B)}) $$


Using the inclusion-exclusion principle:

$$ P(A \cup B) = P(A) + P(B) - P(A \cap B) $$


Substitute the value of $P(A \cap B)$ from the independence of $A$ and $B$:

$$ P(A \cup B) = P(A) + P(B) - P(A) \cdot P(B) $$


Therefore,

$$ P(\neg A \cap \neg B) = 1 - (P(A) + P(B) - P(A) \cdot P(B)) $$


Now, substitute the values of $P(\neg A)$ and $P(\neg B)$ in terms of $P(A)$ and $P(B)$:

$$ P(\neg A \cap \neg B) = 1 - (P(A) + P(B) - P(A) \cdot P(B)) $$

$$ P(\neg A \cap \neg B) = 1 - P(A) - P(B) + P(A) \cdot P(B) $$

$$ P(\neg A \cap \neg B) = (1 - P(A)) \cdot (1 - P(B)) $$

$$ P(\neg A \cap \neg B) = P(\neg A) \cdot P(\neg B) $$


This concludes the proof that if events $A$ and $B$ are independent, then their complements $\neg A$ and $\neg B$ are also independent. This property holds because of the algebraic relationship between probabilities of complements and the definition of independent events.

## Total Independence & Mutual Independence

Total independence and mutual independence are related concepts in probability theory, but they have slightly different meanings and implications.


**Total independence**, also known as **pairwise independence**, refers to a situation where each pair of events in a collection of events is independent of each other. In other words, if we have a set of events $(A_1, A_2, \ldots, A_n)$, they are considered totally independent if the occurrence or non-occurrence of any event $A_i$ does not provide any information about the occurrence of any other event $A_j$ where $i \neq j$.

Mathematically, for total independence, the following condition must hold for every pair of events $A_i$ and $A_j$ where $i \neq j$:

$$ P(A_i \cap A_j) = P(A_i) \cdot P(A_j) $$


**Total independence** implies that events are independent in a pairwise manner, but it does not necessarily imply that the events are mutually independent when considering larger combinations of events.


**Mutual independence**, also known as **oint independence**, is a stronger concept. Events are considered mutually independent if the occurrence or non-occurrence of any subset of events in a collection does not provide any information about the occurrence of any other event in the collection.

Mathematically, events $A_1, A_2, \ldots, A_n$ are mutually independent if and only if the following condition holds for every possible subset of events:

$$ P\left(\bigcap_{i \in I} A_i\right) = \prod_{i \in I} P(A_i)$$

where $I$ is a subset of ${1, 2, \ldots, n}$.

In other words, **mutual independence** extends the concept of independence to any combination of events, not just pairwise combinations.


The key difference between total independence and mutual independence lies in the scope of their independence. Total independence only considers the pairwise independence of events, while mutual independence considers the independence of any combination of events, including subsets larger than two.

For example, consider three events $A$, $B$, and $C$. If these events are totally independent, it means that $A$ is independent of $B$, $A$ is independent of $C$, and $B$ is independent of $C$. However, mutual independence requires that all possible combinations of these events (e.g., $A$ and $B$, $A$ and $C$, $B$ and $C$, and $A$, $B$, and $C$ together) are independent.

In summary, total independence considers independence between pairs of events, while mutual independence considers independence among any combination of events, including larger subsets.

## Theorem of Total Probability

The **Theorem of Total Probability** is a fundamental concept in probability theory that provides a way to calculate the probability of an event $B$ by considering all possible ways in which $B$ can occur, based on a partition of the sample space. It is particularly useful when dealing with complex scenarios that can be broken down into simpler cases.


Let $B_1, B_2, \ldots, B_n$ be a partition of the sample space $S$ (i.e., the events $B_i$ are mutually exclusive and their union covers the entire sample space). 
<img src='imgs/total_probability_theorem.jpg' alt='total_probability_theorem.jpg' width=150>

Then, for any event $A$, the theorem of total probability states:

$$ P(A) = \sum_{i=1}^n P(A \cap B_i) = \sum_{i=1}^n P(B_i) \cdot P(A|B_i) $$

In other words, the probability of event $A$ is the sum of the probabilities of $A$ occurring within each partition $B_i$.

The theorem of total probability can be thought of as a way to "average out" the probability of event $A$ over all possible partitions of the sample space. It breaks down the problem of finding $P(A)$ into simpler cases defined by the different partitions.

### Example:

Consider an example where we have two boxes, Box 1 and Box 2, containing colored balls. Box 1 contains 4 red balls and 6 blue balls, while Box 2 contains 3 red balls and 7 blue balls. You randomly choose a box, and then randomly draw a ball from that box. Let $A$ be the event that you draw a red ball, and let $B_1$ and $B_2$ be the events that you choose Box 1 and Box 2, respectively.

We can partition the sample space as follows:
- $B_1$: Choose Box 1.
- $B_2$: Choose Box 2.

Using the theorem of total probability, we can calculate $P(A)$ as:

$$ P(A) = P(A \cap B_1) + P(A \cap B_2) $$


Calculate each term:

$$P(A \cap B_1) = P(A|B_1) \cdot P(B_1) = \frac{4}{10} \cdot \frac{1}{2} = \frac{2}{10}$$


$$P(A \cap B_2) = P(A|B_2) \cdot P(B_2) = \frac{3}{10} \cdot \frac{1}{2} = \frac{3}{20}$$


Summing the terms:

$$ P(A) = \frac{2}{10} + \frac{3}{20} = \frac{7}{20} $$


In this example, we calculate the probability of drawing a red ball using the theorem of total probability by considering two partitions of the sample space: choosing Box 1 and choosing Box 2.

In [38]:
# Probabilities
p_box1 = 0.5
p_box2 = 0.5
p_red_given_box1 = 4 / 10
p_red_given_box2 = 3 / 10

# Calculate using the theorem of total probability
p_red = (p_red_given_box1 * p_box1) + (p_red_given_box2 * p_box2)

# Print the result
print("Probability of drawing a red ball:", p_red)

Probability of drawing a red ball: 0.35


### Expressing Bayes' Theorem using the Theorem of Total Probability:

Bayes' Theorem can be expressed in terms of the Theorem of Total Probability when dealing with events and their partitions. Recall the Theorem of Total Probability, which states that for any event $A$ and a partition of the sample space $B_1, B_2, \ldots, B_n$:

$$ P(A) = \sum_{i=1}^n P(A \cap B_i) = \sum_{i=1}^n P(B_i) \cdot P(A|B_i) $$

Now, let's consider two events, $A$ and $B$, and their partitions $B_1, B_2, \ldots, B_n$, where $B_1, B_2, \ldots, B_n$ form a partition of the sample space. Bayes' Theorem relates the conditional probability $P(B_i|A)$ to the conditional probability $P(A|B_i)$:


$$ P(B_i|A) = \frac{P(B_i) \cdot P(A|B_i)}{P(A)}\ ,i=1, 2, \ldots, n$$


The expression of Bayes' Theorem in terms of the Theorem of Total Probability can be derived from the above two equations:


$$ P(B_i|A) = \frac{P(B_i) \cdot P(A|B_i)}{\sum_{i=1}^n P(B_i) \cdot P(A|B_i)}\ ,i=1, 2, \ldots, n $$


This is also known as **Theorem of Probability of Causes**.

Bayes' Theorem, which relates conditional probabilities of events, can be derived and expressed using the Theorem of Total Probability, which deals with the probability of events within partitions of the sample space. The two theorems are closely related and provide a powerful framework for updating probabilities based on new evidence or information.

### Example
Consider a scenario involving email spam filtering. Suppose we have the following information:

- $ P(S) $: The prior probability that an email is spam (e.g., $0.2$, indicating $20\%$ of incoming emails are spam).
- $ P(F|S) $: The probability that the spam filter correctly classifies an email as spam (e.g., $0.95$, indicating a $95\%$ true positive rate for spam).
- $ P(F|\neg S) $: The probability that the spam filter incorrectly classifies a non-spam email as spam (e.g., $0.02$, indicating a $2\%$ false positive rate).

We want to calculate $ P(S|F) $, the probability that an email is spam given that it has been classified as spam by the filter.

In [43]:
def probability_of_cause_given_effect(prior_prob, prob_effect_given_cause, prob_effect_given_not_cause):
    total_prob_effect = (prior_prob * prob_effect_given_cause) + ((1 - prior_prob) * prob_effect_given_not_cause)
    prob_cause_given_effect = (prior_prob * prob_effect_given_cause) / total_prob_effect
    return prob_cause_given_effect

# Given probabilities
p_spam = 0.2
p_filter_classifies_spam_given_spam = 0.95
p_filter_classifies_spam_given_non_spam = 0.02

# Calculate using the theorem of probability of causes (Bayes' Theorem)
probability_spam_given_classified_as_spam = probability_of_cause_given_effect(
    p_spam, p_filter_classifies_spam_given_spam, p_filter_classifies_spam_given_non_spam
)

# Print the result
print("Probability that an email is spam given it's classified as spam:", probability_spam_given_classified_as_spam)

Probability that an email is spam given it's classified as spam: 0.9223300970873786



## Probability Distribution

A probability distribution assigns probabilities to each outcome in the sample space. There are two types of probability distributions:

- **Discrete Probability Distribution**: Assigns probabilities to individual outcomes. It's typically used for situations with a countable number of outcomes, like rolling a die.

- **Continuous Probability Distribution**: Assigns probabilities to ranges of outcomes. It's used for situations where outcomes can take any value within a range, like measuring height.


Some common probability distributions include:

- **Binomial Distribution**: Describes the number of successes in a fixed number of independent Bernoulli trials.

- **Poisson Distribution**: Models the number of events occurring in a fixed interval of time or space.

- **Normal (Gaussian) Distribution**: Characterized by its bell-shaped curve and is commonly used to model natural phenomena.

- **Exponential Distribution**: Describes the time between events in a Poisson process.



## Random Variables

A random variable is a function that assigns a numerical value to each outcome in the sample space. It can be discrete (taking on distinct values) or continuous (taking on any value within a range). The probability distribution of a random variable is described by a probability mass function (for discrete) or a probability density function (for continuous).


