```{contents}

```


## Rules of Probability 

- **Combining and Relating Events**

These rules are the bedrock for calculating probabilities of complex events.

### Addition Rule (for "OR" events - Union of Events)

This rule helps calculate the probability that at least one of two (or more) events occurs.

* **Mutually Exclusive Events (Disjoint Events):** Events that cannot occur at the same time. Their intersection is empty.
    * *Formal Definition:* $A \cap B = \emptyset$
    * *Formula:* $P(A \cup B) = P(A) + P(B)$
    * *Deeper Example:* In a deck of cards, drawing a King (Event A) and drawing a Queen (Event B) are mutually exclusive. You cannot draw a card that is both a King and a Queen.
        $P(\text{King or Queen}) = P(\text{King}) + P(\text{Queen}) = \frac{4}{52} + \frac{4}{52} = \frac{8}{52} = \frac{2}{13}$

* **Non-Mutually Exclusive Events (Overlapping Events):** Events that can occur at the same time. Their intersection is not empty.
    * *Formal Definition:* $A \cap B \neq \emptyset$
    * *Formula:* $P(A \cup B) = P(A) + P(B) - P(A \cap B)$
        * The subtraction of $P(A \cap B)$ is crucial to avoid double-counting the outcomes that are common to both A and B.
    * *Deeper Example:* What is the probability of drawing a face card (Jack, Queen, King) OR a heart from a standard deck?
        * Event A: Drawing a Face Card. There are 12 face cards (3 per suit). $P(A) = \frac{12}{52}$
        * Event B: Drawing a Heart. There are 13 hearts. $P(B) = \frac{13}{52}$
        * Event $A \cap B$: Drawing a face card *and* a heart. These are the King of Hearts, Queen of Hearts, Jack of Hearts. There are 3 such cards. $P(A \cap B) = \frac{3}{52}$
        * $P(\text{Face Card or Heart}) = P(A) + P(B) - P(A \cap B) = \frac{12}{52} + \frac{13}{52} - \frac{3}{52} = \frac{25 - 3}{52} = \frac{22}{52} = \frac{11}{26}$

### Multiplication Rule 

- **"AND" events - Intersection of Events**

This rule calculates the probability that two or more events all occur.

* **Independent Events:** The occurrence of one event does not influence the probability of the other event occurring.
    * *Formal Definition:* $P(B|A) = P(B)$ (The probability of B given A is just the probability of B).
    * *Formula:* $P(A \cap B) = P(A) \times P(B)$
    * *Deeper Example:* Rolling a die and flipping a coin. What is the probability of rolling a 6 AND getting a Head?
        * $P(\text{6}) = \frac{1}{6}$
        * $P(\text{Head}) = \frac{1}{2}$
        * These events are independent.
        * $P(\text{6 and Head}) = P(\text{6}) \times P(\text{Head}) = \frac{1}{6} \times \frac{1}{2} = \frac{1}{12}$

* **Dependent Events:** The occurrence of one event *does* affect the probability of the other event occurring.
    * *Formula:* $P(A \cap B) = P(A) \times P(B|A)$
        * This can be extended for more than two events: $P(A \cap B \cap C) = P(A) \times P(B|A) \times P(C|A \cap B)$
    * *Deeper Example:* Drawing two cards *without replacement* from a deck. What is the probability of drawing two Queens?
        * Event A: Drawing a Queen on the first draw. $P(A) = \frac{4}{52}$
        * Event B: Drawing a Queen on the second draw *given* the first was a Queen and not replaced.
        * After the first Queen is drawn, there are 3 Queens left and 51 total cards.
        * $P(B|A) = \frac{3}{51}$
        * $P(\text{Queen and Queen}) = P(A) \times P(B|A) = \frac{4}{52} \times \frac{3}{51} = \frac{12}{2652} = \frac{1}{221}$

### Conditional Probability
 
 - **Probability Under New Information**

This is a critical concept that quantifies how the probability of an event changes when we know that another event has already occurred. It narrows down the sample space.

* **Formula:** $P(B|A) = \frac{P(A \cap B)}{P(A)}$
    * The "condition" is that event A has already happened. We are now considering only the outcomes within A.
    * $P(A \cap B)$ represents the outcomes where *both* A and B occur.
    * $P(A)$ normalizes this by the size of the new (reduced) sample space (event A).
* **Intuition:** Imagine you're looking at a subset of your original sample space. $P(B|A)$ tells you the proportion of that subset where B also occurs.
* **Deeper Example:** A group of 100 students: 60 study Math, 40 study Science, and 20 study both.
    * $P(\text{Math}) = 60/100 = 0.6$
    * $P(\text{Science}) = 40/100 = 0.4$
    * $P(\text{Math and Science}) = 20/100 = 0.2$
    * What is the probability that a student studies Science GIVEN that they study Math? ($P(\text{Science}|\text{Math})$)
        * We are now only considering the 60 students who study Math. Of those 60, 20 also study Science.
        * Using the formula: $P(\text{Science}|\text{Math}) = \frac{P(\text{Math and Science})}{P(\text{Math})} = \frac{0.2}{0.6} = \frac{1}{3} \approx 0.333$
        * This makes sense: among the 60 math students, 20 also do science. $20/60 = 1/3$.

### Bayes’ Theorem 

- **Updating Beliefs with Evidence**
  
  
Bayes' Theorem is a cornerstone of inferential statistics and machine learning. It provides a way to update the probability of a hypothesis (event A) when new evidence (event B) becomes available. It's about reversing conditional probabilities.

* **Formula:** $P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$
    * $P(A|B)$: **Posterior Probability** - The probability of our hypothesis (A) being true, given the new evidence (B). This is what we want to find.
    * $P(B|A)$: **Likelihood** - The probability of observing the evidence (B) if our hypothesis (A) were true. This comes from our model or knowledge.
    * $P(A)$: **Prior Probability** - Our initial belief about the probability of the hypothesis (A) being true, *before* observing the new evidence.
    * $P(B)$: **Marginal Probability of Evidence** - The total probability of observing the evidence (B), regardless of whether A is true or not. This acts as a normalizing constant. It can be expanded as:
        $P(B) = P(B|A)P(A) + P(B|A^c)P(A^c)$ (where $A^c$ is the complement of A, i.e., A is false).

* **Power of Bayes' Theorem:** It's how we learn from data. We start with a prior belief, observe some data, and then use Bayes' Theorem to get a more informed posterior belief.

* **Deeper Example: Drug Testing**
    * A drug test has a 99% true positive rate ($P(\text{Positive}|\text{User}) = 0.99$) and a 1% false positive rate ($P(\text{Positive}|\text{Non-User}) = 0.01$).
    * 1% of the population uses the drug ($P(\text{User}) = 0.01$).
    * If someone tests positive, what is the probability they actually use the drug? ($P(\text{User}|\text{Positive})$)

    * Let $A = \text{User}$ (has the drug)
    * Let $B = \text{Positive}$ (tests positive)

    * We know:
        * $P(A) = 0.01$ (Prior probability of being a user)
        * $P(A^c) = 1 - P(A) = 0.99$ (Prior probability of being a non-user)
        * $P(B|A) = 0.99$ (Likelihood: probability of positive test given user)
        * $P(B|A^c) = 0.01$ (Likelihood: probability of positive test given non-user - false positive)

    * First, calculate $P(B)$ (Total probability of testing positive):
        $P(B) = P(B|A)P(A) + P(B|A^c)P(A^c)$
        $P(B) = (0.99)(0.01) + (0.01)(0.99) = 0.0099 + 0.0099 = 0.0198$

    * Now, apply Bayes' Theorem:
        $P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} = \frac{0.99 \times 0.01}{0.0198} = \frac{0.0099}{0.0198} = 0.5$

    * **Interpretation:** Even with a positive test, there's only a 50% chance the person actually uses the drug! This counter-intuitive result highlights the importance of the prior probability ($P(A)$) and the false positive rate. Since drug use is rare (1%), a significant portion of positive tests come from false positives among the large non-user population.
