# Probability Cheatsheet for Machine Learning

# Combinatorics

## Goal:
- Calculate quantity
- Calculate all possible cases

## Multiplicative Principle
- Count all possible cases of Sample Space or Event
- Each step has N possibilities
- Total = N1 * N2 * N3 * ... * NK

**Example 1:** 3 shirts, 2 pants. How many combinations? For each shirt, there is each pant possibility.

![image.png](attachment:image.png)

## How to calculate the total possibilities of Sample Space and an Event?

### Two or more sets
- Total = N1 * N2 * N3 * ... * NK

### One Set
#### With repetition
- **Example:** password

![image.png](attachment:image.png)

#### Without repetition

![image.png](attachment:image.png)

##### Without the repetition of diagonal = "Permutation"
- They say that the "order matters," so mirrored cases are kept.
- And without any repetiton in other dimensions like (1,1,3)

$$
\frac{n!}{(n-k)!}
$$

  
  **Example:** n persons to sit on a bench with (k) 3 positions.

##### Without the repetition of diagonal and the mirrored part = "Combination" = "Binomial Coeficient"
- They say that the "order doesn't matter," so mirrored cases are removed.

$$
\frac{n!}{(n-k)!k!}
$$


  **Example:** persons to sit on a bench with 3 positions but remove mirrored cases.




![image.png](attachment:image.png)

# 1.2 Probabilistic Theory

**What is it?**
- The relative size of outcomes
    
**How to calculate the size / quantity?**
- Using Combinatorics

# Basic Probability

## Definitions

* **Experiment:** A process that leads to an uncertain outcome.
* **Sample Space (S):** The set of all possible outcomes of an experiment.
* **Event:** A subset of the sample space.
* **Probability of an Event (P(A)):** A measure of the likelihood of an event A occurring.  $0 \le P(A) \le 1$
* **Impossible Event:** An event with a probability of 0.
* **Certain Event:** An event with a probability of 1.

## Probabilistic Independence VS Mutually exclusive events

### Probabilistic Independence

**Definition**

Two events A and B are independent if the occurrence of one does not affect the probability of the occurrence of the other.  Mathematically, this means:

$P(B|A) = P(B)$  or equivalently  $P(A|B) = P(A)$ and also equivalently $P(A \cap B) = P(A) P(B)$.

**Example**

Consider flipping a fair coin twice. Let A be the event of getting heads on the first flip and B be the event of getting heads on the second flip. Since the outcomes of the two flips don't affect each other, A and B are independent events.


### Mutually exclusive events

Mutually exclusive events cannot occur simultaneously.

> **Caution**: Don't confuse __mutually exclusive__ events with __independent events__.  
> **Mutually exclusive** events are **dependent** events.   
> If A and B are _mutually exclusive_ and have _non-zero probabilities_, knowing that A has occurred means B *cannot* occur, so they are **dependent**.  
> If one of the events has 0 probability, they could be independent.


## Probability Rules

**Sum Rule:**  
For any events A and B: $P(A \cup B) = P(A) + P(B) - P(A \cap B)$.

> For **mutually exclusive** events A and B (i.e., they cannot occur simultaneously), $P(A \cup B) = P(A) + P(B)$.

**Complement Rule:**  
The probability of an event A not occurring (denoted A') is  $P(A') = 1 - P(A)$.

**Conditional Probability:**  
The probability of event B occurring given that event A has already occurred is $P(B|A) = \frac{P(A \cap B)}{P(A)}$, where $P(A) > 0$.


**Question examplo**

![image.png](attachment:image.png)

> Take care when talking about OR. If the problem didn't specify objectvly that the groups are exclusives, then they can be inclusive and their intersection could have elmeents.

## Conditional Probability

**Definition**

* **Conditional Probability:** The probability of an event A occurring given that another event B has already occurred.  Denoted as P(A|B).
* **Formula:** $P(A|B) = \frac{P(A \cap B)}{P(B)}$, where $P(B) > 0$.  This reads as "the probability of A given B".
* **Interpretation:**  Conditional probability __restricts__ the sample space to the event B. We are only interested in the outcomes where B has occurred, and then consider the proportion of those outcomes where A also occurs.

**P(A|B)** = Probabilidade de A ocorrer sabendo que B ocorreu.  
**P(B)** = Probabilidade de B  
**P(A∩B)** = _Probabilidade dos elementos de A que também pertencem a B_  

**P(A∩B)** = P(A)*P(B|A) = P(B)*P(A|B)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# Generative and Discriminative Models
- **Generative Models**: Learn the joint probability \( P(X, Y) \) (e.g., Naive Bayes, Gaussian Mixture Models).
- **Discriminative Models**: Learn the conditional probability \( P(Y | X) \) (e.g., Logistic Regression, SVM).