# üìò Probability Distribution Functions (PDF, PMF, CDF)

Probability distribution functions describe **how probabilities are distributed** over the values of a **random variable**.

A random variable may be:
- **Discrete** ‚Üí countable values  
- **Continuous** ‚Üí infinitely many values  


# 1Ô∏è‚É£ Types of Probability Distribution Functions

There are **three main probability distribution functions**:

1. **PMF ‚Äì Probability Mass Function**  
2. **PDF ‚Äì Probability Density Function**  
3. **CDF ‚Äì Cumulative Distribution Function**

Each applies to different types of variables.

# 2Ô∏è‚É£ PMF ‚Äì Probability Mass Function  
*(Used for Discrete Random Variables)

A **Probability Mass Function (PMF)** gives the probability for each **individual value** of a **discrete random variable**.

Discrete variables include:  
- Dice outcomes  
- Number of students  
- Number of goals scored  


## üé≤ Example: Rolling a Fair Dice

Random variable:  
X = {1, 2, 3, 4, 5, 6}

Since the dice is fair:
P(X = 1) = P(X = 2) = ... = P(X = 6) = 1/6


## üìä PMF Graph Interpretation

- Each bar shows the probability of a value  
- All bars together represent **complete distribution**  

## üîç Example: Find P(X ‚â§ 2)

P(X ‚â§ 2) = P(1) + P(2)
= 1/6 + 1/6
= 1/3


# 3Ô∏è‚É£ CDF ‚Äì Cumulative Distribution Function

The **CDF** gives the **cumulative probability** that random variable X is **less than or equal to** some value x.

CDF: F(x) = P(X ‚â§ x)


## üìà CDF Characteristics
- Always **non-decreasing**  
- Starts at 0 and ends at 1  
- Looks like a **step graph** for discrete variables  


# 4Ô∏è‚É£ PDF ‚Äì Probability Density Function

A **Probability Density Function (PDF)** describes the distribution of a **continuous random variable**.

Examples of continuous variables:
- Age  
- Salary  
- Height  


#### üìå Key Concept:  
For continuous variables:

P(X = a) = 0

Because the probability at a single point is meaningless; instead, we calculate probability **over an interval**.

Example:  
P(35 ‚â§ X ‚â§ 40)


## üìê Probability from PDF = Area Under the Curve

P(a ‚â§ X ‚â§ b) = ‚à´ from a to b of f(x) dx


## üìâ PDF ‚Üí CDF Relationship

The **PDF is the slope/derivative** of the CDF.

PDF = d(CDF) / dx
CDF = ‚à´ PDF dx


## üìå Probability Density = Gradient of CDF

At any point x:  
f(x) = slope of CDF at x


# 5Ô∏è‚É£ Properties of a PDF

1. **Non-negativity**
f(x) ‚â• 0 for all x


2. **Total area under the curve = 1**
‚à´ from -‚àû to +‚àû f(x) dx = 1

(These properties ensure a valid probability distribution.)


# 6Ô∏è‚É£ Different Distribution Types

Common distributions used in statistics and machine learning:


## 1. Bernoulli Distribution (PMF)
- Outcomes are **binary**: 0 or 1  
- Used for yes/no, success/failure events  
- **Discrete**  


## 2. Binomial Distribution (PMF)
- Repeated Bernoulli trials  
- Counts number of successes  
- **Discrete**


## 3. Normal (Gaussian) Distribution (PDF)
(Page 3: bell curve)

- Most common continuous distribution  
- Symmetric  
- Many real-world quantities follow it (height, salary)


## 4. Poisson Distribution (PMF)
- Used for **counting events** in a fixed interval  
- Examples: number of calls, visitors, defects

## 5. Log-Normal Distribution (PDF)
- Used for skewed data (e.g., salaries, prices)


## 6. Uniform Distribution (PMF or PDF)
- All values equally likely  
- Flat probability distribution  


# 7Ô∏è‚É£Dataset Variables ‚Üí PMF/PDF

For a **House Price Prediction dataset**:

| Feature | Variable Type | Distribution Type |
|--------|----------------|-------------------|
| Size of house | Continuous | PDF |
| Number of rooms | Discrete | PMF |
| Location | Categorical | PMF |
| Floor number | Discrete | PMF |
| Sea-side? (Yes/No) | Binary | PMF |
| Price | Continuous | PDF |

Used in:
- **EDA (Exploratory Data Analysis)**  
- **Feature Engineering**  
- **ML modeling**


# SUMMARY
- **PMF** ‚Üí For discrete variables; gives probability of exact values  
- **PDF** ‚Üí For continuous variables; probability = area under curve  
- **CDF** ‚Üí Cumulative probability P(X ‚â§ x)  
- PDF = derivative of CDF  
- CDF = integral of PDF  
- Total area under PDF = 1  
- Different distributions: Bernoulli, Binomial, Normal, Poisson, Log-normal, Uniform