<img src="./images/banner.png" width="800">

# Common Discrete Probability Distributions

Welcome to our lecture on Common Discrete Probability Distributions. In this session, we'll explore some of the most frequently encountered probability distributions for discrete random variables. These distributions serve as fundamental building blocks in probability theory, statistics, and machine learning.


در واقع توزیع احتمالاتی یک مدله که انطباقش با فیزیک مسئله باید تحقیق و validate  بشه

**What are discrete probability distributions?**

Discrete probability distributions describe the probability of occurrence for each value of a discrete random variable. Remember, a discrete random variable can only take on a countable number of distinct values, such as integers or a finite set of outcomes.


**Why are they important?**

1. **Modeling Real-World Phenomena**: Many real-world events naturally follow discrete distributions. For example:
   - The number of defective items in a production batch
   - The count of customers arriving at a store in an hour
   - The number of successes in a series of trials

2. **Foundation for Statistical Inference**: These distributions form the basis for many statistical tests and estimation procedures.

3. **Machine Learning Applications**: Many machine learning algorithms, especially in classification and natural language processing, rely on discrete probability distributions.

4. **Data Science Tools**: Understanding these distributions helps in data analysis, hypothesis testing, and predictive modeling.


**Distributions we'll cover:**

1. Bernoulli Distribution
2. Binomial Distribution
3. Poisson Distribution
4. Geometric Distribution
5. Negative Binomial Distribution
6. Hypergeometric Distribution


For each distribution, we'll discuss:
- Its definition and key properties
- The probability mass function (PMF)
- Mean and variance
- Typical applications and examples


**Why learn multiple distributions?**

Different phenomena in nature and various data-generating processes follow different probability distributions. By understanding a range of distributions, you'll be better equipped to:

- Choose the appropriate model for your data
- Make more accurate predictions and inferences
- Understand the assumptions and limitations of statistical models


As we progress through this lecture, try to think about real-world scenarios where each distribution might apply. This will help you develop an intuition for when and how to use these powerful mathematical tools in your data science and machine learning projects.


Let's begin our journey through the world of discrete probability distributions!

**Table of contents**<a id='toc0_'></a>    
- [Bernoulli Distribution](#toc1_)    
  - [Probability Mass Function](#toc1_1_)    
  - [Mean and Variance](#toc1_2_)    
  - [Examples and Applications](#toc1_3_)    
- [Binomial Distribution](#toc2_)    
  - [Relationship to Bernoulli Distribution](#toc2_1_)    
  - [Probability Mass Function](#toc2_2_)    
  - [Mean and Variance](#toc2_3_)    
  - [Examples and Applications](#toc2_4_)    
- [Multinomial Distribution](#toc3_)    
  - [Relationship to Binomial Distribution](#toc3_1_)    
  - [Probability Mass Function](#toc3_2_)    
  - [Mean and Variance](#toc3_3_)    
  - [Examples and Applications](#toc3_4_)    
- [Poisson Distribution](#toc4_)    
  - [Examples of Poisson distributions](#toc4_1_)    
  - [Probability mass function](#toc4_2_)    
  - [Visual representation](#toc4_3_)    
  - [Mean and variance](#toc4_4_)    
  - [Applications in data science and machine learning](#toc4_5_)    
- [(Optional) Geometric Distribution](#toc5_)    
  - [When to use a Geometric distribution](#toc5_1_)    
  - [Probability mass function](#toc5_2_)    
  - [Mean and variance](#toc5_3_)    
  - [Visual representation](#toc5_4_)    
  - [Examples of Geometric distributions](#toc5_5_)    
  - [Applications in data science and machine learning](#toc5_6_)    
  - [Relationship to other distributions](#toc5_7_)    
- [Comparison of Distributions](#toc6_)    
  - [When to use each distribution](#toc6_1_)    
  - [Relationships between distributions](#toc6_2_)    
  - [Comparison table](#toc6_3_)    
  - [Key considerations for selection](#toc6_4_)    
  - [Practical example: Customer behavior modeling](#toc6_5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Bernoulli Distribution](#toc0_)

The Bernoulli distribution is one of the simplest discrete probability distributions, yet it forms the foundation for many more complex distributions and statistical concepts.


The Bernoulli distribution models a random experiment with exactly two possible outcomes, typically labeled as "success" and "failure."


**Key Properties:**
1. It's defined by a single parameter p, which represents the probability of success.
2. The probability of failure is q = 1 - p.
3. Each trial is independent.
4. It's a special case of the Binomial distribution with n = 1 trial.


<img src="./images/tmp/bernoulli.webp" width="600">

### <a id='toc1_1_'></a>[Probability Mass Function](#toc0_)


For a Bernoulli random variable X:

$P(X = x) = \begin{cases} 
p & \text{if } x = 1 \text{ (success)} \\
1-p & \text{if } x = 0 \text{ (failure)}
\end{cases}$


This can be written more compactly as:

$P(X = x) = p^x(1-p)^{1-x}, \text{ for } x \in \{0, 1\}$


### <a id='toc1_2_'></a>[Mean and Variance](#toc0_)


For a Bernoulli(p) distribution:

**Mean (Expected Value):**
- $E[X] = p$


**Variance:**
- $Var(X) = p(1-p)$


### <a id='toc1_3_'></a>[Examples and Applications](#toc0_)


1. **Coin Flip**: 
   The classic example of a Bernoulli trial. For a fair coin, p = 0.5.

2. **Quality Control**: 
   Testing if a manufactured item is defective (success) or not (failure).

3. **Medical Tests**: 
   The outcome of a diagnostic test (positive or negative).

4. **Click-through Rate**: 
   In digital marketing, whether a user clicks on an ad (success) or not (failure).

5. **Machine Learning**:
   - Binary classification problems (e.g., spam detection)
   - Dropout regularization in neural networks

6. **Financial Modeling**:
   Modeling the occurrence of rare events, like defaults in credit risk models.

7. **A/B Testing**:
   Comparing the performance of two versions of a website or app.


Understanding the Bernoulli distribution is crucial as it forms the basis for more complex distributions and many statistical concepts. Its simplicity makes it a powerful tool for modeling binary outcomes, which are prevalent in various fields of study and real-world applications.

## <a id='toc2_'></a>[Binomial Distribution](#toc0_)

<img src="./images/tmp/binomial.jpg" width="600">

The Binomial distribution is a fundamental discrete probability distribution that models the number of successes in a fixed number of independent Bernoulli trials.


A random variable X follows a Binomial distribution if:

1. There is a fixed number of trials (n).
2. Each trial is independent.
3. Each trial has two possible outcomes (success or failure).
4. The probability of success (p) is constant for each trial.


We denote this as X ~ Bin(n, p), where:
- n is the number of trials
- p is the probability of success on each trial


<img src="./images/tmp/binomial-dist.jpg" width="600">

<img src="./images/tmp/De_moivre-laplace.gif" width="600">

> **Note:** If the sample size for binomial distribution is large enough, its shape will be quite similar to that of normal distribution. This is known as the **De Moivre-Laplace theorem**. You can see this in the image above. You will learn more about normal distribution in the next lecture.

### <a id='toc2_1_'></a>[Relationship to Bernoulli Distribution](#toc0_)


The Binomial distribution is essentially a sum of n independent Bernoulli trials:

If X₁, X₂, ..., Xn are independent Bernoulli(p) random variables, then:

- X = X₁ + X₂ + ... + Xn ~ Bin(n, p)

In other words, a Binomial distribution with n=1 is equivalent to a Bernoulli distribution.


### <a id='toc2_2_'></a>[Probability Mass Function](#toc0_)


For a Binomial(n, p) distribution, the probability of exactly k successes is given by:

$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$

Where:
- $\binom{n}{k}$ is the binomial coefficient ("n choose k")
- k ranges from 0 to n


### <a id='toc2_3_'></a>[Mean and Variance](#toc0_)


For X ~ Bin(n, p):

**Mean (Expected Value):**
- $E[X] = np$

**Variance:**
- $Var(X) = np(1-p)$


### <a id='toc2_4_'></a>[Examples and Applications](#toc0_)


1. **Coin Flips**: 
   Number of heads in n coin flips.

2. **Quality Control**: 
   Number of defective items in a batch of n products.

3. **Epidemiology**: 
   Number of individuals contracting a disease in a population of size n.

4. **A/B Testing**: 
   Number of conversions in n trials of a new website design.

5. **Elections**: 
   Modeling the number of votes for a candidate in n districts.

6. **Machine Learning**:
   - Feature selection (e.g., number of relevant features out of n total)
   - Ensemble methods (e.g., number of classifiers agreeing on a prediction)

7. **Natural Language Processing**:
   Modeling word frequencies in text analysis.


Understanding the Binomial distribution is crucial in many areas of data science and machine learning, particularly in scenarios involving repeated trials or counts of successes. Its versatility makes it a powerful tool for modeling and analysis in various fields.

## <a id='toc3_'></a>[Multinomial Distribution](#toc0_)

The Multinomial distribution is a generalization of the Binomial distribution to scenarios where there are more than two possible outcomes for each trial.


A random variable X follows a Multinomial distribution if:

1. There is a fixed number of independent trials (n).
2. Each trial results in exactly one of k possible outcomes.
3. The probability of each outcome remains constant from trial to trial.
4. The trials are independent.


We denote this as X ~ Multinomial(n, p₁, p₂, ..., pₖ), where:
- n is the number of trials
- pᵢ is the probability of outcome i, and Σpᵢ = 1


<img src="./images/tmp/trinomial-dist.png" width="600">

> A trinomial distribution is a special case of the multinomial distribution where there are three possible outcomes. In the image above, we see a trinomial distribution with three possible outcomes: x₁, x₂, and x₃. The probabilities of these outcomes are p₁, p₂, and p₃, respectively. The total number of trials is n. In this case, the multinomial distribution is Multinomial(n, p₁, p₂, p₃). x₁ and x₂ are shown on the x-axis, and the probability mass function (PMF) is represented by the height of the bars. x₃ is not shown separately but can be calculated as x₃ = n - (x₁ + x₂).

### <a id='toc3_1_'></a>[Relationship to Binomial Distribution](#toc0_)


The Multinomial distribution is a multivariate generalization of the Binomial distribution. When k = 2, the Multinomial distribution reduces to the Binomial distribution.


### <a id='toc3_2_'></a>[Probability Mass Function](#toc0_)


For a Multinomial(n, p₁, p₂, ..., pₖ) distribution, the probability of observing x₁ occurrences of outcome 1, x₂ occurrences of outcome 2, and so on, is given by:

$P(X_1 = x_1, X_2 = x_2, ..., X_k = x_k) = \frac{n!}{x_1! x_2! ... x_k!} p_1^{x_1} p_2^{x_2} ... p_k^{x_k}$

Where:
- Σxᵢ = n
- x₁, x₂, ..., xₖ are non-negative integers


### <a id='toc3_3_'></a>[Mean and Variance](#toc0_)


For each component Xᵢ of a Multinomial(n, p₁, p₂, ..., pₖ) distribution:

**Mean (Expected Value):**
- $E[X_i] = np_i$

**Variance:**
- $Var(X_i) = np_i(1-p_i)$

**Covariance:**
- $Cov(X_i, X_j) = -np_ip_j$ for i ≠ j


### <a id='toc3_4_'></a>[Examples and Applications](#toc0_)


1. **Dice Rolls**: 
   Counting occurrences of each number when rolling a die n times.

2. **Market Research**: 
   Modeling consumer choices among multiple product options.

3. **Genetics**: 
   Distribution of genotypes in a population.

4. **Natural Language Processing**:
   - Modeling word frequencies in text analysis
   - Topic modeling in document classification

5. **Machine Learning**:
   - Naive Bayes classifiers for multi-class problems
   - Modeling categorical data in various algorithms

6. **Ecology**: 
   Species distribution in different habitats.

7. **Political Science**: 
   Modeling voting outcomes in multi-party systems.


Understanding the Multinomial distribution is crucial in many areas of data science and machine learning, especially when dealing with categorical data or multi-class classification problems. Its ability to model multiple outcomes makes it a versatile tool for various real-world applications.

## <a id='toc4_'></a>[Poisson Distribution](#toc0_)

A Poisson distribution is a discrete probability distribution that models the number of events occurring within a fixed interval of time or space. It's particularly useful for count data, where we're interested in the number of times an event happens.


<img src="./images/tmp/poisson-dist.webp" width="600">

Key characteristics of a Poisson distribution:

1. It deals with discrete, countable outcomes (represented by k).
2. Events occur randomly and independently.
3. The average rate of occurrence (λ) is known and constant.
4. λ is the only parameter needed to describe the distribution.


You can use a Poisson distribution if:

1. Individual events happen randomly and independently.
2. You know the mean number of events (λ) occurring within a given interval.
3. The occurrence of one event doesn't affect the probability of another event.


### <a id='toc4_1_'></a>[Examples of Poisson distributions](#toc0_)


1. **Classic example: Horse kick deaths**
   - Studied by Ladislaus Bortkiewicz in the late 1800s
   - Analyzed deaths by horse kicks in Prussian army corps
   - Found a mean of 0.61 deaths per corps per year (λ = 0.61)
   - Most years had zero deaths, but some had up to four

2. **Modern applications:**
   - Text messages received per hour
   - Website visitors per day
   - Machine malfunctions per month
   - Rare disease cases per year in a population


### <a id='toc4_2_'></a>[Probability mass function](#toc0_)


The probability mass function (PMF) of a Poisson distribution is given by:

$P(X = k) = \frac{e^{-λ} λ^k}{k!}$

Where:
- e is Euler's number (approximately 2.71828)
- λ is the average number of events per interval
- k is the number of events we're calculating the probability for


### <a id='toc4_3_'></a>[Visual representation](#toc0_)


Poisson distributions can be visualized as graphs of their probability mass function:

- The peak of the distribution represents the most probable number of events (the mode).
- For low λ values, the distribution is right-skewed.
- As λ increases (≥ 10), the distribution approximates a normal distribution.


[Placeholder for graph showing Poisson distributions with different λ values]


### <a id='toc4_4_'></a>[Mean and variance](#toc0_)


A unique property of the Poisson distribution is that its mean and variance are both equal to λ:

- Mean (expected value): E[X] = λ
- Variance: Var(X) = λ


This equality of mean and variance is a distinguishing feature of the Poisson distribution.


### <a id='toc4_5_'></a>[Applications in data science and machine learning](#toc0_)


1. **Anomaly detection:** Identifying unusual patterns in event occurrences
2. **Text analysis:** Modeling the occurrence of rare words in documents
3. **Network traffic:** Predicting packet arrivals in computer networks
4. **Customer behavior:** Modeling purchase frequencies or service requests


The Poisson distribution is closely related to the Binomial distribution. As the number of trials in a Binomial distribution approaches infinity and the probability of success approaches zero, while their product remains constant, the Binomial distribution approaches a Poisson distribution. This relationship is known as the Poisson limit theorem.


Understanding the Poisson distribution is crucial for data scientists and analysts working with count data or rare event occurrences. Its simplicity and well-defined properties make it a powerful tool for modeling and predicting in various fields, from quality control to epidemiology.

## <a id='toc5_'></a>[(Optional) Geometric Distribution](#toc0_)

A Geometric distribution is a discrete probability distribution that models the number of trials needed to achieve the first success in a sequence of independent Bernoulli trials. It's often described as the "wait until success" distribution.


<img src="./images/tmp/geometric-dist.png" width="800">

Key characteristics of a Geometric distribution:

1. It deals with discrete, countable outcomes (represented by k).
2. Trials are independent and identically distributed.
3. Each trial has only two possible outcomes: success or failure.
4. The probability of success (p) remains constant for each trial.


### <a id='toc5_1_'></a>[When to use a Geometric distribution](#toc0_)


You can use a Geometric distribution if:

1. You're interested in the number of trials until the first success occurs.
2. Each trial is independent of the others.
3. The probability of success remains constant across all trials.
4. You're dealing with a sequence of yes/no questions or pass/fail scenarios.


### <a id='toc5_2_'></a>[Probability mass function](#toc0_)


The probability mass function (PMF) of a Geometric distribution is given by:

$P(X = k) = (1-p)^{k-1}p$

Where:
- p is the probability of success on each trial
- k is the number of trials until the first success (k = 1, 2, 3, ...)


This formula gives the probability of getting the first success on the kth trial.


### <a id='toc5_3_'></a>[Mean and variance](#toc0_)


For a Geometric distribution:

- Mean (expected value): E[X] = 1/p
- Variance: Var(X) = (1-p)/p²


These formulas provide insights into the average number of trials needed for success and the spread of possible outcomes.


### <a id='toc5_4_'></a>[Visual representation](#toc0_)


Geometric distributions can be visualized as graphs of their probability mass function:

- The distribution is always right-skewed.
- As p increases, the peak of the distribution shifts left, indicating fewer trials are likely needed for success.


[Placeholder for graph showing Geometric distributions with different p values]


### <a id='toc5_5_'></a>[Examples of Geometric distributions](#toc0_)


1. **Classic example: Coin flips until heads**
   - Flipping a fair coin until you get heads
   - p = 0.5 for each flip
   - The number of flips needed follows a Geometric distribution

2. **Modern applications:**
   - Number of sales calls until making a sale
   - Number of job applications until getting an offer
   - Number of attempts until passing a test
   - Number of days until it rains in a dry climate


### <a id='toc5_6_'></a>[Applications in data science and machine learning](#toc0_)


1. **Survival analysis:** Modeling time until an event occurs
2. **Quality control:** Number of items inspected until finding a defect
3. **Customer behavior:** Modeling customer churn or time between purchases
4. **Network reliability:** Time until a system failure occurs


### <a id='toc5_7_'></a>[Relationship to other distributions](#toc0_)


The Geometric distribution is closely related to the Exponential distribution, its continuous counterpart. If you were to measure the time between successes in a Poisson process, that time would follow an Exponential distribution.


A unique characteristic of the Geometric distribution is its **"memory-less"** property. This means that the probability of success on the next trial is always p, regardless of how many failures have occurred previously. Mathematically:

$P(X > n + k | X > n) = P(X > k)$


This property makes the Geometric distribution particularly useful in modeling scenarios where past failures don't influence future probabilities of success.


Understanding the Geometric distribution is valuable for data scientists and analysts working with scenarios involving repeated trials until a success occurs. Its simplicity and well-defined properties make it a powerful tool for modeling and predicting in various fields, from marketing to reliability engineering.

## <a id='toc6_'></a>[Comparison of Distributions](#toc0_)

Understanding when to use each discrete probability distribution and how they relate to one another is crucial for effective data analysis and modeling. This section provides a comprehensive comparison of the distributions we've covered.


### <a id='toc6_1_'></a>[When to use each distribution](#toc0_)


1. **Bernoulli Distribution**
   - Use when: Modeling a single trial with only two possible outcomes (success/failure).
   - Examples: Coin flip, yes/no survey response, pass/fail test.

2. **Binomial Distribution**
   - Use when: Counting the number of successes in a fixed number of independent Bernoulli trials.
   - Examples: Number of heads in 10 coin flips, number of defective items in a batch of 100.

3. **Multinomial Distribution**
   - Use when: Counting outcomes in a fixed number of trials with more than two possible outcomes.
   - Examples: Roll outcomes for multiple dice, market share among several competitors.

4. **Poisson Distribution**
   - Use when: Counting rare events in a fixed interval of time or space, with a known average rate.
   - Examples: Number of customer arrivals per hour, number of typos per page.

5. **Geometric Distribution**
   - Use when: Counting the number of trials until the first success occurs.
   - Examples: Number of sales calls until a sale, number of coin flips until heads appears.


### <a id='toc6_2_'></a>[Relationships between distributions](#toc0_)


1. **Bernoulli and Binomial**
   - A Binomial distribution with n=1 is equivalent to a Bernoulli distribution.
   - A Binomial distribution is the sum of n independent Bernoulli trials.

2. **Binomial and Multinomial**
   - The Multinomial distribution is a generalization of the Binomial to more than two outcomes.
   - If you collapse a Multinomial into two categories, it becomes a Binomial.

3. **Binomial and Poisson**
   - As n increases and p decreases in a Binomial(n,p), keeping np constant, it approaches a Poisson distribution.
   - This relationship is known as the Poisson limit theorem.

4. **Geometric and Negative Binomial**
   - The Geometric distribution is a special case of the Negative Binomial, where we're only interested in the first success.

5. **Poisson and Exponential**
   - If events occur according to a Poisson process, the time between events follows an Exponential distribution.


### <a id='toc6_3_'></a>[Comparison table](#toc0_)


| Distribution | Parameters | Mean | Variance | Use Case |
|--------------|------------|------|----------|----------|
| Bernoulli    | p          | p    | p(1-p)   | Single binary trial |
| Binomial     | n, p       | np   | np(1-p)  | Fixed trials, binary outcomes |
| Multinomial  | n, p₁...pₖ | npᵢ  | npᵢ(1-pᵢ) | Fixed trials, multiple outcomes |
| Poisson      | λ          | λ    | λ        | Rate of rare events |
| Geometric    | p          | 1/p  | (1-p)/p² | Trials until first success |


### <a id='toc6_4_'></a>[Key considerations for selection](#toc0_)


1. **Nature of the event:** Is it a count, a rate, or a "time until" scenario?
2. **Number of trials:** Fixed or potentially infinite?
3. **Number of possible outcomes:** Two or more?
4. **Independence:** Are events independent of each other?
5. **Constancy:** Does the probability of success remain constant?


### <a id='toc6_5_'></a>[Practical example: Customer behavior modeling](#toc0_)


Consider modeling different aspects of customer behavior:

1. Whether a customer makes a purchase on a visit (Bernoulli)
2. Number of purchases in 10 visits (Binomial)
3. Choice among multiple product categories (Multinomial)
4. Number of customer support requests per day (Poisson)
5. Number of visits until a purchase is made (Geometric)


Understanding these relationships and selection criteria allows data scientists to choose the most appropriate distribution for their specific scenario, leading to more accurate models and insights.


Remember, while these theoretical distributions are powerful tools, real-world data often doesn't perfectly fit any single distribution. It's crucial to validate your assumptions and consider more complex models when necessary.