## Probability Mass Function (PMF) and Probability Density Function (PDF)

The **Probability Mass Function (PMF)** and **Probability Density Function (PDF)** are both functions that define probability distributions, but they apply to different types of variables. Simply put, the PMF applies to discrete random variables, and the PDF applies to continuous random variables. Below is a detailed explanation of the differences.

### 1. Probability Mass Function (PMF)

- **Defined for discrete random variables.**
- The PMF assigns a probability to each possible value, representing the probability of a specific outcome.
- The key feature of the PMF is that it calculates the probability of each individual event occurring exactly.

**Example: Coin Toss**

In the case of a coin toss, the possible outcomes are heads (#1#) or tails (#0#). We can use the PMF to calculate the probability of each outcome:

$$ P(X=1) = 0.5 $$  
$$ P(X=0) = 0.5 $$

Where X represents the outcome of the coin toss.

### 2. Probability Density Function (PDF)

- **Defined for continuous random variables.**
- The PDF does not give the probability of a specific value but instead provides the probability density, allowing the calculation of probabilities over a range of values.
- Since continuous variables can take on an infinite number of values, we calculate probabilities over intervals rather than specific points.

**Example: Height Distribution**

For human heights, which are continuous, the probability of a specific height is effectively zero. Instead, we calculate the probability of a height falling within a range, such as between 160 cm and 170 cm. To calculate this probability, we integrate the PDF over the range:

$$ P(160 \leq X \leq 170) = \int_{160}^{170} f(x) \, dx $$

Where #X# represents the height and #f(x)# is the PDF.

### 3. Differences between PMF and PDF

- **Target Variables:**
  - PMF is used for discrete random variables.
  - PDF is used for continuous random variables.
  
- **Probability Interpretation:**
  - PMF gives the probability of a specific outcome.
  - PDF gives the probability density and is used to calculate the probability over a range of values.
  
- **Probability Calculation:**
  - In PMF, we calculate the probability for each individual value.
  - In PDF, we calculate probabilities over intervals by integrating the PDF.

### 4. Example of Differences

**PMF Example: Dice Roll**

In a dice roll, the possible outcomes are discrete values (1, 2, 3, 4, 5, 6), and the PMF can be used to find the probability of each outcome:

$$ P(X=1) = P(X=2) = P(X=3) = P(X=4) = P(X=5) = P(X=6) = \frac{1}{6} $$

**PDF Example: Human Height**

For human height, the probability of getting a specific value, like exactly 170 cm, is essentially zero. Instead, we calculate the probability of the height falling within a range, say between 160 cm and 170 cm, using the PDF:

$$ P(160 \leq X \leq 170) = \int_{160}^{170} f(x) \, dx $$

## Probability Mass Function (PMF) and Probability Density Function (PDF) - Y-Axis Explanation

The **Probability Density Function (PDF)** represents the distribution of continuous random variables, so the y-axis in a PDF plot represents the **probability density**. It is important to note that the y-axis values are not probabilities, but probability densities.

Since continuous values exist in a PDF, it is not possible to calculate the probability for a specific value directly. Instead, the probability density is used to calculate the probability over a range of values. Specifically, the probability density shows how concentrated the probability is for each value.

To obtain the probability over an interval in a PDF, you need to integrate over that interval. In other words, the probability for a specific value is zero, but the probability over an interval is calculated by integrating the PDF over that range.

**Example: Height Distribution**

For instance, if human heights follow a normal distribution, we calculate the probability that the height falls within a specific range, such as between 160 cm and 170 cm. In this case, the y-axis represents the probability density.

In a normal distribution, the y-axis values represent the probability density, and these values are not the probability themselves. Instead, they are needed to compute the probability over a specific range. The y-axis values represent the density for infinitesimally small intervals.

### Summary:
- In PMF, the y-axis represents **probability** and shows the probability for each discrete value.
- In PDF, the y-axis represents **probability density** and shows the density for each continuous value. This value itself is not a probability, and the probability over an interval is computed by integrating the PDF.

Thus, the main difference is that in PMF, the y-axis represents probability, while in PDF, the y-axis represents probability density.


## Geometric Distribution

The **Geometric Distribution** is a discrete probability distribution that models the number of failures before the first success. This distribution represents the number of failures in a sequence of independent Bernoulli trials before the first success occurs. In simple terms, the geometric distribution models the number of trials until the first success.

### Properties of Geometric Distribution:
- It is a discrete random variable.
- Each trial has two possible outcomes (success or failure) as in Bernoulli trials.
- Trials are independent, meaning the outcome of one trial does not affect the outcome of another.
- The probability of success is constant across trials.

### Probability Mass Function (PMF) of Geometric Distribution
In the geometric distribution, the probability mass function (PMF) gives the probability that the first success will occur on the $(k+1)$-th trial. In other words, it represents the probability of having $k$ failures before the first success.

The PMF of a geometric distribution is given by:

$$
P(X = k) = (1 - p)^k \cdot p
$$

Where:
- $X$ is the number of trials until the first success (i.e., number of failures before the first success + 1),
- $p$ is the probability of success on each trial,
- $k$ is the number of failures before the first success.

#### Example:
For example, if the probability of getting heads in a coin toss is 0.5, we can use the geometric distribution to model the number of tosses until the first head appears. In this case, $p = 0.5$, and the probability that the first head will appear on the $(k+1)$-th toss is:

$$
P(X = k) = (1 - 0.5)^k \cdot 0.5 = 0.5^{k+1}
$$

Thus, the probability that the first head appears on the 3rd toss is:

$$
P(X = 2) = 0.5^3 = 0.125
$$

### Mean and Variance of Geometric Distribution
The mean and variance of a geometric distribution are as follows:

- **Mean (Expected Value):** The average number of trials until the first success is given by:

$$
E(X) = \frac{1}{p}
$$

- **Variance:** The variance in the number of trials until the first success is:

$$
Var(X) = \frac{1 - p}{p^2}
$$

### Use Cases of Geometric Distribution
The geometric distribution is useful in modeling scenarios where we are interested in the number of trials until the first success. Some examples include:

- **Customer Service:** The number of customer interactions until the first positive feedback is received.
- **Manufacturing:** The number of defective products produced before the first non-defective product is produced.
- **Gaming:** The number of attempts until the first victory in a game.

### Relation to Other Distributions
The geometric distribution is related to the **Negative Binomial Distribution**. The negative binomial distribution models the number of failures before achieving a specified number of successes. In other words, the geometric distribution focuses on the number of failures until the first success, whereas the negative binomial distribution focuses on the number of failures until a given number of successes.

### Summary:
- The geometric distribution models the number of failures before the first success in a sequence of independent Bernoulli trials.
- **Probability Mass Function (PMF):** 
$$
P(X = k) = (1 - p)^k \cdot p
$$
- **Mean:** 
$$
E(X) = \frac{1}{p}
$$
- **Variance:** 
$$
Var(X) = \frac{1 - p}{p^2}
$$
- The geometric distribution is useful for modeling real-world scenarios that involve waiting for the first success.


## Negative Hypergeometric Distribution

The **Negative Hypergeometric Distribution** is a discrete probability distribution that models the number of successful items drawn from a population with both successes and failures, up to a given number of failures. It represents the number of successful items in a sample when the sampling process is without replacement.

In simple terms, the negative hypergeometric distribution is used to model the count of successful items within a fixed number of trials in situations where successes and failures are divided by a ratio. It can be seen as a distribution for obtaining a specified number of successes and failures from an initialized population.

### Properties of Negative Hypergeometric Distribution:
- It occurs when sampling is done from a finite population.
- It is a discrete random variable where there are two possible outcomes (success or failure), and it models the distribution of successful items in a fixed number of trials.
- The sampling is done without replacement, meaning once an item is selected, it is not replaced.

### Probability Mass Function (PMF) of Negative Hypergeometric Distribution
The probability mass function (PMF) of the negative hypergeometric distribution is as follows:

$$
P(X = k) = \frac{\binom{r}{k} \binom{N - r}{n - k}}{\binom{N}{n}}
$$

Where:
- $X$ is the number of successful items in the sample,
- $r$ is the total number of successes in the population,
- $N$ is the total population size (the sum of successful and failed items),
- $n$ is the sample size (the number of items drawn),
- $k$ is the number of successes in the sample.

This function calculates the probability of drawing $k$ successes from a sample of size $n$ out of a population of size $N$, where there are $r$ successes in the population. It uses combinations to calculate the number of ways to select the successes and failures.

#### Example: Drawing Joker Cards from a Deck
For example, imagine a deck of 52 cards where 10 of them are jokers. We are drawing 5 cards from the deck and want to calculate the probability of drawing a certain number of jokers. In this case, we have:
- $N = 52$ (the total number of cards),
- $r = 10$ (the number of jokers),
- $n = 5$ (the number of cards drawn).

We would use the negative hypergeometric distribution to find the probability of drawing a specific number of jokers.

### Relationship to Other Distributions
The negative hypergeometric distribution is closely related to the **Hypergeometric Distribution**. The hypergeometric distribution models the probability of a certain number of successes in a fixed sample size drawn from a population, while the negative hypergeometric distribution can be seen as a special case of the hypergeometric distribution where the number of failures is fixed.

Unlike distributions like the **Bernoulli** or **Binomial Distribution**, which assume sampling with replacement, the negative hypergeometric distribution deals with **sampling without replacement**.

### Summary:
- The negative hypergeometric distribution models the number of successes in a fixed number of trials when sampling without replacement.
- The **Probability Mass Function (PMF)** calculates the probability of drawing exactly $k$ successes from a sample of size $n$:
$$
P(X = k) = \frac{\binom{r}{k} \binom{N - r}{n - k}}{\binom{N}{n}}
$$
- It is closely related to the hypergeometric distribution and is suited for situations where sampling without replacement is involved.


## Hypergeometric Distribution

The **Hypergeometric Distribution** is a discrete probability distribution that models the number of successes in a sample drawn without replacement from a finite population that contains both successes and failures. The key aspect is that the sampling is done **without replacement**, meaning once an item is selected, it is not returned to the population.

In simple terms, the hypergeometric distribution is used to find the probability of a certain number of successes in a fixed number of trials, where the population has two categories of items (e.g., successes and failures).

### Properties of the Hypergeometric Distribution:
- Sampling is done **without replacement**, meaning once an item is selected, it cannot be selected again.
- The population consists of two types of items, such as successful items (e.g., 'joker cards') and non-successful items (e.g., 'regular cards').
- The distribution models the probability of drawing a certain number of successes from the sample.

The **Hypergeometric Distribution** is similar to the **Binomial Distribution**, but the main difference is that the binomial distribution assumes **sampling with replacement**, while the hypergeometric distribution assumes **sampling without replacement**.

### Probability Mass Function (PMF) of Hypergeometric Distribution
The probability mass function (PMF) of the hypergeometric distribution is given by:

$$
P(X = k) = \frac{\binom{r}{k} \binom{N - r}{n - k}}{\binom{N}{n}}
$$

Where:
- $X$ is the number of successes in the sample,
- $N$ is the total population size (the sum of successful and non-successful items),
- $r$ is the total number of successes in the population,
- $n$ is the sample size (the number of items drawn),
- $k$ is the number of successes in the sample.

This function calculates the probability of drawing exactly $k$ successes from a sample of size $n$ out of a population of size $N$, where there are $r$ successes in the population. The formula uses combinations to calculate the number of ways to select successes and failures from the population.

#### Example: Drawing Joker Cards from a Deck
For example, consider a deck of 52 cards, where 12 of them are jokers. If we draw 5 cards from the deck, the probability of drawing a certain number of jokers can be computed. In this case:
- $N = 52$ (total number of cards),
- $r = 12$ (number of jokers),
- $n = 5$ (number of cards drawn).

The hypergeometric distribution allows us to calculate the probability of drawing exactly $k$ jokers from the 5 drawn cards.

### Relationship to Other Distributions
The hypergeometric distribution is closely related to the **Binomial Distribution**. However, there is a key difference between the two:
- The **Binomial Distribution** assumes that each trial is independent and the probability of success remains constant for each trial. This is the case when sampling **with replacement**, meaning that each selection does not change the total population or the probability of success.
- The **Hypergeometric Distribution** is used for **sampling without replacement**, meaning the population size and the success probability change after each draw.

Additionally, the **Hypergeometric Distribution** can be considered a special case of the **Negative Hypergeometric Distribution**.

### Mean and Variance of the Hypergeometric Distribution
The mean and variance of the hypergeometric distribution are as follows:

#### Mean (Expected Value):
$$
E(X) = \frac{n \cdot r}{N}
$$

#### Variance:
$$
Var(X) = \frac{n \cdot r \cdot (N - r) \cdot (N - n)}{N^2 \cdot (N - 1)}
$$

Where:
- $E(X)$ is the expected value (mean) of successes in the sample,
- $Var(X)$ is the variance of the number of successes in the sample.

### Summary:
- The hypergeometric distribution models the number of successes in a sample drawn without replacement from a population.
- It is computed using combinations to determine the likelihood of obtaining a specific number of successes.
- The key difference from the binomial distribution is that the hypergeometric distribution assumes **sampling without replacement**.


**Gamma Distribution** is one of the continuous probability distributions defined for non-negative real values. This distribution is often used to model continuous variables that are non-negative, such as waiting times or the time until a success occurs. For example, it is useful for modeling the waiting time for a phone call to come in, the time until a machine breaks down, or the time until a task is completed in a queue.

### Definition of Gamma Distribution
The Gamma distribution is defined by the **shape parameter** $k$ and the **scale parameter** $\theta$. The probability density function (PDF) is given by:

$$
f(x; k, \theta) = \frac{x^{k-1} e^{-x/\theta}}{\Gamma(k) \theta^k}, \quad x \geq 0
$$

Where:
- $x$ is a non-negative real random variable.
- $k > 0$: shape parameter.
- $\theta > 0$: scale parameter.
- $\Gamma(k)$ is the Gamma function.

### Parameters of Gamma Distribution
- **Shape parameter** $k$: Determines the "shape" of the distribution. As $k$ increases, the distribution becomes flatter and more spread out.
- **Scale parameter** $\theta$: Determines the "scale" or the width of the distribution. As $\theta$ increases, the distribution becomes wider.

### Mean and Variance of Gamma Distribution
The mean and variance of the Gamma distribution are as follows:

- **Mean**:
  $$
  \mu = k \cdot \theta
  $$

- **Variance**:
  $$
  \sigma^2 = k \cdot \theta^2
  $$

Where $k$ and $\theta$ are the shape and scale parameters, respectively.

### Examples of Using Gamma Distribution
1. **Modeling Waiting Times**  
   For example, the time it takes for a phone call to be connected after a customer calls a call center can be modeled using a Gamma distribution. In this case, the shape parameter $k$ could represent the "number of attempts" until the call is connected.

2. **Lifespan of Machinery**  
   The time until a machine or equipment breaks down can be modeled using the Gamma distribution. In this case, the shape parameter could be related to the number of factors influencing the machine's failure.

3. **Modeling Poisson Processes**  
   The Gamma distribution is connected to the Poisson distribution. While the Poisson distribution models the number of events occurring within a fixed interval of time, the Gamma distribution models the time intervals between these events. For example, it can model the time it takes for a phone call to arrive.

### Properties of Gamma Distribution
- When the **shape parameter** $k = 1$, the Gamma distribution becomes the **Exponential distribution**. That is, the Gamma distribution generalizes the Exponential distribution.
  
- When the shape parameter $k$ is an integer, the Gamma distribution is useful for modeling the sum of independent events with a fixed number of failures before success. This can extend the **Geometric distribution** to the Gamma distribution.

### Examples of Gamma Distribution

1. **Gamma Distribution with $k = 2, \theta = 1$**  
   When the shape parameter $k = 2$ and the scale parameter $\theta = 1$, the Gamma distribution models the situation of two failures before success. The probability density function (PDF) is:

   $$
   f(x; 2, 1) = x e^{-x}, \quad x \geq 0
   $$

2. **Gamma Distribution with $k = 3, \theta = 2$**  
   When the shape parameter $k = 3$ and the scale parameter $\theta = 2$, the Gamma distribution models a system with three independent failures before success. In this case, the mean is:

   $$
   \mu = 3 \cdot 2 = 6
   $$

   And the variance is:

   $$
   \sigma^2 = 3 \cdot 2^2 = 12
   $$



### 1. What is the Gamma Function?
The Gamma function, $\Gamma(n)$, is defined as:

$$
\Gamma(n) = (n-1)!
$$

However, since the factorial function is only defined for integers, the Gamma function extends this to real and complex values. In other words, the Gamma function is defined for real and complex numbers.

### 2. Factorial and Gamma Function
While the factorial is typically defined for natural numbers, the Gamma function can compute values for real and complex numbers.

For example:

$$
3! = 3 \times 2 \times 1 = 6
$$

But the Gamma function is defined for non-integer values as well, such as:

$$
\Gamma(4) = 3! = 6
$$

Thus, the Gamma function behaves the same as the factorial function for natural numbers.

### 3. Definition of the Gamma Function (Extension to Real and Complex Numbers)
The Gamma function is defined as:

$$
\Gamma(x) = \int_0^\infty t^{x-1} e^{-t} \, dt
$$

This definition can be applied to real and complex numbers, extending the factorial to non-integer values.

### 4. Relationship Between Gamma Function and Factorial
For an integer $n$, the Gamma function is related to the factorial:

$$
\Gamma(n) = (n-1)!
$$

For example:
- $$\Gamma(1) = 0! = 1$$
- $$\Gamma(2) = 1! = 1$$
- $$\Gamma(3) = 2! = 2$$
- $$\Gamma(4) = 3! = 6$$

### 5. Characteristics of the Gamma Function
The Gamma function can be defined for real and complex values, making it a generalization of the factorial function to non-integer values. It can be useful for calculating values at specific real points.

#### Example: Using the Gamma Function

1. **Gamma Function for Non-Integer Values**  
   The Gamma function can also be computed for non-integer values. For example, what is:

   $$ 
   \Gamma(0.5)
   $$

   The value of this is:

   $$ 
   \Gamma(0.5) = \sqrt{\pi}
   $$

   Thus, the Gamma function can give meaningful results for real values as well.

2. **Gamma Function in Probability Distributions**  
   The Gamma function is used in several probability distributions, such as the Gamma distribution and Beta distribution. The Gamma distribution, which is often related to continuous probability distributions concerning "success" or "failure," requires the Gamma function in its calculations.

### Summary
- The factorial is only defined for natural numbers, but the Gamma function is defined for real and complex numbers.
- The Gamma function can be thought of as an extension of the factorial, with the relationship: 

$$ 
\Gamma(n) = (n-1)!
$$

- The Gamma function plays an important role in fields like probability theory, statistics, and physics.


## Negative Binomial Distribution

The **Negative Binomial distribution** is a discrete probability distribution that models the number of failures that occur before a fixed number of successes. It is a generalized form of the binomial distribution and is used to model the number of failures before reaching a certain number of successes in a sequence of success-failure experiments.

### Definition of Negative Binomial Distribution

The Negative Binomial distribution models the number of failures occurring before a fixed number of successes. For example, it can represent the number of failures before obtaining 3 heads in a coin toss.

### Probability Mass Function (PMF) of the Negative Binomial Distribution

The probability mass function (PMF) of the Negative Binomial distribution is given by:

$$ P(X = k) = \binom{k + r - 1}{k} p^r (1 - p)^k, \quad k = 0, 1, 2, \dots $$

Where:
- #X# is the number of failures (failures before the #r# successes occur).
- #r# is the fixed number of successes.
- #p# is the probability of success in each trial.
- #k# is the number of failures.

In this formula, the term $\binom{k + r - 1}{k}$ is a combination that calculates the number of ways to arrange #k# failures and #r# successes.

### Parameters of the Negative Binomial Distribution

- #r#: The number of successes required (fixed).
- #p#: The probability of success in each trial.

### Mean and Variance of the Negative Binomial Distribution

- **Mean (Expected value):**
  $$ E[X] = \frac{r(1 - p)}{p} $$

- **Variance:**
  $$ \text{Var}(X) = \frac{r(1 - p)}{p^2} $$

Therefore, the mean and variance of the Negative Binomial distribution depend on the number of successes #r# and the probability of success #p#.

### Examples of Negative Binomial Distribution

- **Failures before 3 calls are received:**
  If the probability of receiving a call is #p = 0.2#, the distribution can model the number of failures (calls missed) before 3 calls are received.

- **Failures before 5 heads are obtained in a coin toss:**
  If the probability of heads in a coin toss is #p = 0.5#, the distribution can model the number of failures (tails) before 5 heads are obtained.

- **Failures before 3 product failures in a factory:**
  In a factory where the probability of a product failure is #p = 0.1#, the distribution can model the number of inspections (failures) before 3 product failures occur.

### Difference Between Binomial and Negative Binomial Distributions

- The **Binomial distribution** models the number of successes in a fixed number of trials.
- The **Negative Binomial distribution** models the number of failures before a fixed number of successes.

### Relationship Between Negative Binomial and Poisson Distributions

The Negative Binomial distribution can be seen as a generalized form of the Poisson distribution. While the Poisson distribution deals with the count of events in a fixed interval, the Negative Binomial distribution models the number of failures before a specified number of successes occurs.


# Exponential Distribution

The **Exponential distribution** is a continuous probability distribution commonly used to model the time between events or the time until the next event occurs in a process with a constant average rate. Examples include the time until a phone call is received, the time until a machine breaks down, or the time between patient arrivals at a hospital.

### Characteristics of Exponential Distribution

- **Continuous probability distribution**: The Exponential distribution is used to model continuous random variables and the probability of events occurring within a specific interval.
- **Memoryless property**: The Exponential distribution is memoryless, meaning the probability of an event occurring in the future is independent of past events. In other words, "If the event has not occurred yet, the probability of it happening in the next interval is the same as it was before."

### Probability Density Function (PDF) of the Exponential Distribution

The probability density function (PDF) of the Exponential distribution is given by:

$$ f(x; \lambda) = \lambda e^{-\lambda x}, \quad x \geq 0 $$

Where:
- #x# is the time between events (the waiting time for the next event).
- #λ# is the rate parameter (the average number of events per unit time).
- #e# is the natural constant (approximately 2.718).

### Parameter of the Exponential Distribution

- **λ (lambda)**: The rate parameter, which represents the average number of events occurring per unit time. For example, if #λ = 2#, this means on average, 2 events happen per unit time.

### Mean and Variance of the Exponential Distribution

- **Mean (Expected value):**
  $$ E[X] = \frac{1}{\lambda} $$

- **Variance:**
  $$ \text{Var}(X) = \frac{1}{\lambda^2} $$

Thus, the mean of the Exponential distribution is #1/λ#, and the variance is #1/λ²#.

### Examples of Exponential Distribution

- **Time until a phone call is received**:
  If, on average, 3 calls are received per hour, the time between two consecutive calls can be modeled by an Exponential distribution with #λ = 3#.

- **Time until a machine breaks down**:
  The time between machine breakdowns can also be modeled by an Exponential distribution. If a machine breaks down on average 2 times per day, then #λ = 2#.

- **Time between patient arrivals at a hospital**:
  If, on average, 5 patients arrive per hour, the time between patient arrivals can be modeled by an Exponential distribution with #λ = 5#.

### Memoryless Property

One of the key characteristics of the Exponential distribution is its memorylessness. This means that the probability of an event occurring after a certain time #t# has already passed is independent of #t#. In mathematical terms:

$$ P(X > t + s | X > t) = P(X > s) $$

This property implies that if no event has occurred in time #t#, the probability of an event occurring in the next interval is the same as it would have been at the beginning.

### Relationship Between Exponential and Poisson Distributions

The Exponential distribution and the Poisson distribution are closely related. The Poisson distribution models the number of events occurring in a fixed period of time, whereas the Exponential distribution models the time between these events.

For example, if events occur on average 3 times per unit time, the time between events follows an Exponential distribution, and the number of events occurring in that time follows a Poisson distribution.


# Poisson Distribution

The **Poisson distribution** is an important discrete probability distribution in probability theory. It is used to model the number of rare events occurring within a fixed period of time or space. In other words, it models the number of occurrences of events within a given unit of time, area, or volume.

### Characteristics of Poisson Distribution

- **Discrete probability distribution**: The Poisson distribution models the number of discrete events that occur, such as the number of phone calls, breakdowns, or accidents.
- **Rare events**: The distribution is suitable for modeling events that occur infrequently, such as "2 phone calls in an hour" or "3 accidents in a day."
  
### Probability Mass Function (PMF) of the Poisson Distribution

The probability mass function (PMF) of the Poisson distribution is given by:

$$ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, 3, \dots $$

Where:
- #X# is the number of events occurring (e.g., the number of occurrences of a specific event).
- #λ (lambda)# is the average number of events occurring in a unit of time or space (the average rate of occurrence).
- #k# is the number of events that occurred (an integer value 0 or greater).
- #e# is the natural constant (approximately 2.718).

### Parameter of the Poisson Distribution

- **λ (lambda)**: The mean and variance of the Poisson distribution, representing the average number of events occurring in a fixed interval of time or space. For example, if an average of 3 phone calls are received per hour, #λ = 3#.

### Mean and Variance of the Poisson Distribution

- **Mean (Expected value):**
  $$ E[X] = \lambda $$

- **Variance:**
  $$ \text{Var}(X) = \lambda $$

Thus, the mean and variance of the Poisson distribution are both equal to #λ#.

### Examples of Poisson Distribution

- **Number of phone calls received**:
  In a telephone exchange, the number of calls received in a fixed time period can be modeled using the Poisson distribution. For example, if an average of 5 calls are received per hour, #λ = 5#.

- **Number of website visitors**:
  The number of visitors to a website in a fixed time period can also be modeled using the Poisson distribution. For example, if 100 visitors arrive per hour on average, #λ = 100#.

- **Number of machine breakdowns**:
  The number of machine breakdowns in a factory can be modeled using the Poisson distribution. For example, if an average of 3 breakdowns occur per day, #λ = 3#.

- **Number of accidents at an intersection**:
  The number of accidents occurring at a specific intersection in a given time period can be modeled using the Poisson distribution. For example, if an average of 2 accidents occur per day, #λ = 2#.

### Relationship Between Poisson and Exponential Distributions

The Poisson distribution and the Exponential distribution are closely related. The Exponential distribution models the time between events, while the Poisson distribution models the number of events in a fixed period of time.

In other words, the Poisson distribution models the number of occurrences of an event, while the Exponential distribution models the time intervals between these events. For example, the time between phone calls follows an Exponential distribution, and the number of phone calls received within a fixed time period follows a Poisson distribution.


# Beta Distribution
The Beta distribution is a continuous probability distribution that represents the distribution of a random variable that lies between 0 and 1. It is commonly used to model probabilities or proportions.

### Definition of the Beta Distribution
The Beta distribution is defined by two parameters, $\alpha$ (a) and $\beta$ (b), and its probability density function (PDF) is as follows:

$$
f(x; \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \, x^{\alpha-1} (1 - x)^{\beta-1}
$$

Where:
- $x$: A probability variable, $0 \leq x \leq 1$.
- $\alpha, \beta > 0$: Shape parameters.
- $\Gamma$: The Gamma function, which can be understood as a generalized factorial.

### The Role of Parameters $\alpha$ and $\beta$
- **$\alpha$ (a)**: Represents the number of successes.
- **$\beta$ (b)**: Represents the number of failures.
- The shape of the distribution changes depending on the values of $\alpha$ and $\beta$.

### Intuitive Understanding
- **$\alpha > 1$, $\beta > 1$**: The distribution is concentrated around the middle (near 0.5).
- **$\alpha < 1$, $\beta < 1$**: The distribution is concentrated near the endpoints (near 0 or 1).
- **$\alpha = \beta$**: The distribution is symmetric.

### Mean and Variance of the Beta Distribution
- **Mean**:

$$
\text{Mean} = \frac{\alpha}{\alpha + \beta}
$$

This represents the proportion of successes.

- **Variance**:

$$
\text{Variance} = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}
$$

As the sample size increases (i.e., as $\alpha + \beta$ increases), the variance decreases.

### Applications of the Beta Distribution
1. **Probability Modeling (Especially Bayesian Analysis)**
   - The Beta distribution is often used as a prior distribution for probabilities (e.g., click-through rates or success rates).
   - Example: Modeling beliefs such as "The probability of a click rate being 0.3 is high."

2. **Bayesian Update**
   - The Beta distribution is updated as new data is observed to calculate new posterior probabilities.
   - $\alpha$ is the number of successes, and $\beta$ is the number of failures.

3. **A/B Testing**
   - The Beta distribution is used to compare the click-through rates or success rates between two groups.


## Deviance and Pearson Residual

Deviance and Pearson residual are important metrics used to evaluate the goodness of fit and diagnose the performance of a model. They are primarily used in regression analysis or **Generalized Linear Models (GLM)** to assess how well the model explains the data.

### 1. Deviance

Deviance is a metric for model fit that measures how well the model fits the data, i.e., how well the model explains the data.

#### **Definition of Deviance**

Deviance is calculated based on the **log-likelihood** and measures the difference between the **saturated model** (the perfect model) and the **fitted model** (the model we actually applied).

- **Saturated model**: The model that includes all possible variables and best explains the data.
- **Fitted model**: The model that we actually apply to the data.

Deviance is calculated as the difference in log-likelihoods between the two models:

$$
D = -2 \times (\text{Log-Likelihood of the Fitted Model} - \text{Log-Likelihood of the Saturated Model})
$$

Where:
- **Log-Likelihood of the Fitted Model**: The log-likelihood of the model we fitted.
- **Log-Likelihood of the Saturated Model**: The log-likelihood of the saturated model.

**A smaller Deviance value indicates that the model fits the data well.**

---

### 2. Pearson Residual

**Pearson Residual** is a standardized value representing the difference between the observed value and the predicted value, showing how well the model predicts each data point.

#### **Definition of Pearson Residual**

The Pearson residual is calculated as:

$$
r_i = \frac{y_i - \hat{y}_i}{\sqrt{\hat{V}(\hat{y}_i)}}
$$

Where:
- $y_i$: The observed value
- $\hat{y}_i$: The predicted value from the model
- $\hat{V}(\hat{y}_i)$: The variance of the predicted value (uncertainty of the prediction)

#### **Interpretation of Pearson Residual**

- **Pearson residuals close to 0** indicate that the model predicts the data well.
- **Positive residuals** mean that the observed value is greater than the predicted value.
- **Negative residuals** mean that the observed value is smaller than the predicted value.
- **Larger absolute residuals** indicate that the model does not explain the data point well.

#### **Use of Pearson Residual**

Pearson residuals are useful for checking how well the model explains each observation. If most Pearson residuals are close to 0, the model is a good fit. However, data points with large Pearson residuals may be outliers that the model does not explain well.

---

### 3. Difference Between Deviance and Pearson Residual

- **Deviance** is used to evaluate the overall goodness of fit of the model. It is a metric that assesses the quality of the entire model.
- **Pearson residual** is used to evaluate how well each individual data point is explained by the model. It is a diagnostic for individual predictions.

---

### Example

- If the **Deviance** value is small, the model fits the data well overall, which is useful for model fit tests.
- If a **Pearson residual** is large for a data point, it suggests that the data point is an outlier that the model does not explain well. This may help improve the model.

---

### Conclusion

- **Deviance** provides an overall evaluation of the model fit and gives general information about how well the model explains the data.
- **Pearson residual** provides information about how well each observation fits the model and is useful for identifying outliers.

Therefore, **Deviance** and **Pearson residual** should be used together to evaluate the overall model fit and analyze individual observations when necessary.