## Bayes' Theorem

Bayes’ theorem describes the probability of an event based on prior knowledge of conditions related to the event:

$$ P(A | B) = \frac{P(B | A) P(A)}{P(B)} $$

**Example: Diagnosing a Disease**

* $P(D) = 0.01$ (Probability of having the disease)
* $P(\neg D) = 0.99$ (Probability of not having the disease)
* $P(T | D) = 0.95$ (Probability of testing positive given disease)
* $P(T | \neg D) = 0.05$ (False positive rate)

Using Bayes’ theorem, the probability that a person who tested positive actually has the disease is:

$$ P(D | T) = \frac{P(T | D) P(D)}{P(T | D) P(D) + P(T | \neg D) P(\neg D)} $$

$$ = \frac{(0.95 \times 0.01)}{(0.95 \times 0.01) + (0.05 \times 0.99)} = 0.161 $$

In [None]:
# Given probabilities
P_D = 0.01
P_not_D = 0.99
P_T_given_D = 0.95
P_T_given_not_D = 0.05

# Bayes' Theorem calculation
P_D_given_T = (P_T_given_D * P_D) / (P_T_given_D * P_D + P_T_given_not_D * P_not_D)
print(f"P(D | T) = {P_D_given_T:.3f}")


## Different Types of Probability Distributions in Deep Learning

Probability distributions play a fundamental role in deep learning for modeling uncertainties, defining priors in Bayesian networks, and understanding data behavior. Below are important distributions with mathematical definitions and deep learning applications.

**1. Bernoulli Distribution (Binary Outcomes)**

* Used for modeling binary events (e.g., success/failure, 0/1).
* Probability mass function (PMF):

    $$ P(X = x) = p^x (1 - p)^{(1-x)} $$

    where $x \in \{0, 1\}$, and $p$ is the probability of success.

* **Deep Learning Application:**
    * Used in binary classification problems and dropout regularization.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import bernoulli

p = 0.5  # Probability of success
x = [0, 1]
y = bernoulli.pmf(x, p)

plt.bar(x, y)
plt.title("Bernoulli Distribution (p=0.5)")
plt.show()


**2. Binomial Distribution**

* Describes the number of successes in $n$ independent Bernoulli trials.
* Probability mass function (PMF):

    $$ P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k} $$

* **Deep Learning Application:**
    * Used in classification models where multiple independent trials occur.

In [None]:
from scipy.stats import binom

n, p = 10, 0.5
x = np.arange(0, n + 1)
y = binom.pmf(x, n, p)

plt.bar(x, y)
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.show()


**3. Normal (Gaussian) Distribution**

* Defined by mean $\mu$ and variance $\sigma^2$.
* Probability density function (PDF):

    $$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$

* **Deep Learning Application:**
    * Used for weight initialization and Bayesian deep learning.

In [None]:
from scipy.stats import norm

mu, sigma = 0, 1
x = np.linspace(-4, 4, 100)
y = norm.pdf(x, mu, sigma)

plt.plot(x, y)
plt.title("Normal Distribution (mu=0, sigma=1)")
plt.show()


**4. Poisson Distribution**

* Models the number of events occurring in a fixed interval.
* Probability mass function:

    $$ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} $$

* **Deep Learning Application:**
    * Used in event prediction models (e.g., customer arrivals in reinforcement learning).

In [None]:
from scipy.stats import poisson

lambda_ = 5
x = np.arange(0, 15)
y = poisson.pmf(x, lambda_)

plt.bar(x, y)
plt.title("Poisson Distribution (lambda=5)")
plt.show()


**5. Exponential Distribution**

* Describes the time until an event occurs.
* Probability density function:

    $$ f(x) = \lambda e^{-\lambda x}, \quad x \geq 0 $$

* **Deep Learning Application:**
    * Used in queueing models and failure rate analysis.

In [None]:
from scipy.stats import expon

lambda_ = 1
x = np.linspace(0, 5, 100)
y = expon.pdf(x, scale=1/lambda_)

plt.plot(x, y)
plt.title("Exponential Distribution (lambda=1)")
plt.show()


**6. Softmax Distribution**

* Converts a vector into probabilities that sum to 1.
* Formula:

    $$ P(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $$

* **Deep Learning Application:**
    * Used in the output layer of neural networks for multi-class classification.

In [None]:
def softmax(x):
    exp_x = np.exp(x - np.max(x))  # Numerical stability
    return exp_x / exp_x.sum()

x = np.array([2.0, 1.0, 0.1])
y = softmax(x)

print(f"Softmax Probabilities: {y}")


## Expectation and Variance

**Expectation:**

$$ E[X] = \sum x P(X = x) $$

**Variance:**

$$ Var(X) = E[X^2] - (E[X])^2 $$

**Example: Discrete Random Variable**

Consider a random variable $X$ representing the outcome of rolling a fair 6-sided die:

$$ X = \{1, 2, 3, 4, 5, 6\} $$

Each outcome occurs with probability:

$$ P(X = x) = \frac{1}{6}, \quad \text{for } x \in \{1, 2, 3, 4, 5, 6\} $$

**Step 1: Compute Expectation $E[X]$**

Expectation is given by:

$$ E[X] = \sum x P(X = x) $$

Substituting values:

$$ E[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} $$

$$ = \frac{1+2+3+4+5+6}{6} = \frac{21}{6} = 3.5 $$

**Step 2: Compute $E[X^2]$**

$$ E[X^2] = \sum x^2 P(X = x) $$

$$ E[X^2] = 1^2 \cdot \frac{1}{6} + 2^2 \cdot \frac{1}{6} + 3^2 \cdot \frac{1}{6} + 4^2 \cdot \frac{1}{6} + 5^2 \cdot \frac{1}{6} + 6^2 \cdot \frac{1}{6} $$

$$ = \frac{1+4+9+16+25+36}{6} = \frac{91}{6} = 15.167 $$

**Step 3: Compute Variance $Var(X)$**

$$ Var(X) = E[X^2] - (E[X])^2 $$

$$ = 15.167 - (3.5)^2 $$

$$ = 15.167 - 12.25 = 2.917 $$

In [None]:
import numpy as np

X = np.array([1, 2, 3, 4, 5, 6])
P = np.ones_like(X) / len(X)  # Uniform probability

E_X = np.sum(X * P)  # Expectation
E_X2 = np.sum(X**2 * P)  # Expectation of X^2
Var_X = E_X2 - E_X**2  # Variance

print(f"Expectation (E[X]): {E_X}")
print(f"Expectation of X^2 (E[X^2]): {E_X2}")
print(f"Variance (Var(X)): {Var_X}")


## Markov Chains and Transition Matrix Explanation

A Markov Chain is a stochastic process where the probability of transitioning to a future state depends only on the current state and not on past states. This property is known as the Markov Property (memoryless property).

**Understanding the Transition Matrix P**

Given the transition matrix:

$$ P = \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix} $$

This matrix represents probabilities of moving between two states (e.g., State 1 and State 2).

* Each row represents the current state, and each column represents the next state.
* Each row must sum to 1 because it represents probabilities of transitioning from one state to others.

**Interpreting the Transition Probabilities**

$$ P = \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix} $$

* **Row 1 (Current State = 1):**
    * 0.7 → Probability of staying in State 1.
    * 0.3 → Probability of transitioning from State 1 to State 2.
* **Row 2 (Current State = 2):**
    * 0.4 → Probability of transitioning from State 2 to State 1.
    * 0.6 → Probability of staying in State 2.

**Example Calculation: One-Step Transition**

Let's say the initial state probabilities are:

$$ s_0 = \begin{bmatrix} 1 & 0 \end{bmatrix} $$

This means we start in State 1 with 100% certainty.

After one step, the new state probabilities are:

$$ s_1 = s_0 P = \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix} $$

$$ = \begin{bmatrix} (1 \times 0.7 + 0 \times 0.4) & (1 \times 0.3 + 0 \times 0.6) \end{bmatrix} $$

$$ = \begin{bmatrix} 0.7 & 0.3 \end{bmatrix} $$

Thus, after one step:

* 70% probability of being in State 1
* 30% probability of being in State 2

In [None]:
import numpy as np

# Transition matrix
P = np.array([[0.7, 0.3],
              [0.4, 0.6]])

# Initial state (100% probability in State 1)
s0 = np.array([1, 0])

# Compute next state probabilities
s1 = np.dot(s0, P)

print(f"State probabilities after one step: {s1}")
