# 📘 Probability and Information Theory — Knowledge Points and Exercises

Use this chapter to review core probability tools that appear throughout modern deep learning. Concepts are paired with exercises so you can implement the ideas in code.


## 1. Probability Basics

- Random variables: discrete vs. continuous
- Probability distributions:
  - Probability Mass Function (PMF)
  - Probability Density Function (PDF)
- Key rules:
  - Joint probability $P(X, Y)$
  - Marginal probability $P(X)$
  - Conditional probability $P(X\mid Y)$
  - Independence and conditional independence


## 2. Common Distributions

- **Bernoulli**: binary outcomes
- **Binomial**: number of successes in fixed trials
- **Multinomial**: multi-class outcomes
- **Gaussian (Normal)**: continuous, bell-shaped
- **Poisson** and **Exponential**: rare events, waiting times


## 3. Expectation and Variance

- **Expectation** (mean): average value of a random variable  
  $\displaystyle \mathbb{E}[X] = \sum_x x P(x)$ (discrete),  
  $\displaystyle \mathbb{E}[X] = \int x p(x)\,dx$ (continuous)
- **Variance**: measure of spread around the mean  
  $\displaystyle \text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]$
- **Covariance and correlation**: measure linear relationships between two variables

## 4. Bayes' Theorem

$\displaystyle P(A\mid B) = \frac{P(B\mid A)P(A)}{P(B)}$

- Updates belief about event $A$ given evidence $B$
- Foundation of Bayesian inference


## 5. Information Theory

- Information content: $I(x) = -\log P(x)$
- Entropy: $H(X) = -\sum_x P(x)\log P(x)$
- Conditional entropy: $H(X\mid Y)$
- Cross-entropy: $H(P, Q) = -\sum_x P(x)\log Q(x)$
- KL divergence: $D_{KL}(P\Vert Q) = \sum_x P(x) \log \frac{P(x)}{Q(x)}$


## 6. Applications in Machine Learning

- Maximum Likelihood Estimation (MLE)
- Bayesian inference
- Loss functions based on information theory:
  - Cross-entropy loss
  - KL divergence (e.g., in variational inference)
- Probabilistic modeling of uncertainty


---

# Exercises


## Exercise 1: Probability Basics
A fair coin is flipped twice.
1. List the sample space.
2. Compute $P(\text{at least one head})$.
3. Compute $P(\text{first flip is tail} \mid \text{total heads} = 1)$.


In [None]:
# TODO: implement Exercise 1 calculations
print("The sample space is:HH HT TH TT ")
print("P(at least one head):",1-0.25)
print("P(first flip is tail | total heads = 1):",0.5)


## Exercise 2: Gaussian Distribution
Let $X \sim \mathcal{N}(0, 1)$.
1. What is $P(X \leq 0)$?
2. Compute the expectation $\mathbb{E}[X]$.
3. Compute the variance $Var(X)$.


In [None]:
# TODO: implement Exercise 2 calculations


## Exercise 3: Bayes' Theorem
A disease affects 1% of a population.
- Test sensitivity: 99%
- False positive rate: 5%

If a patient tests positive:
1. Compute $P(\text{disease} \mid \text{positive})$.
2. Interpret the result.


In [None]:
# TODO: implement Exercise 3 calculations


## Exercise 4: Entropy
A random variable $X$ has distribution: $P(X=0)=0.25$, $P(X=1)=0.25$, $P(X=2)=0.5$.
1. Compute entropy $H(X)$.
2. Which outcome contributes most to entropy?


In [None]:
# TODO: implement Exercise 4 calculations


## Exercise 5: Cross-Entropy and KL Divergence
True distribution $P = (0.7, 0.3)$.
Predicted distribution $Q = (0.6, 0.4)$.
1. Compute cross-entropy $H(P, Q)$.
2. Compute KL divergence $D_{KL}(P\Vert Q)$.


In [None]:
# TODO: implement Exercise 5 calculations
