### Probability Theory

**Probability theory** is a branch of mathematics concerned with the analysis of random phenomena and uncertainty. It provides the foundation for many data science applications, including machine learning, statistical inference, and decision-making under uncertainty.

In this note, we will cover two key areas of probability theory:
1. Basic probability concepts.
2. Probability distributions.

---

### 1. **Basic Probability Concepts**

#### a) **Definition of Probability**
Probability is a measure of the likelihood that a particular event will occur. It is expressed as a number between 0 and 1, where:
- 0 means the event is impossible.
- 1 means the event is certain.

The probability of an event \( A \), denoted as \( P(A) \), is calculated as:

$$P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}}$$


**Example:**
Consider tossing a fair coin. There are two possible outcomes: heads (H) or tails (T). The probability of getting heads is:

$$P(\text{Heads}) = \frac{1}{2}$$


#### b) **Sample Space and Events**
- **Sample Space (S)**: The set of all possible outcomes of an experiment. For example, the sample space of tossing a coin is \( S = \{H, T\} \).
- **Event (A)**: A subset of the sample space, representing a specific outcome or group of outcomes. For example, getting heads is an event \( A = \{H\} \).

#### c) **Types of Events**
- **Independent Events**: Two events are independent if the occurrence of one does not affect the occurrence of the other. For example, tossing two coins is independent since the result of one toss doesn’t affect the other.
- **Dependent Events**: Two events are dependent if the occurrence of one event affects the occurrence of the other. For example, drawing two cards from a deck without replacement is dependent because the outcome of the first draw affects the second.
  
#### d) **Complementary Events**
The complement of an event \( A \), denoted $( A^c )$, consists of all outcomes in the sample space that are not in \( A \). The probability of the complement of an event is:

$$P(A^c) = 1 - P(A)$$

**Example:**
If the probability of it raining today is $$( P(\text{Rain}) = 0.3 )$$, the probability that it won’t rain is:

$$P(\text{No Rain}) = 1 - 0.3 = 0.7$$


#### e) **Addition Rule**
For two mutually exclusive events \( A \) and \( B \), the probability that either \( A \) or \( B \) will occur is:

$$P(A \cup B) = P(A) + P(B)$$


If events \( A \) and \( B \) are not mutually exclusive (i.e., they can happen together), we subtract the probability of both events occurring:

$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$


#### f) **Multiplication Rule**
For two independent events \( A \) and \( B \), the probability that both events occur is:

$$P(A \cap B) = P(A) \times P(B)$$


For dependent events, the conditional probability is used:

$$P(A \cap B) = P(A) \times P(B|A)$$

Where \( P(B|A) \) is the probability of \( B \) occurring given that \( A \) has occurred.

#### g) **Conditional Probability**
Conditional probability is the probability of an event occurring given that another event has already occurred. The conditional probability of \( A \) given \( B \) is written as:

$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

This is used when the outcome of one event affects the probability of another.

**Example:**
Consider a deck of 52 cards. If you draw a card, the probability of it being an Ace is:

$$P(\text{Ace}) = \frac{4}{52}$$

If you know that the card drawn is a Spade (13 Spades in the deck), the conditional probability that it’s the Ace of Spades is:

$$P(\text{Ace} | \text{Spade}) = \frac{1}{13}$$


---

### 2. **Probability Distributions**

A **probability distribution** describes how the probabilities are distributed over the values of a random variable. There are two main types of probability distributions:
- **Discrete Probability Distributions**
- **Continuous Probability Distributions**

#### a) **Discrete Probability Distributions**
A discrete probability distribution deals with discrete random variables — variables that take on a countable number of distinct values. The probability of each value is non-negative and the sum of the probabilities is 1.

##### i) **Binomial Distribution**
A binomial distribution represents the number of successes in a fixed number of independent Bernoulli trials (e.g., flipping a coin \( n \) times). The probability mass function (PMF) is given by:

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$

Where:
- \( n \) is the number of trials.
- \( k \) is the number of successes.
- \( p \) is the probability of success in each trial.

**Example:**
If you flip a fair coin 5 times, the probability of getting exactly 3 heads is:

$$P(X = 3) = \binom{5}{3} \left(\frac{1}{2}\right)^3 \left(\frac{1}{2}\right)^2 = 10 \times 0.125 \times 0.25 = 0.3125$$


##### ii) **Poisson Distribution**
A Poisson distribution models the probability of a given number of events happening in a fixed interval of time or space, given a constant mean rate of occurrence.

The PMF of the Poisson distribution is:

$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$

Where:
- $(\lambda )$ is the average number of events.
- \( k \) is the number of events.

**Example:**
If a customer service center receives an average of 4 calls per hour, the probability of receiving exactly 2 calls in the next hour is:

$$P(X = 2) = \frac{4^2 e^{-4}}{2!} = \frac{16 \times 0.0183}{2} = 0.1465$$


#### b) **Continuous Probability Distributions**
A continuous probability distribution deals with continuous random variables, which take on an infinite number of possible values. The probability of any single value is 0, and we consider the probability of a range of values instead.

##### i) **Normal Distribution**
The normal distribution is the most widely known continuous probability distribution. It is symmetric and bell-shaped, with the mean, median, and mode all equal. The probability density function (PDF) of a normal distribution is given by:

$$f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}$$

Where:
- $( \mu )$ is the mean.
- $( \sigma )$ is the standard deviation.

The normal distribution is important because many real-world phenomena follow this distribution (e.g., heights, test scores).

##### ii) **Exponential Distribution**
The exponential distribution models the time between events in a Poisson process. The PDF of the exponential distribution is:

$$f(x) = \lambda e^{-\lambda x} \text{ for } x \geq 0$$

Where \( \lambda \) is the rate parameter.

**Example:**
If customers arrive at a store at a rate of 5 per hour, the probability that the next customer arrives in 10 minutes (1/6 hour) is:

$$f(1/6) = 5 e^{-5 \times 1/6} = 5 \times 0.4346 = 2.173$$


---

### Summary

Probability theory provides a structured framework for reasoning about uncertainty. Understanding the basic probability concepts, such as independent and conditional events, and applying probability distributions like the binomial and normal distributions, is essential for many applications in data science, especially in machine learning, statistical modeling, and risk analysis.

- **Basic Probability Concepts**: Include sample space, events, and rules for calculating probabilities.
- **Probability Distributions**: Include discrete distributions (e.g., binomial and Poisson) and continuous distributions (e.g., normal and exponential), each with specific applications in real-world scenarios.

In [2]:
import numpy as np
from scipy.stats import binom, norm

# 1. Basic Probability
## Example: Probability of drawing an ace from a deck of 52 cards
total_cards = 52
aces = 4
p_ace = aces / total_cards
print(f"Probability of drawing an Ace from a deck of 52 cards: {p_ace:.2}")


Probability of drawing an Ace from a deck of 52 cards: 0.0769


In [3]:
# 2. Conditional Probability
## Example: Probability of drawing an Ace given that the card is a face card
face_cards = 12  # Includes Kings, Queens, and Jacks (4 each)
p_ace_given_face = aces / face_cards
print(f"Conditional Probability of drawing an Ace given it's a face card: {p_ace_given_face:.2f}")


Conditional Probability of drawing an Ace given it's a face card: 0.33


In [None]:
# 3. Expected Value
## Example: Expected value of rolling a fair 6-sided die
outcomes = [1, 2, 3, 4, 5, 6]
probabilities = [1/6] * 6  # Uniform distribution for a fair die
expected_value = sum([outcome * prob for outcome, prob in zip(outcomes, probabilities)])
print(f"Expected value of rolling a fair 6-sided die: {expected_value}")

In [None]:
# 4. Variance of a Random Variable
## Example: Variance of rolling a fair 6-sided die
mean_outcome = expected_value
variance = sum([(outcome - mean_outcome)**2 * prob for outcome, prob in zip(outcomes, probabilities)])
print(f"Variance of rolling a fair 6-sided die: {variance}")

In [None]:
# 5. Binomial Probability Distribution
## Example: Probability of getting exactly 3 heads in 5 coin flips, where p(heads) = 0.5
n = 5       # Number of trials (flips)
k = 3       # Desired number of successes (heads)
p = 0.5     # Probability of success (heads) in each trial
p_3_heads = binom.pmf(k, n, p)
print(f"Probability of getting exactly 3 heads in 5 flips: {p_3_heads:.2f}")

In [None]:
# 6. Normal Distribution (Probability Density Function and Cumulative Distribution Function)
## Example: Probability of a value being less than or equal to 1.5 for a standard normal distribution
mean = 0
std_dev = 1
x_value = 1.5
p_less_than_x = norm.cdf(x_value, mean, std_dev)
print(f"Probability of a value being <= 1.5 in standard normal distribution: {p_less_than_x:.2f}")

In [None]:
# 7. Simulation of Random Events
## Example: Simulate flipping a fair coin 1000 times and calculate the proportion of heads
np.random.seed(0)  # For reproducibility
n_flips = 1000
flips = np.random.choice(["Heads", "Tails"], size=n_flips, p=[0.5, 0.5])
p_heads_simulated = np.sum(flips == "Heads") / n_flips
print(f"Simulated probability of heads over 1000 flips: {p_heads_simulated:.2f}")


In [4]:
# 8. Law of Large Numbers
## Demonstrating the Law of Large Numbers by averaging dice rolls
n_rolls = 10000
rolls = np.random.randint(1, 7, n_rolls)
average_roll = np.mean(rolls)
print(f"Average outcome after 10,000 dice rolls: {average_roll:.2f}")


Average outcome after 10,000 dice rolls: 3.49
