# Probability and Statistics for Machine Learning: Expectation, Variance, and Covariance

## 4. Expectation, Variance, and Covariance


### What is Expectation?

The expectation (or expected value) of a random variable represents the long-run average value of repetitions of the experiment. For a discrete random variable \( X \), the expectation is given by:

\[
E(X) = \sum_{i} P(x_i) \cdot x_i
\]

For a continuous random variable, the expectation is:

\[
E(X) = \int_{-\infty}^{\infty} x \cdot f(x) \, dx
\]

### Example: Expectation of a Dice Roll

The expected value of a fair six-sided die roll is:

\[
E(X) = rac{1}{6}(1 + 2 + 3 + 4 + 5 + 6) = 3.5
\]
    

In [None]:

# Example: Calculating the expected value of a fair six-sided die roll
values = np.array([1, 2, 3, 4, 5, 6])
probabilities = np.full(6, 1/6)

expected_value = np.sum(values * probabilities)
expected_value
    


### What is Variance?

Variance measures the spread or dispersion of a random variable around its mean. It is the expected value of the squared difference between the random variable and its mean:

\[
Var(X) = E[(X - E(X))^2]
\]

For a discrete random variable, it is:

\[
Var(X) = \sum_{i} P(x_i) \cdot (x_i - E(X))^2
\]

### Example: Variance of a Dice Roll

The variance of a fair six-sided die roll can be calculated using the formula above.
    

In [None]:

# Example: Calculating the variance of a fair six-sided die roll
mean_value = expected_value
variance = np.sum(probabilities * (values - mean_value)**2)
variance
    


### What is Covariance?

Covariance measures how two random variables change together. It indicates whether increases in one variable are associated with increases (or decreases) in another. For two random variables \( X \) and \( Y \), the covariance is given by:

\[
Cov(X, Y) = E[(X - E(X))(Y - E(Y))]
\]

If \( Cov(X, Y) > 0 \), the variables are positively correlated, meaning they tend to increase together. If \( Cov(X, Y) < 0 \), they are negatively correlated.

### Example: Covariance Between Two Variables

Let's calculate the covariance between two random variables with known values.
    

In [None]:

# Example: Calculating covariance between two variables
X = np.array([1, 2, 3])
Y = np.array([4, 5, 6])

cov_matrix = np.cov(X, Y)
covariance_X_Y = cov_matrix[0, 1]  # Covariance between X and Y
covariance_X_Y
    


### Applications in Machine Learning

- **Expectation** is used to calculate averages and expected outcomes in probabilistic models.
- **Variance** is used to understand the spread of data and is an important concept in algorithms like linear regression.
- **Covariance** is used in Principal Component Analysis (PCA) to understand relationships between variables.

    