<a href="https://colab.research.google.com/github/swopnimghimire-123123/Maths_For_ML/blob/main/07_Covariance_%26_Correlation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Colab 7: Covariance & Correlation
` Learning Goals`

- Understand the essence of covariance and correlation.

- Learn how to measure relationships between variables.

- See how these concepts apply in data analysis and ML.

## 1. Covariance

**Essence:** Measures how two variables move together.

**Use Case:** Foundation of PCA, dimensionality reduction, and portfolio analysis in finance.

**Example:**

If height ↑ and weight ↑ → covariance positive.

If height ↑ and test errors ↓ → covariance negative.

But raw values make covariance hard to interpret.

## 2. Correlation

**Essence:** Standardized covariance → always between -1 and 1.

**Use Case:** Quick check for strength of linear relationship between variables.

**Example:**

Correlation ≈ +1 → strong positive (study time vs marks).

Correlation ≈ -1 → strong negative (speed vs time to finish a race).

Correlation ≈ 0 → no clear relationship.

**Big Picture:**
Covariance tells us direction, correlation tells us strength and direction.

## 3. Theory
**Covariance formula:**

$
Cov(X,Y) = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{n-1}
$

**Correlation formula:**

$
Corr(X,Y) = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}
$

In [None]:
### 4. Code Demonstraiton

import numpy as np
import pandas as pd

# Example data: study hours vs exam scores
data = pd.DataFrame({
    'StudyHours': [1, 2, 3, 4, 5],
    'ExamScore': [60, 70, 75, 80, 85]
})

# Covariance Matrix
cov_matrix = data.cov()
print("Covariance Matrix:")
print(cov_matrix)

# Correlation Matrix
corr_matrix = data.corr()
print("\nCorrelation Matrix:")
print(corr_matrix)


Covariance Matrix:
            StudyHours  ExamScore
StudyHours         2.5       15.0
ExamScore         15.0       92.5

Correlation Matrix:
            StudyHours  ExamScore
StudyHours    1.000000   0.986394
ExamScore     0.986394   1.000000


## 5. Practice Problems

Let's create a dataset for "Exercise Hours" and "Weight" and calculate the correlation.

In [None]:
# Create a dataset for Exercise Hours and Weight
exercise_data = pd.DataFrame({
    'ExerciseHours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Weight': [180, 175, 170, 165, 160, 155, 150, 145, 140, 135]
})

# Calculate and print the correlation matrix
exercise_corr_matrix = exercise_data.corr()
print("Correlation Matrix for Exercise Hours and Weight:")
display(exercise_corr_matrix)

Correlation Matrix for Exercise Hours and Weight:


Unnamed: 0,ExerciseHours,Weight
ExerciseHours,1.0,-1.0
Weight,-1.0,1.0


Now, let's simulate random numbers and check their correlation.

In [None]:
# Simulate random numbers
x = np.random.randn(100)
y = 2*x + np.random.randn(100)

# Create a DataFrame and calculate correlation
random_data = pd.DataFrame({'x': x, 'y': y})
random_corr_matrix = random_data.corr()

print("\nCorrelation Matrix for simulated random numbers:")
display(random_corr_matrix)

# What correlation do you expect?
print("\nExpected correlation for simulated data: ~0.8 to 0.9 (due to the 2*x relationship plus random noise)")


Correlation Matrix for simulated random numbers:


Unnamed: 0,x,y
x,1.0,0.891017
y,0.891017,1.0



Expected correlation for simulated data: ~0.8 to 0.9 (due to the 2*x relationship plus random noise)
