### Correlation & Covariance

### 🔹 Covariance (सह-प्रसरण)

👉 Covariance tells us how two variables change together.

If positive covariance → both variables move in the same direction.

If negative covariance → one goes up, the other goes down.

If covariance is close to zero → no relation.

📌 Example:

Height & Weight → taller people generally weigh more → positive covariance.

Price & Demand → when price goes up, demand usually goes down → negative covariance.

But… covariance values are not standardized.
Example: Covariance can be 5000 or -20000 → difficult to interpret.

🔹 Correlation (सहसंबंध)

👉 Correlation is the standardized form of covariance.

Always between -1 and +1.

+1 → perfect positive relation

-1 → perfect negative relation

0 → no relation

📌 Example:

Height & Weight → correlation ≈ +0.95 (strong positive).

Price & Demand → correlation ≈ -0.85 (strong negative).

Shoe size & Salary → correlation ≈ 0 (no relation).

### In short:

Covariance tells direction of relation.

Correlation tells direction + strength of relation.

In [14]:
import numpy as np
import pandas as pd
X = np.array([2,4,6])
Y = np.array([4,2,6])
cov_matrix = np.cov(X, Y, ddof=0)   # ddof=0 → population covariance
print("Covariance matrix:\n", cov_matrix)
print("Cov(X,Y) =", cov_matrix[0,1])
print()
## Correlation
corr_matrix = np.corrcoef(X,Y)
print("Correlation matrix:\n", corr_matrix)
print("Corr(X,Y) =", corr_matrix[0,1])
print()
### Pandas ShortCut
import pandas as pd

X = [2, 4, 6]
Y = [4, 2, 6]

df = pd.DataFrame({"X": X, "Y": Y})

# Only covariance between X and Y
cov_xy = df.cov().loc["X", "Y"]

# Only correlation between X and Y
corr_xy = df.corr().loc["X", "Y"]

print("Cov(X,Y):", cov_xy)
print("Corr(X,Y):", corr_xy)




Covariance matrix:
 [[2.66666667 1.33333333]
 [1.33333333 2.66666667]]
Cov(X,Y) = 1.3333333333333333

Correlation matrix:
 [[1.  0.5]
 [0.5 1. ]]
Corr(X,Y) = 0.5

Cov(X,Y): 2.0
Corr(X,Y): 0.5


### Pearson Correlation

What it does: Measures the strength and direction of a linear relationship between two continuous variables.

Range: -1 to +1

Key idea: It looks at actual values and assumes they are normally distributed.

When to use:

When the relationship looks like a straight line (linear).

Example: Height vs Weight.

### Spearman Correlation

What it does: Measures the strength and direction of a monotonic relationship (always increasing or always decreasing), but not necessarily linear.

Key idea: Works on ranks, not raw values.

When to use:

Data is not normally distributed.

Outliers are present (since ranks reduce their effect).

Relationship is non-linear but monotonic.

✅ Example:

Students ranked by study hours and exam scores.

So in practice:

If relationship is linear & no major outliers → Use Pearson.

If relationship is non-linear or has outliers → Use Spearman.

In [24]:
import seaborn as sns
import pandas as pd

# Load dataset
tips = sns.load_dataset("tips")

# Calculate correlations
pearson_corr = tips['total_bill'].corr(tips['tip'], method='pearson')
spearman_corr = tips['total_bill'].corr(tips['tip'], method='spearman')

print("Pearson Correlation:", pearson_corr)
print("Spearman Correlation:", spearman_corr)
answer = tips['total_bill'].corr(tips['tip'],method = 'pearson')



Pearson Correlation: 0.6757341092113645
Spearman Correlation: 0.6789681219001009
0.6757341092113645


In [27]:
import numpy as np

# Non-linear but monotonic relationship
x = np.arange(1, 11)
y = x**2  # quadratic relation

df = pd.DataFrame({"x": x, "y": y})

pearson_corr = df['x'].corr(df['y'], method='pearson')
spearman_corr = df['x'].corr(df['y'], method='spearman')

print("Pearson Correlation:", pearson_corr)
print("Spearman Correlation:", spearman_corr)


Pearson Correlation: 0.9745586289152092
Spearman Correlation: 0.9999999999999999
