### Correlation Analysis in Python - Pearson 


#### **What is Correlation Analysis?**
- **Correlation analysis** is a statistical method to evaluate the **strength and direction** of the relationship between two quantitative variables.
- Measures **how changes in one variable are associated** with changes in another.

---

#### **Key Concept: Correlation Coefficient ("r")**
- **Range:** -1 to +1
  - **Positive Correlation** (r → +1): Both variables increase together.
  - **Negative Correlation** (r → -1): One variable increases, the other decreases.
  - **No Correlation** (r ≈ 0): Weak or no linear relationship.

---

#### **Pearson Correlation Coefficient**
- Most common method for **linear relationships**.
- Best for **continuous data** without significant outliers.
  
---

#### **Other Correlation Coefficients**
- **Spearman’s Rank Correlation:** For ordinal data or non-linear relationships.
- **Kendall’s Tau:** Used for smaller datasets or to handle ties in rankings.
- **Point-Biserial Correlation:** For one continuous and one binary variable.

---

#### **Applications Across Fields**
- Used in **economics**, **social sciences**, **finance**, and more.
- **Helps identify relationships** but does not imply causation.

---

#### **Important Reminder**
- **Correlation ≠ Causation**: A correlation does not mean one variable causes changes in the other.

---

#### **Conclusion**
- Correlation analysis is a powerful tool for **understanding relationships** but should be interpreted carefully.



![Title](Images/correlation.png)


## Example 

In [2]:
import pandas as pd

# Sample data
data = {
    'Variable_1': [1, 2, 3, 4, 5],
    'Variable_2': [5, 4, 3, 2, 1]
}

# Creating a DataFrame from the sample data
df = pd.DataFrame(data)

# Calculating mean of  Variable_1.
mean_of_variable1 = df['Variable_1'].mean()
print(f'Mean of variable1: {mean_of_variable1}')

# Calculating Pearson correlation coefficient between Variable_1 and Variable_2
pearson_corr = df['Variable_1'].corr(df['Variable_2'])

print(f"Pearson correlation coefficient: {pearson_corr}")


Mean of variable1: 3.0
Pearson correlation coefficient: -0.9999999999999999


**Guideline for interpreting result**

- **0.00 - 0.19**: **Very Weak Correlation**  
  - Little to no association between variables.
- **0.20 - 0.39**: **Weak Correlation**  
  - Low association; one variable only slightly follows the trend of the other.
- **0.40 - 0.59**: **Moderate Correlation**  
  - Some association; as one variable changes, the other has a moderate tendency to follow.
- **0.60 - 0.79**: **Strong Correlation**  
  - Strong association; as one variable changes, the other tends to follow significantly.
- **0.80 - 1.00**: **Very Strong Correlation**  
  - Very high association; changes in one variable closely relate to changes in the other.

This range provides a useful guideline, but remember that correlation strength also depends on **context, sample size, and domain**, so it's always good to interpret it alongside these factors.