# 🔗 **Numerical Variable Correlations Analysis**

## **🎯 Notebook Purpose**

This notebook conducts comprehensive correlation analysis between numerical customer variables (Age, Annual Income, Spending Score), providing insights into linear and monotonic relationships that form the foundation for customer behavior understanding and segmentation strategies.

---

## **🔍 Comprehensive Analysis Coverage**

### **1. Pearson Product-Moment Correlation**
- **Age vs Annual Income Correlation**
  - **Importance:** Reveals relationship between life stage and earning capacity
  - **Interpretation:** Positive correlation suggests income increases with age; negative suggests younger high earners
- **Age vs Spending Score Correlation**
  - **Importance:** Shows how spending behavior relates to customer life stage
  - **Interpretation:** Positive correlation indicates older customers spend more; negative suggests younger spenders
- **Annual Income vs Spending Score Correlation**
  - **Importance:** Critical relationship for understanding spending capacity vs actual spending
  - **Interpretation:** Strong positive correlation confirms income drives spending; weak correlation suggests other factors influence spending

### **2. Correlation Strength Assessment**
- **Effect Size Interpretation (Cohen's Guidelines)**
  - **Importance:** Provides standardized interpretation of correlation magnitude
  - **Interpretation:** |r| < 0.3 (small), 0.3-0.5 (medium), >0.5 (large effect); guides business significance assessment
- **Statistical Significance Testing**
  - **Importance:** Determines if observed correlations are likely due to chance
  - **Interpretation:** p < 0.05 indicates significant correlation; p > 0.05 suggests correlation could be due to sampling variation
- **Confidence Intervals for Correlations**
  - **Importance:** Quantifies uncertainty in correlation estimates
  - **Interpretation:** Wide intervals indicate uncertain estimates; narrow intervals suggest precise relationship measurement

### **3. Assumption Validation**
- **Linearity Assessment**
  - **Importance:** Pearson correlation assumes linear relationships
  - **Interpretation:** Non-linear patterns require alternative correlation measures or transformation
- **Normality Testing for Variables**
  - **Importance:** Severe non-normality can affect correlation interpretation
  - **Interpretation:** Non-normal distributions may require robust correlation methods
- **Outlier Impact Analysis**
  - **Importance:** Extreme values can artificially inflate or deflate correlations
  - **Interpretation:** High outlier influence suggests need for robust correlation methods

### **4. Robust Correlation Methods**
- **Spearman Rank Correlation**
  - **Importance:** Captures monotonic relationships without linearity assumption
  - **Interpretation:** Higher than Pearson suggests non-linear monotonic relationship; lower suggests outlier influence
- **Kendall's Tau Correlation**
  - **Importance:** Alternative rank-based measure less sensitive to outliers
  - **Interpretation:** More conservative than Spearman; better for small samples or many tied values
- **Biweight Midcorrelation**
  - **Importance:** Robust to outliers while maintaining efficiency
  - **Interpretation:** Differs significantly from Pearson when outliers are influential

### **5. Correlation Matrix Analysis**
- **Complete Correlation Matrix Construction**
  - **Importance:** Provides comprehensive view of all variable relationships
  - **Interpretation:** Patterns in matrix reveal variable clusters and potential multicollinearity
- **Hierarchical Clustering of Variables**
  - **Importance:** Groups variables by similarity of correlation patterns
  - **Interpretation:** Tight clusters suggest redundant variables; loose clustering indicates diverse information
- **Eigenvalue Analysis of Correlation Matrix**
  - **Importance:** Assesses dimensionality and information content
  - **Interpretation:** Few large eigenvalues suggest low-dimensional structure; many similar eigenvalues indicate high dimensionality

### **6. Partial Correlation Analysis**
- **Age-Income Correlation Controlling for Spending**
  - **Importance:** Isolates direct age-income relationship removing spending influence
  - **Interpretation:** Reduced partial correlation suggests spending mediates age-income relationship
- **Age-Spending Correlation Controlling for Income**
  - **Importance:** Examines age effect on spending independent of income
  - **Interpretation:** Strong partial correlation indicates age affects spending beyond income effects
- **Income-Spending Correlation Controlling for Age**
  - **Importance:** Isolates pure income effect on spending behavior
  - **Interpretation:** Strong partial correlation confirms income drives spending across age groups

### **7. Correlation Stability Analysis**
- **Bootstrap Correlation Confidence Intervals**
  - **Importance:** Assesses stability of correlation estimates across samples
  - **Interpretation:** Stable intervals indicate reliable relationships; unstable intervals suggest sampling sensitivity
- **Jackknife Correlation Analysis**
  - **Importance:** Evaluates influence of individual observations on correlations
  - **Interpretation:** High variability indicates correlation sensitive to specific customers
- **Subsample Correlation Consistency**
  - **Importance:** Tests if correlations hold across different customer subgroups
  - **Interpretation:** Consistent correlations suggest generalizable relationships; inconsistent correlations indicate subgroup differences

### **8. Business Interpretation Framework**
- **Customer Segmentation Implications**
  - **Importance:** Translates correlations into segmentation strategy insights
  - **Interpretation:** Strong correlations suggest natural customer groupings; weak correlations indicate diverse customer base
- **Marketing Strategy Insights**
  - **Importance:** Guides targeting and positioning strategies based on customer relationships
  - **Interpretation:** Correlation patterns inform which customer characteristics to target together
- **Revenue Optimization Applications**
  - **Importance:** Uses correlation insights to identify high-value customer characteristics
  - **Interpretation:** Strong income-spending correlation guides pricing and promotion strategies

---

## **📊 Expected Outcomes**

- **Correlation Matrix:** Complete relationship map between numerical customer variables
- **Relationship Strength Assessment:** Quantified correlation magnitudes with confidence intervals
- **Robust Correlation Estimates:** Outlier-resistant relationship measures
- **Business Insights:** Actionable implications for customer segmentation and marketing
- **Statistical Validation:** Significance testing and assumption verification for all correlations

This analysis provides the quantitative foundation for understanding customer behavior patterns and designing effective segmentation strategies.
