# 👥 **Group Comparisons: Categorical vs Numerical Analysis**

## **🎯 Notebook Purpose**

This notebook conducts comprehensive statistical comparisons between categorical customer groups (Gender) and numerical variables (Age, Annual Income, Spending Score), implementing both parametric and non-parametric methods to identify significant differences in customer behavior patterns across demographic segments.

---

## **🔍 Comprehensive Analysis Coverage**

### **1. Independent Samples T-Tests**
- **Male vs Female Age Comparison**
  - **Importance:** Determines if customer age distribution differs significantly by gender
  - **Interpretation:** Significant differences indicate age-based gender bias in customer acquisition or market appeal
- **Male vs Female Income Comparison**
  - **Importance:** Reveals gender-based income disparities in customer base
  - **Interpretation:** Income differences may reflect broader socioeconomic patterns or targeted marketing effectiveness
- **Male vs Female Spending Score Comparison**
  - **Importance:** Most critical comparison for understanding gender-based spending behavior
  - **Interpretation:** Significant differences guide gender-specific marketing strategies and product positioning

### **2. Assumption Testing for T-Tests**
- **Normality Testing by Group (Shapiro-Wilk, Anderson-Darling)**
  - **Importance:** Validates parametric test assumptions for each gender group
  - **Interpretation:** Violated normality requires non-parametric alternatives or data transformation
- **Homogeneity of Variance Testing (Levene's Test, Bartlett's Test)**
  - **Importance:** Tests equal variance assumption required for standard t-tests
  - **Interpretation:** Unequal variances require Welch's t-test or non-parametric methods
- **Independence Assumption Verification**
  - **Importance:** Ensures observations within and between groups are independent
  - **Interpretation:** Violated independence requires specialized methods for clustered or paired data

### **3. Non-Parametric Group Comparisons**
- **Mann-Whitney U Test (Wilcoxon Rank-Sum)**
  - **Importance:** Distribution-free alternative to t-test for comparing group medians
  - **Interpretation:** Tests if one gender group tends to have higher values; robust to outliers and non-normality
- **Kolmogorov-Smirnov Two-Sample Test**
  - **Importance:** Tests if entire distributions differ between gender groups
  - **Interpretation:** Detects differences in shape, location, or spread; more comprehensive than median tests
- **Mood's Median Test**
  - **Importance:** Tests equality of medians across gender groups
  - **Interpretation:** Simple, robust test; appropriate when sample sizes are small or distributions are highly skewed

### **4. Effect Size Calculation for Group Differences**
- **Cohen's d for Mean Differences**
  - **Importance:** Quantifies practical significance of gender differences beyond statistical significance
  - **Interpretation:** d = 0.2 (small), 0.5 (medium), 0.8 (large); guides business impact assessment
- **Glass's Delta for Unequal Variances**
  - **Importance:** Effect size when gender groups have different variabilities
  - **Interpretation:** Uses one group as reference; appropriate when comparing to baseline or control group
- **Cliff's Delta for Non-Parametric Effect Size**
  - **Importance:** Effect size for Mann-Whitney U test and other rank-based comparisons
  - **Interpretation:** Probability that random observation from one group exceeds random observation from other group

### **5. Robust Group Comparison Methods**
- **Trimmed Mean Comparisons**
  - **Importance:** Compares group central tendencies while reducing outlier influence
  - **Interpretation:** More representative of typical customers when extreme values are present
- **Bootstrap Group Comparisons**
  - **Importance:** Distribution-free method for comparing groups without parametric assumptions
  - **Interpretation:** Provides confidence intervals and p-values through resampling; robust to distributional violations
- **Permutation Tests for Group Differences**
  - **Importance:** Exact tests that don't rely on distributional assumptions
  - **Interpretation:** Particularly useful for small samples or unusual customer distributions

### **6. Multiple Group Comparisons (if applicable)**
- **One-Way ANOVA (if additional categorical variables)**
  - **Importance:** Compares means across multiple customer groups simultaneously
  - **Interpretation:** Tests overall group differences; requires post-hoc tests to identify specific group pairs
- **Kruskal-Wallis Test (Non-Parametric ANOVA)**
  - **Importance:** Distribution-free alternative for comparing multiple groups
  - **Interpretation:** Tests if at least one group differs; robust to non-normality and unequal variances
- **Post-Hoc Multiple Comparison Procedures**
  - **Importance:** Controls family-wise error rate when making multiple pairwise comparisons
  - **Interpretation:** Bonferroni, Tukey HSD, or FDR methods prevent inflation of Type I error

### **7. Descriptive Statistics by Group**
- **Central Tendency Measures by Gender**
  - **Importance:** Provides detailed characterization of each customer group
  - **Interpretation:** Reveals typical customer profiles for each demographic segment
- **Variability Measures by Group**
  - **Importance:** Shows diversity within each gender group
  - **Interpretation:** High within-group variability suggests heterogeneous segments; low variability indicates homogeneous groups
- **Distribution Shape Analysis by Group**
  - **Importance:** Compares skewness and kurtosis across gender groups
  - **Interpretation:** Different distribution shapes may require different marketing approaches or product offerings

### **8. Confidence Intervals for Group Differences**
- **Confidence Intervals for Mean Differences**
  - **Importance:** Provides uncertainty bounds around observed group differences
  - **Interpretation:** Intervals excluding zero indicate significant differences; width shows precision of estimates
- **Bootstrap Confidence Intervals for Differences**
  - **Importance:** Distribution-free confidence intervals for group comparisons
  - **Interpretation:** More robust than parametric intervals when assumptions are violated
- **Confidence Intervals for Effect Sizes**
  - **Importance:** Quantifies uncertainty in practical significance measures
  - **Interpretation:** Wide intervals indicate uncertain effect sizes; narrow intervals suggest precise estimates

### **9. Visualization of Group Comparisons**
- **Box Plots and Violin Plots by Group**
  - **Importance:** Visual comparison of distributions across gender groups
  - **Interpretation:** Shows medians, quartiles, outliers, and distribution shapes; reveals group differences at a glance
- **Density Plots and Histograms by Group**
  - **Importance:** Detailed visualization of distribution shapes for each group
  - **Interpretation:** Overlapping distributions suggest similar groups; separated distributions indicate distinct segments
- **Error Bar Plots with Confidence Intervals**
  - **Importance:** Shows group means with uncertainty bounds
  - **Interpretation:** Non-overlapping error bars suggest significant differences; overlapping bars indicate potential similarity

### **10. Business Applications and Interpretation**
- **Customer Segmentation Implications**
  - **Importance:** Translates statistical group differences into segmentation strategies
  - **Interpretation:** Significant differences justify gender-based segmentation; non-significant results suggest unified approaches
- **Marketing Strategy Recommendations**
  - **Importance:** Guides gender-specific marketing tactics and messaging
  - **Interpretation:** Large effect sizes justify differentiated strategies; small effects may not warrant separate approaches
- **Product Development Insights**
  - **Importance:** Informs product features and positioning based on gender preferences
  - **Interpretation:** Group differences in spending behavior guide product portfolio and pricing strategies

---

## **📊 Expected Outcomes**

- **Statistical Group Comparisons:** Rigorous testing of gender differences across all numerical variables
- **Effect Size Quantification:** Practical significance assessment of observed group differences
- **Robust Analysis:** Reliable conclusions using appropriate methods for data characteristics
- **Business Insights:** Actionable recommendations for gender-based customer strategies
- **Visualization Suite:** Clear graphical representation of group differences and distributions
- **Statistical Validation:** Proper handling of assumptions and multiple testing issues

This analysis provides the foundation for evidence-based decisions about gender-specific customer segmentation and marketing strategies.
