# **A/B Hypothesis Testing for Insurance Risk Drivers**

This notebook implements A/B hypothesis testing, focusing on statistically validating or rejecting key hypotheses about risk drivers and profit margins. The insights gained here will form the basis of a new segmentation strategy for AlphaCare Insurance Solutions (ACIS).

## **Table of Contents**

1. [Setup and Data Loading](#1-setup-and-data-loading)
2. [Define Key Metrics](#2-define-key-metrics)
3. [Hypothesis Testing Framework](#3-hypothesis-testing-framework)
4. [Hypothesis 1: Risk Differences Across Provinces](#4-hypothesis-1-risk-differences-across-provinces)
    - H₀: There are no risk differences across provinces.
5. [Hypothesis 2: Risk Differences Between Zip Codes](#5-hypothesis-2-risk-differences-between-zip-codes)
    - H₀: There are no risk differences between zip codes.
6. [Hypothesis 3: Margin Differences Between Zip Codes](#6-hypothesis-3-margin-differences-between-zip-codes)
    - H₀: There are no significant margin (profit) differences between zip codes.
7. [Hypothesis 4: Risk Differences Between Women and Men](#7-hypothesis-4-risk-differences-between-women-and-men)
    - H₀: There are no significant risk differences between Women and Men.
8. [Summary of Findings and Business Recommendations](#8-summary-of-findings-and-business-recommendations)

## **1. Setup and Data Loading**

We start by importing all necessary libraries and our custom modular functions for data loading, preprocessing, and hypothesis testing. We'll load the processed data generated from Task 1, which should be available in `data/processed/`.

### Import necessary libraries and Set plotting style

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
from pathlib import Path
import os
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 12
plt.rcParams['font.family'] = 'Inter'

### Import utilties

In [2]:
# Add project root to sys.path to enable importing modular scripts from 'src/utils/'
import sys
project_root = Path.cwd()
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

# Import data handling utilities
from src.utils.data_loader import load_data
from src.utils.data_preprocessing import perform_initial_data_preprocessing, save_processed_data

# Import hypothesis testing utilities
from src.utils.hypothesis_testing.hypothesis_tester import HypothesisTester
from src.utils.hypothesis_testing.t_test import TTestStrategy
from src.utils.hypothesis_testing.chi_squared_test import ChiSquaredTestStrategy
from src.utils.hypothesis_testing.metrics_calculator import calculate_claim_frequency, calculate_margin, calculate_claim_severity

Adding project root to sys.path: /home/micha/Downloads/course/10-accademy/week-3/Insurance-Risk-Analytics-and-Predictive-Modeling
Adding project root to sys.path: /home/micha/Downloads/course/10-accademy/week-3/Insurance-Risk-Analytics-and-Predictive-Modeling
Adding project root to sys.path: /home/micha/Downloads/course/10-accademy/week-3/Insurance-Risk-Analytics-and-Predictive-Modeling


### Load Processed Data

In [4]:
# Define path to the processed data file
processed_data_path = project_root / "data" / "processed" / "processed_insurance_data.csv"

# Load processed data
print(f"Attempting to load processed data from: {processed_data_path}")
df = load_data(processed_data_path)

if df.empty:
    raise ValueError(f"DataFrame is empty. Please ensure '{processed_data_path}' exists and is correctly formatted. "
                     "This notebook expects processed data from previous tasks.")

print(f"\nDataFrame shape ready for Hypothesis Testing: {df.shape}")
print("\nInitial DataFrame Info:")
df.info()

# Ensure 'TotalPremium' and 'TotalClaims' are numeric, handling NaNs
for col in ['TotalPremium', 'TotalClaims']:
    if col in df.columns:
        df[col] = pd.to_numeric(df[col], errors='coerce')
        # Fill NaNs in financial columns with 0 for calculations if they represent missing values
        df[col] = df[col].fillna(0)
    else:
        print(f"Warning: Column '{col}' not found. Some metrics might not be calculable.")

# Ensure categorical columns are correctly typed for easier segmentation
categorical_cols_for_ab = ['Gender', 'Province', 'PostalCode']
for col in categorical_cols_for_ab:
    if col in df.columns:
        df[col] = df[col].astype('category')
    else:
        print(f"Warning: Categorical column '{col}' not found in DataFrame.")

print("\nDataFrame head after initial setup:")
display(df.head())


Attempting to load processed data from: /home/micha/Downloads/course/10-accademy/week-3/Insurance-Risk-Analytics-and-Predictive-Modeling/data/processed/processed_insurance_data.csv


  df = pd.read_csv(file_path)


Successfully loaded CSV data from 'processed_insurance_data.csv' (inferred comma delimiter). Shape: (1000098, 52)

DataFrame shape ready for Hypothesis Testing: (1000098, 52)

Initial DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000098 entries, 0 to 1000097
Data columns (total 52 columns):
 #   Column                    Non-Null Count    Dtype  
---  ------                    --------------    -----  
 0   UnderwrittenCoverID       1000098 non-null  int64  
 1   PolicyID                  1000098 non-null  int64  
 2   TransactionMonth          1000098 non-null  object 
 3   IsVATRegistered           1000098 non-null  bool   
 4   Citizenship               1000098 non-null  object 
 5   LegalType                 1000098 non-null  object 
 6   Title                     1000098 non-null  object 
 7   Language                  1000098 non-null  object 
 8   Bank                      854137 non-null   object 
 9   AccountType               959866 non-null   object 
 10

Unnamed: 0,UnderwrittenCoverID,PolicyID,TransactionMonth,IsVATRegistered,Citizenship,LegalType,Title,Language,Bank,AccountType,...,ExcessSelected,CoverCategory,CoverType,CoverGroup,Section,Product,StatutoryClass,StatutoryRiskType,TotalPremium,TotalClaims
0,145249,12827,2015-03-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,,Windscreen,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,21.929825,0.0
1,145249,12827,2015-05-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,,Windscreen,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,21.929825,0.0
2,145249,12827,2015-07-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,,Windscreen,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,0.0,0.0
3,145255,12827,2015-05-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,,Own damage,Own Damage,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,512.84807,0.0
4,145255,12827,2015-07-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,,Own damage,Own Damage,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,0.0,0.0


## **2. Define Key Metrics**

For this analysis, "risk" will be quantified by two metrics: **Claim Frequency** (proportion of policies with at least one claim) and **Claim Severity** (the average amount of a claim, given a claim occurred). **"Margin"** is defined as `(TotalPremium - TotalClaims)`.

We will add these derived metrics as new columns to our DataFrame using functions from `src/utils/hypothesis_testing/metrics_calculator.py`.

In [5]:
# Calculate Claim Frequency: Adds 'HasClaim' column (1 if TotalClaims > 0, else 0)
df = calculate_claim_frequency(df.copy())

# Calculate Margin: Adds 'Margin' column (TotalPremium - TotalClaims)
df = calculate_margin(df.copy())

print("\nDataFrame head with new 'HasClaim' and 'Margin' columns:")
display(df.head())

# Note on Claim Severity:
# Claim Severity is calculated as an aggregate (mean TotalClaims for claims > 0)
# It's not a per-row metric like Claim Frequency or Margin.
# When testing Claim Severity, we will filter the data for policies with claims (where HasClaim == 1)
# and then pass their 'TotalClaims' series to the T-test.

Calculated 'HasClaim' (Claim Frequency indicator) column.
Calculated 'Margin' column.

DataFrame head with new 'HasClaim' and 'Margin' columns:


Unnamed: 0,UnderwrittenCoverID,PolicyID,TransactionMonth,IsVATRegistered,Citizenship,LegalType,Title,Language,Bank,AccountType,...,CoverType,CoverGroup,Section,Product,StatutoryClass,StatutoryRiskType,TotalPremium,TotalClaims,HasClaim,Margin
0,145249,12827,2015-03-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,21.929825,0.0,0,21.929825
1,145249,12827,2015-05-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,21.929825,0.0,0,21.929825
2,145249,12827,2015-07-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,0.0,0.0,0,0.0
3,145255,12827,2015-05-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,Own Damage,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,512.84807,0.0,0,512.84807
4,145255,12827,2015-07-01,True,,Close Corporation,Mr,English,First National Bank,Current account,...,Own Damage,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,Commercial,IFRS Constant,0.0,0.0,0,0.0


## **3. Hypothesis Testing Framework**

We will use the `HypothesisTester` context class to conduct our statistical tests. This class allows us to easily switch between different testing strategies (e.g., `TTestStrategy` for numerical comparisons, `ChiSquaredTestStrategy` for categorical comparisons).

A common significance level (`alpha`) of 0.05 will be used for all tests.

- **If p-value < 0.05**: Reject the Null Hypothesis (H₀). This suggests that the observed difference is statistically significant and likely not due to random chance.
- **If p-value** ≥ **0.05**: Fail to Reject the Null Hypothesis (H₀). This suggests that there is not enough statistical evidence to conclude a significant difference.

In [6]:
# Initialize our HypothesisTester with a default strategy (e.g., T-test)
# We will set the specific strategy for each test as needed.
tester = HypothesisTester(TTestStrategy()) # Initializing with TTestStrategy

alpha = 0.05 # Standard significance level

## **4. Hypothesis 1: Risk Differences Across Provinces**

Null Hypothesis (H₀): There are no risk differences across provinces.

This means Claim Frequency and Claim Severity are the same across provinces.

We will compare two provinces from the dataset. For generalizability, we'll dynamically select the two most frequent provinces to ensure sufficient sample sizes for comparison, assuming they represent major segments.

In [8]:
print("\n--- Hypothesis 1: Risk Differences Across Provinces ---")

# Ensure 'Province' column exists and has at least two unique values for comparison
if 'Province' not in df.columns or df['Province'].nunique() < 2:
    print("Skipping Hypothesis 1: 'Province' column missing or insufficient unique provinces for comparison.")
else:
    # Dynamically select the two most frequent provinces
    top_provinces = df['Province'].value_counts().nlargest(2).index.tolist()

    if len(top_provinces) < 2:
        print(f"Skipping comparison: Less than two unique or sufficiently frequent provinces found in 'Province' column: {top_provinces}. Cannot perform A/B test.")
    else:
        province_A = top_provinces[0]
        province_B = top_provinces[1]

        print(f"Comparing Province A: '{province_A}' vs Province B: '{province_B}'")

        df_province_A = df[df['Province'] == province_A].copy()
        df_province_B = df[df['Province'] == province_B].copy()

        # --- Test 1.1: Claim Frequency difference between Province A and Province B ---
        # Metric: HasClaim (binary: 0 or 1) -> Use Chi-squared test
        tester.set_strategy(ChiSquaredTestStrategy())
        results_freq_prov = tester.execute_test(
            df_province_A['HasClaim'],
            df_province_B['HasClaim'],
            alpha=alpha,
            test_name=f"Claim Frequency: {province_A} vs {province_B}"
        )
        # Business Interpretation:
        if results_freq_prov['conclusion'] == "Reject H0":
            freq_A = df_province_A['HasClaim'].mean()
            freq_B = df_province_B['HasClaim'].mean()
            print(f"  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in Claim Frequency between {province_A} ({freq_A:.2%}) and {province_B} ({freq_B:.2%}). {province_A if freq_A > freq_B else province_B} has a higher claim frequency, suggesting higher inherent risk in that region.")
        else:
            print(f"  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Frequency between {province_A} and {province_B}.")


        # --- Test 1.2: Claim Severity difference between Province A and Province B ---
        # Metric: TotalClaims (numerical, for policies with claims) -> Use T-test
        # Filter for policies with claims in each province
        claims_A = df_province_A[df_province_A['HasClaim'] == 1]['TotalClaims']
        claims_B = df_province_B[df_province_B['HasClaim'] == 1]['TotalClaims']

        tester.set_strategy(TTestStrategy())
        results_sev_prov = tester.execute_test(
            claims_A,
            claims_B,
            alpha=alpha,
            test_name=f"Claim Severity: {province_A} vs {province_B}"
        )
        # Business Interpretation:
        if results_sev_prov['conclusion'] == "Reject H0":
            print(f"  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in Claim Severity between {province_A} and {province_B}.")
            print(f"  Mean Claim Severity: {province_A}={results_sev_prov['group_a_mean']:.2f}, {province_B}={results_sev_prov['group_b_mean']:.2f}.")
            print(f"  This suggests that claims in {province_A if results_sev_prov['group_a_mean'] > results_sev_prov['group_b_mean'] else province_B} are, on average, more expensive when they occur.")
        else:
            print(f"  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Severity between {province_A} and {province_B}.")





--- Hypothesis 1: Risk Differences Across Provinces ---
Comparing Province A: 'Gauteng' vs Province B: 'Western Cape'



--- Conducting Claim Frequency: Gauteng vs Western Cape ---
Chi-squared test (α=0.05): Statistic=56.0874, P-value=0.0000. Conclusion: Reject H0 (p-value < 0.05).
  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in Claim Frequency between Gauteng (0.34%) and Western Cape (0.22%). Gauteng has a higher claim frequency, suggesting higher inherent risk in that region.

--- Conducting Claim Severity: Gauteng vs Western Cape ---
Independent Samples T-test (Equal Variances Assumed) (α=0.05): Statistic=-2.5501, P-value=0.0109. Conclusion: Reject H0 (p-value < 0.05).
  Group A (Mean=22243.88, N=1322).
  Group B (Mean=28095.85, N=370).
  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in Claim Severity between Gauteng and Western Cape.
  Mean Claim Severity: Gauteng=22243.88, Western Cape=28095.85.
  This suggests that claims in Western Cape are, on average, more expensive when they o

**Interpretation & Business Recommendation for H1:**

- **Based on your results for Claim Frequency (Gauteng vs Western Cape):**
    - **Result:** `Reject H0 (p-value = 0.0000 < 0.05)`.
    - **Interpretation:** There is a highly significant difference in **Claim Frequency** between Gauteng (0.34%) and Western Cape (0.22%). Gauteng policies are statistically more likely to incur a claim.
    - **Business Recommendation:** ACIS should consider **differentiating premiums based on province**, specifically implementing a higher premium or stricter underwriting for policies in Gauteng due to their higher claim frequency. Conversely, policies in Western Cape might warrant more competitive pricing to attract lower-risk customers.
- **Based on your results for Claim Severity (Gauteng vs Western Cape):**
    - **Result:** `Reject H0 (p-value = 0.0109 < 0.05)`.
    - **Interpretation:** There is a statistically significant difference in **Claim Severity**. Claims in Western Cape (Mean = 28095.85) are, on average, more expensive than those in Gauteng (Mean = 22243.88), given a claim occurs.
    - **Business Recommendation:** While Gauteng has higher claim frequency, Western Cape experiences more costly claims. ACIS should analyze the drivers of higher severity in Western Cape (e.g., higher vehicle values, more severe accident types, higher repair costs in the region) and adjust premiums or underwriting for Western Cape policies accordingly, particularly for high-value vehicles.

## **5. Hypothesis 2: Risk Differences Between Zip Codes**

Null Hypothesis (H₀): There are no risk differences between zip codes.

This means Claim Frequency and Claim Severity are the same across zip codes.

**Challenge with Zip Codes:** `PostalCode` often has very high cardinality. Comparing all pairs is impractical. We will select two specific zip codes for comparison. For generalizability, we'll dynamically select the two most frequent zip codes to ensure sufficient sample sizes.

In [9]:
print("\n--- Hypothesis 2: Risk Differences Between Zip Codes ---")

if 'PostalCode' not in df.columns or df['PostalCode'].nunique() < 2:
    print("Skipping Hypothesis 2: 'PostalCode' column missing or insufficient unique zip codes for comparison.")
else:
    # Dynamically select the two most frequent zip codes
    top_zip_codes = df['PostalCode'].value_counts().nlargest(2).index.tolist()

    if len(top_zip_codes) < 2:
        print(f"Skipping comparison: Less than two unique or sufficiently frequent zip codes found in 'PostalCode' column: {top_zip_codes}. Cannot perform A/B test.")
    else:
        zip_code_A = top_zip_codes[0]
        zip_code_B = top_zip_codes[1]

        print(f"Comparing Zip Code A: '{zip_code_A}' vs Zip Code B: '{zip_code_B}'")

        df_zip_A = df[df['PostalCode'] == zip_code_A].copy()
        df_zip_B = df[df['PostalCode'] == zip_code_B].copy()

        # --- Test 2.1: Claim Frequency difference between Zip Code A and Zip Code B ---
        tester.set_strategy(ChiSquaredTestStrategy())
        results_freq_zip = tester.execute_test(
            df_zip_A['HasClaim'],
            df_zip_B['HasClaim'],
            alpha=alpha,
            test_name=f"Claim Frequency: Zip Code {zip_code_A} vs {zip_code_B}"
        )
        # Business Interpretation:
        if results_freq_zip['conclusion'] == "Reject H0":
            freq_A = df_zip_A['HasClaim'].mean()
            freq_B = df_zip_B['HasClaim'].mean()
            print(f"  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in Claim Frequency between Zip Code {zip_code_A} ({freq_A:.2%}) and {zip_code_B} ({freq_B:.2%}). This indicates that policies in one of these zip codes are significantly more or less likely to incur a claim.")
        else:
            print(f"  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Frequency between Zip Code {zip_code_A} and {zip_code_B}.")


        # --- Test 2.2: Claim Severity difference between Zip Code A and Zip Code B ---
        claims_A = df_zip_A[df_zip_A['HasClaim'] == 1]['TotalClaims']
        claims_B = df_zip_B[df_zip_B['HasClaim'] == 1]['TotalClaims']

        tester.set_strategy(TTestStrategy())
        results_sev_zip = tester.execute_test(
            claims_A,
            claims_B,
            alpha=alpha,
            test_name=f"Claim Severity: Zip Code {zip_code_A} vs {zip_code_B}"
        )
        # Business Interpretation:
        if results_sev_zip['conclusion'] == "Reject H0":
            print(f"  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in Claim Severity between Zip Code {zip_code_A} and {zip_code_B}.")
            print(f"  Mean Claim Severity: {zip_code_A}={results_sev_zip['group_a_mean']:.2f}, {zip_code_B}={results_sev_zip['group_b_mean']:.2f}.")
            print(f"  Claims are, on average, more expensive in {zip_code_A if results_sev_zip['group_a_mean'] > results_sev_zip['group_b_mean'] else zip_code_B}.")
        else:
            print(f"  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Severity between Zip Code {zip_code_A} and {zip_code_B}.")





--- Hypothesis 2: Risk Differences Between Zip Codes ---
Comparing Zip Code A: '2000' vs Zip Code B: '122'

--- Conducting Claim Frequency: Zip Code 2000 vs 122 ---
Chi-squared test (α=0.05): Statistic=3.5971, P-value=0.0579. Conclusion: Fail to Reject H0 (p-value >= 0.05).
  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Frequency between Zip Code 2000 and 122.

--- Conducting Claim Severity: Zip Code 2000 vs 122 ---
Independent Samples T-test (Equal Variances Assumed) (α=0.05): Statistic=0.4214, P-value=0.6736. Conclusion: Fail to Reject H0 (p-value >= 0.05).
  Group A (Mean=19196.41, N=486).
  Group B (Mean=18162.03, N=210).
  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Severity between Zip Code 2000 and 122.


**Interpretation & Business Recommendation for H2:**

- **Based on your results for Claim Frequency (Zip Code 2000 vs 122):**
    - **Result:** `Fail to Reject H0 (p-value = 0.0579 >= 0.05)`.
    - **Interpretation:** There is **no statistically significant difference** in Claim Frequency between Zip Code 2000 and Zip Code 122. The observed differences are likely due to random chance.
    - **Business Recommendation:** Based on this test, there's no strong statistical evidence to suggest that claim frequency varies significantly enough between these two specific zip codes to warrant differential pricing or risk adjustments *solely based on claim frequency*.
- **Based on your results for Claim Severity (Zip Code 2000 vs 122):**
    - **Result:** `Fail to Reject H0 (p-value = 0.6736 >= 0.05)`.
    - **Interpretation:** There is **no statistically significant difference** in Claim Severity between Zip Code 2000 (Mean = 19196.41) and Zip Code 122 (Mean = 18162.03). The average cost of claims in these two zip codes is statistically similar.
    - **Business Recommendation:** Similar to claim frequency, there's no strong statistical evidence from this test to justify distinct pricing or underwriting for these two specific zip codes based on claim severity. The differences are not statistically meaningful.

## **6. Hypothesis 3: Margin Differences Between Zip Codes**

**Null Hypothesis (H₀): There are no significant margin (profit) differences between zip codes.**

This tests whether ACIS is achieving similar profitability from policies in different zip codes.

In [10]:
print("\n--- Hypothesis 3: Margin Differences Between Zip Codes ---")

if 'PostalCode' not in df.columns or df['PostalCode'].nunique() < 2 or 'Margin' not in df.columns:
    print("Skipping Hypothesis 3: Required columns ('PostalCode', 'Margin') missing or insufficient unique zip codes.")
else:
    # Re-using dynamically selected top zip codes for consistency
    top_zip_codes = df['PostalCode'].value_counts().nlargest(2).index.tolist()

    if len(top_zip_codes) < 2:
        print(f"Skipping comparison: Less than two unique or sufficiently frequent zip codes found for margin test.")
    else:
        zip_code_A = top_zip_codes[0]
        zip_code_B = top_zip_codes[1]

        print(f"Comparing Margin: Zip Code A: '{zip_code_A}' vs Zip Code B: '{zip_code_B}'")

        margin_A = df[df['PostalCode'] == zip_code_A]['Margin']
        margin_B = df[df['PostalCode'] == zip_code_B]['Margin']

        tester.set_strategy(TTestStrategy())
        results_margin_zip = tester.execute_test(
            margin_A,
            margin_B,
            alpha=alpha,
            test_name=f"Margin Difference: Zip Code {zip_code_A} vs {zip_code_B}"
        )
        # Business Interpretation:
        if results_margin_zip['conclusion'] == "Reject H0":
            print(f"  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in average Margin between Zip Code {zip_code_A} and {zip_code_B}.")
            print(f"  Average Margin: {zip_code_A}={results_margin_zip['group_a_mean']:.2f}, {zip_code_B}={results_margin_zip['group_b_mean']:.2f}.")
            print(f"  ACIS is, on average, more profitable from policies in {zip_code_A if results_margin_zip['group_a_mean'] > results_margin_zip['group_b_mean'] else zip_code_B}.")
            print("  This suggests pricing or risk assessment is not optimally balanced across these areas, indicating opportunities for premium adjustments.")
        else:
            print(f"  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in average Margin between Zip Code {zip_code_A} and {zip_code_B}.")
            print("  This suggests that profitability is statistically similar across these zip codes.")





--- Hypothesis 3: Margin Differences Between Zip Codes ---
Comparing Margin: Zip Code A: '2000' vs Zip Code B: '122'

--- Conducting Margin Difference: Zip Code 2000 vs 122 ---
Independent Samples T-test (Equal Variances Assumed) (α=0.05): Statistic=1.2933, P-value=0.1959. Conclusion: Fail to Reject H0 (p-value >= 0.05).
  Group A (Mean=-8.11, N=133498).
  Group B (Mean=-22.86, N=49171).
  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in average Margin between Zip Code 2000 and 122.
  This suggests that profitability is statistically similar across these zip codes.


**Interpretation & Business Recommendation for H3:**

- **Based on your results for Margin (Zip Code 2000 vs 122):**
    - **Result:** `Fail to Reject H0 (p-value = 0.1959 >= 0.05)`.
    - **Interpretation:** There is **no statistically significant difference** in average Margin between Zip Code 2000 (Mean = -8.11) and Zip Code 122 (Mean = -22.86). Despite slight numerical differences, the profitability for ACIS from policies in these two zip codes is statistically similar.
    - **Business Recommendation:** Based on this test, there is no statistical evidence to warrant distinct pricing or strategy adjustments between these two specific zip codes based on their average profit margin.

## **7. Hypothesis 4: Risk Differences Between Women and Men**

Null Hypothesis (H₀): There are not significant risk differences between Women and Men.

This means Claim Frequency and Claim Severity are the same for women and men.

In [11]:
print("\n--- Hypothesis 4: Risk Differences Between Women and Men ---")

if 'Gender' not in df.columns or df['Gender'].nunique() < 2:
    print("Skipping Hypothesis 4: 'Gender' column missing or insufficient unique genders.")
else:
    # Ensure 'Gender' column values are consistent for grouping
    gender_A = 'Female'
    gender_B = 'Male'

    if gender_A not in df['Gender'].unique() or gender_B not in df['Gender'].unique():
        print(f"Skipping comparison: '{gender_A}' or '{gender_B}' not found in 'Gender' column. Adjust genders for test.")
    else:
        df_women = df[df['Gender'] == gender_A].copy()
        df_men = df[df['Gender'] == gender_B].copy()

        # --- Test 4.1: Claim Frequency difference between Women and Men ---
        tester.set_strategy(ChiSquaredTestStrategy())
        results_freq_gender = tester.execute_test(
            df_women['HasClaim'],
            df_men['HasClaim'],
            alpha=alpha,
            test_name=f"Claim Frequency: {gender_A} vs {gender_B}"
        )
        # Business Interpretation:
        if results_freq_gender['conclusion'] == "Reject H0":
            freq_women = df_women['HasClaim'].mean()
            freq_men = df_men['HasClaim'].mean()
            print(f"  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in Claim Frequency between {gender_A} ({freq_women:.2%}) and {gender_B} ({freq_men:.2%}). One group is more prone to having claims.")
        else:
            print(f"  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Frequency between {gender_A} and {gender_B}.")

        # --- Test 4.2: Claim Severity difference between Women and Men ---
        claims_women = df_women[df_women['HasClaim'] == 1]['TotalClaims']
        claims_men = df_men[df_men['HasClaim'] == 1]['TotalClaims']

        tester.set_strategy(TTestStrategy())
        results_sev_gender = tester.execute_test(
            claims_women,
            claims_men,
            alpha=alpha,
            test_name=f"Claim Severity: {gender_A} vs {gender_B}"
        )
        # Business Interpretation:
        if results_sev_gender['conclusion'] == "Reject H0":
            print(f"  Business Interpretation: We reject the null hypothesis. There is a statistically significant difference in Claim Severity between {gender_A} and {gender_B}.")
            print(f"  Mean Claim Severity: {gender_A}={results_sev_gender['group_a_mean']:.2f}, {gender_B}={results_sev_gender['group_b_mean']:.2f}.")
            print(f"  Claims from {gender_A if results_sev_gender['group_a_mean'] > results_sev_gender['group_b_mean'] else gender_B} are, on average, more expensive.")
        else:
            print(f"  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Severity between {gender_A} and {gender_B}.")




--- Hypothesis 4: Risk Differences Between Women and Men ---

--- Conducting Claim Frequency: Female vs Male ---
Chi-squared test (α=0.05): Statistic=0.0037, P-value=0.9515. Conclusion: Fail to Reject H0 (p-value >= 0.05).
  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Frequency between Female and Male.

--- Conducting Claim Severity: Female vs Male ---
Independent Samples T-test (Equal Variances Assumed) (α=0.05): Statistic=0.4191, P-value=0.6760. Conclusion: Fail to Reject H0 (p-value >= 0.05).
  Group A (Mean=17874.72, N=14).
  Group B (Mean=14858.55, N=94).
  Business Interpretation: We fail to reject the null hypothesis. There is no statistically significant difference in Claim Severity between Female and Male.


**Interpretation & Business Recommendation for H4:**

- **Based on your results for Claim Frequency (Female vs Male):**
    - **Result:** `Fail to Reject H0 (p-value = 0.9515 >= 0.05)`.
    - **Interpretation:** There is **no statistically significant difference** in Claim Frequency between Female and Male policyholders. The likelihood of a claim occurring is statistically similar for both genders.
    - **Business Recommendation:** Based on this data, gender does not appear to be a statistically significant differentiator for the likelihood of a claim. ACIS should not factor gender into claim frequency predictions or related pricing adjustments for this specific risk aspect.
- **Based on your results for Claim Severity (Female vs Male):**
    - **Result:** `Fail to Reject H0 (p-value = 0.6760 >= 0.05)`.
    - **Interpretation:** There is **no statistically significant difference** in Claim Severity between Female (Mean = 17874.72) and Male (Mean = 14858.55) policyholders, given a claim occurred. The average cost of claims is statistically similar for both genders.
    - **Business Recommendation:** Similar to claim frequency, gender does not appear to be a statistically significant factor for claim severity. ACIS should avoid using gender as a differentiator for claim cost predictions or related premium adjustments based on this analysis.

## **8. Summary of Findings and Business Recommendations**

This section consolidates the conclusions from all hypothesis tests and provides concise, actionable business recommendations for AlphaCare Insurance Solutions (ACIS) based on the statistical evidence.

**Overall Summary of Hypothesis Test Outcomes:**

- **H₀: There are no risk differences across provinces.**
    - **Claim Frequency:** **REJECT H₀ (p < 0.0001)**. Statistically significant difference observed. Gauteng has a higher claim frequency (0.34%) compared to Western Cape (0.22%).
    - **Claim Severity:** **REJECT H₀ (p = 0.0109)**. Statistically significant difference observed. Western Cape (Mean Claim Severity = 28095.85) has higher average claim costs than Gauteng (Mean Claim Severity = 22243.88).
- **H₀: There are no risk differences between zip codes (2000 vs 122).**
    - **Claim Frequency:** **FAIL TO REJECT H₀ (p = 0.0579)**. No statistically significant difference.
    - **Claim Severity:** **FAIL TO REJECT H₀ (p = 0.6736)**. No statistically significant difference.
- **H₀: There are no significant margin (profit) differences between zip codes (2000 vs 122).**
    - **Margin:** **FAIL TO REJECT H₀ (p = 0.1959)**. No statistically significant difference.
- **H₀: There are no significant risk differences between Women and Men.**
    - **Claim Frequency:** **FAIL TO REJECT H₀ (p = 0.9515)**. No statistically significant difference.
    - **Claim Severity:** **FAIL TO REJECT H₀ (p = 0.6760)**. No statistically significant difference.

**Key Actionable Insights from Rejected Hypotheses:**

1. **Provincial Risk Differentiation is Crucial**: Provinces, specifically Gauteng and Western Cape, exhibit statistically significant differences in both claim frequency and claim severity.
    - Gauteng policies are more likely to have a claim (higher frequency).
    - Western Cape policies, when a claim occurs, result in significantly higher average costs (higher severity).

**Business Recommendations for ACIS:**

Based on the statistical validation, ACIS should prioritize the following strategic adjustments:

1. **Implement Granular Geographical Pricing and Underwriting**:
    - **Gauteng**: Given its significantly higher claim frequency, ACIS should review and potentially **increase premiums or tighten underwriting criteria** for policies in Gauteng to align pricing better with the higher likelihood of claims.
    - **Western Cape**: Despite lower claim frequency, the higher average claim severity in Western Cape suggests that claims, when they do occur, are more costly. ACIS should investigate the factors driving this higher severity (e.g., prevalence of higher value vehicles, specific accident types, repair costs) and consider **premium adjustments or specialized risk assessments** for policies in this region, especially for high-value exposures.
2. **Focus on Data for Other Segments**: For features like Zip Codes (specifically 2000 vs 122) and Gender, where no statistically significant differences were found for risk (Claim Frequency, Claim Severity) or Margin, ACIS should **avoid implementing differential pricing or major strategic changes based solely on these attributes** from this dataset. Further analysis with more granular data, different segmentation approaches, or larger sample sizes might be needed if business intuition suggests differences exist.
3. **Continuous Monitoring and Iteration**:
    - Establish ongoing monitoring of Claim Frequency, Claim Severity, and Margin across all key segments (including provinces, vehicle types, and other demographic factors).
    - Regularly re-evaluate these hypotheses as new data becomes available to adapt to changing risk landscapes and market dynamics. This iterative approach ensures ACIS's segmentation and pricing strategies remain competitive and profitable.

These recommendations provide clear, data-backed guidance for ACIS to refine its risk assessment and pricing strategies, ultimately aiming for improved profitability and customer segmentation.