# A/B Testing Simulation for Insurance Analytics

This notebook simulates an A/B test comparing claim frequencies between two groups: a control group with standard premiums and a treatment group with reduced premiums.


In [1]:
# notebooks/example_notebook.ipynb

import sys
import os
sys.path.append(os.path.abspath('../scripts'))

In [2]:
from ab_testing import generate_insurance_data, save_data_to_csv, ab_test, gender_analysis

In [3]:

# Example usage
insurance_data = generate_insurance_data()



In [4]:
insurance_data

Unnamed: 0,Province,Gender,Claimed
0,Province_C,Male,0
1,Province_A,Male,1
2,Province_C,Male,0
3,Province_C,Male,1
4,Province_A,Female,0
...,...,...,...
995,Province_B,Male,1
996,Province_B,Female,0
997,Province_C,Male,1
998,Province_C,Male,0


In [5]:
save_data_to_csv(insurance_data)


## Hypotheses for Chi-squared Test (Provinces)

- **Null Hypothesis (\( H_0 \))**: There are no risk differences across provinces; the proportion of claims is the same for all provinces.
- **Alternative Hypothesis (\( H_a \))**: There are risk differences across provinces; the proportion of claims is not the same for all provinces.

## Hypotheses for T-test (Gender)

    - **Null Hypothesis (\( H_0 \))**: There are no risk differences between genders; the mean claim rate is the same for males and females.
- **Alternative Hypothesis (\( H_a \))**: There are risk differences between genders; the mean claim rate is not the same for males and females.

## Setting the P-value Threshold

Typically, a significance level (\( \alpha \)) of 0.05 is used:

- If the p-value is less than \( \alpha \) (0.05), we **reject the null hypothesis**.
- If the p-value is greater than or equal to \( \alpha \) (0.05), we **fail to reject the null hypothesis**.


In [6]:
chi2_stat, p_value, contingency_table = ab_test(insurance_data)
print("Chi-squared Statistic:", chi2_stat)
print("P-value (Provinces):", p_value)



Chi-squared Statistic: 0.43236333074228434
P-value (Provinces): 0.8055889409655793


In [7]:
contingency_table
print("Contingency Table (Provinces):\n", contingency_table)

Contingency Table (Provinces):
 Claimed       0    1
Province            
Province_A  211  144
Province_B  201  125
Province_C  196  123


## A/B Test Results for Provinces

### Chi-squared Statistic
- **Value**: 0.43

### P-value
- **Value**: 0.81

### Interpretation
The Chi-squared statistic of 0.43 indicates a very low level of discrepancy between the observed and expected frequencies of claims across the provinces. The p-value of 0.81 suggests that this difference is not statistically significant, as it is well above the conventional alpha level of 0.05.

This result implies that there is no strong evidence to suggest that the claim rates differ significantly among the provinces. 
Therefore, we fail to reject the null hypothesis, which states that there is no difference in claim rates across the provinces in this dataset.


In [8]:
t_stat, gender_p_value = gender_analysis(insurance_data)

In [9]:
print("T-statistic (Gender):", t_stat)
print("P-value (Gender):", gender_p_value)

T-statistic (Gender): 12.92963215401046
P-value (Gender): 1.936383802773897e-35


## Gender Analysis Results

### T-statistic
- **Value**: 12.93

### P-value
- **Value**: \(1.94 \times 10^{-35}\)

### Interpretation
The T-statistic of 12.93 indicates a significant difference in the claim rates between genders. The extremely low p-value (approximately \(1.94 \times 10^{-35}\)) suggests that this difference is statistically significant, far below the conventional alpha level of 0.05. 

This result implies that gender has a strong effect on the likelihood of making a claim, with the data providing overwhelming evidence against the null hypothesis, which states that there is no difference in claim rates between genders. Thus, we can conclude that gender is a significant factor in insurance claims within this dataset.

we reject the null hypothesis.