## Hypothesis Statement
* Hypothesis Test: Two-Sample t-test

* Feature: MonthlyCharges
* Target: Churn

### Null Hypothesis (H₀)

There is no significant difference in the average Monthly Charges between customers who churned and customers who did not churn.

### Alternative Hypothesis (H₁)

There is a significant difference in the average Monthly Charges between customers who churned and customers who did not churn.

## Test Description and Assumptions
* Test Used:
We use an Independent Two-Sample t-test because we are comparing the means of a numeric variable (MonthlyCharges) between two independent groups (Churn = Yes and Churn = No).

### Assumptions:

* Type of variable:
The variable we are comparing (like MonthlyCharges or tenure) should be numeric and continuous, because we are comparing averages.

* Independence:
The two groups (churned and non-churned customers) should be independent, meaning one customer’s value does not affect another’s.

* Distribution / Sample Size:
The data in each group should be approximately normally distributed, or the sample size should be large enough so that averages behave normally.

* Variance condition:
The spread (variance) of the two groups should be roughly similar, or we should use a version of the test that handles unequal variance.

In [1]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv("data/clean_data.csv")
df.head(5)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [4]:
required_col = ['MonthlyCharges','Churn']

df2 = df[required_col].copy()
df2.head(5)

Unnamed: 0,MonthlyCharges,Churn
0,29.85,No
1,56.95,No
2,53.85,Yes
3,42.3,No
4,70.7,Yes


In [7]:
df_churn = df2.loc[df2['Churn'] == 'Yes', 'MonthlyCharges']

df_no_churn = df2.loc[df2['Churn'] == 'No', 'MonthlyCharges']

print("Seperation of monthlycharges into two groups is sucessful")


Seperation of monthlycharges into two groups is sucessful


In [9]:
df_churn.head(5)

2      53.85
4      70.70
5      99.65
8     104.80
13    103.70
Name: MonthlyCharges, dtype: float64

In [10]:
df_no_churn.head(5)

0    29.85
1    56.95
3    42.30
6    89.10
7    29.75
Name: MonthlyCharges, dtype: float64

In [None]:
mean = df_churn.mean()
print("Mean of monthly charges for customers who churned:", mean)
