## Load the merged dataset

In [1]:
from pathlib import Path
import pandas as pd

# Where this notebook lives
NOTEBOOK_DIR = Path.cwd()

# Path to processed data
DATA_PROCESSED = (NOTEBOOK_DIR / "../data/processed").resolve()

# Load the cleaned/merged dataset
df = pd.read_csv(DATA_PROCESSED / "telco_churn_master.csv", keep_default_na=False)

print("Dataset loaded. Shape:", df.shape)
df.head()

Dataset loaded. Shape: (7043, 48)


Unnamed: 0,Customer ID,Gender,Age,Under 30,Senior Citizen,Married,Dependents,Number of Dependents,City,Zip Code,...,Total Revenue,Satisfaction Score,Customer Status,Churn Label,Churn Value,Churn Score,CLTV,Churn Category,Churn Reason,Population
0,8779-QRDMV,Male,78,No,Yes,No,No,0,Los Angeles,90022,...,59.65,3,Churned,Yes,1,91,5433,Competitor,Competitor offered more data,68701
1,7495-OOKFY,Female,74,No,Yes,Yes,Yes,1,Los Angeles,90063,...,1024.1,3,Churned,Yes,1,69,5302,Competitor,Competitor made better offer,55668
2,1658-BYGOY,Male,71,No,Yes,No,Yes,3,Los Angeles,90065,...,1910.88,2,Churned,Yes,1,81,3179,Competitor,Competitor made better offer,47534
3,4598-XLKNJ,Female,78,No,Yes,Yes,Yes,1,Inglewood,90303,...,2995.07,2,Churned,Yes,1,88,5337,Dissatisfaction,Limited range of services,27778
4,4846-WHAFZ,Female,80,No,Yes,Yes,Yes,1,Whittier,90602,...,3102.36,2,Churned,Yes,1,67,2793,Price,Extra data charges,26265


## Now we take a look at basic info about the dataset

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 48 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Customer ID                        7043 non-null   object 
 1   Gender                             7043 non-null   object 
 2   Age                                7043 non-null   int64  
 3   Under 30                           7043 non-null   object 
 4   Senior Citizen                     7043 non-null   object 
 5   Married                            7043 non-null   object 
 6   Dependents                         7043 non-null   object 
 7   Number of Dependents               7043 non-null   int64  
 8   City                               7043 non-null   object 
 9   Zip Code                           7043 non-null   int64  
 10  Latitude                           7043 non-null   float64
 11  Longitude                          7043 non-null   float

We can see the loaded data looks good, and there is no missing values.

### Now we compute the overall churn rate to understand how many people abandon the services.

In [3]:
df["Churn Value"].value_counts(normalize=True)

Churn Value
0    0.73463
1    0.26537
Name: proportion, dtype: float64

Around 27% of customers in the dataset have churned.

### Now we create a helper funcation for churn rate by any feature

To understand which customer groups are most likely to churn, we calculate the churn rate for each category of a selected feature. By grouping customers and taking the average churn value (0 = stayed, 1 = left), we can quickly see which segments have the highest churn risk. This helps highlight key drivers of churn and guides where to focus further analysis.

In [4]:
def churn_rate_by(column):
    """
    Compute and display churn rate for each category in a given column.
    Example: churn_rate_by("Contract")
    """
    return (
        df.groupby(column)["Churn Value"]
        .mean()
        .sort_values(ascending=False)
        .to_frame("Churn Rate")
    )

### 1. Gender

In [9]:
churn_rate_by("Gender")

Unnamed: 0_level_0,Churn Rate
Gender,Unnamed: 1_level_1
Female,0.269209
Male,0.261603


- Female: 26.9% churn
- Male: 26.2% churn

-> Gender has almost no impact on churn.

Any difference is too small to be meaningful.

**Conclusion**: Gender is not a useful predictor and should not be targeted by retention strategy.

In [10]:
churn_rate_by("Contract")

Unnamed: 0_level_0,Churn Rate
Contract,Unnamed: 1_level_1
Month-to-Month,0.458449
One Year,0.107097
Two Year,0.025491


In [11]:
churn_rate_by("Internet Service")

Unnamed: 0_level_0,Churn Rate
Internet Service,Unnamed: 1_level_1
Yes,0.318289
No,0.07405


In [12]:
churn_rate_by("Online Security")

Unnamed: 0_level_0,Churn Rate
Online Security,Unnamed: 1_level_1
No,0.313296
Yes,0.146112


In [13]:
churn_rate_by("Payment Method")

Unnamed: 0_level_0,Churn Rate
Payment Method,Unnamed: 1_level_1
Mailed Check,0.368831
Bank Withdrawal,0.339985
Credit Card,0.14478


In [14]:
churn_rate_by("Offer")

Unnamed: 0_level_0,Churn Rate
Offer,Unnamed: 1_level_1
Offer E,0.529193
No Offer,0.271086
Offer D,0.267442
Offer C,0.228916
Offer B,0.122573
Offer A,0.067308


### 1. Gender

- Female: 26.9% churn
- Male: 26.2% churn

-> Gender has almost no impact on churn.
Any difference is too small to be meaningful.
Conclusion: Gender is not a useful predictor and should not be targeted by retention strategy.



### 2. Contract Type

- Month-to-Month: 45.8% churn
- One Year: 10.7% churn
- Two Year: 2.5% churn

-> This is one of the strongest churn drivers in the entire dataset.
Customers on month-to-month contracts churn ~18× more than those on a 2-year contract.
Why? Short-term contracts make it easy for customers to cancel at any moment.

Business implication:
Retention strategy should strongly encourage customers to move from month-to-month to longer contracts (discounts, loyalty programs).



### 3. Internet Service

- Has Internet (Yes): 31.8% churn
- No Internet: 7.4% churn

-> Customers who use internet services churn far more than those who only use phone services.

Explanation:
Internet services bring more potential issues:

- higher monthly charges
- speed/quality dissatisfaction
- competition from other ISPs

Implication:
Internet users are the primary population for churn mitigation efforts.



### 4. Online Security

- No Online Security: 31.3% churn
- With Online Security: 14.6% churn

-> Having online security nearly cuts churn in half.

Interpretation:
Online security behaves like a “sticky” add-on.
Customers with more bundled services are less likely to leave.

Business implication:
Bundling more services (security, backup, tech support) can significantly reduce churn.



### 5. Payment Method

- Mailed Check: 36.9% churn
- Bank Withdrawal: 34.0% churn
- Credit Card: 14.5% churn

-> Customers who pay automatically (credit card) churn far less.

Why?
Automatic recurring billing reduces cancellation friction.
Customers paying manually (check or bank withdrawal) are more likely to cancel or switch.

Business implication:
Promote automatic payments (discounts, incentives).
These customers are more stable.



### 6. Offer

- Offer E: 52.9% churn
- Offer D: 26.7% churn
- Offer C/B/A declining churn
- No Offer: 27.1% churn

-> Offer E customers have extremely high churn.
This is a major red flag.

Possible reasons:

- Offer E may attract "deal-seekers" who churn as soon as promotion ends
- Offer E might be targeted at customers with existing issues
- Offer E might expire earlier or be less valuable

Business implication:
Investigate Offer E in detail.
It may be poorly designed or incentivizing the wrong behavior.