# **🧠 Scenario:**

You're hired as a Data Analyst by a subscription-based ecommerce platform. Your task is to analyze customer behavior and predict which customers are likely to churn (i.e., not renew their subscription next month).

# **📊 PART 1 – Data Understanding & Exploration**

In [23]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [24]:
df=pd.read_csv('/kaggle/input/customer-churn-dataset/customer_churn_dataset-training-master.csv')

In [25]:
df.head()

Unnamed: 0,CustomerID,Age,Gender,Tenure,Usage Frequency,Support Calls,Payment Delay,Subscription Type,Contract Length,Total Spend,Last Interaction,Churn
0,2.0,30.0,Female,39.0,14.0,5.0,18.0,Standard,Annual,932.0,17.0,1.0
1,3.0,65.0,Female,49.0,1.0,10.0,8.0,Basic,Monthly,557.0,6.0,1.0
2,4.0,55.0,Female,14.0,4.0,6.0,18.0,Basic,Quarterly,185.0,3.0,1.0
3,5.0,58.0,Male,38.0,21.0,7.0,7.0,Standard,Monthly,396.0,29.0,1.0
4,6.0,23.0,Male,32.0,20.0,5.0,8.0,Basic,Monthly,617.0,20.0,1.0


List all columns with their data types and count missing values.

In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 440833 entries, 0 to 440832
Data columns (total 12 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   CustomerID         440832 non-null  float64
 1   Age                440832 non-null  float64
 2   Gender             440832 non-null  object 
 3   Tenure             440832 non-null  float64
 4   Usage Frequency    440832 non-null  float64
 5   Support Calls      440832 non-null  float64
 6   Payment Delay      440832 non-null  float64
 7   Subscription Type  440832 non-null  object 
 8   Contract Length    440832 non-null  object 
 9   Total Spend        440832 non-null  float64
 10  Last Interaction   440832 non-null  float64
 11  Churn              440832 non-null  float64
dtypes: float64(9), object(3)
memory usage: 40.4+ MB


Check null value

In [27]:
df.isnull().sum()

CustomerID           1
Age                  1
Gender               1
Tenure               1
Usage Frequency      1
Support Calls        1
Payment Delay        1
Subscription Type    1
Contract Length      1
Total Spend          1
Last Interaction     1
Churn                1
dtype: int64

Drop null value 

In [28]:
df=df.dropna()

Check duplicate value

In [29]:
df.duplicated().sum()

0

How many customers are in the dataset?

In [30]:
df['CustomerID'].nunique()

440832

In [31]:
df['CustomerID'].count()

440832

Find the average age, tenure, and total spend of customers.

In [32]:
a=df['Age'].mean()
b=df['Tenure'].mean()
c=df['Total Spend'].mean()
print('Age Average',a)
print('Tenure Average',b)
print('Total Spend',c)

Age Average 39.37315349157956
Tenure Average 31.25633574695122
Total Spend 631.6162227787457



What’s the distribution of Gender and Subscription Type?

In [33]:
df['Gender'].value_counts()

Gender
Male      250252
Female    190580
Name: count, dtype: int64

In [34]:
df['Gender'].value_counts(normalize=True)*100

Gender
Male      56.768111
Female    43.231889
Name: proportion, dtype: float64

In [35]:
df['Subscription Type'].value_counts()

Subscription Type
Standard    149128
Premium     148678
Basic       143026
Name: count, dtype: int64

In [36]:
df['Subscription Type'].value_counts(normalize=True)*100

Subscription Type
Standard    33.828760
Premium     33.726680
Basic       32.444559
Name: proportion, dtype: float64

Count how many customers have churned vs not churned.

In [37]:
df['Churn'].value_counts(normalize=True)*100

Churn
1.0    56.71072
0.0    43.28928
Name: proportion, dtype: float64

Show top 5 customers with the highest Total Spend.

In [38]:
df.sort_values(by='Total Spend', ascending=False).head(5)

Unnamed: 0,CustomerID,Age,Gender,Tenure,Usage Frequency,Support Calls,Payment Delay,Subscription Type,Contract Length,Total Spend,Last Interaction,Churn
104922,108103.0,56.0,Female,23.0,10.0,6.0,10.0,Basic,Monthly,1000.0,10.0,1.0
40795,41188.0,48.0,Female,49.0,3.0,5.0,4.0,Basic,Monthly,1000.0,21.0,1.0
115293,118933.0,49.0,Female,56.0,3.0,6.0,15.0,Standard,Monthly,1000.0,3.0,1.0
39805,40160.0,59.0,Male,35.0,18.0,5.0,19.0,Standard,Annual,1000.0,22.0,1.0
134907,139388.0,18.0,Male,7.0,16.0,8.0,12.0,Basic,Annual,1000.0,1.0,1.0


# **🟡 Level 2: Exploratory Data Analysis (EDA)**

What is the correlation between Tenure, Usage Frequency, and Total Spend?

In [39]:
df[['Tenure','Usage Frequency','Total Spend']].corr()

Unnamed: 0,Tenure,Usage Frequency,Total Spend
Tenure,1.0,-0.0268,0.019006
Usage Frequency,-0.0268,1.0,0.018631
Total Spend,0.019006,0.018631,1.0


Compare average Payment Delay for churned vs non-churned customers.

In [40]:
df.groupby('Churn')['Payment Delay'].mean()

Churn
0.0    10.015500
1.0    15.217729
Name: Payment Delay, dtype: float64

Which Subscription Type has the highest churn rate?

In [41]:
df.groupby('Subscription Type')['Churn'].mean().sort_values(ascending=False)

Subscription Type
Basic       0.581782
Standard    0.560700
Premium     0.559417
Name: Churn, dtype: float64

Do customers with more Support Calls tend to churn more?

In [42]:
df.groupby('Churn')['Support Calls'].mean()

Churn
0.0    1.586418
1.0    5.144861
Name: Support Calls, dtype: float64

Find average Contract Length by Subscription Type.

In [None]:
df.groupby('Subscription Type')['Contract Length'].mean()

What are the top 3 age groups most likely to churn?

# **🔧 PART 3 – Feature Engineering**

In [None]:
# **📈 PART 3 – Modeling: Predicting Churn**