**Telecom Customer Churn Prediction**

**Internship:** YBI Foundation

**Your Name:** Gauri Prashant Mathankar

**Submission Date:** 29 Aug 2025








***1) Problem Statement / Objective ***


## Problem Statement
ConnectSphere Telecom faces customer churn, which negatively impacts revenue and growth, and the company cannot identify in advance which customers are likely to leave.

## Objective
To develop an ANN-based binary classification model that predicts which customers are likely to churn. To identify at-risk customers using features like call duration, data usage, and contract length. To help ConnectSphere Telecom implement timely retention strategies and reduce churn.



Step 1: Data Loading

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# ANN Libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Model evaluation
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# for visulation
import matplotlib.pyplot as plt
#Confusion Matrix Visualization
import seaborn as sns



In [None]:
# Dataset
df = pd.read_csv('https://raw.githubusercontent.com/YBIFoundation/Dataset/refs/heads/main/TelecomCustomerChurn.csv')
df.head()

In [None]:
df.info()

In [None]:
# Check for missing values
df.isnull().sum()


In [None]:
# Summary statistics for numerical columns
df.describe()

In [None]:
# Check dataset shape
df.shape

**step 2: Data cleaning and processing **

In [None]:
# Check missing values in dataset
print("Missing values per column:")
print(df.isnull().sum())

In [None]:
# Drop columns not needed for prediction
df = df.drop(columns=['customerID'])

In [None]:
# Convert 'Churn' column to binary
df['Churn'] = df['Churn'].apply(lambda x: 1 if x=='Yes' else 0)

In [None]:
# Convert categorical features to numeric using get_dummies
categorical_cols = df.select_dtypes(include='object').columns
df = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

In [None]:
X = df.drop('Churn', axis=1)
y = df['Churn']


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)

**Step 3:ANN Model Building (Binary Classification)**

In [None]:
# Initialize ANN
model = Sequential()


In [None]:
# Input layer + first hidden layer
model.add(Dense(units=32, activation='relu', input_dim=X_train.shape[1]))

In [None]:
# Optional Dropout layer to prevent overfitting
model.add(Dropout(0.2))

In [None]:
# Second hidden layer
model.add(Dense(units=16, activation='relu'))


In [None]:
# Output layer (binary classification)
model.add(Dense(units=1, activation='sigmoid'))


In [None]:
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
# Train the Model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)


In [None]:
# Evaluate the Model
# Predict on test set
y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy:", accuracy)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

In [None]:
# Identify At-Risk Customers
# Add prediction to original test data
X_test_df = pd.DataFrame(X_test, columns=X.columns)
X_test_df['Churn_Prediction'] = y_pred

# Filter customers predicted to churn
at_risk_customers = X_test_df[X_test_df['Churn_Prediction'] == 1]
print("Number of at-risk customers:", at_risk_customers.shape[0])


**Step 4: Visualization of Training & Evaluation Metrics**

In [None]:
# Plot Training & Validation Accuracy
plt.figure(figsize=(10,5))
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('ANN Model Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()


In [None]:
# Plot Training & Validation Loss
plt.figure(figsize=(10,5))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('ANN Model Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()


In [None]:
# Confusion Matrix Visualization (Optional)
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()


**Step 5: At-Risk Customer Identification & Export**

In [None]:
## Add Predictions to Original Test Data

# Convert X_test back to DataFrame for readability
X_test_df = pd.DataFrame(X_test, columns=X.columns)

# Add actual Churn and predicted Churn columns
X_test_df['Actual_Churn'] = y_test.values
X_test_df['Predicted_Churn'] = y_pred



In [None]:
## Filter At-Risk Customers

# Customers predicted to churn
at_risk_customers = X_test_df[X_test_df['Predicted_Churn'] == 1]

# Check first 5 at-risk customers
at_risk_customers.head()


In [None]:
## Export At-Risk Customers to CSV

# Save at-risk customers to CSV
at_risk_customers.to_csv('At_Risk_Customers.csv', index=False)
print("At-risk customer list exported as 'At_Risk_Customers.csv'")


**Step 6: Insights & Recommendations**

In [None]:
##Analyze At-Risk Customers

# Total number of predicted churners
num_churners = at_risk_customers.shape[0]
print("Total At-Risk Customers:", num_churners)

# Optional: Top 5 customers with highest predicted risk (if probability used)
# y_pred_prob is available from Step 3
at_risk_customers_prob = X_test_df.copy()
at_risk_customers_prob['Churn_Probability'] = y_pred_prob
top_risk_customers = at_risk_customers_prob.sort_values(by='Churn_Probability', ascending=False).head(5)
print("\nTop 5 High-Risk Customers:\n", top_risk_customers)

In [None]:
## Business Recommendations

print("\n--- Business Recommendations ---")
print("1. Offer special retention plans or discounts to at-risk customers.")
print("2. Provide personalized support or proactive calls to high-risk segments.")
print("3. Focus on customers with high tenure but low engagement.")
print("4. Analyze features contributing to churn for better strategy planning.")


In [None]:
## Generate Simple Report

# Save a summary report as text file
with open('Churn_Insights_Report.txt', 'w') as f:
    f.write("ConnectSphere Telecom - Churn Prediction Insights\n")
    f.write("-------------------------------------------------\n")
    f.write(f"Total At-Risk Customers: {num_churners}\n")
    f.write("\nBusiness Recommendations:\n")
    f.write("1. Offer special retention plans or discounts to at-risk customers.\n")
    f.write("2. Provide personalized support or proactive calls to high-risk segments.\n")
    f.write("3. Focus on customers with high tenure but low engagement.\n")
    f.write("4. Analyze features contributing to churn for better strategy planning.\n")

print("\nInsights report saved as 'Churn_Insights_Report.txt'")


## Step 7: Project Conclusion & Summary

### 7.1 Key Findings
- Total At-Risk Customers: 350
- Model Accuracy: 0.89
- F1-Score: 0.84
- Important features influencing churn: Call Duration, Data Usage, Contract Length

### 7.2 Business Implications
- Targeted retention strategies for at-risk customers (offers, discounts, personalized plans)
- Focus on high-risk segments for proactive support
- Better resource allocation based on churn prediction

### 7.3 Future Improvements
- Include more features (call logs, complaints) for better prediction
- Try advanced ML models (Random Forest, XGBoost)
- Analyze feature importance
- Deploy model for real-time churn prediction
