# Customer Churn Prediction using Artificial Neural Networks

This notebook demonstrates how to predict customer churn using Artificial Neural Networks (ANN). We'll use a dataset containing various customer attributes to build a predictive model that can identify customers who are likely to leave a service.

In [None]:
# Import necessary libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O
import matplotlib.pyplot as plt # for visualization
import seaborn as sns # for statistical data visualization

# For model building and evaluation
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report, roc_curve, auc

# For deep learning
import tensorflow as tf
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense, Dropout
from keras.callbacks import EarlyStopping

# For handling imbalanced data
from imblearn.over_sampling import SMOTE

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

## Data Loading and Exploration

For this project, we'll use a synthetic customer churn dataset that mimics real-world banking customer data.

In [None]:
# Create a synthetic customer churn dataset
# In a real scenario, you would load your data from a file
# For example: df = pd.read_csv('/path/to/churn_data.csv')

# Generate synthetic data
n_samples = 10000

# Customer ID
customer_id = np.arange(1, n_samples + 1)

# Demographics
age = np.random.normal(40, 10, n_samples).round().astype(int)
age = np.clip(age, 18, 95)  # Clip to reasonable age range

gender = np.random.choice(['Male', 'Female'], n_samples)

# Geographic information
country = np.random.choice(['France', 'Spain', 'Germany'], n_samples, p=[0.5, 0.3, 0.2])

# Account information
credit_score = np.random.normal(650, 100, n_samples).round().astype(int)
credit_score = np.clip(credit_score, 300, 850)  # Clip to reasonable credit score range

tenure = np.random.poisson(5, n_samples)  # Years with the bank
balance = np.random.exponential(50000, n_samples).round(2)  # Account balance
num_products = np.random.choice([1, 2, 3, 4], n_samples, p=[0.5, 0.3, 0.15, 0.05])  # Number of bank products
has_credit_card = np.random.choice([0, 1], n_samples, p=[0.3, 0.7])  # Has a credit card
is_active_member = np.random.choice([0, 1], n_samples, p=[0.2, 0.8])  # Active member
estimated_salary = np.random.normal(70000, 30000, n_samples).round(2)  # Estimated salary

# Create features that influence churn
churn_prob = 0.2 - 0.01 * tenure + 0.1 * (num_products > 2).astype(int) - 0.05 * is_active_member + 0.1 * (balance < 10000).astype(int)
churn_prob = np.clip(churn_prob, 0.05, 0.95)  # Ensure probabilities are between 0.05 and 0.95

# Generate churn based on calculated probabilities
churn = np.random.binomial(1, churn_prob)

# Create DataFrame
data = {
    'CustomerId': customer_id,
    'CreditScore': credit_score,
    'Gender': gender,
    'Age': age,
    'Tenure': tenure,
    'Balance': balance,
    'NumOfProducts': num_products,
    'HasCrCard': has_credit_card,
    'IsActiveMember': is_active_member,
    'EstimatedSalary': estimated_salary,
    'Geography': country,
    'Exited': churn
}

df = pd.DataFrame(data)

# Display the first few rows
df.head()

In [None]:
# Check the shape of the dataset
df.shape

In [None]:
# Get information about the dataset
df.info()

In [None]:
# Statistical summary of the dataset
df.describe()

In [None]:
# Check for missing values
df.isnull().sum()

In [None]:
# Check for duplicates
df.duplicated().sum()

In [None]:
# Check the distribution of the target variable (churn)
plt.figure(figsize=(8, 6))
sns.countplot(x='Exited', data=df)
plt.title('Distribution of Customer Churn')
plt.xlabel('Exited (0 = No, 1 = Yes)')
plt.ylabel('Count')
plt.show()

# Print the percentage of each class
churn_percentage = df['Exited'].value_counts(normalize=True) * 100
print(f"Percentage of customers who stayed: {churn_percentage[0]:.2f}%")
print(f"Percentage of customers who churned: {churn_percentage[1]:.2f}%")

## Exploratory Data Analysis

In [None]:
# Correlation heatmap for numerical features
numerical_features = df.select_dtypes(include=['int64', 'float64']).drop('CustomerId', axis=1)

plt.figure(figsize=(12, 10))
sns.heatmap(numerical_features.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Numerical Features')
plt.show()

In [None]:
# Age distribution by churn status
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='Age', hue='Exited', multiple='stack', bins=20)
plt.title('Age Distribution by Churn Status')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

In [None]:
# Balance distribution by churn status
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='Balance', hue='Exited', multiple='stack', bins=20)
plt.title('Balance Distribution by Churn Status')
plt.xlabel('Balance')
plt.ylabel('Count')
plt.show()

In [None]:
# Churn rate by geography
plt.figure(figsize=(10, 6))
sns.barplot(x='Geography', y='Exited', data=df, estimator=np.mean)
plt.title('Churn Rate by Geography')
plt.xlabel('Country')
plt.ylabel('Churn Rate')
plt.show()

In [None]:
# Churn rate by number of products
plt.figure(figsize=(10, 6))
sns.barplot(x='NumOfProducts', y='Exited', data=df, estimator=np.mean)
plt.title('Churn Rate by Number of Products')
plt.xlabel('Number of Products')
plt.ylabel('Churn Rate')
plt.show()

In [None]:
# Churn rate by active membership status
plt.figure(figsize=(10, 6))
sns.barplot(x='IsActiveMember', y='Exited', data=df, estimator=np.mean)
plt.title('Churn Rate by Active Membership Status')
plt.xlabel('Is Active Member (0 = No, 1 = Yes)')
plt.ylabel('Churn Rate')
plt.show()

In [None]:
# Churn rate by gender
plt.figure(figsize=(10, 6))
sns.barplot(x='Gender', y='Exited', data=df, estimator=np.mean)
plt.title('Churn Rate by Gender')
plt.xlabel('Gender')
plt.ylabel('Churn Rate')
plt.show()

In [None]:
# Tenure vs Churn
plt.figure(figsize=(10, 6))
sns.boxplot(x='Exited', y='Tenure', data=df)
plt.title('Tenure by Churn Status')
plt.xlabel('Exited (0 = No, 1 = Yes)')
plt.ylabel('Tenure (Years)')
plt.show()

## Data Preprocessing

In [None]:
# Drop customer ID as it's not relevant for prediction
df_model = df.drop('CustomerId', axis=1)

In [None]:
# Separate features and target variable
X = df_model.drop('Exited', axis=1)
y = df_model['Exited']

In [None]:
# Identify categorical and numerical columns
categorical_cols = X.select_dtypes(include=['object']).columns.tolist()
numerical_cols = X.select_dtypes(include=['int64', 'float64']).columns.tolist()

print(f"Categorical columns: {categorical_cols}")
print(f"Numerical columns: {numerical_cols}")

In [None]:
# Create preprocessing pipelines for both numerical and categorical data
numerical_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(drop='first')

# Combine preprocessing steps
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_cols),
        ('cat', categorical_transformer, categorical_cols)
    ])

# Apply preprocessing
X_processed = preprocessor.fit_transform(X)

# Get feature names after one-hot encoding
cat_feature_names = preprocessor.named_transformers_['cat'].get_feature_names_out(categorical_cols)
feature_names = numerical_cols + list(cat_feature_names)
print(f"Features after preprocessing: {feature_names}")

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42, stratify=y)

In [None]:
# Check for class imbalance
print(f"Training set class distribution:\n{pd.Series(y_train).value_counts(normalize=True)}")
print(f"\nTesting set class distribution:\n{pd.Series(y_test).value_counts(normalize=True)}")

In [None]:
# Apply SMOTE to handle class imbalance
smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

# Check the new class distribution
print(f"After SMOTE, training set class distribution:\n{pd.Series(y_train_resampled).value_counts(normalize=True)}")

## Building the Neural Network Model

In [None]:
# Define the model architecture
model = Sequential()

# Input layer and first hidden layer
model.add(Dense(16, activation='relu', input_dim=X_train_resampled.shape[1]))
model.add(Dropout(0.3))  # Add dropout for regularization

# Second hidden layer
model.add(Dense(8, activation='relu'))
model.add(Dropout(0.2))

# Output layer with sigmoid activation for binary classification
model.add(Dense(1, activation='sigmoid'))

In [None]:
# Model summary
model.summary()

In [None]:
# Compile the model
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

In [None]:
# Define early stopping callback
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True
)

In [None]:
# Train the model
history = model.fit(
    X_train_resampled, y_train_resampled,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stopping],
    verbose=1
)

## Model Evaluation

In [None]:
# Plot training history
plt.figure(figsize=(12, 5))

# Plot training & validation loss values
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper right')

# Plot training & validation accuracy values
plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='lower right')

plt.tight_layout()
plt.show()

In [None]:
# Make predictions on the test set
y_pred_proba = model.predict(X_test)
y_pred = (y_pred_proba > 0.5).astype(int)

In [None]:
# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f'Accuracy: {accuracy:.4f}')
print(f'Precision: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F1 Score: {f1:.4f}')

In [None]:
# Generate confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.title('Confusion Matrix')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

# Calculate and display metrics from confusion matrix
tn, fp, fn, tp = cm.ravel()
total = tn + fp + fn + tp

print(f"True Negatives: {tn} ({tn/total:.2%})")
print(f"False Positives: {fp} ({fp/total:.2%})")
print(f"False Negatives: {fn} ({fn/total:.2%})")
print(f"True Positives: {tp} ({tp/total:.2%})")

In [None]:
# Print classification report
print(classification_report(y_test, y_pred))

In [None]:
# Plot ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(10, 8))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

## Feature Importance Analysis

In [None]:
# Create a simple function to estimate feature importance using permutation importance
def get_feature_importance(model, X, y, feature_names):
    # Initialize importance array
    importances = []
    baseline_accuracy = accuracy_score(y, (model.predict(X) > 0.5).astype(int))
    
    # For each feature
    for i in range(X.shape[1]):
        # Create a copy of the data
        X_permuted = X.copy()
        
        # Shuffle the values of the current feature
        np.random.shuffle(X_permuted[:, i])
        
        # Predict with the permuted feature
        y_pred_permuted = (model.predict(X_permuted) > 0.5).astype(int)
        
        # Calculate the decrease in accuracy
        permuted_accuracy = accuracy_score(y, y_pred_permuted)
        importance = baseline_accuracy - permuted_accuracy
        importances.append(importance)
    
    # Create a DataFrame with feature names and importance scores
    feature_importance = pd.DataFrame({
        'Feature': feature_names,
        'Importance': importances
    })
    
    # Sort by importance
    feature_importance = feature_importance.sort_values('Importance', ascending=False)
    
    return feature_importance

# Get feature importance
feature_importance = get_feature_importance(model, X_test, y_test, feature_names)

In [None]:
# Plot feature importance
plt.figure(figsize=(12, 8))
sns.barplot(x='Importance', y='Feature', data=feature_importance.head(10))
plt.title('Top 10 Feature Importance')
plt.tight_layout()
plt.show()

## Customer Churn Prediction Example

In [None]:
# Create a function to predict churn for a new customer
def predict_churn(customer_data):
    # Preprocess the customer data
    customer_processed = preprocessor.transform(pd.DataFrame([customer_data]))
    
    # Make prediction
    churn_probability = model.predict(customer_processed)[0][0]
    churn_prediction = 1 if churn_probability > 0.5 else 0
    
    return churn_prediction, churn_probability

# Example customer
new_customer = {
    'CreditScore': 650,
    'Gender': 'Female',
    'Age': 35,
    'Tenure': 2,
    'Balance': 25000,
    'NumOfProducts': 3,
    'HasCrCard': 1,
    'IsActiveMember': 0,
    'EstimatedSalary': 65000,
    'Geography': 'France'
}

# Predict churn for the new customer
churn_prediction, churn_probability = predict_churn(new_customer)

print(f"Churn Prediction: {'Yes' if churn_prediction == 1 else 'No'}")
print(f"Churn Probability: {churn_probability:.2%}")

## Conclusion

In this notebook, we built an Artificial Neural Network model to predict customer churn based on various customer attributes. The model achieved an accuracy of [value] and an F1 score of [value], indicating its effectiveness in identifying customers who are likely to churn.

The most important factors affecting customer churn were found to be:
1. [Top factor based on feature importance]
2. [Second factor based on feature importance]
3. [Third factor based on feature importance]

This model could be used by businesses to identify customers at risk of churning and take proactive measures to retain them.

## Future Work

1. Try different model architectures and hyperparameters
2. Implement more sophisticated feature engineering
3. Explore other techniques for handling class imbalance
4. Develop a customer retention strategy based on the model's predictions
5. Deploy the model as a real-time prediction service