# PROJECT SCOPE

The scope of this project in this code base is to build a life time value of a customer to enable businesses know and understand their customers and how much they bring throughout their life time (with the business); here, we will be using the bank churn data, and for this model, we will be focusing on companies in the financial services sector. for this and the remaining model, we will be using the BankChurners.csv file saved in the Datasets folder. However after the model is runned it saved the processed data into its own folder saved in the Dataset folder where it has the data with the LTV prediction generated based off this model

Lifetime Value or LTV is an estimate of the average revenue that a customer will generate throughout their lifespan as a customer

# IMPORTING NECCESSARY PACKAGES 

In [1]:
import os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_absolute_error, r2_score
import joblib

# LOADING THE DATA

In [2]:
# Data Ingestion
file_path = r'/Users/abduljalaalabubakar/Desktop/Projects/Symply Finance/Customer Insight Model/Fintech Customer Insight Model/Datasets/Bank Churn Dataset/BankChurners.csv'
data = pd.read_csv(file_path)

# DATA INVESTIGATION

In [3]:
# Data Investigation
print("Missing Values:\n", data.isnull().sum())
print("\nClass Distribution (%):\n", data['Attrition_Flag'].value_counts(normalize=True) * 100)

# Dropping unnecessary columns and handle missing values
data.dropna(inplace=True)
data = data.drop(['CLIENTNUM'], axis=1)

Missing Values:
 CLIENTNUM                                                                                                                             0
Attrition_Flag                                                                                                                        0
Customer_Age                                                                                                                          0
Gender                                                                                                                                0
Dependent_count                                                                                                                       0
Education_Level                                                                                                                       0
Marital_Status                                                                                                                        0
Income_Category                

# DATA PREPARATION

In [4]:
# Encoding the categorical variables
label_encoders = {}
for column in data.select_dtypes(include=['object']).columns:
    le = LabelEncoder()
    data[column] = le.fit_transform(data[column])
    label_encoders[column] = le

#  Defining the LTV Target Variable
# Using Total_Trans_Amt, Avg_Utilization_Ratio, and retention time (Months_on_book)
data['LTV'] = data['Total_Trans_Amt'] * data['Avg_Utilization_Ratio'] * data['Months_on_book']

# Data Preprocessing
X = data.drop(['LTV'], axis=1)  # Features
y = data['LTV']  # Target

# Standardizing the numerical features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

# Model Selection
models = {
    "Linear Regression": LinearRegression(),
    "Random Forest Regressor": RandomForestRegressor(),
    "Gradient Boosting Regressor": GradientBoostingRegressor(),
    "Neural Network Regressor": MLPRegressor(max_iter=300)
}

# DATA TRAINING AND SAVING THE BEST MODEL

In [None]:
# Defining the folders to store results
model_folder_path = r'/Users/abduljalaalabubakar/Desktop/Projects/Symply Finance/Customer Insight Model/Fintech Customer Insight Model/Customer_LTV_Best_Models'
results_folder_path = r'/Users/abduljalaalabubakar/Desktop/Projects/Symply Finance/Customer Insight Model/Fintech Customer Insight Model/Datasets/Customer_LTV_Results'
os.makedirs(model_folder_path, exist_ok=True)
os.makedirs(results_folder_path, exist_ok=True)

# Training and Evaluating Models
results = {}
best_model = None
best_score = float('-inf')

for name, model in models.items():
    print(f"Training {name}...")
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    # Saving the results
    results[name] = {
        "Mean Absolute Error": mae,
        "R-squared": r2
    }
    
    # Saving the best model
    if r2 > best_score:
        best_score = r2
        best_model = model

# Saving the best model
best_model_path = os.path.join(model_folder_path, "best_ltv_model.pkl")
joblib.dump(best_model, best_model_path)
print(f"\nBest model saved to: {best_model_path}")

Training Linear Regression...
Training Random Forest Regressor...
Training Gradient Boosting Regressor...
Training Neural Network Regressor...


# DISPLAYING THE RESULT

In [None]:
# Generating the predictions using the best model on the entire dataset
ltv_predictions = best_model.predict(X_scaled)

# Saving the predictions to a new column in the original data
data['Predicted_LTV'] = ltv_predictions

# Saving the dataset with LTV predictions to a new file
output_file_path = os.path.join(results_folder_path, "BankChurners_with_LTV.csv")
data.to_csv(output_file_path, index=False)
print(f"\nLTV predictions saved to: {output_file_path}")

# Displaying the results
print("\n### Model Evaluation Results ###")
for name, metrics in results.items():
    print(f"\nModel: {name}")
    print(f"Mean Absolute Error: {metrics['Mean Absolute Error']}")
    print(f"R-squared (R²): {metrics['R-squared']}")

# Next Steps for Customer LTV Model Implementation
1. Model Deployment
Package the Random Forest Regressor:
Export the model using joblib.
Prepare an API using frameworks like Flask or FastAPI to allow real-time or batch predictions.
Set up a Scalable Pipeline:
Build a data pipeline to handle preprocessing (e.g., encoding, scaling) for incoming customer data.
Include functionality to handle missing or erroneous data.
2. Integrate with Business Systems
CRM Integration:
Connect the LTV prediction model to CRM systems to provide actionable insights for marketing and customer retention teams.
Dashboard Development:
Create dashboards for business users to view LTV predictions and associated feature insights.
3. Model Monitoring and Maintenance
Performance Monitoring:
Track metrics like MAE, R², and drift detection in real-time.
Retraining Strategy:
Set up automated or periodic retraining pipelines to update the model with the latest data.
Error Logging:
Implement logging to capture and address prediction errors.
4. Data Enrichment and Feature Engineering
Incorporate External Data:
Explore adding external variables (e.g., macroeconomic factors, market trends).
Analyze Feature Contributions:
Use tools like permutation importance to validate the impact of features on LTV predictions.
Dynamic Features:
Create time-based features to capture trends (e.g., recent transaction frequency).
5. Customer Segmentation and Targeting
Segment Customers:
Use LTV predictions to group customers into tiers (e.g., high, medium, low).
Retention Campaigns:
Develop targeted retention strategies for customers with declining LTV.
Upselling Opportunities:
Focus premium products or services on high-LTV customers.
6. Scale for Production
Distributed Processing:
Implement distributed frameworks (e.g., Apache Spark) to handle larger datasets.
Cloud Integration:
Deploy the pipeline and model on cloud platforms (e.g., AWS, Azure) for scalability and reliability.
7. Validate Model Performance in a Live Environment
Pilot Testing:
Roll out the LTV model to a small segment of customers.
Measure impact on business KPIs like retention rate, upsell success, and campaign ROI.
A/B Testing:
Compare model-driven strategies against traditional approaches to validate effectiveness.
8. Reporting and Documentation
Create Reports:
Summarize model predictions, accuracy metrics, and customer insights for stakeholders.
Maintain Documentation:
Document preprocessing steps, model assumptions, and API endpoints for developers and business users.
9. Iterate and Improve
Feedback Loops:
Gather feedback from users to refine the model and dashboard.
Hyperparameter Tuning:
Explore optimization of model parameters for better performance.
Incorporate Additional Use Cases:
Extend the pipeline to include related use cases like churn prediction or cross-sell propensity.