# Assignment 11: Predicting Customer Lifetime Value with Multiple Regression

This assignment focuses on applying Multiple Linear Regression (MLR) to a common business problem: predicting Customer Lifetime Value (CLV). I will use Python and relevant libraries to build, interpret, and evaluate an MLR model using synthetic customer data.

In [2]:
#importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Optional: Set a consistent plot style
sns.set_style("whitegrid")
print("Libraries imported successfully.")

Libraries imported successfully.


## Task 1: Executing starter code to generate data:

In [3]:
# Starter Code: Generate Synthetic CLV Data
np.random.seed(110)
n_customers = 500

# Generate predictor variables
avg_purchase_value = np.random.normal(75, 25, n_customers) # Avg $75, std $25
purchase_frequency = np.random.normal(5, 2, n_customers)   # Avg 5 purchases/yr, std 2
customer_tenure = np.random.uniform(0.5, 10, n_customers)  # Tenure between 0.5 and 10 years

# Ensure non-negative values where logical
avg_purchase_value = np.maximum(avg_purchase_value, 10)
purchase_frequency = np.maximum(purchase_frequency, 0.5)

# Generate CLV based on a linear relationship + noise
# CLV = Base + (Effect of AvgPurchase * Freq * Tenure) - somewhat multiplicative effect logic + noise
# Let's simplify to additive for clarity in MLR interpretation for this assignment:
# CLV = Base + Effect(AvgPurchase) + Effect(Freq) + Effect(Tenure) + Noise
clv = 150 + 10 * avg_purchase_value + 80 * purchase_frequency + 50 * customer_tenure + np.random.normal(0, 250, n_customers)
clv = np.maximum(clv, 50) # Ensure minimum CLV

# Create DataFrame
df_clv = pd.DataFrame({
    'AvgPurchaseValue': avg_purchase_value,
    'PurchaseFrequency': purchase_frequency,
    'CustomerTenure': customer_tenure,
    'CLV': clv
})

print("Synthetic Customer Data Head:")
print(df_clv.head())
print("\nData Description:")
print(df_clv.describe())


Synthetic Customer Data Head:
   AvgPurchaseValue  PurchaseFrequency  CustomerTenure          CLV
0         83.214928           2.649825        2.762106  1503.484718
1         55.095036           3.438132        7.472734  1492.435762
2        110.078096           3.054568        9.820457  1367.956073
3         36.305181           3.581917        6.321292  1010.988008
4        104.168257           3.359105        8.689112  1997.209326

Data Description:
       AvgPurchaseValue  PurchaseFrequency  CustomerTenure          CLV
count        500.000000         500.000000      500.000000   500.000000
mean          75.546159           5.005602        5.229268  1561.155591
std           24.889128           1.954198        2.792919   393.090770
min           10.000000           0.500000        0.510736   401.844341
25%           58.085492           3.722779        2.783082  1281.604601
50%           76.024690           5.031070        5.120830  1571.990603
75%           93.576606           6.313

## Task 2: Preparing Data for MLR

In [4]:
# Define features (X) and target (y)
x_clv = df_clv[['AvgPurchaseValue', 'PurchaseFrequency', 'CustomerTenure']]
y_clv = df_clv['CLV']

print("\nShape of X_conv:", x_clv.shape)
print("Shape of y_conv:", y_clv.shape)


Shape of X_conv: (500, 3)
Shape of y_conv: (500,)


## Task 3: Splitting the data

In [5]:
X_clv_train, X_clv_test, y_clv_train, y_clv_test = train_test_split(
    x_clv, y_clv, test_size=0.20, random_state=110 # Using 80/20 split here
)

print("\n--- MLR Train-Test Split ---")
print("Training set size:", X_clv_train.shape[0])
print("Test set size:", X_clv_test.shape[0])


--- MLR Train-Test Split ---
Training set size: 400
Test set size: 100


## Task 4: Create instance of Linear Regression model

In [6]:
# Create the Multiple Linear Regression model
mlr_model = LinearRegression()

print('MLR Model created.')

MLR Model created.


## Task 5: Train the MLR model using the training data

In [7]:
#Train the MLR model
mlr_model.fit(X_clv_train, y_clv_train)

print("\nMLR Model trained successfully.")



MLR Model trained successfully.


## Task 6: Print intercepts and coefficients

In [8]:
# Get intercept and coefficients
mlr_intercept = mlr_model.intercept_
mlr_coefficients = mlr_model.coef_

print(f"\n--- MLR Model Coefficients ---")
print(f"Intercept (β₀): {mlr_intercept:.3f}")

# Match coefficients to feature names
feature_names = X_clv_train.columns
coeffs_df = pd.DataFrame({'Feature': feature_names, 'Coefficient (β)': mlr_coefficients})
print(coeffs_df)

# Interpretation
print("\nInterpretation:")
print(f"- The model predicts a baseline customer lifetime value of ${mlr_intercept:.3f} when all predictors (purchase value, purchase frequency, customer tenure) are zero (use caution interpreting intercepts).")
print(f"- For each additional dollar of Average Purchase Value, customer lifetime value is predicted to increase by ${coeffs_df.loc[0, 'Coefficient (β)']:.3f}, holding Purchase Frequency and Customer Tenure constant.")
print(f"- For each additional purchase, customer lifetime value is predicted to increase by ${coeffs_df.loc[1, 'Coefficient (β)']:.3f}, holding Average Purchase Value and Customer Tenure constant.")
print(f"- For each one year increase in Customer Tenure, customer lifetime value is predicted to decrease by ${abs(coeffs_df.loc[2, 'Coefficient (β)']):.3f} (note the negative sign), holding Average Purchase Value and Purchase Frequency constant.")


--- MLR Model Coefficients ---
Intercept (β₀): 184.239
             Feature  Coefficient (β)
0   AvgPurchaseValue         9.720471
1  PurchaseFrequency        75.771990
2     CustomerTenure        49.723306

Interpretation:
- The model predicts a baseline customer lifetime value of $184.239 when all predictors (purchase value, purchase frequency, customer tenure) are zero (use caution interpreting intercepts).
- For each additional dollar of Average Purchase Value, customer lifetime value is predicted to increase by $9.720, holding Purchase Frequency and Customer Tenure constant.
- For each additional purchase, customer lifetime value is predicted to increase by $75.772, holding Average Purchase Value and Customer Tenure constant.
- For each one year increase in Customer Tenure, customer lifetime value is predicted to decrease by $49.723 (note the negative sign), holding Average Purchase Value and Purchase Frequency constant.


## Task 7: Evaluate the model’s performance on the test set

In [9]:
# Make predictions on the TEST set
y_clv_pred_test = mlr_model.predict(X_clv_test)

# Calculate evaluation metrics
r2_mlr_test = r2_score(y_clv_test, y_clv_pred_test)
mse_mlr_test = mean_squared_error(y_clv_test, y_clv_pred_test)
rmse_mlr_test = np.sqrt(mse_mlr_test)

print(f"\n--- MLR Model Evaluation (Test Set) ---")
print(f"R-squared (R²): {r2_mlr_test:.3f}")
print(f"Root Mean Squared Error (RMSE): {rmse_mlr_test:.3f}") # Units are percentage points
print(f"Mean Absolute Error (MAE): {mse_mlr_test:.3f}") 


--- MLR Model Evaluation (Test Set) ---
R-squared (R²): 0.564
Root Mean Squared Error (RMSE): 249.398
Mean Absolute Error (MAE): 62199.421


## Task 8: Make a prediction for the CLV 

In [12]:
# Predict conversion rate for a new customer with the profile
new_visitor_profile = pd.DataFrame({
    'AvgPurchaseValue': [85],  # $85
    'PurchaseFrequency': [6],         
    'CustomerTenure': [4]            # 4 years
})

predicted_conversion = mlr_model.predict(new_visitor_profile)

print(f"\n--- MLR Prediction ---")
print(f"Predicted Customer Lifetime Value for profile: ${predicted_conversion[0]:.2f}") 


--- MLR Prediction ---
Predicted Customer Lifetime Value for profile: $1664.00
