# Bank Marketing Classification Analysis

This notebook implements a complete machine learning pipeline for predicting term deposit subscriptions based on bank marketing campaign data.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from time import time

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.svm import SVC
from sklearn.metrics import (
    accuracy_score, classification_report, confusion_matrix,
    precision_score, recall_score, f1_score, roc_auc_score, roc_curve
)

warnings.filterwarnings('ignore')
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

## Problem 1: Understanding the Data

The dataset represents **17 marketing campaigns** conducted between **May 2008 and November 2010** by a Portuguese banking institution. The campaigns involved phone calls to clients to promote term deposit subscriptions.

This analysis follows the **CRISP-DM (Cross-Industry Standard Process for Data Mining)** methodology, which provides a structured approach to planning and executing data mining projects. The CRISP-DM framework includes six phases:
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment

**Reference:** Shearer, C. (2000). The CRISP-DM model: the new blueprint for data mining. *Journal of Data Warehousing*, 5(4), 13-22.

## Problem 2: Read in the Data

In [None]:
# Read the dataset
df = pd.read_csv('data/bank-additional/bank-additional-full.csv', sep=';')

# Display the shape of the dataset
print(f"Dataset shape: {df.shape}")
print(f"Number of samples: {df.shape[0]}")
print(f"Number of features: {df.shape[1]}")
print()

# Display the first few rows
df.head()

## Problem 3: Understanding the Features

In [None]:
# Check for missing values
print("Missing values per column:")
print(df.isnull().sum())
print()

In [None]:
# Check data types
print("Data types:")
print(df.dtypes)
print()

In [None]:
# Check for 'unknown' values in categorical columns
print("Count of 'unknown' values in categorical columns:")
categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
    unknown_count = (df[col] == 'unknown').sum()
    if unknown_count > 0:
        print(f"{col}: {unknown_count} ({unknown_count/len(df)*100:.2f}%)")

In [None]:
# Get basic statistics
print("Descriptive statistics for numerical features:")
df.describe()

In [None]:
# Display information about the dataset
df.info()

### Notes on Data Quality

**Handling 'unknown' values:** The dataset contains 'unknown' values in several categorical columns (job, marital, education, default, housing, loan). These represent missing or unavailable information. For this analysis, we will treat 'unknown' as a separate category rather than imputing or removing these records, as they may carry meaningful information about data collection patterns.

**Excluding 'duration' feature:** The 'duration' attribute (last contact duration in seconds) is highly correlated with the target variable and is only available after a call is performed. This creates **data leakage** because:
- Duration is not known before the call is made
- It cannot be used for prediction in a real-world scenario
- Including it would artificially inflate model performance

Therefore, the 'duration' feature will be excluded from our predictive models to ensure realistic and deployable results.

## Problem 4: Understanding the Task

### Business Objective

The primary business objective is to **predict whether a client will subscribe to a term deposit** (the target variable 'y') based on various demographic, social, economic, and campaign-related features.

### Why This Matters

By accurately predicting which clients are most likely to subscribe, the bank can:

1. **Optimize Marketing Campaigns:** Focus resources on high-probability prospects, reducing wasted effort and costs
2. **Improve Conversion Rates:** Increase the percentage of successful subscriptions per contact
3. **Enhance Customer Experience:** Reduce unnecessary calls to clients unlikely to subscribe
4. **Increase ROI:** Maximize return on investment for marketing campaigns
5. **Strategic Planning:** Better understand which factors drive term deposit subscriptions

### Success Criteria

Given that this is an imbalanced classification problem (fewer 'yes' than 'no' responses), we will evaluate models not just on accuracy but also on:
- **Precision:** Of those predicted to subscribe, how many actually do?
- **Recall:** Of those who actually subscribe, how many do we identify?
- **F1-Score:** Harmonic mean of precision and recall
- **ROC-AUC:** Overall discriminative ability of the model

## Problem 5: Engineering Features

In [None]:
# Create a copy of the dataframe
df_model = df.copy()

# Exclude the 'duration' feature to avoid data leakage
print("Excluding 'duration' feature to prevent data leakage")
df_model = df_model.drop('duration', axis=1)
print(f"Shape after removing duration: {df_model.shape}")
print()

In [None]:
# Separate features and target
X = df_model.drop('y', axis=1)
y = df_model['y']

# Encode target variable as binary (yes=1, no=0)
y = y.map({'yes': 1, 'no': 0})

print("Target variable distribution:")
print(y.value_counts())
print()
print(f"Class balance: {y.value_counts(normalize=True)}")
print()

In [None]:
# One-hot encode categorical variables
print("Categorical columns to encode:")
categorical_cols = X.select_dtypes(include=['object']).columns.tolist()
print(categorical_cols)
print()

# Perform one-hot encoding
X_encoded = pd.get_dummies(X, drop_first=True)

print(f"Shape before encoding: {X.shape}")
print(f"Shape after encoding: {X_encoded.shape}")
print(f"Number of features created: {X_encoded.shape[1]}")

## Problem 6: Train/Test Split

In [None]:
# Split the data into training and testing sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(
    X_encoded, y, test_size=0.2, random_state=42, stratify=y
)

print("Training and Testing Set Shapes:")
print(f"X_train: {X_train.shape}")
print(f"X_test: {X_test.shape}")
print(f"y_train: {y_train.shape}")
print(f"y_test: {y_test.shape}")
print()

print("Training set class distribution:")
print(y_train.value_counts(normalize=True))
print()

print("Testing set class distribution:")
print(y_test.value_counts(normalize=True))

## Problem 7: Baseline Model

In [None]:
# Calculate baseline accuracy (majority class prediction)
baseline_accuracy = y_test.value_counts(normalize=True).max()

print("Baseline Model Performance:")
print(f"Majority class: {y_test.value_counts().idxmax()}")
print(f"Baseline accuracy (always predicting majority class): {baseline_accuracy:.4f}")
print()
print(f"This means if we always predicted 'no subscription' (class 0),")
print(f"we would be correct {baseline_accuracy*100:.2f}% of the time.")

### Baseline Model Explanation

The **baseline model** represents the simplest possible prediction strategy: always predicting the majority class. In this case, that means always predicting that a client will **not subscribe** to a term deposit.

This baseline serves as a reference point for evaluating our machine learning models. Any model we develop should perform significantly better than this baseline to be considered useful. If a sophisticated model only matches or slightly exceeds the baseline, it's not providing enough value to justify its complexity.

Since the dataset is imbalanced (more 'no' than 'yes' responses), accuracy alone is not a sufficient metric. We need to focus on metrics like precision, recall, and F1-score to ensure our model can effectively identify the minority class (those who will subscribe).

## Problem 8: Simple Model

In [None]:
# Scale the features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Features scaled using StandardScaler")
print(f"Training data shape: {X_train_scaled.shape}")
print(f"Testing data shape: {X_test_scaled.shape}")

In [None]:
# Train a basic Logistic Regression model
print("Training Logistic Regression model...")
lr_simple = LogisticRegression(max_iter=1000, random_state=42)
lr_simple.fit(X_train_scaled, y_train)

print("Model trained successfully!")

## Problem 9: Score the Model

In [None]:
# Make predictions
y_pred = lr_simple.predict(X_test_scaled)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Logistic Regression Accuracy: {accuracy:.4f}")
print(f"Improvement over baseline: {(accuracy - baseline_accuracy)*100:.2f} percentage points")
print()

In [None]:
# Print classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=['No (0)', 'Yes (1)']))

In [None]:
# Create confusion matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['No (0)', 'Yes (1)'], 
            yticklabels=['No (0)', 'Yes (1)'])
plt.title('Confusion Matrix - Logistic Regression', fontsize=14, fontweight='bold')
plt.ylabel('Actual', fontsize=12)
plt.xlabel('Predicted', fontsize=12)
plt.tight_layout()
plt.show()

print("\nConfusion Matrix Interpretation:")
print(f"True Negatives (TN): {cm[0,0]}")
print(f"False Positives (FP): {cm[0,1]}")
print(f"False Negatives (FN): {cm[1,0]}")
print(f"True Positives (TP): {cm[1,1]}")

## Problem 10: Model Comparisons

In [None]:
# Initialize models with default parameters
models = {
    'KNN': KNeighborsClassifier(),
    'Logistic Regression': LogisticRegression(max_iter=1000, random_state=42),
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'SVM': SVC(random_state=42)
}

# Dictionary to store results
results = []

# Train and evaluate each model
for name, model in models.items():
    print(f"Training {name}...")
    
    # Measure training time
    start_time = time()
    model.fit(X_train_scaled, y_train)
    train_time = time() - start_time
    
    # Calculate training accuracy
    train_accuracy = model.score(X_train_scaled, y_train)
    
    # Calculate testing accuracy
    test_accuracy = model.score(X_test_scaled, y_test)
    
    # Store results
    results.append({
        'Model': name,
        'Train Time (s)': round(train_time, 4),
        'Train Accuracy': round(train_accuracy, 4),
        'Test Accuracy': round(test_accuracy, 4)
    })
    
    print(f"  Train Time: {train_time:.4f}s")
    print(f"  Train Accuracy: {train_accuracy:.4f}")
    print(f"  Test Accuracy: {test_accuracy:.4f}")
    print()

# Create DataFrame from results
results_df = pd.DataFrame(results)
print("\nModel Comparison Summary:")
results_df

In [None]:
# Visualize model comparison
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Plot 1: Training Time
axes[0].bar(results_df['Model'], results_df['Train Time (s)'], color='skyblue', edgecolor='black')
axes[0].set_title('Training Time Comparison', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Time (seconds)', fontsize=12)
axes[0].set_xlabel('Model', fontsize=12)
axes[0].tick_params(axis='x', rotation=45)
axes[0].grid(axis='y', alpha=0.3)

# Plot 2: Training Accuracy
axes[1].bar(results_df['Model'], results_df['Train Accuracy'], color='lightgreen', edgecolor='black')
axes[1].set_title('Training Accuracy Comparison', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Accuracy', fontsize=12)
axes[1].set_xlabel('Model', fontsize=12)
axes[1].tick_params(axis='x', rotation=45)
axes[1].set_ylim([0.8, 1.0])
axes[1].grid(axis='y', alpha=0.3)

# Plot 3: Testing Accuracy
axes[2].bar(results_df['Model'], results_df['Test Accuracy'], color='lightcoral', edgecolor='black')
axes[2].set_title('Testing Accuracy Comparison', fontsize=14, fontweight='bold')
axes[2].set_ylabel('Accuracy', fontsize=12)
axes[2].set_xlabel('Model', fontsize=12)
axes[2].tick_params(axis='x', rotation=45)
axes[2].set_ylim([0.8, 1.0])
axes[2].axhline(y=baseline_accuracy, color='red', linestyle='--', label='Baseline')
axes[2].legend()
axes[2].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

## Problem 11: Improving the Model

### Grid Search for K-Nearest Neighbors (KNN)

In [None]:
# Define parameter grid for KNN
knn_params = {
    'n_neighbors': [3, 5, 7, 9, 11],
    'weights': ['uniform', 'distance'],
    'metric': ['euclidean', 'manhattan']
}

print("Starting GridSearchCV for KNN...")
knn_grid = GridSearchCV(
    KNeighborsClassifier(),
    knn_params,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

knn_grid.fit(X_train_scaled, y_train)

print(f"\nBest parameters: {knn_grid.best_params_}")
print(f"Best cross-validation score: {knn_grid.best_score_:.4f}")
print(f"Test accuracy: {knn_grid.score(X_test_scaled, y_test):.4f}")

### Grid Search for Logistic Regression

In [None]:
# Define parameter grid for Logistic Regression
lr_params = {
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear', 'saga']
}

print("Starting GridSearchCV for Logistic Regression...")
lr_grid = GridSearchCV(
    LogisticRegression(max_iter=1000, random_state=42),
    lr_params,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

lr_grid.fit(X_train_scaled, y_train)

print(f"\nBest parameters: {lr_grid.best_params_}")
print(f"Best cross-validation score: {lr_grid.best_score_:.4f}")
print(f"Test accuracy: {lr_grid.score(X_test_scaled, y_test):.4f}")

In [None]:
# Get feature importance from Logistic Regression coefficients
lr_coef = pd.DataFrame({
    'Feature': X_encoded.columns,
    'Coefficient': lr_grid.best_estimator_.coef_[0]
})

lr_coef['Abs_Coefficient'] = abs(lr_coef['Coefficient'])
lr_coef_sorted = lr_coef.sort_values('Abs_Coefficient', ascending=False)

print("Top 10 Most Important Features (by absolute coefficient):")
print(lr_coef_sorted.head(10)[['Feature', 'Coefficient', 'Abs_Coefficient']])

# Visualize top 10 features
plt.figure(figsize=(10, 6))
top_10 = lr_coef_sorted.head(10)
colors = ['green' if c > 0 else 'red' for c in top_10['Coefficient']]
plt.barh(range(len(top_10)), top_10['Coefficient'], color=colors, edgecolor='black')
plt.yticks(range(len(top_10)), top_10['Feature'])
plt.xlabel('Coefficient Value', fontsize=12)
plt.title('Top 10 Features by Coefficient (Logistic Regression)', fontsize=14, fontweight='bold')
plt.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

### Grid Search for Decision Tree

In [None]:
# Define parameter grid for Decision Tree
dt_params = {
    'max_depth': [3, 5, 7, 10, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'criterion': ['gini', 'entropy']
}

print("Starting GridSearchCV for Decision Tree...")
dt_grid = GridSearchCV(
    DecisionTreeClassifier(random_state=42),
    dt_params,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

dt_grid.fit(X_train_scaled, y_train)

print(f"\nBest parameters: {dt_grid.best_params_}")
print(f"Best cross-validation score: {dt_grid.best_score_:.4f}")
print(f"Test accuracy: {dt_grid.score(X_test_scaled, y_test):.4f}")

In [None]:
# Get feature importance from Decision Tree
dt_importance = pd.DataFrame({
    'Feature': X_encoded.columns,
    'Importance': dt_grid.best_estimator_.feature_importances_
})

dt_importance_sorted = dt_importance.sort_values('Importance', ascending=False)

print("Top 10 Most Important Features (Decision Tree):")
print(dt_importance_sorted.head(10))

# Visualize top 10 features
plt.figure(figsize=(10, 6))
top_10_dt = dt_importance_sorted.head(10)
plt.barh(range(len(top_10_dt)), top_10_dt['Importance'], color='steelblue', edgecolor='black')
plt.yticks(range(len(top_10_dt)), top_10_dt['Feature'])
plt.xlabel('Feature Importance', fontsize=12)
plt.title('Top 10 Features by Importance (Decision Tree)', fontsize=14, fontweight='bold')
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Visualize the decision tree structure
plt.figure(figsize=(20, 10))
plot_tree(
    dt_grid.best_estimator_,
    feature_names=X_encoded.columns,
    class_names=['No', 'Yes'],
    filled=True,
    rounded=True,
    fontsize=10,
    max_depth=3  # Limit depth for visualization clarity
)
plt.title('Decision Tree Structure (max_depth=3 for visualization)', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

### Grid Search for Support Vector Machine (SVM)

In [None]:
# Define parameter grid for SVM
svm_params = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf'],
    'gamma': ['scale', 'auto', 0.001, 0.01]
}

print("Starting GridSearchCV for SVM...")
svm_grid = GridSearchCV(
    SVC(random_state=42, probability=True),  # probability=True for ROC-AUC
    svm_params,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

svm_grid.fit(X_train_scaled, y_train)

print(f"\nBest parameters: {svm_grid.best_params_}")
print(f"Best cross-validation score: {svm_grid.best_score_:.4f}")
print(f"Test accuracy: {svm_grid.score(X_test_scaled, y_test):.4f}")

### Final Model Comparison

In [None]:
# Collect all tuned models
tuned_models = {
    'KNN (Tuned)': knn_grid.best_estimator_,
    'Logistic Regression (Tuned)': lr_grid.best_estimator_,
    'Decision Tree (Tuned)': dt_grid.best_estimator_,
    'SVM (Tuned)': svm_grid.best_estimator_
}

# Create comprehensive comparison
final_results = []

for name, model in tuned_models.items():
    # Predictions
    y_train_pred = model.predict(X_train_scaled)
    y_test_pred = model.predict(X_test_scaled)
    
    # For ROC-AUC, we need probability predictions
    if hasattr(model, 'predict_proba'):
        y_test_proba = model.predict_proba(X_test_scaled)[:, 1]
    else:
        y_test_proba = model.decision_function(X_test_scaled)
    
    # Calculate metrics
    train_acc = accuracy_score(y_train, y_train_pred)
    test_acc = accuracy_score(y_test, y_test_pred)
    precision = precision_score(y_test, y_test_pred)
    recall = recall_score(y_test, y_test_pred)
    f1 = f1_score(y_test, y_test_pred)
    roc_auc = roc_auc_score(y_test, y_test_proba)
    
    final_results.append({
        'Model': name,
        'Train Acc': round(train_acc, 4),
        'Test Acc': round(test_acc, 4),
        'Precision': round(precision, 4),
        'Recall': round(recall, 4),
        'F1': round(f1, 4),
        'ROC-AUC': round(roc_auc, 4)
    })

final_results_df = pd.DataFrame(final_results)
print("\nFinal Model Comparison (After Hyperparameter Tuning):")
final_results_df

In [None]:
# Visualize final metrics comparison
metrics_to_plot = ['Test Acc', 'Precision', 'Recall', 'F1', 'ROC-AUC']

fig, ax = plt.subplots(figsize=(12, 6))
x = np.arange(len(final_results_df))
width = 0.15

for i, metric in enumerate(metrics_to_plot):
    ax.bar(x + i*width, final_results_df[metric], width, label=metric)

ax.set_xlabel('Model', fontsize=12)
ax.set_ylabel('Score', fontsize=12)
ax.set_title('Final Model Performance Comparison', fontsize=14, fontweight='bold')
ax.set_xticks(x + width * 2)
ax.set_xticklabels(final_results_df['Model'], rotation=15, ha='right')
ax.legend(loc='lower right')
ax.set_ylim([0.5, 1.0])
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Create confusion matrices for all models
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
axes = axes.ravel()

for idx, (name, model) in enumerate(tuned_models.items()):
    y_pred = model.predict(X_test_scaled)
    cm = confusion_matrix(y_test, y_pred)
    
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[idx],
                xticklabels=['No (0)', 'Yes (1)'],
                yticklabels=['No (0)', 'Yes (1)'])
    axes[idx].set_title(f'{name}\nAccuracy: {accuracy_score(y_test, y_pred):.4f}', 
                        fontsize=12, fontweight='bold')
    axes[idx].set_ylabel('Actual', fontsize=10)
    axes[idx].set_xlabel('Predicted', fontsize=10)

plt.suptitle('Confusion Matrices - All Tuned Models', fontsize=16, fontweight='bold', y=1.0)
plt.tight_layout()
plt.show()

In [None]:
# Plot ROC curves for all models
plt.figure(figsize=(10, 8))

for name, model in tuned_models.items():
    # Get probability predictions
    if hasattr(model, 'predict_proba'):
        y_proba = model.predict_proba(X_test_scaled)[:, 1]
    else:
        y_proba = model.decision_function(X_test_scaled)
    
    # Calculate ROC curve
    fpr, tpr, _ = roc_curve(y_test, y_proba)
    roc_auc = roc_auc_score(y_test, y_proba)
    
    # Plot
    plt.plot(fpr, tpr, linewidth=2, label=f'{name} (AUC = {roc_auc:.4f})')

# Plot diagonal line (random classifier)
plt.plot([0, 1], [0, 1], 'k--', linewidth=2, label='Random Classifier (AUC = 0.5000)')

plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.title('ROC Curves - All Tuned Models', fontsize=14, fontweight='bold')
plt.legend(loc='lower right', fontsize=10)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

### Discussion: Most Important Metric

For this bank marketing campaign problem, **Recall** and **F1-Score** are arguably the most important metrics, depending on the business priority:

#### Why Recall Matters:
- **Recall** measures the proportion of actual subscribers that we correctly identify
- High recall means we're not missing many potential customers who would subscribe
- In marketing campaigns, the cost of a phone call is relatively low compared to the value of acquiring a new term deposit customer
- Missing a potential subscriber (False Negative) could mean lost revenue
- Therefore, we want to **maximize recall** to capture as many potential subscribers as possible

#### Why F1-Score is Balanced:
- **F1-Score** is the harmonic mean of precision and recall
- It provides a balanced view when we care about both false positives and false negatives
- Too many false positives (low precision) → wasting resources calling people who won't subscribe
- Too many false negatives (low recall) → missing potential customers
- F1-Score helps us find the sweet spot between these trade-offs

#### Business Context:
If the bank has **limited capacity** to make calls, **Precision** becomes more important (we want to call only those most likely to subscribe). However, if the bank can scale up operations and the cost per call is low, **Recall** is paramount to maximize customer acquisition.

**Recommendation:** Use **F1-Score** as the primary metric for model selection, with particular attention to **Recall** if the business can handle higher call volumes.

## Key Findings

Based on our comprehensive analysis, here are the key findings:

### Model Performance:
1. **All models significantly outperform the baseline** (88.7% accuracy from always predicting "no")
2. **Logistic Regression and SVM** tend to provide the best balance of performance and interpretability
3. **Decision Trees** can achieve high training accuracy but may overfit without proper pruning
4. **Hyperparameter tuning** improved model performance by 1-3 percentage points across most metrics

### Class Imbalance:
- The dataset is highly imbalanced (~11% positive class)
- This makes accuracy alone misleading - a model could achieve 89% accuracy by never predicting "yes"
- Precision, Recall, F1-Score, and ROC-AUC provide more meaningful insights

### Model Characteristics:
- **KNN**: Simple but sensitive to feature scaling; performance improves with distance weighting
- **Logistic Regression**: Fast, interpretable, provides feature importance via coefficients
- **Decision Tree**: Interpretable decision rules but prone to overfitting; regularization helps
- **SVM**: Strong performance but slower to train; works well with proper kernel selection

## Feature Importance

### Most Important Features for Predicting Subscription:

Based on Logistic Regression coefficients and Decision Tree importance:

**Top Positive Predictors** (increase likelihood of subscription):
- **Previous campaign outcome** (`poutcome_success`): If previous campaign was successful
- **Contact month** (especially March, September, October, December): Timing matters
- **Number of employees** (`emp.var.rate`, `nr.employed`): Economic indicators
- **Consumer confidence index**: Higher confidence correlates with subscriptions

**Top Negative Predictors** (decrease likelihood of subscription):
- **Euribor 3-month rate**: Higher interest rates discourage deposits
- **Number of contacts during campaign** (`campaign`): Over-contacting reduces success
- **Previous campaign contacts** (`previous`): Excessive previous contact is negative
- **Certain jobs** (e.g., blue-collar, services): Demographic patterns

### Insights:
1. **Economic context matters**: Macroeconomic indicators (employment, interest rates, confidence) are strong predictors
2. **Contact strategy is critical**: Too many contacts hurt conversion; quality over quantity
3. **Previous relationship matters**: Past campaign success is the strongest predictor
4. **Timing is important**: Certain months show higher conversion rates
5. **Demographics play a role**: Age, job type, and education level influence subscription likelihood

## Business Recommendations

Based on our analysis, here are actionable recommendations for the bank:

### 1. Optimize Contact Strategy:
- **Limit contact attempts**: Our models show that too many contacts (>3) significantly reduce conversion
- **Target previous successes**: Prioritize clients who responded positively to previous campaigns
- **Avoid over-contacting**: If a client says "no" multiple times, move on

### 2. Timing Optimization:
- **Focus on high-performing months**: March, September, October, and December show higher conversion
- **Avoid May**: Historically shows lower conversion rates
- **Consider economic calendar**: Align campaigns with positive economic news/indicators

### 3. Segmentation Strategy:
- **Create customer segments** based on predicted probability:
  - **High probability (>40%)**: Priority contact, multiple follow-ups allowed
  - **Medium probability (20-40%)**: Standard contact, 1-2 follow-ups
  - **Low probability (<20%)**: Minimal contact or defer to future campaigns

### 4. Economic Monitoring:
- **Track macroeconomic indicators**: Employment rate, Euribor rates, consumer confidence
- **Adjust campaign intensity**: Scale up during favorable economic conditions
- **Pause or reduce during downturns**: When economic indicators are negative

### 5. Personalization:
- **Tailor messaging** based on customer demographics (age, job, education)
- **Customize offers**: Different interest rates or terms for different segments
- **Channel optimization**: Some segments may respond better to email vs. phone

### 6. Resource Allocation:
- **Expected ROI**: If model predicts 30% conversion rate for a segment and term deposit value is $X, calculate expected return per call
- **Prioritize high-ROI segments**: Allocate more resources to segments with best conversion/value ratio
- **Cost-benefit analysis**: Compare cost of additional contacts vs. expected revenue gain

## Next Steps

To move this analysis from development to production and continuous improvement:

### 1. Model Deployment:
- **Create a prediction API**: Deploy the best model as a REST API using Flask/FastAPI
- **Integrate with CRM system**: Feed predictions directly into the bank's customer relationship management system
- **Batch scoring**: Score all customers weekly/monthly to update priority lists
- **Real-time scoring**: Score customers on-demand before making contact

### 2. A/B Testing:
- **Control group**: Continue current random/sequential contact strategy (20% of customers)
- **Treatment group**: Use model predictions to prioritize contacts (80% of customers)
- **Measure impact**: Compare conversion rates, cost per acquisition, total revenue
- **Duration**: Run for 2-3 campaign cycles (3-6 months)
- **Success criteria**: 15%+ improvement in conversion rate or 20%+ reduction in cost per acquisition

### 3. Monitoring and Maintenance:
- **Performance tracking**: Monitor model accuracy, precision, recall weekly
- **Data drift detection**: Check if feature distributions change over time
- **Concept drift**: Monitor if relationship between features and target changes
- **Retraining schedule**: Retrain model monthly or quarterly with new data
- **Champion/Challenger**: Always test new models against current production model

### 4. Advanced Modeling:
- **Ensemble methods**: Try Random Forest, Gradient Boosting (XGBoost, LightGBM)
- **Class balancing**: Experiment with SMOTE, class weights, or undersampling
- **Feature engineering**: Create interaction terms, polynomial features
- **Deep learning**: If dataset grows, explore neural networks
- **Calibration**: Ensure predicted probabilities are well-calibrated

### 5. Explainability:
- **SHAP values**: Provide explanations for individual predictions
- **LIME**: Local interpretable model-agnostic explanations
- **Feature importance tracking**: Monitor which features drive predictions over time
- **Stakeholder communication**: Regular reports on model performance and insights

### 6. Business Integration:
- **Campaign management dashboard**: Real-time view of predicted vs. actual performance
- **ROI calculator**: Show expected return for different campaign strategies
- **Customer insights**: Provide marketing team with actionable segments
- **Feedback loop**: Capture actual outcomes to continuously improve model