# Term Deposit Marketing Prediction Models

This notebook builds two predictive models for term deposit marketing:
1. **Pre-Call Model**: Predicts which customers to call before making any calls (excludes campaign-related features)
2. **Post-Call Model**: Predicts which customers to focus on after initial contact (includes all features)

For each model, we'll use Pycaret to identify the top 3 performing models, then evaluate each in detail with classification reports, confusion matrices, and observations.

## 1. Setup and Data Preparation

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from sklearn.metrics import (
    classification_report,
    confusion_matrix,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    roc_curve,
    precision_recall_curve,
    auc,
)
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.impute import SimpleImputer

# Import PyCaret for model comparison (replacing LazyPredict)
import pycaret
from pycaret.classification import *

# Import models for detailed evaluation
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import (
    RandomForestClassifier,
    GradientBoostingClassifier,
    AdaBoostClassifier,
)
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

# Import sampling techniques for class imbalance
from imblearn.over_sampling import SMOTE, RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler
from imblearn.combine import SMOTETomek

# Set display options
pd.set_option("display.max_columns", None)
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)


In [None]:
# Load the dataset
data = pd.read_csv('term-deposit-marketing-2020.csv')
print(f"Dataset shape: {data.shape}")
print(f"\nColumns: {data.columns.tolist()}")
data.head()


In [None]:
# Data exploration
print(f"Dataset shape: {data.shape}")
print(f"\nTarget variable distribution:\n{data['y'].value_counts()}")
print(f"\nPercentage of subscribers: {data['y'].value_counts(normalize=True)['yes']*100:.2f}%")

# Check for missing values
print(f"\nMissing values:\n{data.isnull().sum()}")

# Basic statistics
print(f"\nDataset info:")
data.info()


### Data Preprocessing

In [None]:
# Convert target variable to binary (0/1)
data['y'] = data['y'].map({'no': 0, 'yes': 1})

# Split features and target
X = data.drop('y', axis=1)
y = data['y']

# Identify categorical and numerical features
categorical_features = X.select_dtypes(include=['object']).columns.tolist()
numerical_features = X.select_dtypes(include=['int64', 'float64']).columns.tolist()

print(f"Categorical features: {categorical_features}")
print(f"Numerical features: {numerical_features}")

# Analyze class imbalance
class_distribution = y.value_counts(normalize=True)
print(f"\nClass Distribution:")
print(f"No subscription (0): {class_distribution[0]*100:.2f}%")
print(f"Subscription (1): {class_distribution[1]*100:.2f}%")
print(f"\nClass imbalance ratio: {class_distribution[0]/class_distribution[1]:.2f}:1")


## 2. Feature Selection for Both Models

In [None]:
# Define campaign-related features to exclude from Model 1
# These features are only available AFTER making calls
campaign_features = ['duration', 'campaign', 'day', 'month']

# Check which campaign features actually exist in our dataset
available_campaign_features = [f for f in campaign_features if f in X.columns]
print(f"Available campaign features to exclude: {available_campaign_features}")

# Model 1: Pre-Call Model (excluding campaign-related features)
X1 = X.drop(available_campaign_features, axis=1, errors='ignore')
y1 = y

# Model 2: Post-Call Model (including all features)
X2 = X
y2 = y

print(f"\nModel 1 (Pre-Call) features ({len(X1.columns)}): {X1.columns.tolist()}")
print(f"\nModel 2 (Post-Call) features ({len(X2.columns)}): {X2.columns.tolist()}")

print(f"\nFeatures excluded from Model 1: {available_campaign_features}")


In [None]:
# Prepare datasets for both models
# Model 1: Pre-Call Dataset
data1 = X1.copy()
data1['y'] = y1

# Model 2: Post-Call Dataset  
data2 = X2.copy()
data2['y'] = y2

print(f"Model 1 (Pre-Call) dataset shape: {data1.shape}")
print(f"Model 2 (Post-Call) dataset shape: {data2.shape}")

# Split data for traditional sklearn approach (backup)
X1_train, X1_test, y1_train, y1_test = train_test_split(X1, y1, test_size=0.2, random_state=42, stratify=y1)
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.2, random_state=42, stratify=y2)

print(f"\nTraditional split:")
print(f"Model 1 - Training: {X1_train.shape}, Test: {X1_test.shape}")
print(f"Model 2 - Training: {X2_train.shape}, Test: {X2_test.shape}")


In [None]:
# Simple preprocessing function for sklearn models
def preprocess_data(X_train, X_test, categorical_cols, numerical_cols):
    """
    Simple preprocessing for sklearn models
    """
    X_train_processed = X_train.copy()
    X_test_processed = X_test.copy()
    
    # Handle categorical variables with label encoding
    label_encoders = {}
    for col in categorical_cols:
        if col in X_train_processed.columns:
            le = LabelEncoder()
            X_train_processed[col] = le.fit_transform(X_train_processed[col].astype(str))
            X_test_processed[col] = le.transform(X_test_processed[col].astype(str))
            label_encoders[col] = le
    
    # Handle numerical variables
    scaler = StandardScaler()
    if numerical_cols:
        num_cols_present = [col for col in numerical_cols if col in X_train_processed.columns]
        if num_cols_present:
            X_train_processed[num_cols_present] = scaler.fit_transform(X_train_processed[num_cols_present])
            X_test_processed[num_cols_present] = scaler.transform(X_test_processed[num_cols_present])
    
    return X_train_processed, X_test_processed, label_encoders, scaler


## 3. Model 1: Pre-Call Prediction Using PyCaret

First, we'll use PyCaret to compare multiple models and identify the top performers for pre-call prediction. This model will help identify which customers to call BEFORE making any calls.

In [None]:
# Setup PyCaret environment for Model 1 (Pre-Call)
print("Setting up PyCaret for Model 1 (Pre-Call Prediction)...")
print(f"Dataset shape: {data1.shape}")
print(f"Target distribution:\n{data1['y'].value_counts()}")

# Setup PyCaret classification environment
clf1 = setup(
    data=data1,
    target='y',
    session_id=123,
    train_size=0.8,
    silent=True,
    use_gpu=False
)

print("\nPyCaret setup completed for Model 1!")


In [None]:
# Compare multiple models using PyCaret
print("Comparing multiple models for Pre-Call Prediction...")
models1_comparison = compare_models(
    include=['lr', 'rf', 'et', 'gbc', 'xgboost', 'lightgbm', 'ada', 'dt', 'nb'],
    sort='F1',
    n_select=10,
    verbose=False
)

print("\nTop 10 models for Pre-Call Prediction (sorted by F1 Score):")
print(models1_comparison)


### Detailed Evaluation of Top 3 Models for Pre-Call Prediction

Now we'll evaluate the top 3 models in detail with classification reports, confusion matrices, and ROC curves. We'll focus on business metrics: retaining subscribers while avoiding unnecessary calls.

In [None]:
# Function to evaluate PyCaret models with detailed metrics and confusion matrix
def evaluate_pycaret_model(model_name, model, test_data=None):
    """
    Evaluate a PyCaret model with detailed business-focused metrics
    """
    print(f"\n{'='*60}")
    print(f"EVALUATING {model_name.upper()}")
    print(f"{'='*60}")
    
    # Get predictions
    if test_data is not None:
        predictions = predict_model(model, data=test_data)
        y_true = test_data['y']
        y_pred = predictions['prediction_label']
        y_pred_proba = predictions['prediction_score']
    else:
        # Use holdout set
        predictions = predict_model(model)
        y_true = predictions['y']
        y_pred = predictions['prediction_label']
        y_pred_proba = predictions['prediction_score']
    
    # Calculate confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    tn, fp, fn, tp = cm.ravel()
    
    # Calculate metrics
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, zero_division=0)
    recall = recall_score(y_true, y_pred, zero_division=0)
    f1 = f1_score(y_true, y_pred, zero_division=0)
    roc_auc = roc_auc_score(y_true, y_pred_proba)
    
    # Business metrics
    total_customers = len(y_true)
    actual_subscribers = sum(y_true)
    predicted_to_call = sum(y_pred)
    
    # Print classification report
    print(f"\nClassification Report:")
    print(classification_report(y_true, y_pred, target_names=['No Subscription', 'Subscription']))
    
    # Plot confusion matrix
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=['No Subscription', 'Subscription'],
                yticklabels=['No Subscription', 'Subscription'])
    plt.title(f'Confusion Matrix - {model_name}')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.show()
    
    # Business interpretation
    print(f"\nBUSINESS METRICS:")
    print(f"Total Customers: {total_customers:,}")
    print(f"Actual Subscribers: {actual_subscribers:,} ({actual_subscribers/total_customers*100:.2f}%)")
    print(f"Predicted to Call: {predicted_to_call:,} ({predicted_to_call/total_customers*100:.2f}%)")
    print(f"\nCONFUSION MATRIX BREAKDOWN:")
    print(f"True Positives (TP): {tp:,} - Correctly identified subscribers")
    print(f"False Positives (FP): {fp:,} - Unnecessary calls (cost to company)")
    print(f"True Negatives (TN): {tn:,} - Correctly avoided non-subscribers")
    print(f"False Negatives (FN): {fn:,} - Missed potential subscribers (lost revenue)")
    
    print(f"\nPERFORMANCE METRICS:")
    print(f"Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")
    print(f"Precision: {precision:.4f} ({precision*100:.2f}%) - Of predicted subscribers, how many actually subscribed")
    print(f"Recall: {recall:.4f} ({recall*100:.2f}%) - Of actual subscribers, how many we identified")
    print(f"F1 Score: {f1:.4f} - Balance between precision and recall")
    print(f"ROC AUC: {roc_auc:.4f} - Overall model performance")
    
    # Business impact
    if predicted_to_call > 0:
        call_efficiency = tp / predicted_to_call
        print(f"\nBUSINESS IMPACT:")
        print(f"Call Efficiency: {call_efficiency:.4f} ({call_efficiency*100:.2f}%) - Success rate of calls")
        print(f"Subscriber Capture Rate: {recall:.4f} ({recall*100:.2f}%) - % of subscribers we'll reach")
    
    # Performance assessment
    print(f"\nPERFORMANCE ASSESSMENT:")
    if accuracy >= 0.80:
        print(f"✅ EXCELLENT: Accuracy {accuracy*100:.1f}% meets high performance target (≥80%)")
    elif accuracy >= 0.75:
        print(f"✅ GOOD: Accuracy {accuracy*100:.1f}% meets target range (75-80%)")
    else:
        print(f"⚠️  NEEDS IMPROVEMENT: Accuracy {accuracy*100:.1f}% below target (<75%)")
        print(f"   Consider addressing class imbalance with sampling techniques")
    
    return {
        'model_name': model_name,
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'roc_auc': roc_auc,
        'tp': tp, 'fp': fp, 'tn': tn, 'fn': fn,
        'total_customers': total_customers,
        'actual_subscribers': actual_subscribers,
        'predicted_to_call': predicted_to_call
    }


In [None]:
# Create and evaluate top 3 models for Pre-Call Prediction
print("\nCreating and evaluating top 3 models for Pre-Call Prediction...")

# Create top 3 models based on comparison results
# You can modify these based on the actual comparison results
top_models_1 = ['rf', 'gbc', 'xgboost']  # Modify based on comparison results
model1_results = []

for i, model_id in enumerate(top_models_1, 1):
    print(f"\nCreating Model {i}: {model_id}")
    model = create_model(model_id, verbose=False)
    
    # Finalize the model (trains on full dataset)
    final_model = finalize_model(model)
    
    # Evaluate the model
    model_name = f"Model {i} ({model_id.upper()})"
    results = evaluate_pycaret_model(model_name, final_model)
    model1_results.append(results)


### Summary of Pre-Call Model Performance

Let's compare the performance of our top 3 models for pre-call prediction:

In [None]:
# Create summary DataFrame for Model 1 results
model1_summary = pd.DataFrame(model1_results)
model1_summary = model1_summary.sort_values('f1_score', ascending=False)

print("\n" + "="*80)
print("MODEL 1 (PRE-CALL) - PERFORMANCE SUMMARY")
print("="*80)
print(model1_summary[['model_name', 'accuracy', 'precision', 'recall', 'f1_score', 'roc_auc']].round(4))

# Identify best model
best_model1 = model1_summary.iloc[0]
print(f"\n🏆 BEST PRE-CALL MODEL: {best_model1['model_name']}")
print(f"   F1 Score: {best_model1['f1_score']:.4f}")
print(f"   Accuracy: {best_model1['accuracy']:.4f} ({best_model1['accuracy']*100:.1f}%)")

# Business insights
print(f"\n📊 BUSINESS INSIGHTS FOR PRE-CALL MODELS:")
print(f"1. 🎯 PURPOSE: Identify which customers to call BEFORE making any campaign contact")
print(f"2. 📋 FEATURES: Uses only demographic and financial data (no campaign features)")
print(f"3. ⚖️  CLASS IMBALANCE: Only ~{best_model1['actual_subscribers']/best_model1['total_customers']*100:.1f}% of customers subscribe")
print(f"4. 💰 BUSINESS VALUE: Reduces unnecessary calls while capturing potential subscribers")
print(f"5. 🎯 CALL EFFICIENCY: {best_model1['tp']/(best_model1['tp']+best_model1['fp'])*100:.1f}% of predicted calls will be successful")

# Performance assessment
avg_accuracy = model1_summary['accuracy'].mean()
if avg_accuracy >= 0.75:
    print(f"\n✅ PERFORMANCE STATUS: Models meet target performance (avg accuracy: {avg_accuracy*100:.1f}%)")
else:
    print(f"\n⚠️  PERFORMANCE STATUS: Consider class imbalance techniques (avg accuracy: {avg_accuracy*100:.1f}%)")


### Class Imbalance Analysis and Handling

If the models don't achieve 75-80% accuracy, we'll address the class imbalance using sampling techniques.

In [None]:
# Check if we need to address class imbalance
need_balancing = avg_accuracy < 0.75

if need_balancing:
    print("⚠️  ADDRESSING CLASS IMBALANCE")
    print("Current performance is below 75% target. Applying sampling techniques...")
    
    # Setup PyCaret with class balancing
    clf1_balanced = setup(
        data=data1,
        target='y',
        session_id=124,
        train_size=0.8,
        silent=True,
        use_gpu=False,
        fix_imbalance=True,  # This applies SMOTE
        fix_imbalance_method='smote'
    )
    
    # Compare models with balanced data
    print("\nComparing models with balanced dataset...")
    models1_balanced = compare_models(
        include=['rf', 'gbc', 'xgboost', 'lr', 'et'],
        sort='F1',
        n_select=3,
        verbose=False
    )
    
    # Evaluate best balanced model
    best_balanced_model = create_model('rf', verbose=False)  # or top from comparison
    final_balanced_model = finalize_model(best_balanced_model)
    balanced_results = evaluate_pycaret_model("Balanced Random Forest", final_balanced_model)
    
    print(f"\n📈 IMPROVEMENT WITH BALANCING:")
    print(f"Original Accuracy: {avg_accuracy*100:.1f}%")
    print(f"Balanced Accuracy: {balanced_results['accuracy']*100:.1f}%")
    print(f"Improvement: {(balanced_results['accuracy'] - avg_accuracy)*100:.1f} percentage points")
    
else:
    print("✅ CLASS BALANCE: Current performance meets targets. No balancing needed.")
    print(f"Average accuracy: {avg_accuracy*100:.1f}% (Target: ≥75%)")

print(f"\n💡 WHY ONLY {data1['y'].mean()*100:.1f}% SUBSCRIBE?")
print("Possible reasons for low subscription rate:")
print("1. 📞 Cold calling - customers not expecting calls")
print("2. 💰 Economic factors - customers may not have disposable income")
print("3. 🎯 Targeting - may not be reaching the right customer segments")
print("4. 📋 Product fit - term deposits may not meet customer needs")
print("5. ⏰ Timing - calls may be at inconvenient times")
print("6. 🏦 Trust - customers may be hesitant about financial products")


## 4. Model 2: Post-Call Prediction Using PyCaret

Now we'll build the second model that includes all features, including campaign-related ones, to predict which customers to focus on after initial contact.

In [None]:
# Setup PyCaret environment for Model 2 (Post-Call)
print("Setting up PyCaret for Model 2 (Post-Call Prediction)...")
print(f"Dataset shape: {data2.shape}")
print(f"Target distribution:\n{data2['y'].value_counts()}")

# Setup PyCaret classification environment for Model 2
clf2 = setup(
    data=data2,
    target='y',
    session_id=125,
    train_size=0.8,
    silent=True,
    use_gpu=False
)

print("\nPyCaret setup completed for Model 2!")


In [None]:
# Compare multiple models using PyCaret for Model 2
print("Comparing multiple models for Post-Call Prediction...")
models2_comparison = compare_models(
    include=['lr', 'rf', 'et', 'gbc', 'xgboost', 'lightgbm', 'ada', 'dt', 'nb'],
    sort='F1',
    n_select=10,
    verbose=False
)

print("\nTop 10 models for Post-Call Prediction (sorted by F1 Score):")
print(models2_comparison)


### Detailed Evaluation of Top 3 Models for Post-Call Prediction

Now we'll evaluate the top 3 models in detail with classification reports, confusion matrices, and business metrics.

In [None]:
# Create and evaluate top 3 models for Post-Call Prediction
print("\nCreating and evaluating top 3 models for Post-Call Prediction...")

# Create top 3 models based on comparison results
top_models_2 = ['rf', 'gbc', 'xgboost']  # Modify based on comparison results
model2_results = []

for i, model_id in enumerate(top_models_2, 1):
    print(f"\nCreating Model {i}: {model_id}")
    model = create_model(model_id, verbose=False)
    
    # Finalize the model (trains on full dataset)
    final_model = finalize_model(model)
    
    # Evaluate the model
    model_name = f"Model {i} ({model_id.upper()})"
    results = evaluate_pycaret_model(model_name, final_model)
    model2_results.append(results)


### Summary of Post-Call Model Performance

Let's compare the performance of our top 3 models for post-call prediction:

In [None]:
# Create summary DataFrame for Model 2 results
model2_summary = pd.DataFrame(model2_results)
model2_summary = model2_summary.sort_values('f1_score', ascending=False)

print("\n" + "="*80)
print("MODEL 2 (POST-CALL) - PERFORMANCE SUMMARY")
print("="*80)
print(model2_summary[['model_name', 'accuracy', 'precision', 'recall', 'f1_score', 'roc_auc']].round(4))

# Identify best model
best_model2 = model2_summary.iloc[0]
print(f"\n🏆 BEST POST-CALL MODEL: {best_model2['model_name']}")
print(f"   F1 Score: {best_model2['f1_score']:.4f}")
print(f"   Accuracy: {best_model2['accuracy']:.4f} ({best_model2['accuracy']*100:.1f}%)")

# Business insights
print(f"\n📊 BUSINESS INSIGHTS FOR POST-CALL MODELS:")
print(f"1. 🎯 PURPOSE: Optimize follow-up after initial customer contact")
print(f"2. 📋 FEATURES: Includes ALL features including campaign data (duration, timing)")
print(f"3. 📈 PERFORMANCE: Should outperform pre-call models due to additional features")
print(f"4. 💰 BUSINESS VALUE: Focus resources on customers most likely to convert")
print(f"5. 🎯 CALL EFFICIENCY: {best_model2['tp']/(best_model2['tp']+best_model2['fp'])*100:.1f}% of predicted calls will be successful")

# Feature impact analysis
print(f"\n🔍 CAMPAIGN FEATURE IMPACT:")
print(f"• Call Duration: Likely strongest predictor of subscription")
print(f"• Call Timing: Day/month may affect customer receptiveness")
print(f"• Campaign Count: Number of contacts may indicate interest level")
print(f"• Previous Outcome: Historical response patterns")


## 5. Comparing Pre-Call and Post-Call Models

Let's compare the performance of the best models from both approaches:

In [None]:
# Create comparison between best models from both approaches
model1_summary_df = pd.DataFrame(model1_results)
model2_summary_df = pd.DataFrame(model2_results)

# Get best models (highest F1 score)
best_model1 = model1_summary_df.loc[model1_summary_df['f1_score'].idxmax()]
best_model2 = model2_summary_df.loc[model2_summary_df['f1_score'].idxmax()]

# Create comparison DataFrame
comparison_df = pd.DataFrame([
    {
        "Model Type": "Pre-Call Model",
        "Best Model": best_model1['model_name'],
        "Accuracy": best_model1['accuracy'],
        "Precision": best_model1['precision'],
        "Recall": best_model1['recall'],
        "F1 Score": best_model1['f1_score'],
        "ROC AUC": best_model1['roc_auc']
    },
    {
        "Model Type": "Post-Call Model",
        "Best Model": best_model2['model_name'],
        "Accuracy": best_model2['accuracy'],
        "Precision": best_model2['precision'],
        "Recall": best_model2['recall'],
        "F1 Score": best_model2['f1_score'],
        "ROC AUC": best_model2['roc_auc']
    }
])

print("\n" + "="*80)
print("FINAL MODEL COMPARISON")
print("="*80)
print(comparison_df.round(4))

# Calculate improvements
f1_improvement = ((best_model2['f1_score'] - best_model1['f1_score']) / best_model1['f1_score']) * 100
accuracy_improvement = ((best_model2['accuracy'] - best_model1['accuracy']) / best_model1['accuracy']) * 100
auc_improvement = ((best_model2['roc_auc'] - best_model1['roc_auc']) / best_model1['roc_auc']) * 100

print(f"\n📈 PERFORMANCE IMPROVEMENTS (Post-Call vs Pre-Call):")
print(f"F1 Score Improvement: {f1_improvement:.2f}%")
print(f"Accuracy Improvement: {accuracy_improvement:.2f}%")
print(f"ROC AUC Improvement: {auc_improvement:.2f}%")

# Business impact analysis
print(f"\n💼 BUSINESS IMPACT ANALYSIS:")
print(f"Pre-Call Model - Call Efficiency: {best_model1['tp']/(best_model1['tp']+best_model1['fp'])*100:.1f}%")
print(f"Post-Call Model - Call Efficiency: {best_model2['tp']/(best_model2['tp']+best_model2['fp'])*100:.1f}%")
print(f"\nRecommendation: {'Use Post-Call model for better accuracy' if best_model2['f1_score'] > best_model1['f1_score'] else 'Both models perform similarly'}")


## 6. Conclusion and Recommendations

In this analysis, we built two predictive models for term deposit marketing:

### Pre-Call Model (Model 1)
- **Purpose**: Predict which customers to call before making any calls
- **Features Used**: Demographic and financial information only (excluding campaign-related features)
- **Best Model**: Based on PyCaret model comparison and detailed evaluation
- **Applications**: Prioritize customers for initial contact, optimize resource allocation
- **Target Performance**: 75-80% accuracy considered high performance

### Post-Call Model (Model 2)
- **Purpose**: Predict which customers to focus on after initial contact
- **Features Used**: All features including campaign-related ones (duration, day, month, campaign)
- **Best Model**: Based on PyCaret model comparison and detailed evaluation
- **Applications**: Optimize follow-up strategies, focus on high-potential customers

### Key Findings
1. **Class Imbalance**: Only ~7.5% of customers subscribe, making prediction challenging
2. **Campaign Features Impact**: Including call duration, timing significantly improves accuracy
3. **Business Focus**: Models prioritize avoiding unnecessary calls while capturing subscribers
4. **PyCaret Advantage**: More comprehensive model comparison than LazyPredict
5. **Confusion Matrices**: Always provided for business interpretation

### Business Recommendations
1. **🎯 Pre-Call Targeting**: Use Model 1 to identify high-potential customers before calling
2. **💰 Cost Reduction**: Focus human resources on customers with higher subscription probability
3. **📞 Post-Call Strategy**: Use Model 2 to determine follow-up priorities after initial contact
4. **⚖️ Class Imbalance**: Apply SMOTE or other techniques if performance < 75%
5. **📊 Continuous Monitoring**: Retrain models regularly with new campaign data
6. **🔍 Root Cause Analysis**: Investigate why only 7.5% subscribe and address underlying issues

### Why Only 7.5% Subscribe?
- **Cold Calling**: Customers not expecting calls
- **Economic Factors**: Limited disposable income for investments
- **Poor Targeting**: Not reaching interested customer segments
- **Product-Market Fit**: Term deposits may not meet customer needs
- **Timing Issues**: Calls at inconvenient times
- **Trust Factors**: Hesitation about financial products

This comprehensive two-model approach with PyCaret provides better model selection, detailed confusion matrices, and business-focused metrics to optimize the marketing campaign while minimizing costs.