# Credit Card Fraud Detection - Standardized Grid Search

This notebook implements fraud detection using:
- **Algorithms**: Logistic Regression, Random Forest, XGBoost
- **Sampling**: SMOTE oversampling and Random undersampling
- **Tuning**: **Standardized Grid Search** with cross-validation
- **Metrics**: Precision, Recall, F1-Score

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import Pipeline as ImbPipeline
import xgboost as xgb
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")

Libraries imported successfully!


## 1. Load and Explore Data

In [3]:
# Dataset link - https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud

In [4]:
# Obtaining credit card fraud detection dataset
!curl -o card_transdata.csv https://raw.githubusercontent.com/marhcouto/fraud-detection/master/data/card_transdata.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 72.7M  100 72.7M    0     0  5982k      0  0:00:12  0:00:12 --:--:-- 6740k


In [5]:
# Read the data
data = pd.read_csv('card_transdata.csv')

print(f"Dataset shape: {data.shape}")
print(f"\nFeatures: {list(data.columns[:-1])}")
print(f"\nFraud distribution:")
print(data['fraud'].value_counts())
print(f"\nFraud percentage: {data['fraud'].mean()*100:.2f}%")

Dataset shape: (1000000, 8)

Features: ['distance_from_home', 'distance_from_last_transaction', 'ratio_to_median_purchase_price', 'repeat_retailer', 'used_chip', 'used_pin_number', 'online_order']

Fraud distribution:
fraud
0.0    912597
1.0     87403
Name: count, dtype: int64

Fraud percentage: 8.74%


In [6]:
data

Unnamed: 0,distance_from_home,distance_from_last_transaction,ratio_to_median_purchase_price,repeat_retailer,used_chip,used_pin_number,online_order,fraud
0,57.877857,0.311140,1.945940,1.0,1.0,0.0,0.0,0.0
1,10.829943,0.175592,1.294219,1.0,0.0,0.0,0.0,0.0
2,5.091079,0.805153,0.427715,1.0,0.0,0.0,1.0,0.0
3,2.247564,5.600044,0.362663,1.0,1.0,0.0,1.0,0.0
4,44.190936,0.566486,2.222767,1.0,1.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...
999995,2.207101,0.112651,1.626798,1.0,1.0,0.0,0.0,0.0
999996,19.872726,2.683904,2.778303,1.0,1.0,0.0,0.0,0.0
999997,2.914857,1.472687,0.218075,1.0,1.0,0.0,1.0,0.0
999998,4.258729,0.242023,0.475822,1.0,0.0,0.0,1.0,0.0


In [7]:
data.describe()

Unnamed: 0,distance_from_home,distance_from_last_transaction,ratio_to_median_purchase_price,repeat_retailer,used_chip,used_pin_number,online_order,fraud
count,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0
mean,26.628792,5.036519,1.824182,0.881536,0.350399,0.100608,0.650552,0.087403
std,65.390784,25.843093,2.799589,0.323157,0.477095,0.300809,0.476796,0.282425
min,0.004874,0.000118,0.004399,0.0,0.0,0.0,0.0,0.0
25%,3.878008,0.296671,0.475673,1.0,0.0,0.0,0.0,0.0
50%,9.96776,0.99865,0.997717,1.0,0.0,0.0,1.0,0.0
75%,25.743985,3.355748,2.09637,1.0,1.0,0.0,1.0,0.0
max,10632.723672,11851.104565,267.802942,1.0,1.0,1.0,1.0,1.0


## 2. Data Preparation

In [8]:
# Prepare features and target
X = data.drop('fraud', axis=1)
y = data['fraud']

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")
print(f"Training fraud cases: {y_train.sum()} ({y_train.mean()*100:.2f}%)")
print(f"Test fraud cases: {y_test.sum()} ({y_test.mean()*100:.2f}%)")

Training set: 700000 samples
Test set: 300000 samples
Training fraud cases: 61182.0 (8.74%)
Test fraud cases: 26221.0 (8.74%)


In [9]:
## 3. Standardized Grid Search Configuration

# Define consistent grid search parameters for all models
GRID_SEARCH_CONFIG = {
    'cv': 3,                    # 3-fold cross-validation
    'scoring': 'f1',            # F1-score as optimization metric
    'n_jobs': -1,               # Use all available CPU cores
    'verbose': 1,               # Show progress
    'random_state': 42,         # Reproducible results
    'n_iter': 10               # Number of parameter combinations to try
}

print("Standardized Grid Search Configuration:")
for key, value in GRID_SEARCH_CONFIG.items():
    print(f"  {key}: {value}")

# Standardized parameter grids for each algorithm
PARAM_GRIDS = {
    'logistic_regression': {
        'classifier__C': [0.1, 1, 10],
        'classifier__penalty': ['l2'],
        'classifier__solver': ['lbfgs'],
        'classifier__max_iter': [1000]
    },
    'random_forest': {
        'classifier__n_estimators': [50, 100],
        'classifier__max_depth': [5],
        'classifier__min_samples_split': [10],
        'classifier__min_samples_leaf': [4]
    },
    'xgboost': {
        'classifier__n_estimators': [50, 100],
        'classifier__max_depth': [3, 5],
        'classifier__learning_rate': [0.1],
        'classifier__subsample': [0.8],
        'classifier__colsample_bytree': [0.9]
    }
}

print(f"\nParameter grids defined for {len(PARAM_GRIDS)} algorithms")
for algo, params in PARAM_GRIDS.items():
    print(f"  {algo}: {len(params)} hyperparameters")

Standardized Grid Search Configuration:
  cv: 3
  scoring: f1
  n_jobs: -1
  verbose: 1
  random_state: 42
  n_iter: 10

Parameter grids defined for 3 algorithms
  logistic_regression: 4 hyperparameters
  random_forest: 4 hyperparameters
  xgboost: 5 hyperparameters


In [10]:
def evaluate_model(model, X_test, y_test, model_name):
    """Evaluate model and return metrics"""
    y_pred = model.predict(X_test)

    precision = precision_score(y_test, y_pred, zero_division=0)
    recall = recall_score(y_test, y_pred, zero_division=0)
    f1 = f1_score(y_test, y_pred, zero_division=0)

    print(f"\n{model_name} Results:")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1-Score: {f1:.4f}")

    # Print classification report for more details
    print(f"\nDetailed Classification Report:")
    print(classification_report(y_test, y_pred))

    return {'Model': model_name, 'Precision': precision, 'Recall': recall, 'F1-Score': f1}

def run_standardized_experiment(pipeline, param_grid, model_name, X_train, y_train, X_test, y_test):
    """Run standardized grid search experiment"""
    print(f"\n{'='*60}")
    print(f"Training {model_name}...")
    print(f"{'='*60}")
    
    # Use standardized grid search configuration
    grid_search = RandomizedSearchCV(
        pipeline, 
        param_grid,
        **GRID_SEARCH_CONFIG
    )
    
    # Fit the model
    grid_search.fit(X_train, y_train)
    
    # Print best parameters and CV score
    print(f"Best parameters: {grid_search.best_params_}")
    print(f"Best CV F1-score: {grid_search.best_score_:.4f}")
    
    # Evaluate on test set
    results = evaluate_model(grid_search.best_estimator_, X_test, y_test, model_name)
    
    return grid_search, results

## 4. Model Training and Evaluation - Standardized Experiments

### 4.1 Logistic Regression with SMOTE

In [11]:
# Create pipeline with SMOTE and Logistic Regression
lr_smote_pipeline = ImbPipeline([
    ('scaler', StandardScaler()),
    ('smote', SMOTE(random_state=42)),
    ('classifier', LogisticRegression(random_state=42))
])

# Run standardized experiment
lr_smote_grid, lr_smote_results = run_standardized_experiment(
    lr_smote_pipeline, 
    PARAM_GRIDS['logistic_regression'],
    "Logistic Regression (SMOTE)",
    X_train, y_train, X_test, y_test
)


Training Logistic Regression (SMOTE)...
Fitting 3 folds for each of 3 candidates, totalling 9 fits
Best parameters: {'classifier__solver': 'lbfgs', 'classifier__penalty': 'l2', 'classifier__max_iter': 1000, 'classifier__C': 10}
Best CV F1-score: 0.7148

Logistic Regression (SMOTE) Results:
Precision: 0.5760
Recall: 0.9485
F1-Score: 0.7167

Detailed Classification Report:
              precision    recall  f1-score   support

         0.0       0.99      0.93      0.96    273779
         1.0       0.58      0.95      0.72     26221

    accuracy                           0.93    300000
   macro avg       0.79      0.94      0.84    300000
weighted avg       0.96      0.93      0.94    300000



### 4.2 Logistic Regression with Undersampling

In [12]:
# Create pipeline with undersampling and Logistic Regression
lr_under_pipeline = ImbPipeline([
    ('scaler', StandardScaler()),
    ('undersampler', RandomUnderSampler(random_state=42)),
    ('classifier', LogisticRegression(random_state=42))
])

# Run standardized experiment
lr_under_grid, lr_under_results = run_standardized_experiment(
    lr_under_pipeline,
    PARAM_GRIDS['logistic_regression'],
    "Logistic Regression (Undersampling)",
    X_train, y_train, X_test, y_test
)


Training Logistic Regression (Undersampling)...
Fitting 3 folds for each of 3 candidates, totalling 9 fits
Best parameters: {'classifier__solver': 'lbfgs', 'classifier__penalty': 'l2', 'classifier__max_iter': 1000, 'classifier__C': 10}
Best CV F1-score: 0.7148

Logistic Regression (Undersampling) Results:
Precision: 0.5771
Recall: 0.9494
F1-Score: 0.7179

Detailed Classification Report:
              precision    recall  f1-score   support

         0.0       0.99      0.93      0.96    273779
         1.0       0.58      0.95      0.72     26221

    accuracy                           0.93    300000
   macro avg       0.79      0.94      0.84    300000
weighted avg       0.96      0.93      0.94    300000



### 4.3 Random Forest with SMOTE

In [13]:
# Create pipeline with SMOTE and Random Forest
rf_smote_pipeline = ImbPipeline([
    ('smote', SMOTE(random_state=42)),
    ('classifier', RandomForestClassifier(random_state=42))
])

# Run standardized experiment
rf_smote_grid, rf_smote_results = run_standardized_experiment(
    rf_smote_pipeline,
    PARAM_GRIDS['random_forest'],
    "Random Forest (SMOTE)",
    X_train, y_train, X_test, y_test
)


Training Random Forest (SMOTE)...
Fitting 3 folds for each of 2 candidates, totalling 6 fits
Best parameters: {'classifier__n_estimators': 100, 'classifier__min_samples_split': 10, 'classifier__min_samples_leaf': 4, 'classifier__max_depth': 5}
Best CV F1-score: 0.9967

Random Forest (SMOTE) Results:
Precision: 1.0000
Recall: 0.9937
F1-Score: 0.9969

Detailed Classification Report:
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00    273779
         1.0       1.00      0.99      1.00     26221

    accuracy                           1.00    300000
   macro avg       1.00      1.00      1.00    300000
weighted avg       1.00      1.00      1.00    300000



### 4.4 Random Forest with Undersampling

In [14]:
# Create pipeline with undersampling and Random Forest
rf_under_pipeline = ImbPipeline([
    ('undersampler', RandomUnderSampler(random_state=42)),
    ('classifier', RandomForestClassifier(random_state=42))
])

# Run standardized experiment
rf_under_grid, rf_under_results = run_standardized_experiment(
    rf_under_pipeline,
    PARAM_GRIDS['random_forest'],
    "Random Forest (Undersampling)",
    X_train, y_train, X_test, y_test
)


Training Random Forest (Undersampling)...
Fitting 3 folds for each of 2 candidates, totalling 6 fits
Best parameters: {'classifier__n_estimators': 50, 'classifier__min_samples_split': 10, 'classifier__min_samples_leaf': 4, 'classifier__max_depth': 5}
Best CV F1-score: 0.9971

Random Forest (Undersampling) Results:
Precision: 0.9727
Recall: 0.9999
F1-Score: 0.9861

Detailed Classification Report:
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00    273779
         1.0       0.97      1.00      0.99     26221

    accuracy                           1.00    300000
   macro avg       0.99      1.00      0.99    300000
weighted avg       1.00      1.00      1.00    300000



### 4.5 XGBoost with SMOTE

In [15]:
# Create pipeline with SMOTE and XGBoost
xgb_smote_pipeline = ImbPipeline([
    ('smote', SMOTE(random_state=42)),
    ('classifier', xgb.XGBClassifier(random_state=42, eval_metric='logloss'))
])

# Run standardized experiment
xgb_smote_grid, xgb_smote_results = run_standardized_experiment(
    xgb_smote_pipeline,
    PARAM_GRIDS['xgboost'],
    "XGBoost (SMOTE)",
    X_train, y_train, X_test, y_test
)


Training XGBoost (SMOTE)...
Fitting 3 folds for each of 4 candidates, totalling 12 fits
Best parameters: {'classifier__subsample': 0.8, 'classifier__n_estimators': 100, 'classifier__max_depth': 5, 'classifier__learning_rate': 0.1, 'classifier__colsample_bytree': 0.9}
Best CV F1-score: 0.9887

XGBoost (SMOTE) Results:
Precision: 0.9722
Recall: 0.9988
F1-Score: 0.9853

Detailed Classification Report:
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00    273779
         1.0       0.97      1.00      0.99     26221

    accuracy                           1.00    300000
   macro avg       0.99      1.00      0.99    300000
weighted avg       1.00      1.00      1.00    300000



### 4.6 XGBoost with Undersampling

In [16]:
# Create pipeline with undersampling and XGBoost
xgb_under_pipeline = ImbPipeline([
    ('undersampler', RandomUnderSampler(random_state=42)),
    ('classifier', xgb.XGBClassifier(random_state=42, eval_metric='logloss'))
])

# Run standardized experiment
xgb_under_grid, xgb_under_results = run_standardized_experiment(
    xgb_under_pipeline,
    PARAM_GRIDS['xgboost'],
    "XGBoost (Undersampling)",
    X_train, y_train, X_test, y_test
)


Training XGBoost (Undersampling)...
Fitting 3 folds for each of 4 candidates, totalling 12 fits
Best parameters: {'classifier__subsample': 0.8, 'classifier__n_estimators': 50, 'classifier__max_depth': 5, 'classifier__learning_rate': 0.1, 'classifier__colsample_bytree': 0.9}
Best CV F1-score: 0.9872

XGBoost (Undersampling) Results:
Precision: 0.9784
Recall: 0.9983
F1-Score: 0.9883

Detailed Classification Report:
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00    273779
         1.0       0.98      1.00      0.99     26221

    accuracy                           1.00    300000
   macro avg       0.99      1.00      0.99    300000
weighted avg       1.00      1.00      1.00    300000



## 5. Results Summary and Comparison - Standardized Results

In [17]:
# Combine all results from standardized experiments
all_results = [
    lr_smote_results,
    lr_under_results,
    rf_smote_results,
    rf_under_results,
    xgb_smote_results,
    xgb_under_results
]

# Create results DataFrame
results_df = pd.DataFrame(all_results)
results_df = results_df.round(4)

print("\n" + "="*80)
print("STANDARDIZED GRID SEARCH RESULTS COMPARISON")
print("="*80)
print(f"Grid Search Configuration Used:")
for key, value in GRID_SEARCH_CONFIG.items():
    print(f"  {key}: {value}")

print(f"\nResults Table:")
print(results_df.to_string(index=False))

# Sort by F1-Score for better comparison
results_sorted = results_df.sort_values('F1-Score', ascending=False)
print("\n" + "="*80)
print("RESULTS RANKED BY F1-SCORE (STANDARDIZED)")
print("="*80)
print(results_sorted.to_string(index=False))

# Calculate performance statistics
print(f"\n" + "="*80)
print("PERFORMANCE STATISTICS")
print("="*80)
print(f"Best F1-Score: {results_df['F1-Score'].max():.4f}")
print(f"Worst F1-Score: {results_df['F1-Score'].min():.4f}")
print(f"Average F1-Score: {results_df['F1-Score'].mean():.4f}")
print(f"F1-Score Standard Deviation: {results_df['F1-Score'].std():.4f}")

print(f"\nBest Precision: {results_df['Precision'].max():.4f}")
print(f"Best Recall: {results_df['Recall'].max():.4f}")


STANDARDIZED GRID SEARCH RESULTS COMPARISON
Grid Search Configuration Used:
  cv: 3
  scoring: f1
  n_jobs: -1
  verbose: 1
  random_state: 42
  n_iter: 10

Results Table:
                              Model  Precision  Recall  F1-Score
        Logistic Regression (SMOTE)     0.5760  0.9485    0.7167
Logistic Regression (Undersampling)     0.5771  0.9494    0.7179
              Random Forest (SMOTE)     1.0000  0.9937    0.9969
      Random Forest (Undersampling)     0.9727  0.9999    0.9861
                    XGBoost (SMOTE)     0.9722  0.9988    0.9853
            XGBoost (Undersampling)     0.9784  0.9983    0.9883

RESULTS RANKED BY F1-SCORE (STANDARDIZED)
                              Model  Precision  Recall  F1-Score
              Random Forest (SMOTE)     1.0000  0.9937    0.9969
            XGBoost (Undersampling)     0.9784  0.9983    0.9883
      Random Forest (Undersampling)     0.9727  0.9999    0.9861
                    XGBoost (SMOTE)     0.9722  0.9988    0.9853
Logi

In [18]:
# Advanced analysis with standardized results
print("\n" + "="*80)
print("DETAILED ANALYSIS - STANDARDIZED EXPERIMENTS")
print("="*80)

# Find best model for each metric
best_precision = results_df.loc[results_df['Precision'].idxmax()]
best_recall = results_df.loc[results_df['Recall'].idxmax()]
best_f1 = results_df.loc[results_df['F1-Score'].idxmax()]

print(f"Best Precision: {best_precision['Model']} ({best_precision['Precision']:.4f})")
print(f"Best Recall: {best_recall['Model']} ({best_recall['Recall']:.4f})")
print(f"Best F1-Score: {best_f1['Model']} ({best_f1['F1-Score']:.4f})")

# Algorithm vs Sampling Technique Analysis
print(f"\n" + "="*80)
print("ALGORITHM VS SAMPLING TECHNIQUE ANALYSIS")
print("="*80)

algorithms = ['Logistic Regression', 'Random Forest', 'XGBoost']
sampling_techniques = ['SMOTE', 'Undersampling']

comparison_data = []
for algo in algorithms:
    smote_row = results_df[results_df['Model'].str.contains(algo) & results_df['Model'].str.contains('SMOTE')]
    under_row = results_df[results_df['Model'].str.contains(algo) & results_df['Model'].str.contains('Undersampling')]

    if not smote_row.empty and not under_row.empty:
        smote_f1 = smote_row['F1-Score'].iloc[0]
        under_f1 = under_row['F1-Score'].iloc[0]
        better = "SMOTE" if smote_f1 > under_f1 else "Undersampling"
        diff = abs(smote_f1 - under_f1)
        
        comparison_data.append({
            'Algorithm': algo,
            'SMOTE_F1': smote_f1,
            'Undersampling_F1': under_f1,
            'Better_Technique': better,
            'F1_Difference': diff
        })
        
        print(f"{algo:20}: {better:13} wins (F1 diff: {diff:.4f})")
        print(f"{'':20}  SMOTE: {smote_f1:.4f}, Undersampling: {under_f1:.4f}")

# Overall sampling technique performance
smote_avg = results_df[results_df['Model'].str.contains('SMOTE')]['F1-Score'].mean()
under_avg = results_df[results_df['Model'].str.contains('Undersampling')]['F1-Score'].mean()
print(f"\nOverall Average Performance:")
print(f"  SMOTE Average F1-Score: {smote_avg:.4f}")
print(f"  Undersampling Average F1-Score: {under_avg:.4f}")
print(f"  Better Overall: {'SMOTE' if smote_avg > under_avg else 'Undersampling'}")

# Best parameter analysis
print(f"\n" + "="*80)
print("BEST HYPERPARAMETERS ANALYSIS")
print("="*80)

models_and_grids = [
    ("Logistic Regression (SMOTE)", lr_smote_grid),
    ("Logistic Regression (Undersampling)", lr_under_grid),
    ("Random Forest (SMOTE)", rf_smote_grid),
    ("Random Forest (Undersampling)", rf_under_grid),
    ("XGBoost (SMOTE)", xgb_smote_grid),
    ("XGBoost (Undersampling)", xgb_under_grid)
]

for model_name, grid in models_and_grids:
    print(f"\n{model_name}:")
    for param, value in grid.best_params_.items():
        print(f"  {param}: {value}")


DETAILED ANALYSIS - STANDARDIZED EXPERIMENTS
Best Precision: Random Forest (SMOTE) (1.0000)
Best Recall: Random Forest (Undersampling) (0.9999)
Best F1-Score: Random Forest (SMOTE) (0.9969)

ALGORITHM VS SAMPLING TECHNIQUE ANALYSIS
Logistic Regression : Undersampling wins (F1 diff: 0.0012)
                      SMOTE: 0.7167, Undersampling: 0.7179
Random Forest       : SMOTE         wins (F1 diff: 0.0108)
                      SMOTE: 0.9969, Undersampling: 0.9861
XGBoost             : Undersampling wins (F1 diff: 0.0030)
                      SMOTE: 0.9853, Undersampling: 0.9883

Overall Average Performance:
  SMOTE Average F1-Score: 0.8996
  Undersampling Average F1-Score: 0.8974
  Better Overall: SMOTE

BEST HYPERPARAMETERS ANALYSIS

Logistic Regression (SMOTE):
  classifier__solver: lbfgs
  classifier__penalty: l2
  classifier__max_iter: 1000
  classifier__C: 10

Logistic Regression (Undersampling):
  classifier__solver: lbfgs
  classifier__penalty: l2
  classifier__max_iter: 1000


## 6. Feature Importance Analysis - Standardized Results

In [19]:
# Feature importance analysis from standardized experiments
print(f"Feature Importance Analysis from Standardized Grid Search")
print("="*60)

# Get the best performing model
best_model_name = best_f1['Model']
print(f"Analyzing feature importance for: {best_model_name}")
print(f"Best F1-Score: {best_f1['F1-Score']:.4f}")

# Get the corresponding best model
best_model = None
if 'XGBoost' in best_model_name:
    if 'SMOTE' in best_model_name:
        best_model = xgb_smote_grid.best_estimator_
    else:
        best_model = xgb_under_grid.best_estimator_
elif 'Random Forest' in best_model_name:
    if 'SMOTE' in best_model_name:
        best_model = rf_smote_grid.best_estimator_
    else:
        best_model = rf_under_grid.best_estimator_
else:  # Logistic Regression
    if 'SMOTE' in best_model_name:
        best_model = lr_smote_grid.best_estimator_
    else:
        best_model = lr_under_grid.best_estimator_

# Extract feature importance
try:
    if hasattr(best_model.named_steps['classifier'], 'feature_importances_'):
        importance = best_model.named_steps['classifier'].feature_importances_
        feature_importance = pd.DataFrame({
            'Feature': X.columns,
            'Importance': importance
        }).sort_values('Importance', ascending=False)

        print(f"\nTop Features by Importance:")
        print(feature_importance.to_string(index=False))
        
        # Additional analysis
        print(f"\nFeature Importance Insights:")
        print(f"  Most Important Feature: {feature_importance.iloc[0]['Feature']} ({feature_importance.iloc[0]['Importance']:.4f})")
        print(f"  Least Important Feature: {feature_importance.iloc[-1]['Feature']} ({feature_importance.iloc[-1]['Importance']:.4f})")
        print(f"  Top 3 features account for {feature_importance.head(3)['Importance'].sum():.2%} of total importance")
        
    elif hasattr(best_model.named_steps['classifier'], 'coef_'):
        # For logistic regression, use coefficients
        coef = best_model.named_steps['classifier'].coef_[0]
        feature_importance = pd.DataFrame({
            'Feature': X.columns,
            'Coefficient': coef,
            'Abs_Coefficient': np.abs(coef)
        }).sort_values('Abs_Coefficient', ascending=False)

        print(f"\nLogistic Regression Coefficients (by absolute value):")
        print(feature_importance[['Feature', 'Coefficient', 'Abs_Coefficient']].to_string(index=False))
        
        print(f"\nCoefficient Insights:")
        print(f"  Strongest positive coefficient: {feature_importance[feature_importance['Coefficient'] > 0].iloc[0]['Feature']}")
        print(f"  Strongest negative coefficient: {feature_importance[feature_importance['Coefficient'] < 0].iloc[0]['Feature']}")
    else:
        print("Feature importance not available for this model type.")
        
except Exception as e:
    print(f"Could not extract feature importance: {e}")

# Compare feature importance across tree-based models
print(f"\n" + "="*80)
print("FEATURE IMPORTANCE COMPARISON ACROSS TREE-BASED MODELS")
print("="*80)

tree_models = [
    ("Random Forest (SMOTE)", rf_smote_grid.best_estimator_),
    ("Random Forest (Undersampling)", rf_under_grid.best_estimator_),
    ("XGBoost (SMOTE)", xgb_smote_grid.best_estimator_),
    ("XGBoost (Undersampling)", xgb_under_grid.best_estimator_)
]

feature_importance_comparison = pd.DataFrame({'Feature': X.columns})

for model_name, model in tree_models:
    try:
        if hasattr(model.named_steps['classifier'], 'feature_importances_'):
            importance = model.named_steps['classifier'].feature_importances_
            feature_importance_comparison[model_name] = importance
    except Exception as e:
        print(f"Could not extract importance for {model_name}: {e}")

if len(feature_importance_comparison.columns) > 1:
    # Calculate average importance across models
    importance_cols = [col for col in feature_importance_comparison.columns if col != 'Feature']
    feature_importance_comparison['Average_Importance'] = feature_importance_comparison[importance_cols].mean(axis=1)
    feature_importance_comparison = feature_importance_comparison.sort_values('Average_Importance', ascending=False)
    
    print("Feature Importance Comparison:")
    print(feature_importance_comparison.round(4).to_string(index=False))

Feature Importance Analysis from Standardized Grid Search
Analyzing feature importance for: Random Forest (SMOTE)
Best F1-Score: 0.9969

Top Features by Importance:
                       Feature  Importance
ratio_to_median_purchase_price    0.542648
                  online_order    0.165713
            distance_from_home    0.152853
distance_from_last_transaction    0.055761
               used_pin_number    0.040170
                     used_chip    0.034343
               repeat_retailer    0.008512

Feature Importance Insights:
  Most Important Feature: ratio_to_median_purchase_price (0.5426)
  Least Important Feature: repeat_retailer (0.0085)
  Top 3 features account for 86.12% of total importance

FEATURE IMPORTANCE COMPARISON ACROSS TREE-BASED MODELS
Feature Importance Comparison:
                       Feature  Random Forest (SMOTE)  Random Forest (Undersampling)  XGBoost (SMOTE)  XGBoost (Undersampling)  Average_Importance
ratio_to_median_purchase_price                 0.5426

## 7. Key Insights and Recommendations - Standardized Experiments

In [23]:
print("\n" + "="*80)
print("KEY INSIGHTS AND RECOMMENDATIONS - STANDARDIZED GRID SEARCH")
print("="*80), 1.0

print(f"\n1. STANDARDIZED EXPERIMENTAL SETUP:")
print(f"   • Used consistent grid search parameters across all experiments")
print(f"   • {GRID_SEARCH_CONFIG['n_iter']} parameter combinations tested per model")
print(f"   • {GRID_SEARCH_CONFIG['cv']}-fold cross-validation for robust evaluation")
print(f"   • F1-score as primary optimization metric")

print(f"\n2. BEST OVERALL MODEL:")
print(f"   • {best_f1['Model']} achieved the highest F1-Score of {best_f1['F1-Score']:.4f}")
print(f"   • Precision: {best_f1['Precision']:.4f}, Recall: {best_f1['Recall']:.4f}")
print(f"   • This model provides the best balance between Precision and Recall")

print(f"\n3. SAMPLING TECHNIQUE ANALYSIS:")
smote_avg = results_df[results_df['Model'].str.contains('SMOTE')]['F1-Score'].mean()
under_avg = results_df[results_df['Model'].str.contains('Undersampling')]['F1-Score'].mean()
better_sampling = "SMOTE" if smote_avg > under_avg else "Undersampling"
diff = abs(smote_avg - under_avg)
print(f"   • {better_sampling} performs better on average")
print(f"   • Average F1-Score: SMOTE ({smote_avg:.4f}) vs Undersampling ({under_avg:.4f})")
print(f"   • Difference: {diff:.4f}")

print(f"\n4. ALGORITHM RANKING BY BEST PERFORMANCE:")
algo_performance = {}
algorithms = ['Logistic Regression', 'Random Forest', 'XGBoost']
for algo in algorithms:
    algo_scores = results_df[results_df['Model'].str.contains(algo)]['F1-Score']
    algo_performance[algo] = algo_scores.max()

sorted_algos = sorted(algo_performance.items(), key=lambda x: x[1], reverse=True)
for i, (algo, score) in enumerate(sorted_algos, 1):
    print(f"   {i}. {algo}: {score:.4f} (best F1-score)")

print(f"\n5. PERFORMANCE CONSISTENCY:")
f1_std = results_df['F1-Score'].std()
f1_range = results_df['F1-Score'].max() - results_df['F1-Score'].min()
print(f"   • F1-Score standard deviation: {f1_std:.4f}")
print(f"   • F1-Score range: {f1_range:.4f}")
if f1_std < 0.1:
    print(f"   • Models show consistent performance across algorithms")
else:
    print(f"   • Significant performance variation between models")

print(f"\n6. HYPERPARAMETER INSIGHTS:")
print(f"   • Best performing configurations identified through systematic search")
print(f"   • Parameter optimization significantly impacts model performance")
print(f"   • Standardized approach ensures fair comparison between models")

# print(f"\n7. RECOMMENDATIONS:")
# print(f"   • Deploy {best_f1['Model']} for production fraud detection")
# print(f"   • Monitor model performance and retrain periodically")
# print(f"   • Consider ensemble methods combining top-performing models")
# print(f"   • Implement proper validation pipeline for new data")

# if results_df['Precision'].min() < 0.8:
#     print(f"   • WARNING: Some models have precision < 0.8, consider cost of false positives")
    
# if results_df['Recall'].min() < 0.8:
#     print(f"   • WARNING: Some models have recall < 0.8, consider cost of missed fraud")
print(f"\n7. EXPERIMENT REPRODUCIBILITY:")
print(f"   • All experiments use random_state=42")


KEY INSIGHTS AND RECOMMENDATIONS - STANDARDIZED GRID SEARCH

1. STANDARDIZED EXPERIMENTAL SETUP:
   • Used consistent grid search parameters across all experiments
   • 10 parameter combinations tested per model
   • 3-fold cross-validation for robust evaluation
   • F1-score as primary optimization metric

2. BEST OVERALL MODEL:
   • Random Forest (SMOTE) achieved the highest F1-Score of 0.9969
   • Precision: 1.0000, Recall: 0.9937
   • This model provides the best balance between Precision and Recall

3. SAMPLING TECHNIQUE ANALYSIS:
   • SMOTE performs better on average
   • Average F1-Score: SMOTE (0.8996) vs Undersampling (0.8974)
   • Difference: 0.0022

4. ALGORITHM RANKING BY BEST PERFORMANCE:
   1. Random Forest: 0.9969 (best F1-score)
   2. XGBoost: 0.9883 (best F1-score)
   3. Logistic Regression: 0.7179 (best F1-score)

5. PERFORMANCE CONSISTENCY:
   • F1-Score standard deviation: 0.1404
   • F1-Score range: 0.2802
   • Significant performance variation between models

6. 