# Fraud Detection Pipeline - Machine Learning

**Objective:** Membangun end-to-end machine learning pipeline untuk memprediksi probabilitas transaksi online yang fraudulent

**Dataset:** Transaction data (train_transaction.csv, test_transaction.csv)

## 1. Import Libraries

In [118]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score, classification_report, confusion_matrix
from imblearn.over_sampling import SMOTE
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
import warnings
warnings.filterwarnings('ignore')

## 2. Load Data

In [119]:
# Load data
train = pd.read_csv('dataset/train_transaction.csv')
test = pd.read_csv('dataset/test_transaction.csv')
print(f"Train shape: {train.shape}, Test shape: {test.shape}")

Train shape: (590540, 394), Test shape: (506691, 393)


## 3. Exploratory Data Analysis (EDA)

In [120]:
# Info dataset
print("=== INFORMASI DATASET ===")
print(f"Shape: {train.shape}")
print(f"\nTipe data:")
print(train.dtypes.value_counts())
print(f"\nMemory usage: {train.memory_usage(deep=True).sum() / 1024**2:.2f} MB")

=== INFORMASI DATASET ===
Shape: (590540, 394)

Tipe data:
float64    376
object      14
int64        4
Name: count, dtype: int64

Memory usage: 2062.07 MB

Memory usage: 2062.07 MB


In [121]:
# Target distribution
print("=== DISTRIBUSI TARGET (isFraud) ===")
print(train['isFraud'].value_counts())
print(f"\nPersentase fraud: {train['isFraud'].mean()*100:.2f}%")
print(f"Class imbalance ratio: 1:{(1-train['isFraud'].mean())/train['isFraud'].mean():.0f}")

=== DISTRIBUSI TARGET (isFraud) ===
isFraud
0    569877
1     20663
Name: count, dtype: int64

Persentase fraud: 3.50%
Class imbalance ratio: 1:28


In [122]:
# Missing values
print("=== MISSING VALUES ===")
missing = train.isnull().sum()
missing_pct = (missing / len(train) * 100).round(2)
missing_df = pd.DataFrame({'Missing': missing, 'Pct': missing_pct})
missing_df = missing_df[missing_df['Missing'] > 0].sort_values('Missing', ascending=False)
print(f"Kolom dengan missing values: {len(missing_df)}/{train.shape[1]}")
print(f"\nTop 10 kolom dengan missing terbanyak:")
print(missing_df.head(10))

=== MISSING VALUES ===
Kolom dengan missing values: 374/394

Top 10 kolom dengan missing terbanyak:
       Missing    Pct
dist2   552913  93.63
D7      551623  93.41
D13     528588  89.51
D14     528353  89.47
D12     525823  89.04
D6      517353  87.61
D8      515614  87.31
D9      515614  87.31
V156    508595  86.12
V139    508595  86.12
Kolom dengan missing values: 374/394

Top 10 kolom dengan missing terbanyak:
       Missing    Pct
dist2   552913  93.63
D7      551623  93.41
D13     528588  89.51
D14     528353  89.47
D12     525823  89.04
D6      517353  87.61
D8      515614  87.31
D9      515614  87.31
V156    508595  86.12
V139    508595  86.12


In [123]:
# Statistik deskriptif fitur numerik
print("=== STATISTIK DESKRIPTIF FITUR NUMERIK ===")
numeric_features = train.select_dtypes(include=[np.number]).columns.tolist()
if 'TransactionID' in numeric_features:
    numeric_features.remove('TransactionID')
if 'isFraud' in numeric_features:
    numeric_features.remove('isFraud')
    
print(f"Total fitur numerik: {len(numeric_features)}")
print(f"\nContoh statistik untuk 5 fitur pertama:")
print(train[numeric_features[:5]].describe())

=== STATISTIK DESKRIPTIF FITUR NUMERIK ===
Total fitur numerik: 378

Contoh statistik untuk 5 fitur pertama:
       TransactionDT  TransactionAmt          card1          card2  \
count   5.905400e+05   590540.000000  590540.000000  581607.000000   
mean    7.372311e+06      135.027176    9898.734658     362.555488   
std     4.617224e+06      239.162522    4901.170153     157.793246   
min     8.640000e+04        0.251000    1000.000000     100.000000   
25%     3.027058e+06       43.321000    6019.000000     214.000000   
50%     7.306528e+06       68.769000    9678.000000     361.000000   
75%     1.124662e+07      125.000000   14184.000000     512.000000   
max     1.581113e+07    31937.391000   18396.000000     600.000000   

               card3  
count  588975.000000  
mean      153.194925  
std        11.336444  
min       100.000000  
25%       150.000000  
50%       150.000000  
75%       150.000000  
max       231.000000  
Total fitur numerik: 378

Contoh statistik untuk 5 fi

In [124]:
# Categorical features
print("=== FITUR KATEGORIKAL ===")
cat_cols = train.select_dtypes(include=['object']).columns.tolist()
print(f"Total fitur kategorikal: {len(cat_cols)}")
if len(cat_cols) > 0:
    print(f"\nContoh unique values dari 5 fitur pertama:")
    for col in cat_cols[:5]:
        print(f"{col}: {train[col].nunique()} unique values")

=== FITUR KATEGORIKAL ===
Total fitur kategorikal: 14

Contoh unique values dari 5 fitur pertama:
ProductCD: 5 unique values
card4: 4 unique values
card6: 4 unique values
P_emaildomain: 59 unique values
R_emaildomain: 60 unique values


## 4. Data Preprocessing

In [125]:
# Gunakan semua data training
print(f"Total training data: {train.shape[0]:,} rows")
print(f"Fraud rate: {train['isFraud'].mean():.4f}")

Total training data: 590,540 rows
Fraud rate: 0.0350


In [126]:
# Pilih fitur numerik saja untuk model sederhana
numeric_cols = train.select_dtypes(include=[np.number]).columns.tolist()
numeric_cols.remove('TransactionID')
if 'isFraud' in numeric_cols:
    numeric_cols.remove('isFraud')

X = train[numeric_cols].copy()
y = train['isFraud'].copy()

# Handle missing values dengan median
X.fillna(X.median(), inplace=True)

print(f"Total features: {len(numeric_cols)}")
print(f"Data shape: {X.shape}")

Total features: 378
Data shape: (590540, 378)


## 5. Train-Validation Split & Scaling

In [127]:
# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

print(f"Train: {X_train_scaled.shape}, Val: {X_val_scaled.shape}")

Train: (472432, 378), Val: (118108, 378)


## 5.1. Handle Class Imbalance dengan SMOTE

In [128]:
# SMOTE untuk oversampling minority class
print("Sebelum SMOTE:")
print(f"Class 0: {(y_train == 0).sum()}, Class 1: {(y_train == 1).sum()}")

smote = SMOTE(random_state=42)
X_train_smote, y_train_smote = smote.fit_resample(X_train_scaled, y_train)

print(f"\nSetelah SMOTE:")
print(f"Class 0: {(y_train_smote == 0).sum()}, Class 1: {(y_train_smote == 1).sum()}")
print(f"Shape: {X_train_smote.shape}")

Sebelum SMOTE:
Class 0: 455902, Class 1: 16530

Setelah SMOTE:
Class 0: 455902, Class 1: 455902
Shape: (911804, 378)

Setelah SMOTE:
Class 0: 455902, Class 1: 455902
Shape: (911804, 378)


## 6. Model Training dengan SMOTE

In [129]:
# Random Forest dengan SMOTE data
model_smote = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42,
    n_jobs=-1,
    verbose=1
)
model_smote.fit(X_train_smote, y_train_smote)

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   42.9s
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   42.9s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:  2.2min finished
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:  2.2min finished


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   42.9s
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   42.9s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:  2.2min finished
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:  2.2min finished


0,1,2
,n_estimators,100
,criterion,'gini'
,max_depth,10
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


## 6.1. Evaluasi Model SMOTE

In [130]:
# Evaluasi Random Forest + SMOTE
y_pred_proba_smote = model_smote.predict_proba(X_val_scaled)[:, 1]
y_pred_smote = model_smote.predict(X_val_scaled)
roc_auc_smote = roc_auc_score(y_val, y_pred_proba_smote)

print(f"Random Forest + SMOTE ROC-AUC: {roc_auc_smote:.4f}")
print("\nConfusion Matrix:")
print(confusion_matrix(y_val, y_pred_smote))
print("\nClassification Report:")
print(classification_report(y_val, y_pred_smote))

[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s


[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s


Random Forest + SMOTE ROC-AUC: 0.8789

Confusion Matrix:
[[104891   9084]
 [  1411   2722]]

Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.92      0.95    113975
           1       0.23      0.66      0.34      4133

    accuracy                           0.91    118108
   macro avg       0.61      0.79      0.65    118108
weighted avg       0.96      0.91      0.93    118108



[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished


## 7. Model Training (Baseline tanpa SMOTE)

In [131]:
# Model: Random Forest dengan class weight untuk handle imbalance
model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    class_weight='balanced',
    random_state=42,
    n_jobs=-1,
    verbose=1
)
model.fit(X_train_scaled, y_train)

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   13.0s
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   13.0s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:   39.6s finished
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:   39.6s finished


[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   13.0s
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   13.0s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:   39.6s finished
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:   39.6s finished


0,1,2
,n_estimators,100
,criterion,'gini'
,max_depth,10
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


## 8. Model Evaluation

In [132]:
# Evaluasi
y_pred_proba = model.predict_proba(X_val_scaled)[:, 1]
y_pred = model.predict(X_val_scaled)

roc_auc = roc_auc_score(y_val, y_pred_proba)

print(f"ROC-AUC: {roc_auc:.4f}")
print("\nConfusion Matrix:")
print(confusion_matrix(y_val, y_pred))
print("\nClassification Report:")
print(classification_report(y_val, y_pred))

[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s


[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.4s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s


ROC-AUC: 0.8827

Confusion Matrix:
[[99036 14939]
 [ 1048  3085]]

Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.87      0.93    113975
           1       0.17      0.75      0.28      4133

    accuracy                           0.86    118108
   macro avg       0.58      0.81      0.60    118108
weighted avg       0.96      0.86      0.90    118108



[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.5s finished


## 9. Prediksi Test Set & Submission

In [133]:
# Prediksi untuk test set
test_ids = test['TransactionID'].copy()
X_test_real = test[numeric_cols].copy()
X_test_real.fillna(X.median(), inplace=True)
X_test_real_scaled = scaler.transform(X_test_real)

test_proba = model.predict_proba(X_test_real_scaled)[:, 1]

# # Buat submission file
# submission = pd.DataFrame({
#     'TransactionID': test_ids,
#     'isFraud': test_proba
# })
# submission.to_csv('submission.csv', index=False)
# print("Submission saved!")

[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    1.0s
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    2.4s finished
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    2.4s finished


## 10. Deep Learning Pipeline (PyTorch)

In [134]:
# Definisi Neural Network sederhana
class FraudNet(nn.Module):
    def __init__(self, input_size):
        super(FraudNet, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.1)
        # self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.relu(self.fc3(x))
        x = self.dropout(x)
        x = self.fc4(x)
        # x = self.sigmoid(self.fc4(x))
        return x

In [135]:
# Persiapan data untuk PyTorch
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train.values).reshape(-1, 1)
X_val_tensor = torch.FloatTensor(X_val_scaled)
y_val_tensor = torch.FloatTensor(y_val.values).reshape(-1, 1)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=1024, shuffle=True)

print(f"Input size: {X_train_scaled.shape[1]}")

Input size: 378


In [136]:
# Inisialisasi model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dl_model = FraudNet(X_train_scaled.shape[1]).to(device)

# Loss function dengan weight untuk handle imbalance
pos_weight = torch.tensor([len(y_train) / y_train.sum() - 1]).to(device)
criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
optimizer = optim.Adam(dl_model.parameters(), lr=0.005)

print(f"Device: {device}")
print(f"Pos weight: {pos_weight.item():.2f}")

Device: cpu
Pos weight: 27.58


In [137]:
# Training loop
epochs = 30
for epoch in range(epochs):
    dl_model.train()
    total_loss = 0
    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)
        
        optimizer.zero_grad()
        outputs = dl_model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    print(f"Epoch {epoch+1}/{epochs}, Loss: {total_loss/len(train_loader):.4f}")

Epoch 1/30, Loss: 0.9244
Epoch 2/30, Loss: 0.8597
Epoch 2/30, Loss: 0.8597
Epoch 3/30, Loss: 0.8272
Epoch 3/30, Loss: 0.8272
Epoch 4/30, Loss: 0.8032
Epoch 4/30, Loss: 0.8032
Epoch 5/30, Loss: 0.7797
Epoch 5/30, Loss: 0.7797
Epoch 6/30, Loss: 0.7637
Epoch 6/30, Loss: 0.7637
Epoch 7/30, Loss: 0.7488
Epoch 7/30, Loss: 0.7488
Epoch 8/30, Loss: 0.7339
Epoch 8/30, Loss: 0.7339
Epoch 9/30, Loss: 0.7215
Epoch 9/30, Loss: 0.7215
Epoch 10/30, Loss: 0.7080
Epoch 10/30, Loss: 0.7080
Epoch 11/30, Loss: 0.7024
Epoch 11/30, Loss: 0.7024
Epoch 12/30, Loss: 0.6900
Epoch 13/30, Loss: 0.6821
Epoch 15/30, Loss: 0.6652
Epoch 16/30, Loss: 0.6652
Epoch 17/30, Loss: 0.6613
Epoch 18/30, Loss: 0.6485
Epoch 19/30, Loss: 0.6465
Epoch 20/30, Loss: 0.6430
Epoch 21/30, Loss: 0.6354
Epoch 22/30, Loss: 0.6294
Epoch 23/30, Loss: 0.6256
Epoch 24/30, Loss: 0.6238
Epoch 25/30, Loss: 0.6258
Epoch 26/30, Loss: 0.6110
Epoch 27/30, Loss: 0.6111
Epoch 28/30, Loss: 0.6103
Epoch 29/30, Loss: 0.6011
Epoch 30/30, Loss: 0.6037


In [138]:
# Evaluasi Deep Learning model pada validation set
dl_model.eval()
with torch.no_grad():
    X_val_device = X_val_tensor.to(device)
    dl_pred_proba = dl_model(X_val_device).cpu().numpy()
    dl_pred = (dl_pred_proba > 0.5).astype(int)

dl_roc_auc = roc_auc_score(y_val, dl_pred_proba)
print(f"Deep Learning ROC-AUC: {dl_roc_auc:.4f}")
print("\nConfusion Matrix:")
print(confusion_matrix(y_val, dl_pred))
print("\nClassification Report:")
print(classification_report(y_val, dl_pred))

Deep Learning ROC-AUC: 0.9145

Confusion Matrix:
[[108235   5740]
 [  1257   2876]]

Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.95      0.97    113975
           1       0.33      0.70      0.45      4133

    accuracy                           0.94    118108
   macro avg       0.66      0.82      0.71    118108
weighted avg       0.97      0.94      0.95    118108



In [139]:
# Prediksi test set dengan Deep Learning
X_test_real_tensor = torch.FloatTensor(X_test_real_scaled).to(device)
dl_model.eval()
with torch.no_grad():
    dl_test_proba = dl_model(X_test_real_tensor).cpu().numpy()

# dl_submission = pd.DataFrame({
#     'TransactionID': test_ids,
#     'isFraud': dl_test_proba.flatten()
# })
# dl_submission.to_csv('submission_dl.csv', index=False)
# print("Deep Learning submission saved!")

## 11. Model Comparison

In [140]:
# Bandingkan performa semua model
comparison = pd.DataFrame({
    'Model': ['Random Forest', 'Random Forest + SMOTE', 'Deep Learning (PyTorch)'],
    'ROC-AUC': [roc_auc, roc_auc_smote, dl_roc_auc]
})

comparison = comparison.sort_values('ROC-AUC', ascending=False).reset_index(drop=True)
comparison['Rank'] = range(1, len(comparison) + 1)

print("=== PERBANDINGAN MODEL ===")
print(comparison.to_string(index=False))
print(f"\nModel terbaik: {comparison.iloc[0]['Model']} (ROC-AUC: {comparison.iloc[0]['ROC-AUC']:.4f})")

=== PERBANDINGAN MODEL ===
                  Model  ROC-AUC  Rank
Deep Learning (PyTorch) 0.914507     1
          Random Forest 0.882749     2
  Random Forest + SMOTE 0.878852     3

Model terbaik: Deep Learning (PyTorch) (ROC-AUC: 0.9145)


In [141]:
# Detail perbandingan per metrik
from sklearn.metrics import precision_score, recall_score, f1_score

rf_pred = model.predict(X_val_scaled)

detailed_comparison = pd.DataFrame({
    'Model': ['Random Forest', 'Random Forest + SMOTE', 'Deep Learning'],
    'ROC-AUC': [
        roc_auc_score(y_val, y_pred_proba),
        roc_auc_score(y_val, y_pred_proba_smote),
        roc_auc_score(y_val, dl_pred_proba)
    ],
    'Precision': [
        precision_score(y_val, rf_pred),
        precision_score(y_val, y_pred_smote),
        precision_score(y_val, dl_pred)
    ],
    'Recall': [
        recall_score(y_val, rf_pred),
        recall_score(y_val, y_pred_smote),
        recall_score(y_val, dl_pred)
    ],
    'F1-Score': [
        f1_score(y_val, rf_pred),
        f1_score(y_val, y_pred_smote),
        f1_score(y_val, dl_pred)
    ]
})

print("=== DETAIL PERBANDINGAN METRIK ===")
print(detailed_comparison.to_string(index=False))

[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    0.5s finished


=== DETAIL PERBANDINGAN METRIK ===
                Model  ROC-AUC  Precision   Recall  F1-Score
        Random Forest 0.882749   0.171161 0.746431  0.278467
Random Forest + SMOTE 0.878852   0.230561 0.658602  0.341552
        Deep Learning 0.914507   0.333798 0.695863  0.451173
