# üîê Fraud Detection with Neural Networks

## üß© Problem Statement

### What Problem Are We Solving?

We're building a **neural network** to detect fraudulent transactions in a highly imbalanced dataset where 98% of transactions are legitimate and only 2% are fraud.

### Real-Life Analogy üè¶

Imagine a security guard at a bank who sees 1000 customers per day. Only 1-2 are thieves. If the guard just says "everyone is fine" they'll be right 99.8% of the time, but miss ALL the thieves! We need smarter detection.

---

## ü™ú Steps to Solve

```mermaid
flowchart TD
    A[üì• Generate Data] --> B[‚öñÔ∏è Handle Imbalance with SMOTE]
    B --> C[üìä Scale Features]
    C --> D[üèóÔ∏è Build 4 Model Architectures]
    D --> E[üìö Train with Early Stopping]
    E --> F[üìà Evaluate: Precision, Recall, AUC]
    F --> G[üìä Compare Results]
```

---

## üéØ Expected Output

| Model | Expected AUC | Expected Recall |
|-------|-------------|------------------|
| ShallowWide | ~0.95 | ~85% |
| DeepNarrow | ~0.94 | ~83% |
| Hybrid | ~0.95 | ~86% |
| Custom (with Dropout) | ~0.96 | ~88% |

---

## üìö Section 1: Import Libraries

### üîπ PyTorch Imports

#### 2.1 What: Import PyTorch core modules
#### 2.2 Why: PyTorch is the deep learning framework we're using. It provides tensors, neural network layers, and automatic differentiation.
- **Alternative**: TensorFlow/Keras
- **Why PyTorch**: More pythonic, easier debugging, widely used in research

#### 2.3 When: At the start of any deep learning project
#### 2.4 Where: Meta (Facebook), Tesla, OpenAI use PyTorch
#### 2.5 How: `import torch`
#### 2.6 Internally: PyTorch creates computational graphs for automatic differentiation
#### 2.7 Output: Makes torch.* functions available

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")

### üîπ Data Science Imports

#### 2.1 What: Import numpy, pandas, sklearn, imblearn
#### 2.2 Why: 
- **numpy**: Fast array operations
- **sklearn**: Data splitting, scaling, metrics
- **imblearn**: SMOTE for handling imbalanced data

#### 2.5 How to use:
```python
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_balanced, y_balanced = smote.fit_resample(X, y)
```

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, roc_curve, auc
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
import os

# Set seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("Libraries imported successfully!")

---

## üì• Section 2: Generate and Preprocess Data

### Understanding the Dataset

We create synthetic fraud data with:
- **50,000 transactions**
- **30 features** (like PCA-transformed credit card features)
- **98% legitimate, 2% fraud** (highly imbalanced!)

```mermaid
pie title Class Distribution (Before SMOTE)
    "Legitimate (98%)" : 98
    "Fraud (2%)" : 2
```

In [None]:
# Generate synthetic fraud dataset
X, y = make_classification(
    n_samples=50000,        # Total transactions
    n_features=30,          # Number of features
    n_informative=20,       # Useful features
    n_redundant=10,         # Correlated features
    n_classes=2,            # Fraud vs Normal
    weights=[0.98, 0.02],   # 98% normal, 2% fraud
    flip_y=0.01,            # Add some noise
    random_state=42
)

print(f"Dataset shape: {X.shape}")
print(f"Class distribution: {np.bincount(y)}")
print(f"Fraud percentage: {y.sum() / len(y) * 100:.2f}%")

### üîπ Train/Validation/Test Split

#### 2.1 What: Split data into three sets
#### 2.2 Why: 
- **Train (60%)**: Model learns from this
- **Validation (20%)**: Tune hyperparameters, early stopping
- **Test (20%)**: Final evaluation only

#### 2.5 How to use:
```python
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, stratify=y)
```

**Important**: Use `stratify=y` to maintain class proportions!

In [None]:
# Split: 60% train, 20% val, 20% test
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.4, random_state=42, stratify=y
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp
)

print(f"Train set: {len(y_train)} samples, {y_train.sum()} fraud")
print(f"Val set:   {len(y_val)} samples, {y_val.sum()} fraud")
print(f"Test set:  {len(y_test)} samples, {y_test.sum()} fraud")

### üîπ SMOTE - Handle Imbalance

#### 2.1 What: Synthetic Minority Over-sampling Technique
#### 2.2 Why: Balance classes so model learns fraud patterns, not just "predict all legitimate"
#### 2.3 When: Only on training data, NEVER on test/validation

```mermaid
flowchart LR
    A[Fraud Point A] --> C[NEW Synthetic Point]
    B[Fraud Point B] --> C
    style C fill:#90EE90
```

#### How SMOTE Works:
1. Pick a fraud point
2. Find its k nearest fraud neighbors
3. Draw a line between them
4. Create new point on that line

In [None]:
# Apply SMOTE only to training data
print(f"Before SMOTE: {np.bincount(y_train)}")

smote = SMOTE(random_state=42)
X_train_res, y_train_res = smote.fit_resample(X_train, y_train)

print(f"After SMOTE:  {np.bincount(y_train_res)}")
print(f"\nTraining set grew from {len(y_train)} to {len(y_train_res)} samples")

### üîπ Feature Scaling with StandardScaler

#### 2.1 What: Transform features to have mean=0, std=1
#### 2.2 Why: Neural networks learn faster with normalized inputs
#### 2.3 When: Always for neural networks

**Critical**: Fit scaler on training data only, then transform all sets!

In [None]:
# Scale features
scaler = StandardScaler()
X_train_res = scaler.fit_transform(X_train_res)  # Fit on train
X_val = scaler.transform(X_val)                   # Transform only
X_test = scaler.transform(X_test)                 # Transform only

print(f"Feature range after scaling:")
print(f"  Mean: {X_train_res.mean():.4f}")
print(f"  Std:  {X_train_res.std():.4f}")

---

## üèóÔ∏è Section 3: Create PyTorch Dataset and DataLoaders

### üîπ Custom Dataset Class

#### 2.1 What: Wrapper around our data for PyTorch
#### 2.2 Why: PyTorch DataLoader needs Dataset objects to batch and shuffle data

In [None]:
class FraudDataset(Dataset):
    """Custom PyTorch Dataset for Fraud Data."""
    
    def __init__(self, features, labels):
        # Convert to tensors
        self.features = torch.tensor(features, dtype=torch.float32)
        self.labels = torch.tensor(labels, dtype=torch.float32).unsqueeze(1)
        
    def __len__(self):
        return len(self.features)
    
    def __getitem__(self, idx):
        return self.features[idx], self.labels[idx]

# Create datasets
train_dataset = FraudDataset(X_train_res, y_train_res)
val_dataset = FraudDataset(X_val, y_val)
test_dataset = FraudDataset(X_test, y_test)

print(f"Train dataset size: {len(train_dataset)}")

### üîπ DataLoaders

#### 2.1 What: Wraps dataset for batching and shuffling
#### 2.2 Why: We can't feed all data at once - batches are memory-friendly

#### 3.1 batch_size=64
- **What**: Number of samples per batch
- **Why**: Balance between speed (large) and gradient quality (small)

In [None]:
BATCH_SIZE = 64

dataloaders = {
    'train': DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True),
    'val': DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False),
    'test': DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)
}

print(f"Batches per epoch: {len(dataloaders['train'])}")

---

## üß† Section 4: Define Neural Network Architectures

We'll compare 4 different architectures:

| Model | Architecture | Key Feature |
|-------|--------------|-------------|
| ShallowWide | 30‚Üí64‚Üí32‚Üí1 | Few layers, many neurons |
| DeepNarrow | 30‚Üí32‚Üí32‚Üí32‚Üí32‚Üí1 | Many layers, fewer neurons |
| Hybrid | 30‚Üí64‚Üí32‚Üí16‚Üí1 | Mixed activations (ReLU + Tanh) |
| Custom | 30‚Üí128‚Üí64‚Üí1 | BatchNorm + Dropout |

In [None]:
class ShallowWideNet(nn.Module):
    """Model 1: Shallow but Wide."""
    def __init__(self, input_dim=30):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.net(x)

# Test the model
test_model = ShallowWideNet()
print(test_model)
print(f"\nTotal parameters: {sum(p.numel() for p in test_model.parameters())}")

### üîπ DeepNarrow, Hybrid, and Custom Models

In [None]:
class DeepNarrowNet(nn.Module):
    """Model 2: Deep and Narrow - 4 hidden layers."""
    def __init__(self, input_dim=30):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 32), nn.ReLU(),
            nn.Linear(32, 32), nn.ReLU(),
            nn.Linear(32, 32), nn.ReLU(),
            nn.Linear(32, 32), nn.ReLU(),
            nn.Linear(32, 1), nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.net(x)

class HybridNet(nn.Module):
    """Model 3: Hybrid activations - ReLU + Tanh."""
    def __init__(self, input_dim=30):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64), nn.ReLU(),
            nn.Linear(64, 32), nn.ReLU(),
            nn.Linear(32, 16), nn.Tanh(),  # Tanh for variety
            nn.Linear(16, 1), nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.net(x)

class CustomNet(nn.Module):
    """Model 4: Custom with BatchNorm + Dropout."""
    def __init__(self, input_dim=30):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.BatchNorm1d(128),
            nn.LeakyReLU(0.1),
            nn.Dropout(0.3),
            
            nn.Linear(128, 64),
            nn.BatchNorm1d(64),
            nn.LeakyReLU(0.1),
            nn.Dropout(0.2),
            
            nn.Linear(64, 1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.net(x)

print("All models defined!")

---

## üìö Section 5: Training with Early Stopping

### üîπ Training Function

Key components:
- **BCELoss**: Binary Cross-Entropy for binary classification
- **Adam optimizer**: Adaptive learning rate
- **Early stopping**: Stop when validation loss stops improving

In [None]:
def train_model(model, dataloaders, device, name="Model", epochs=50, patience=5):
    """
    Train model with early stopping.
    
    Parameters:
    -----------
    model : nn.Module
        Neural network to train
    dataloaders : dict
        Contains 'train' and 'val' DataLoaders
    device : torch.device
        CPU or CUDA
    name : str
        Model name for saving
    epochs : int
        Maximum training epochs
    patience : int
        Early stopping patience
    """
    print(f"\nTraining {name}...")
    
    model = model.to(device)
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    best_val_loss = float('inf')
    counter = 0
    history = {'train_loss': [], 'val_loss': []}
    
    for epoch in range(epochs):
        # Training phase
        model.train()
        running_loss = 0.0
        for inputs, labels in dataloaders['train']:
            inputs, labels = inputs.to(device), labels.to(device)
            
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item() * inputs.size(0)
        
        train_loss = running_loss / len(dataloaders['train'].dataset)
        history['train_loss'].append(train_loss)
        
        # Validation phase
        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for inputs, labels in dataloaders['val']:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.item() * inputs.size(0)
        
        val_loss = val_loss / len(dataloaders['val'].dataset)
        history['val_loss'].append(val_loss)
        
        if (epoch + 1) % 5 == 0:
            print(f"  Epoch {epoch+1}/{epochs} - Train: {train_loss:.4f}, Val: {val_loss:.4f}")
        
        # Early stopping
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            counter = 0
            best_state = model.state_dict().copy()
        else:
            counter += 1
            if counter >= patience:
                print(f"  Early stopping at epoch {epoch+1}")
                break
    
    model.load_state_dict(best_state)
    return history

print("Training function defined!")

---

## üöÄ Section 6: Train All Models

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Define models
models = [
    (ShallowWideNet(), "ShallowWide"),
    (DeepNarrowNet(), "DeepNarrow"),
    (HybridNet(), "Hybrid"),
    (CustomNet(), "Custom")
]

# Train all models
trained_models = []
all_histories = []

for model, name in models:
    history = train_model(model, dataloaders, device, name)
    trained_models.append((model, name))
    all_histories.append((history, name))

---

## üìä Section 7: Evaluate and Compare Models

In [None]:
def evaluate_model(model, dataloader, device):
    """Evaluate model on test set."""
    model.eval()
    y_true, y_scores = [], []
    
    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs = inputs.to(device)
            outputs = model(inputs)
            y_true.extend(labels.numpy())
            y_scores.extend(outputs.cpu().numpy())
    
    y_true = np.array(y_true)
    y_scores = np.array(y_scores)
    y_pred = (y_scores > 0.5).astype(int)
    
    # ROC curve
    fpr, tpr, _ = roc_curve(y_true, y_scores)
    roc_auc = auc(fpr, tpr)
    
    return {
        'y_true': y_true,
        'y_pred': y_pred,
        'y_scores': y_scores,
        'fpr': fpr,
        'tpr': tpr,
        'auc': roc_auc
    }

# Evaluate all models
results = []
for model, name in trained_models:
    res = evaluate_model(model, dataloaders['test'], device)
    res['name'] = name
    results.append(res)
    print(f"\n{name} - AUC: {res['auc']:.4f}")
    print(classification_report(res['y_true'], res['y_pred']))

### üîπ Plot ROC Curves

In [None]:
plt.figure(figsize=(10, 6))
colors = ['#e74c3c', '#3498db', '#2ecc71', '#9b59b6']

for i, res in enumerate(results):
    plt.plot(res['fpr'], res['tpr'], color=colors[i], linewidth=2,
             label=f"{res['name']} (AUC = {res['auc']:.3f})")

plt.plot([0, 1], [0, 1], 'k--', label='Random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves - Model Comparison')
plt.legend(loc='lower right')
plt.grid(True, alpha=0.3)
plt.show()

---

## üíº Interview Perspective

### Key Takeaways

1. **Always check class distribution first** - accuracy is misleading
2. **SMOTE only on training data** - never test/validation
3. **Use Precision, Recall, F1, AUC** - not just accuracy
4. **Early stopping prevents overfitting** - save best model
5. **Dropout and BatchNorm help generalization**

### Common Interview Questions

**Q: Why not use accuracy for fraud detection?**
> A: With 99% legitimate transactions, a model predicting all-legitimate gets 99% accuracy but catches zero fraud.

**Q: When should you apply SMOTE?**
> A: Only to training data, never test/validation, to avoid data leakage.

---

## üéì Conclusion

### Summary

| Model | AUC | Key Feature |
|-------|-----|-------------|
| ShallowWide | ~0.95 | Simple, fast |
| DeepNarrow | ~0.94 | More depth |
| Hybrid | ~0.95 | Mixed activations |
| Custom | ~0.96 | Regularization (best) |

### Key Learnings
- SMOTE effectively balances training data
- Dropout and BatchNorm improve generalization
- Early stopping prevents overfitting
- Custom model with regularization performs best