
# 02 – F1 and Cost-Sensitive Metrics

**Module:** Anomaly & Fraud Detection  
**Topic:** Rare-Event Evaluation Metrics

This notebook demonstrates **F1 score and cost-sensitive metrics** for evaluating
fraud and anomaly detection models. It complements precision-recall curve analysis
and supports business-aligned decision-making.


## Objective

Build a leakage-free workflow that:
- Computes F1 score and class-specific metrics
- Incorporates business costs for false positives and false negatives
- Evaluates models on the original rare-event distribution
- Supports threshold selection for deployment


## Design Principles

✔ Metrics reflect business risk  
✔ Rare-event evaluation without oversampling or resampling  
✔ Threshold-based evaluation enabled  
✔ Leakage-free train/validation split


## High-Level Workflow


&nbsp;&nbsp;&nbsp;&nbsp;↓  
Predict Probabilities on Validation/Test  
&nbsp;&nbsp;&nbsp;&nbsp;↓  
Compute F1 and Cost-Sensitive Metrics  
&nbsp;&nbsp;&nbsp;&nbsp;↓  
Threshold Adjustment  
&nbsp;&nbsp;&nbsp;&nbsp;↓  
Deployment & Monitoring


## Imports and Setup



In [9]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_curve, classification_report

np.random.seed(2010)

## Simulated Imbalanced Fraud Dataset


In [11]:
X, y = make_classification(
    n_samples=10000,
    n_features=10,
    n_informative=5,
    n_redundant=2,
    weights=[0.985, 0.015],
    flip_y=0.001,
    random_state=42
)

df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(X.shape[1])])
df["fraud"] = y

## Leakage-Free Train / Test Split


In [15]:
X_train, X_test, y_train, y_test = train_test_split(
    df.drop(columns="fraud"), df["fraud"],
    test_size=0.3, stratify=df["fraud"], random_state=42
)

## Train Weighted Model

In [18]:
model = LogisticRegression(max_iter=1000, class_weight='balanced')
model.fit(X_train, y_train)

## Predict Probabilities

In [21]:
y_probs = model.predict_proba(X_test)[:,1]

## Threshold Selection via F1

In [24]:
precision, recall, thresholds = precision_recall_curve(y_test, y_probs)
f1_scores = 2 * precision * recall / (precision + recall + 1e-9)
best_idx = np.argmax(f1_scores)
best_threshold = thresholds[best_idx]
y_pred = (y_probs >= best_threshold).astype(int)

print(f"Optimal threshold (max F1): {best_threshold:.3f}")
print(classification_report(y_test, y_pred))

Optimal threshold (max F1): 0.735
              precision    recall  f1-score   support

           0       0.99      0.93      0.96      2954
           1       0.11      0.57      0.18        46

    accuracy                           0.92      3000
   macro avg       0.55      0.75      0.57      3000
weighted avg       0.98      0.92      0.95      3000



##  Cost-Sensitive Evaluation 
### Example: assign costs to false positives and false negatives

In [27]:
cost_fp = 1  # cost of false positive
cost_fn = 20  # cost of false negative

tp = np.sum((y_pred == 1) & (y_test == 1))
fp = np.sum((y_pred == 1) & (y_test == 0))
tn = np.sum((y_pred == 0) & (y_test == 0))
fn = np.sum((y_pred == 0) & (y_test == 1))

total_cost = fp*cost_fp + fn*cost_fn
n_events = len(y_test)

print(f"Total cost: {total_cost}")
print(f"Average cost per event: {total_cost/n_events:.4f}")

Total cost: 614
Average cost per event: 0.2047



##  Interpretation

- F1 provides a balanced measure of precision and recall
- Cost-sensitive metrics translate misclassification into monetary or business impact
- Optimal thresholds depend on both F1 and cost trade-offs


## Production Checklist

✔ Evaluate on original distribution  
✔ Threshold tuned via F1 or business metric  
✔ Cost-sensitive monitoring in place  
✔ Periodic recalibration of threshold and cost assumptions


## Key Takeaways

- F1 is essential for rare-event evaluation
- Cost-sensitive metrics align model output with business risk
- Threshold tuning and monitoring are critical for production deployment


## Next Steps

- Integrate with PR-curve analysis for visual threshold selection  
- Compare thresholds across class weighting, resampling, and focal loss methods  
- Monitor F1 and cost-sensitive metrics over time in live systems