# 06_Cost Evaluation

This notebook evaluates the expected **financial impact** of the trained fraud detection models.

Using predefined business cost assumptions, models are compared based on
their **total expected loss**, rather than purely statistical performance.

- Cost values are illustrative and chosen to reflect a common fraud detection scenario,
  where missed fraud incurs significantly higher loss than false alerts.

The analysis focuses on the selected production candidate (XGBoost),
with cost-based threshold optimization applied to validate
and refine the final operating decision.

In [None]:
# Define cost constants + load artifacts
import joblib
import pandas as pd
...
from sklearn.metrics import confusion_matrix


# Business cost assumptions
COST_FN = 500   # missed fraud
COST_FP = 5     # false alert

# Load model outputs
baseline_outputs = joblib.load("../artifacts/model_outputs_baseline.pkl")
rf_outputs = joblib.load("../artifacts/model_outputs_random_forest.pkl")
xgb_outputs = joblib.load("../artifacts/model_outputs_xgboost.pkl")

models_outputs = {
    "Logistic Regression": baseline_outputs,
    "Random Forest": rf_outputs,
    "XGBoost": xgb_outputs
}

y_test = baseline_outputs["y_test"]  # shared ground truth

## Cost Calculation Logic

The following function computes the expected business cost
based on false positives and false negatives.

In [None]:
def compute_total_cost(y_true, y_pred, cost_fn, cost_fp):
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    return {
        "total_cost": (fn * cost_fn) + (fp * cost_fp),
        "false_negatives": fn,
        "false_positives": fp
    }

## Cost-Based Threshold Optimization — XGBoost

This section explores how different decision thresholds impact
the expected financial cost for the selected production candidate (XGBoost).

The goal is to verify whether the previously selected threshold
remains optimal when evaluated from a cost perspective.

## Why Cost-Based Threshold Optimization Is Applied to XGBoost Only

Cost-based threshold exploration is performed only for the selected
production candidate (XGBoost).

Other models were excluded based on inferior ranking performance and
operational trade-offs observed during model evaluation.

Optimizing thresholds for models that are unlikely to be deployed
does not provide additional business value and increases decision complexity.


In [None]:
import numpy as np
import pandas as pd

# Extract XGBoost outputs
xgb_proba = xgb_outputs["y_pred_proba"]

# Threshold range to evaluate
thresholds = np.round(np.arange(0.25, 0.81, 0.05), 2)

results = []

for t in thresholds:
    # Convert probabilities to binary predictions
    y_pred = (xgb_proba >= t).astype(int)
    
    # Compute business cost
    cost_result = compute_total_cost(
        y_true=y_test,
        y_pred=y_pred,
        cost_fn=COST_FN,
        cost_fp=COST_FP
    )
    
    results.append({
        "threshold": t,
        "false_negatives": cost_result["false_negatives"],
        "false_positives": cost_result["false_positives"],
        "total_cost": cost_result["total_cost"]
    })

xgb_threshold_df = pd.DataFrame(results)

# Sort by lowest total cost
xgb_threshold_df.sort_values("total_cost")


## The cost curve reaches its minimum around a threshold of 0.30.
However, thresholds in the range of 0.30–0.40 provide a stable trade-off
between financial loss and alert volume.

The final operating threshold should be selected within this range
based on operational capacity and customer experience constraints.

Thresholds above 0.80 were not explored, as they lead to a sharp increase
in missed fraud cases with limited reduction in false alerts,
making them impractical for real-world fraud detection systems.

## Threshold Refinement (0.30 – 0.40)

After identifying the cost minimum region, a finer-grained threshold analysis
is performed to refine the final operating point within the stable cost range.

In [None]:
# Refined threshold range around the minimum
refined_thresholds = np.round(np.arange(0.30, 0.41, 0.01), 2)

refined_results = []

for t in refined_thresholds:
    y_pred = (xgb_proba >= t).astype(int)

    cost = compute_total_cost(
        y_true=y_test,
        y_pred=y_pred,
        cost_fn=COST_FN,
        cost_fp=COST_FP
    )

    refined_results.append({
        "threshold": t,
        "false_negatives": cost["false_negatives"],
        "false_positives": cost["false_positives"],
        "total_cost": cost["total_cost"]
    })

refined_df = pd.DataFrame(refined_results)
refined_df.sort_values("total_cost")


### Threshold Selection Rationale

The refined cost analysis shows that a threshold of **0.30** achieves the
lowest total financial cost under the current cost assumptions.

However, this threshold also produces the highest number of false positive alerts,
which may lead to increased customer friction and operational overhead
(e.g. card blocks, customer support interactions, or manual reviews).

Although a threshold of **0.35** results in one additional missed fraud case
compared to 0.30, the increase in total cost is relatively small.
In return, it significantly reduces the volume of false alerts,
leading to a more stable and practical operating point.

Given the relatively flat cost curve in the **0.30–0.40** range,
a threshold of **0.35** is selected as a balanced compromise between
minimizing financial loss and maintaining acceptable customer experience
and operational efficiency.

In [None]:
XGB_FINAL_THRESHOLD = 0.35

## Final Cost Comparison Across Models

After selecting the final operating threshold for XGBoost,
we compare the expected financial cost across all evaluated models
using fixed decision thresholds.

Random Forest and Logistic Regression thresholds are taken from the model evaluation phase
and kept fixed to ensure a fair cost-based comparison.


In [None]:
model_thresholds = {
    "Logistic Regression": 0.70,   # from evaluation
    "Random Forest": 0.35,        # from evaluation
    "XGBoost": XGB_FINAL_THRESHOLD
}

In [None]:
comparison_results = []

for model_name, outputs in models_outputs.items():
    y_proba = outputs["y_pred_proba"]
    threshold = model_thresholds[model_name]

    y_pred = (y_proba >= threshold).astype(int)

    cost = compute_total_cost(
        y_true=y_test,
        y_pred=y_pred,
        cost_fn=COST_FN,
        cost_fp=COST_FP
    )

    comparison_results.append({
        "model": model_name,
        "threshold": threshold,
        "false_negatives": cost["false_negatives"],
        "false_positives": cost["false_positives"],
        "total_cost": cost["total_cost"]
    })

comparison_df = pd.DataFrame(comparison_results)
comparison_df.sort_values("total_cost")


### Cross-Model Cost Comparison Insights

The table above compares the expected financial cost of all evaluated models
using their fixed operating thresholds.

Although Logistic Regression achieves fewer missed fraud cases at its selected threshold,
the significantly higher number of false positive alerts results in increased overall cost.

Random Forest shows very low false positives, but its higher number of missed fraud cases
leads to the highest financial loss among the evaluated models.

XGBoost achieves the lowest total expected cost by balancing fraud detection
and alert volume, making it the most suitable model from a business perspective.

## Final Business Recommendation

Based on cost-based threshold optimization and cross-model comparison,
**XGBoost** is selected as the preferred production model for fraud detection.

This decision minimizes expected financial loss while maintaining
a manageable number of false alerts, aligning model performance
with real-world operational and customer experience constraints.