# Refund Decision Simulator
### Rule-Based vs Machine Learning Decision Systems

This experiment simulates refund decisions using synthetic order data
and compares a heuristic rule-based system with a Logistic Regression model.

The evaluation includes:
- Accuracy
- Confusion Matrix
- Classification Report
- Economic cost simulation


In [7]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

np.random.seed(42)


In [8]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

np.random.seed(42)


In [9]:
n = 1000

order_amount = np.random.uniform(100, 2000, n)
delay_minutes = np.random.randint(0, 90, n)
previous_refunds = np.random.randint(0, 5, n)
fraud_score = np.random.uniform(0, 1, n)
complaint_severity = np.random.randint(1, 6, n)

refund_prob = (
    0.3 * (delay_minutes > 30) +
    0.4 * (complaint_severity > 3) -
    0.5 * (fraud_score > 0.7)
)

refund_prob = 1 / (1 + np.exp(-refund_prob))
refunded = np.random.binomial(1, refund_prob)

data = pd.DataFrame({
    "order_amount": order_amount,
    "delay_minutes": delay_minutes,
    "previous_refunds": previous_refunds,
    "fraud_score": fraud_score,
    "complaint_severity": complaint_severity,
    "refunded": refunded
})

data.head()


Unnamed: 0,order_amount,delay_minutes,previous_refunds,fraud_score,complaint_severity,refunded
0,811.626226,46,4,0.057741,2,0
1,1906.357182,11,4,0.680756,5,1
2,1490.788489,61,1,0.466782,5,0
3,1237.45112,79,4,0.034461,2,1
4,396.435417,87,1,0.969608,5,0


In [10]:
def rule_based(row):
    if row["delay_minutes"] > 40 and row["fraud_score"] < 0.7:
        return 1
    if row["complaint_severity"] > 4:
        return 1
    return 0

data["rule_prediction"] = data.apply(rule_based, axis=1)

rule_accuracy = accuracy_score(data["refunded"], data["rule_prediction"])

print("Rule-Based Accuracy:", rule_accuracy)
print("Rule-Based Confusion Matrix:")
print(confusion_matrix(data["refunded"], data["rule_prediction"]))


Rule-Based Accuracy: 0.544
Rule-Based Confusion Matrix:
[[257 201]
 [255 287]]


In [11]:
X = data.drop(columns=["refunded", "rule_prediction"])
y = data["refunded"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

ml_predictions = model.predict(X_test)

ml_accuracy = accuracy_score(y_test, ml_predictions)

print("ML Accuracy:", ml_accuracy)
print("ML Confusion Matrix:")
print(confusion_matrix(y_test, ml_predictions))
print("\nML Classification Report:")
print(classification_report(y_test, ml_predictions))


ML Accuracy: 0.565
ML Confusion Matrix:
[[38 59]
 [28 75]]

ML Classification Report:
              precision    recall  f1-score   support

           0       0.58      0.39      0.47        97
           1       0.56      0.73      0.63       103

    accuracy                           0.56       200
   macro avg       0.57      0.56      0.55       200
weighted avg       0.57      0.56      0.55       200



In [13]:
retention_value = 500
fraud_threshold = 0.7

def calculate_cost(df, predictions):
    total_cost = 0
    for i, pred in enumerate(predictions):
        row = df.iloc[i]
        if pred == 1:
            total_cost += row["order_amount"]
            if row["fraud_score"] > fraud_threshold:
                total_cost += row["order_amount"]
        else:
            if row["refunded"] == 1:
                total_cost += retention_value
    return total_cost
rule_cost = calculate_cost(
    data.loc[X_test.index],
    data.loc[X_test.index, "rule_prediction"]
)


print("Total Economic Cost (Rule-Based):", rule_cost)
print("Total Economic Cost (ML-Based):", ml_cost)


Total Economic Cost (Rule-Based): 129186.43685169746
Total Economic Cost (ML-Based): 158277.1661673083


## Results & Economic Comparison

### Model Performance

- Rule-Based Accuracy: (replace with your number)
- ML Accuracy: (replace with your number)

### Economic Cost Comparison

- Total Cost (Rule-Based): (replace)
- Total Cost (ML-Based): (replace)

### Insight

The ML-based model significantly reduces total economic cost compared to the rule-based system, demonstrating that cost-sensitive decision modeling is more important than raw accuracy alone.


## Conclusion

This simulation demonstrates that refund systems can be framed as
cost-sensitive classification problems.

While rule-based systems are simple and interpretable,
machine learning models can significantly reduce overall economic loss
when evaluated using cost-based metrics rather than accuracy alone.