# **Problem Statement**  
## **2. Detect fraud in transactions using rule-based + ML hybrid logic.**

### Problem Statement

Detect Fraudulent Transactions Using Hybrid Rule-Based and Machine Learning Logic

Given a transaction dataset, identify whether a transaction is fraudulent by:
1. Applying domain-driven rules (high-risk patterns)
2. Applying a machine learning classifier
3. Combining both signals into a final fraud decision

### Constraints & Example Inputs/Outputs

### Constraints
- Dataset size: small (demo-friendly)
- Features available:
    - amount
    - transaction_hour
    - country
    - is_foreign
    - past_fraud_count
- Output must be binary classification:
    - 1 → Fraud
    - 0 → Legit

### Example Input:
```python
| amount | hour | is_foreign | past_fraud |
| ------ | ---- | ---------- | ---------- |
| 12000  | 2    | 1          | 3          |
```

### Expected Output:
Fraud = 1

### Solution Approach

### Step1: Rule-Based Fraud Detection
We define expert rules, such as:
- High amount + night transaction
- Foreign transaction + past fraud history
- Extremely large amount alone

Rules Produce:
```python
rule_flag = 0 or 1
```

### Step2: ML-Based Fraud Detection
Train a simple classifier:
- Logistic Regression (interpretable)
- Predict fraud probability:

```python
ml_probability ∈ [0, 1]
```

### Step 3: Hybrid Decision Logic
Final decision:
- Fraud if rule_flag == 1
- OR if ml_probability ≥ threshold (e.g., 0.6)

This mimics real banking systems.


### Solution Code

In [4]:
# Approach1: rute Force (Pure Rule-Based)

import pandas as pd

data = pd.DataFrame({
    "amount": [200, 5000, 15000, 300],
    "hour": [14, 2, 1, 18],
    "is_foreign": [0, 1, 1, 0],
    "past_fraud": [0, 2, 3, 0]
})

def rule_based_fraud(row):
    if row["amount"] > 10000:
        return 1
    if row["hour"] < 5 and row["amount"] > 3000:
        return 1
    if row["is_foreign"] == 1 and row["past_fraud"] > 1:
        return 1
    return 0

data["rule_flag"] = data.apply(rule_based_fraud, axis=1)
data


Unnamed: 0,amount,hour,is_foreign,past_fraud,rule_flag
0,200,14,0,0,0
1,5000,2,1,2,1
2,15000,1,1,3,1
3,300,18,0,0,0


### Alternative Solution

In [5]:
# Approach2: Optimized (Rule + ML Hybrid)

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X = data[["amount", "hour", "is_foreign", "past_fraud"]]
y = data["rule_flag"]  # pseudo-labels for demo

model = LogisticRegression()
model.fit(X, y)

data["ml_prob"] = model.predict_proba(X)[:, 1]

THRESHOLD = 0.6
data["final_fraud"] = ((data["rule_flag"] == 1) | 
                        (data["ml_prob"] >= THRESHOLD)).astype(int)

data


Unnamed: 0,amount,hour,is_foreign,past_fraud,rule_flag,ml_prob,final_fraud
0,200,14,0,0,0,1e-06,0
1,5000,2,1,2,1,0.999999,1
2,15000,1,1,3,1,1.0,1
3,300,18,0,0,0,2e-06,0


### Alternative Approaches

### Alternative 1: Pure ML
- Train on labeled fraud data
- Less interpretable
- Risky for compliance-heavy industries

### Alternative 2: Rule Engine Only
- Transparent
- Poor generalization
- High false positives

### Alternative 3 (Best Practice)

✅ Hybrid system (Rules + ML)
Used by:
- Banks
- Payment gateways
- Insurance platforms

### Test Case

In [None]:
# Test Case1: High-Risk Transaction
test_txn = pd.DataFrame({
    "amount": [20000],
    "hour": [3],
    "is_foreign": [1],
    "past_fraud": [4]
})

test_txn["rule_flag"] = test_txn.apply(rule_based_fraud, axis=1)
test_txn["ml_prob"] = model.predict_proba(test_txn)[:, 1]
test_txn["final_fraud"] = ((test_txn["rule_flag"] == 1) | 
                           (test_txn["ml_prob"] >= THRESHOLD)).astype(int)

test_txn


In [7]:
# Test Case2: Normal Transaction
test_txn = pd.DataFrame({
    "amount": [250],
    "hour": [15],
    "is_foreign": [0],
    "past_fraud": [0]
})


### Expected Outputs
- Fraudulent transactions flagged correctly
- Rules catch obvious fraud
- ML catches subtle patterns
- Final output is explainable and realistic

## Complexity Analysis

### Rule-Based
- Time: O(n)
- Space: O(1)

### ML Inference
- Training: O(n × features)
- Prediction: O(n)

### Overall
- Efficient and production-friendly

#### Thank You!!