## Flag

#### A. Posting/transaction timing

`flag_post_lag_high`
<br> **What**: The ledger entry was posted much later than the actual trip.
<br> **Why suspicious**: Holds, disputes, cross-agency reconciliation, or manual adjustment.
<br> **Tune**: Start with LAG_MAX = 3 days; loosen/tighten by agency SLA and weekends/holidays.

`flag_post_lag_negative`
* `post_lag_days` < 0 → back-dated or bad timestamps
<br> **What**: Posting happened “before” the transaction date.
<br> **Why suspicious**: Backdating, timezone issues, or column mapping errors.
<br> **Tune**: Keep strict (should be 0+). Investigate timezone/ETL parsing.

#### B. Amount/Billing
`flag_topup_large_no_text`
<br> **What**: Large positive amount that doesn’t look like a payment/top-up in description.
<br> **Why suspicious**: Credit applied incorrectly or misclassified transaction.
<br> **Rule**: amount ≥ TOPUP_MIN AND description not matching (replenish|top up|payment|credit|deposit|reload|fund).
<br> **Tune**: Set TOPUP_MIN to known reload sizes (e.g., 25/50/100). Extend regex with your program’s wording.
<br> **Check**: prepaid, plan_rate, and the customer’s reload history.

`flag_zero_amount`
<br> **What**: Zero (or near-zero) toll where a charge is expected.
<br> **Why suspicious**: Toll bypass, discount misapplied, lane misread.
<br> **Rule**: |amount| < $0.01 AND fare_type indicates billable trip.
<br> **Tune**: Add whitelists for “free” lanes/events; relax to <$0.10 if rounding occurs.
Check: Camera/plate read outcomes, fare promotions.

`flag_amt_outlier`
<br> **What**: Amount far from typical for that route/vehicle/fare.
<br> **Why suspicious**: Wrong rate table, misclassification, tampering.
<br> **Rule**: |amount − group_median| > AMT_DEV_PCT × |group_median| grouped by (agency, exit_plaza, vehicle_type_code, fare_type).
<br> **Tune**: Start at AMT_DEV_PCT = 0.5 (±50%); tighten if pricing is stable.
<br> **Check**: Seasonal pricing, HOV/discount flags, plaza mapping.

#### C. Usage Patterns
`flag_fast_repeat_exit`
<br> **What**: Same tag exits again too quickly.
<br> **Why suspicious**: Duplicate tag use, cloning, clock issues, mis-sequenced events.
<br> **Rule**: Time since previous exit for same tag < MIN_EXIT_GAP_MIN (e.g., 5 min).
<br> **Tune**: Base on minimal plausible loop time between nearby plazas.
<br> **Check**: Distances between exit_plaza, lane IDs, clock drift.

`flag_agency_hop_fast`
<br> **What**: Tag jumps between different agencies too quickly.
<br> **Why suspicious**: Tag sharing/cloning, misrouting of events.
<br> **Rule**: Previous agency ≠ current agency AND gap < AGENCY_HOP_MIN (e.g., 30 min).
<br> **Tune**: Use real travel time between closest border plazas.
<br> **Check**: Border plazas list; cross-agency settlement records.

#### D. Balance/Funding
`flag_negative_balance`
<br> **What**: Account balance below zero.
<br> **Why suspicious**: Overdrawn usage, failed replenishment, posting order issues.
<br> **Rule**: balance < 0.
<br> **Tune**: None (hard <br> **Rule**), but allow small tolerance if pennies rounding.
<br> **Check**: Recent top-ups, card declines.

`flag_low_bal_no_topup`
<br> **What**: Prepaid tag keeps running at very low balance without replenishment.
<br> **Why suspicious**: Blocked card, attempted misuse before suspension.
<br> **Rule**: prepaid == True AND balance < $5 AND no recent top-up (rolling window).
<br> **Tune**: Threshold to your low-balance limit; window by your policy (e.g., last 10 events / 1–3 days).
<br> **Check**: Payment failures, auto-replenish settings.

#### E. Temporal Behavior
`flag_quiet_hours`
<br> **What**: Travel during atypical hours (e.g., 1–4 AM).
<br> **Why suspicious**: Out-of-character use can indicate tag sharing/theft.
Rule: exit_hour ∈ {1,2,3,4} (global simple rule).
<br> **Tune**: Replace with per-tag baselines later (flag if outside tag’s 5–95% hour band).
<br> **Check**: Compare with user’s historical hour distribution.

#### High-severity combos:
* `flag_fast_repeat_exit` or `flag_agency_hop_fast` AND `flag_zero_amount` or `flag_amt_outlier`.

In [None]:
# --- tune thresholds here ---
LAG_MAX = 3                 # days
TOPUP_MIN = 50.0            # $ threshold to consider a top-up
AMT_DEV_PCT = 0.50          # 50% above/below group median
MIN_EXIT_GAP_MIN = 5        # min gap between exits for same tag
AGENCY_HOP_MIN = 30         # min time to realistically change agency
QUIET_HOURS = {1,2,3,4}     # "unusual hour" window

In [None]:
df_flag = df.copy()

# Ensure dtypes
for c in ["posting_date", "transaction_date", "exit_time"]:
    if c in df_flag.columns:
        df_flag[c] = pd.to_datetime(df_flag[c], errors="coerce")

# --- A. Posting/transaction timing ---
df_flag["flag_post_lag_high"] = (df_flag["post_lag_days"] > LAG_MAX)
df_flag["flag_post_lag_negative"] = (df_flag["post_lag_days"] < 0)

# --- B1. Detect likely top-ups ---
desc_str = df_flag["description"].astype(str).str.lower()
is_topup_like_text = desc_str.str.contains(r"repleni|top\s*up|payment|credit|deposit|reload|fund", regex=True)
df_flag["flag_topup_large_no_text"] = (df_flag["amount"] >= TOPUP_MIN) & (~is_topup_like_text)

# --- B2. Zero/near-zero when fare should bill ---
fare_str = df_flag["fare_type"].astype(str).str.lower()
should_bill = fare_str.str.contains(r"toll|cash|ezpass|charge|debit")
df_flag["flag_zero_amount"] = should_bill & (df_flag["amount"].abs() < 0.01)

# --- B3. Outlier vs typical group median ---
grp_cols = ["agency", "exit_plaza", "vehicle_type_code", "fare_type"]
grp = df_flag.groupby(grp_cols, dropna=False)["amount"].median().rename("group_median")
df_flag = df_flag.merge(grp, on=grp_cols, how="left")
df_flag["amt_dev"] = (df_flag["amount"] - df_flag["group_median"]).abs()
df_flag["flag_amt_outlier"] = (df_flag["group_median"].notna()) & (df_flag["amt_dev"] > AMT_DEV_PCT * df_flag["group_median"].abs())

# --- C1. Back-to-back exits for same tag within too-short time ---
df_flag = df_flag.sort_values(["tag_plate_number", "exit_time"])
dt = df_flag.groupby("tag_plate_number")["exit_time"].diff().dt.total_seconds() / 60.0
df_flag["mins_since_prev_exit"] = dt
df_flag["flag_fast_repeat_exit"] = dt.notna() & (dt >= 0) & (dt < MIN_EXIT_GAP_MIN)

# --- C2. Agency hop too fast ---
prev_agency = df_flag.groupby("tag_plate_number")["agency"].shift(1)
df_flag["flag_agency_hop_fast"] = (
    prev_agency.notna()
    & (prev_agency != df_flag["agency"])
    & dt.notna() & (dt >= 0) & (dt < AGENCY_HOP_MIN)
)

# --- D. Balance sanity ---
df_flag["flag_negative_balance"] = df_flag["balance"] < 0

# Optional rolling: prepaid card being used repeatedly with low balance and no top-up
# Mark recent top-up in last N minutes (e.g., 60)
N_MIN = 60
recent_topup = (df_flag["amount"] >= TOPUP_MIN)
recent_topup_rolled = (
    recent_topup.groupby(df_flag["tag_plate_number"]).rolling(window=10, min_periods=1).max().reset_index(level=0, drop=True)
)
df_flag["flag_low_bal_no_topup"] = (df_flag.get("prepaid", False) == True) & (df_flag["balance"] < 5) & (recent_topup_rolled == 0)

# --- E. Quiet-hours oddity (simple global rule; can refine per-tag baseline) ---
if "exit_hour" not in df_flag.columns and "exit_time" in df_flag.columns:
    df_flag["exit_hour"] = df_flag["exit_time"].dt.hour
df_flag["flag_quiet_hours"] = df_flag["exit_hour"].isin(QUIET_HOURS)

# --- Collect reasons & final flag ---
flag_cols = [
    "flag_post_lag_high","flag_post_lag_negative",
    "flag_topup_large_no_text","flag_zero_amount","flag_amt_outlier",
    "flag_fast_repeat_exit","flag_agency_hop_fast",
    "flag_negative_balance","flag_low_bal_no_topup","flag_quiet_hours"
]
df_flag["any_rule_flag"] = df_flag[flag_cols].any(axis=1)

def reasons(row):
    return [c for c in flag_cols if bool(row[c])]

df_flag["flag_reasons"] = df_flag.apply(reasons, axis=1)

# Preview suspicious rows with key context
view_cols = ["posting_date","transaction_date","exit_time","tag_plate_number","agency",
             "exit_plaza","vehicle_type_code","fare_type","amount","balance",
             "post_lag_days","mins_since_prev_exit","any_rule_flag","flag_reasons"]
suspicious = df_flag[df_flag["any_rule_flag"]].sort_values(["tag_plate_number","exit_time"])
suspicious[view_cols].head(20)