# Stateful Fraud Detection with transformWithStateInPandas

This notebook demonstrates **advanced streaming fraud detection** using PySpark's `transformWithStateInPandas` API.

## What is transformWithStateInPandas?

`transformWithStateInPandas` is the **next-generation** stateful streaming operator in Apache Spark 4.0+, replacing the older `applyInPandasWithState` API. According to the [official Spark documentation](https://spark.apache.org/docs/latest/streaming/structured-streaming-transform-with-state.html), it provides:

- **Object-oriented design**: Define stateful logic using `StatefulProcessor` classes
- **State variable types**: `ValueState`, `ListState`, `MapState` for optimized operations
- **Automatic TTL eviction**: Built-in Time-To-Live for state cleanup
- **Timer management**: Register, list, and delete timers for time-based processing
- **State schema evolution**: Add/remove state variables across query runs
- **Checkpointed timers**: Fault-tolerant timer persistence

## Fraud Detection Features

This notebook calculates real-time fraud indicators based on:

1. **Transaction Velocity**: Count of transactions in time windows
2. **IP Address Changes**: Frequency of IP changes per user
3. **Location Anomalies**: Geographic distance from previous transaction
4. **Amount Patterns**: Statistical anomalies in transaction amounts
5. **Time-based Patterns**: Unusual transaction timing

## Architecture

```
Streaming Source (Rate/Kafka)
    ↓
Feature Generation (TransactionDataGenerator)
    ↓
Group by Key (user_id)
    ↓
transformWithStateInPandas (StatefulProcessor)
  • Maintain transaction history per user (ValueState + ListState)
  • Calculate velocity features
  • Detect location anomalies
  • Track IP changes
  • Compute fraud scores
    ↓
Write to Lakebase PostgreSQL (foreachBatch)
    ↓
Real-time Feature Serving (<10ms latency)
```

## Prerequisites

- Run `00_setup.ipynb` first to provision Lakebase PostgreSQL
- Databricks Runtime with Spark 4.0+ (for transformWithStateInPandas support)
- Lakebase PostgreSQL instance configured and accessible


In [None]:
# Import required libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.streaming import StatefulProcessor, StatefulProcessorHandle
from pyspark.sql.streaming.state import ValueState, ListState, TTLConfig
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from typing import Iterator
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print("Imports successful")


In [None]:
# Import utility modules
from utils.data_generator import TransactionDataGenerator
from utils.lakebase_client import LakebaseClient

# Initialize data generator
data_gen = TransactionDataGenerator(spark)

print("Utility modules loaded")


In [None]:
# Configure Lakebase connection
LAKEBASE_CONFIG = {
    "instance_name": "neha-lakebase-demo",
    "database": "databricks_postgres"
}

# Initialize Lakebase client
lakebase = LakebaseClient(**LAKEBASE_CONFIG)

# Test connection
if lakebase.test_connection():
    print("Connected to Lakebase PostgreSQL")
else:
    raise Exception("Failed to connect to Lakebase")


## Step 1: Generate Streaming Transaction Data

Generate synthetic transaction data with fraud indicators.


In [None]:
# Generate streaming transaction data
df_transactions = data_gen.generate_transaction_data(
    rows_per_second=10,
    num_users=100,
    fraud_ratio=0.1
)

print("Schema of streaming transactions:")
df_transactions.printSchema()


## Step 2: Define Output Schema

Define the output schema for fraud features. State is managed internally by the `StatefulProcessor` using `ValueState` and `ListState`.


In [None]:
# Define output schema - fraud features per transaction
output_schema = StructType([
    StructField("transaction_id", StringType(), False),
    StructField("user_id", StringType(), False),
    StructField("timestamp", TimestampType(), False),
    StructField("amount", DoubleType(), False),
    StructField("merchant_id", StringType(), False),
    StructField("ip_address", StringType(), False),
    StructField("latitude", DoubleType(), False),
    StructField("longitude", DoubleType(), False),
    
    # Fraud detection features
    StructField("user_transaction_count", IntegerType(), False),
    StructField("transactions_last_hour", IntegerType(), False),
    StructField("transactions_last_10min", IntegerType(), False),
    StructField("ip_changed", IntegerType(), False),
    StructField("ip_change_count_total", IntegerType(), False),
    StructField("distance_from_last_km", DoubleType(), True),
    StructField("velocity_kmh", DoubleType(), True),
    StructField("amount_vs_user_avg_ratio", DoubleType(), True),
    StructField("amount_vs_user_max_ratio", DoubleType(), True),
    StructField("amount_zscore", DoubleType(), True),
    StructField("seconds_since_last_transaction", DoubleType(), True),
    StructField("is_rapid_transaction", IntegerType(), False),
    StructField("is_impossible_travel", IntegerType(), False),
    StructField("is_amount_anomaly", IntegerType(), False),
    StructField("fraud_score", DoubleType(), False),
    StructField("is_fraud_prediction", IntegerType(), False)
])

print("Output schema defined")


## Step 3: Implement FraudDetectorProcessor (StatefulProcessor)

Create a `StatefulProcessor` class that implements the fraud detection logic. This is the object-oriented approach introduced in Spark 4.0+ for `transformWithStateInPandas`.


In [None]:
def calculate_haversine_distance(lat1, lon1, lat2, lon2):
    """
    Calculate distance between two geographic points in kilometers.
    """
    if pd.isna(lat1) or pd.isna(lon1) or pd.isna(lat2) or pd.isna(lon2):
        return None
    
    R = 6371.0  # Earth radius in kilometers
    
    # Convert to radians
    lat1_rad = np.radians(lat1)
    lon1_rad = np.radians(lon1)
    lat2_rad = np.radians(lat2)
    lon2_rad = np.radians(lon2)
    
    # Haversine formula
    dlat = lat2_rad - lat1_rad
    dlon = lon2_rad - lon1_rad
    a = np.sin(dlat/2)**2 + np.cos(lat1_rad) * np.cos(lat2_rad) * np.sin(dlon/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    
    return R * c

print("Distance calculation helper function defined")


In [None]:
class FraudDetectorProcessor(StatefulProcessor):
    """
    StatefulProcessor for fraud detection using transformWithStateInPandas.
    
    This processor maintains a single consolidated state per user to detect fraud patterns including:
    - Transaction velocity (time windows)
    - IP address changes
    - Geographic anomalies (impossible travel)
    - Amount-based anomalies
    """
    
    def init(self, handle: StatefulProcessorHandle) -> None:
        """
        Initialize the stateful processor with a single consolidated state variable.
        
        Consolidated state includes:
        - Transaction count
        - Last transaction details (timestamp, IP, location)
        - IP change count
        - Amount statistics (total, avg, max)
        - Recent transaction times (up to 50)
        - Recent transaction amounts (up to 50)
        """
        self.handle = handle
        
        # Define comprehensive state schema - consolidates ALL state into one object
        state_schema = StructType([
            # Transaction count
            StructField("transaction_count", IntegerType(), False),
            
            # Last transaction details
            StructField("last_timestamp", TimestampType(), True),
            StructField("last_ip_address", StringType(), True),
            StructField("last_latitude", DoubleType(), True),
            StructField("last_longitude", DoubleType(), True),
            
            # IP change tracking
            StructField("ip_change_count", IntegerType(), False),
            
            # Amount statistics
            StructField("total_amount", DoubleType(), False),
            StructField("avg_amount", DoubleType(), False),
            StructField("max_amount", DoubleType(), False),
            
            # Recent transaction history (bounded to 50 each)
            StructField("recent_timestamps", ArrayType(TimestampType()), False),
            StructField("recent_amounts", ArrayType(DoubleType()), False)
        ])
        
        # Initialize SINGLE consolidated state variable with TTL (1 hour of inactivity)
        ttl_config = TTLConfig(ttl_duration=timedelta(hours=1))
        
        self.user_state = handle.getValueState(
            "user_fraud_state",  # Single state variable name
            state_schema,
            ttl_config
        )
    
    def handleInputRows(self, key, rows, timer_values) -> Iterator[pd.DataFrame]:
        """
        Process input rows for a given user and emit fraud features.
        
        Args:
            key: user_id
            rows: Iterator of Pandas DataFrames containing transactions for this user
            timer_values: Timer values (not used in this implementation)
        
        Yields:
            pd.DataFrame: Enriched transactions with fraud features
        """
        user_id = key
        
        # Process each micro-batch
        for pdf in rows:
            if pdf.empty:
                continue
            
            # Sort by timestamp
            pdf = pdf.sort_values('timestamp')
            
            # Retrieve existing state (single consolidated object)
            if self.user_state.exists():
                state = self.user_state.get()
                prev_count = state[0]  # transaction_count
                prev_last_time = state[1]  # last_timestamp
                prev_ip = state[2]  # last_ip_address
                prev_lat = state[3]  # last_latitude
                prev_lon = state[4]  # last_longitude
                prev_ip_changes = state[5]  # ip_change_count
                prev_total_amount = state[6]  # total_amount
                prev_avg_amount = state[7]  # avg_amount
                prev_max_amount = state[8]  # max_amount
                prev_times = list(state[9]) if state[9] else []  # recent_timestamps
                prev_amounts = list(state[10]) if state[10] else []  # recent_amounts
            else:
                # Initialize state for new user
                prev_count = 0
                prev_last_time = None
                prev_ip = None
                prev_lat = None
                prev_lon = None
                prev_ip_changes = 0
                prev_total_amount = 0.0
                prev_avg_amount = 0.0
                prev_max_amount = 0.0
                prev_times = []
                prev_amounts = []
            
            # Process each transaction
            results = []
            
            for idx, row in pdf.iterrows():
                current_time = row['timestamp']
                current_ip = row['ip_address']
                current_lat = row['latitude']
                current_lon = row['longitude']
                current_amount = row['amount']
                
                # Update transaction count
                prev_count += 1
                
                # Calculate time-based features
                if prev_last_time is not None:
                    time_diff = (current_time - prev_last_time).total_seconds()
                else:
                    time_diff = None
                
                # IP change detection
                ip_changed = 0
                if prev_ip is not None and current_ip != prev_ip:
                    ip_changed = 1
                    prev_ip_changes += 1
                
                # Geographic distance calculation
                distance_km = None
                velocity_kmh = None
                if prev_lat is not None and prev_lon is not None:
                    distance_km = calculate_haversine_distance(
                        prev_lat, prev_lon, current_lat, current_lon
                    )
                    if distance_km is not None and time_diff is not None and time_diff > 0:
                        velocity_kmh = (distance_km / time_diff) * 3600
                
                # Amount-based features
                prev_total_amount += current_amount
                prev_avg_amount = prev_total_amount / prev_count
                prev_max_amount = max(prev_max_amount, current_amount)
                
                amount_vs_avg_ratio = current_amount / prev_avg_amount if prev_avg_amount > 0 else 1.0
                amount_vs_max_ratio = current_amount / prev_max_amount if prev_max_amount > 0 else 1.0
                
                # Z-score calculation
                amount_zscore = None
                if len(prev_amounts) >= 3:
                    amounts_std = np.std(prev_amounts)
                    if amounts_std > 0:
                        amount_zscore = (current_amount - prev_avg_amount) / amounts_std
                
                # Update recent transactions (bounded to 50)
                prev_times.append(current_time)
                prev_amounts.append(current_amount)
                if len(prev_times) > 50:
                    prev_times = prev_times[-50:]
                    prev_amounts = prev_amounts[-50:]
                
                # Count transactions in time windows
                one_hour_ago = current_time - timedelta(hours=1)
                ten_min_ago = current_time - timedelta(minutes=10)
                
                trans_last_hour = sum(1 for t in prev_times if t >= one_hour_ago)
                trans_last_10min = sum(1 for t in prev_times if t >= ten_min_ago)
                
                # Fraud indicators
                is_rapid = 1 if trans_last_10min >= 5 else 0
                is_impossible_travel = 1 if velocity_kmh is not None and velocity_kmh > 800 else 0
                is_amount_anomaly = 1 if amount_zscore is not None and abs(amount_zscore) > 3 else 0
                
                # Calculate fraud score (0-100)
                fraud_score = 0.0
                if is_rapid:
                    fraud_score += 20
                if is_impossible_travel:
                    fraud_score += 30
                if is_amount_anomaly:
                    fraud_score += 25
                if prev_ip_changes >= 5:
                    fraud_score += 15
                if trans_last_hour >= 10:
                    fraud_score += 10
                fraud_score = min(fraud_score, 100.0)
                
                # Fraud prediction
                is_fraud_pred = 1 if fraud_score >= 50 else 0
                
                # Append result
                results.append({
                    'transaction_id': row['transaction_id'],
                    'user_id': user_id,
                    'timestamp': current_time,
                    'amount': current_amount,
                    'merchant_id': row['merchant_id'],
                    'ip_address': current_ip,
                    'latitude': current_lat,
                    'longitude': current_lon,
                    'user_transaction_count': prev_count,
                    'transactions_last_hour': trans_last_hour,
                    'transactions_last_10min': trans_last_10min,
                    'ip_changed': ip_changed,
                    'ip_change_count_total': prev_ip_changes,
                    'distance_from_last_km': distance_km,
                    'velocity_kmh': velocity_kmh,
                    'amount_vs_user_avg_ratio': amount_vs_avg_ratio,
                    'amount_vs_user_max_ratio': amount_vs_max_ratio,
                    'amount_zscore': amount_zscore,
                    'seconds_since_last_transaction': time_diff,
                    'is_rapid_transaction': is_rapid,
                    'is_impossible_travel': is_impossible_travel,
                    'is_amount_anomaly': is_amount_anomaly,
                    'fraud_score': fraud_score,
                    'is_fraud_prediction': is_fraud_pred
                })
                
                # Update state for next transaction
                prev_last_time = current_time
                prev_ip = current_ip
                prev_lat = current_lat
                prev_lon = current_lon
            
            # Update SINGLE consolidated state object (atomic update)
            self.user_state.update((
                prev_count,           # transaction_count
                prev_last_time,       # last_timestamp
                prev_ip,              # last_ip_address
                prev_lat,             # last_latitude
                prev_lon,             # last_longitude
                prev_ip_changes,      # ip_change_count
                prev_total_amount,    # total_amount
                prev_avg_amount,      # avg_amount
                prev_max_amount,      # max_amount
                prev_times,           # recent_timestamps
                prev_amounts          # recent_amounts
            ))
            
            # Yield results
            if results:
                yield pd.DataFrame(results)
    
    def handleExpiredTimer(self, key, timer_values, expired_timer_info) -> Iterator[pd.DataFrame]:
        """
        Handle expired timers (not used in this implementation).
        """
        # No timer logic in this basic implementation
        yield pd.DataFrame()
    
    def close(self) -> None:
        """
        Perform cleanup operations (none needed).
        """
        pass

print("FraudDetectorProcessor class defined with CONSOLIDATED state")


## Step 4: Apply transformWithStateInPandas

Apply the `FraudDetectorProcessor` to the streaming data using `transformWithStateInPandas`.


In [None]:
# Apply stateful fraud detection using transformWithStateInPandas
df_with_fraud_features = df_transactions \
    .withWatermark("timestamp", "10 minutes") \
    .groupBy("user_id") \
    .transformWithStateInPandas(
        statefulProcessor=FraudDetectorProcessor(),
        outputStructType=output_schema,
        outputMode="Append",
        timeMode="None"
    )

print("Stateful processing configured with transformWithStateInPandas")
print("\nOutput schema:")
df_with_fraud_features.printSchema()


## Step 5: Create Unified Feature Table in Lakebase

Create a comprehensive PostgreSQL table that stores **ALL features** in one place:

### Table Design: `fraud_features`

This unified table combines:

1. **Stateless Transaction Features** (from `FeatureEngineer` class):
   - Time-based features (cyclical encodings, business hours, etc.)
   - Amount-based features (log, sqrt, categories, etc.)
   - Merchant risk scores
   - Device and network features
   
2. **Stateful Fraud Detection Features** (from `FraudDetectorProcessor`):
   - Transaction velocity (counts in time windows)
   - IP change tracking
   - Geographic anomalies (impossible travel)
   - Amount anomalies (z-scores, ratios)
   - Composite fraud scores and predictions

### Why One Table?

✅ **Unified feature store** - All features for ML in one query  
✅ **Simplified architecture** - Single table vs joining multiple tables  
✅ **Better performance** - No joins needed for model inference  
✅ **Easier to maintain** - One schema to manage

**Note:** If you want to add stateless features, you can apply `FeatureEngineer` transformations before or after the `FraudDetectorProcessor`. For this demo, we'll create the table structure ready to accept both types of features.


In [None]:
# Create unified fraud features table using LakebaseClient method
print("Creating unified fraud_features table in Lakebase...")
lakebase.create_fraud_features_table("fraud_features")
print("Table created successfully!")
print("\nThis unified table includes:")
print("  • Stateless transaction features (~40 columns)")
print("  • Stateful fraud detection features (~25 columns)")
print("  • Processing metadata (5 columns)")
print(f"  • Total: ~70+ columns ready for ML")


## Step 6: Write Fraud Features to Lakebase

Stream fraud features to Lakebase PostgreSQL for real-time serving.


In [None]:
# Define foreachBatch function to write to Lakebase
def write_to_lakebase(batch_df, batch_id):
    """
    Write each micro-batch to Lakebase PostgreSQL.
    """
    if batch_df.isEmpty():
        return
    
    logger.info(f"Processing batch {batch_id} with {batch_df.count()} rows")
    
    # Write to Lakebase using client
    lakebase.write_streaming_batch(batch_df, "fraud_features")
    
    logger.info(f"Batch {batch_id} written to Lakebase")


# Start streaming query to Lakebase
query_lakebase = df_with_fraud_features \
    .writeStream \
    .outputMode("append") \
    .foreachBatch(write_to_lakebase) \
    .option("checkpointLocation", "/tmp/fraud_detection_checkpoint") \
    .trigger(processingTime="10 seconds") \
    .start()

print("Streaming to Lakebase PostgreSQL...")
print(f"Query ID: {query_lakebase.id}")
print(f"Status: {query_lakebase.status}")


## Step 7: Monitor and Query Fraud Features

Query fraud features from Lakebase for real-time insights.


In [None]:
# Wait for data to process
import time
print("Waiting 30 seconds for data to process...")
time.sleep(30)

# Query top users by fraud score
query_results = """
SELECT 
    user_id,
    COUNT(*) as total_transactions,
    SUM(is_fraud_prediction) as predicted_frauds,
    AVG(fraud_score) as avg_fraud_score,
    MAX(fraud_score) as max_fraud_score,
    SUM(is_rapid_transaction) as rapid_transactions,
    SUM(is_impossible_travel) as impossible_travels,
    SUM(is_amount_anomaly) as amount_anomalies
FROM fraud_features
GROUP BY user_id
ORDER BY predicted_frauds DESC
LIMIT 10
"""

with lakebase.get_connection() as conn:
    result_df = pd.read_sql(query_results, conn)

print("\nTop 10 Users by Predicted Fraud Count:")
display(result_df)


In [None]:
# Query high-risk transactions
high_risk_query = """
SELECT 
    transaction_id,
    user_id,
    timestamp,
    amount,
    fraud_score,
    is_rapid_transaction,
    is_impossible_travel,
    is_amount_anomaly,
    transactions_last_10min,
    velocity_kmh
FROM fraud_features
WHERE fraud_score >= 50
ORDER BY fraud_score DESC, timestamp DESC
LIMIT 20
"""

with lakebase.get_connection() as conn:
    high_risk_df = pd.read_sql(high_risk_query, conn)

print("\nHigh-Risk Transactions (fraud_score >= 50):")
display(high_risk_df)


In [None]:
# Real-time feature serving example - Get features for specific user
def get_user_fraud_features(user_id: str):
    """
    Get real-time fraud features for a user from Lakebase PostgreSQL.
    Query latency: <10ms
    """
    query = """
    SELECT 
        transaction_id,
        timestamp,
        amount,
        user_transaction_count,
        transactions_last_hour,
        transactions_last_10min,
        fraud_score,
        is_fraud_prediction
    FROM fraud_features
    WHERE user_id = %s
    ORDER BY timestamp DESC
    LIMIT 10
    """
    
    with lakebase.get_connection() as conn:
        df = pd.read_sql(query, conn, params=(user_id,))
    
    return df

# Example: Get features for a user
sample_user = "user_001"
user_features = get_user_fraud_features(sample_user)

print(f"\nRecent transactions for {sample_user}:")
display(user_features)


## Step 8: Stop Streaming Queries

When done, stop the streaming query.


In [None]:
# Stop streaming query
if query_lakebase.isActive:
    query_lakebase.stop()
    print("Streaming query stopped")

print("\nAll streaming queries stopped successfully")


## Summary

This notebook demonstrated advanced streaming fraud detection using `transformWithStateInPandas` with a **consolidated state object** - the next-generation stateful streaming API in Apache Spark 4.0+.

### Key Capabilities

1. **Object-Oriented Stateful Processing**: Using `StatefulProcessor` class with `transformWithStateInPandas`
2. **Consolidated State Management**: Single `ValueState` object containing all user state (atomic updates)
3. **Automatic TTL Eviction**: Built-in 1-hour TTL for state cleanup
4. **Fraud Features**:
   - Transaction velocity (counts in time windows)
   - IP address change tracking
   - Geographic anomalies (impossible travel detection)
   - Amount-based anomalies (z-score, ratios)
   - Composite fraud scores (0-100)
5. **Real-time Serving**: Writing features to Lakebase PostgreSQL for <10ms query latency
6. **Production Patterns**: Proper watermarking and checkpointing

### Key Advantages of transformWithStateInPandas

According to the [Apache Spark documentation](https://spark.apache.org/docs/latest/streaming/structured-streaming-transform-with-state.html), `transformWithStateInPandas` provides:

- **Object-oriented design**: Define stateful logic using `StatefulProcessor` classes (vs function-based `applyInPandasWithState`)
- **State variable types**: `ValueState`, `ListState`, `MapState` optimized for different operations
- **Automatic TTL eviction**: Built-in Time-To-Live for state cleanup (1 hour in this example)
- **Timer management**: Register, list, and delete timers for time-based processing
- **State schema evolution**: Add/remove state variables across query runs
- **Checkpointed timers**: Fault-tolerant timer persistence
- **Next-generation API**: Replaces older `applyInPandasWithState` in Spark 4.0+

### Consolidated State Architecture

Instead of managing 6 separate state variables, this implementation uses a **single consolidated `ValueState`** for:

```
user_fraud_state (ValueState) - SINGLE STATE OBJECT
  ├── transaction_count (int)
  ├── last_timestamp (timestamp)
  ├── last_ip_address (string)
  ├── last_latitude (double)
  ├── last_longitude (double)
  ├── ip_change_count (int)
  ├── total_amount (double)
  ├── avg_amount (double)
  ├── max_amount (double)
  ├── recent_timestamps (array<timestamp>)  # Bounded to 50
  └── recent_amounts (array<double>)        # Bounded to 50
```

**Benefits of Consolidated State:**
- ✅ **Atomic updates**: All state updated in a single operation
- ✅ **Simplified code**: Single state variable vs 6 separate ones
- ✅ **Better performance**: Single read/write vs multiple operations
- ✅ **Easier to reason about**: State is cohesive and self-contained
- ✅ **Schema evolution**: Easier to add/modify fields in one place

### StatefulProcessor Implementation

```
FraudDetectorProcessor (StatefulProcessor)
  ├── init() - Initialize single consolidated state
  │   └── user_fraud_state (ValueState with 11 fields + arrays)
  │
  ├── handleInputRows() - Process transactions
  │   ├── state = self.user_state.get() (single read)
  │   ├── Calculate fraud features
  │   └── self.user_state.update((...)) (single atomic write)
  │
  ├── handleExpiredTimer() - Handle timers (not used here)
  └── close() - Cleanup operations (none needed)
```

### Fraud Detection Logic

**Fraud Score Calculation (0-100 points):**
- Rapid transactions (5+ in 10 min): +20 points
- Impossible travel (>800 km/h): +30 points  
- Amount anomaly (z-score > 3): +25 points
- Frequent IP changes (5+ total): +15 points
- High velocity (10+ in 1 hour): +10 points

**Fraud Prediction:** Score >= 50 triggers fraud flag

### Real-time Architecture

```
Streaming Transactions
    ↓
transformWithStateInPandas (grouped by user_id)
  • StatefulProcessor with SINGLE consolidated ValueState
  • Automatic TTL (1 hour)
  • Bounded arrays (last 50 transactions)
  • Atomic state updates
    ↓
foreachBatch
    ↓
Lakebase PostgreSQL (fraud_features table)
    ↓
Real-time Queries (<10ms latency)
```

### Comparison: Multiple vs Consolidated State

**OLD (6 separate state variables):**
```python
# Multiple reads
if self.transaction_count.exists():
    prev_count = self.transaction_count.get()[0]
if self.last_transaction.exists():
    last_txn = self.last_transaction.get()
    prev_last_time = last_txn[0]
# ... 4 more state variable reads

# Multiple writes
self.transaction_count.update((prev_count,))
self.last_transaction.update((prev_last_time, prev_ip, prev_lat, prev_lon))
self.ip_change_count.update((prev_ip_changes,))
# ... 3 more state variable writes
```

**NEW (1 consolidated state object):**
```python
# Single read
if self.user_state.exists():
    state = self.user_state.get()
    prev_count = state[0]
    prev_last_time = state[1]
    prev_ip = state[2]
    # ... all fields in one read

# Single atomic write
self.user_state.update((
    prev_count, prev_last_time, prev_ip, prev_lat, prev_lon,
    prev_ip_changes, prev_total_amount, prev_avg_amount, 
    prev_max_amount, prev_times, prev_amounts
))
```

### Next Steps

- Integrate with ML models for enhanced fraud scoring
- Add timer-based processing using `handleExpiredTimer()`
- Implement state schema evolution for adding new fields
- Connect to downstream systems (dashboards, notification services)
- Tune TTL duration and processing trigger intervals
- Add more sophisticated fraud detection rules (device fingerprinting, network analysis)
- Implement A/B testing for fraud detection thresholds
- Explore `MapState` for multi-level hierarchical state
