In [None]:
# Enterprise Fraud Detection System - Overview

This notebook provides a comprehensive overview of the Enterprise Fraud Detection System, demonstrating its key concepts, architecture, and capabilities.

## Table of Contents

1. [System Architecture](#architecture)
2. [Hub and Spoke Model](#hub-spoke)
3. [Feature Store](#feature-store)
4. [Real-time Inference](#inference)
5. [Performance Metrics](#metrics)
6. [Getting Started](#getting-started)

---

## Introduction

The Enterprise Fraud Detection System implements a sophisticated "Hub and Spoke" architecture designed for financial institutions to detect fraud across multiple products in real-time.

### Key Features:
- **Unified Customer Risk Assessment**: 360° customer view across all products
- **Product-Specific Detection**: Specialized models for PIX, Credit Cards, Loans, etc.
- **Real-time Inference**: Sub-100ms fraud scoring
- **Advanced Feature Engineering**: 4-pillar feature architecture
- **Scalable Architecture**: Handles 10,000+ TPS

---


In [None]:
# Import necessary libraries
import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Add the src directory to Python path
sys.path.append('../src')

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ Libraries imported successfully")
print(f"📅 Notebook run time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")


In [None]:
## System Architecture Overview

The system follows a **Hub and Spoke** architecture pattern:

```
                    ┌─────────────────┐
                    │   Client Apps   │
                    └─────────┬───────┘
                              │
                    ┌─────────▼───────┐
                    │  Fraud API      │
                    │  (FastAPI)      │
                    └─────────┬───────┘
                              │
           ┌──────────────────┼──────────────────┐
           │                  │                  │
    ┌──────▼──────┐          │        ┌─────────▼─────────┐
    │  Hub Model  │          │        │   Spoke Models    │
    │             │          │        │                   │
    │ • Profile   │          │        │ • PIX Model       │
    │ • Behavior  │          │        │ • Credit Card     │
    │ • Network   │          │        │ • Loan Model      │
    └─────────────┘          │        └───────────────────┘
                              │
                    ┌─────────▼───────┐
                    │  Feature Store  │
                    │                 │
                    │ 4-Pillar Design │
                    └─────────────────┘
```

### Key Components:

1. **Hub Model**: Unified customer risk assessment
2. **Spoke Models**: Product-specific fraud detection  
3. **Feature Store**: Advanced feature engineering with 4 pillars
4. **Unified Customer View**: 360° customer profile consolidation


In [None]:
# Demonstrate the configuration system
from utils.config_manager import ConfigManager

# Initialize configuration
config = ConfigManager()

print("🔧 System Configuration Overview:")
print("=" * 50)

# Display configuration summary
config_summary = config.get_config_summary()
for key, value in config_summary.items():
    print(f"{key:20s}: {value}")

print("\n📊 Feature Store Configuration:")
print("-" * 30)
feature_config = config.get_feature_store_config()
for key, value in feature_config.items():
    if isinstance(value, dict):
        print(f"{key}:")
        for sub_key, sub_value in value.items():
            print(f"  {sub_key}: {sub_value}")
    else:
        print(f"{key}: {value}")

print("\n🎯 Model Configuration:")
print("-" * 20)
hub_config = config.get_model_config("hub_model")
print(f"Hub Model Algorithm: {hub_config.get('algorithm', 'Not configured')}")
print(f"Hub Model Version: {hub_config.get('version', 'Not configured')}")


In [None]:
## Feature Store - 4 Pillar Architecture

The feature store is designed with a 4-pillar architecture that provides comprehensive customer and transaction understanding:

### Pillar 1: Profile Features (Static/Slow-changing)
- **Customer Demographics**: Age, income, occupation
- **Account Characteristics**: Account age, product portfolio
- **Credit Information**: Internal and external credit scores
- **Relationship Indicators**: Customer tenure, product usage

### Pillar 2: Behavioral Features (Cross-Product Aggregations)
- **Transaction Patterns**: Volume, frequency, timing across all products
- **Channel Usage**: Mobile, web, ATM usage patterns
- **Digital Behavior**: Login patterns, session duration
- **Velocity Metrics**: Transaction frequency in different time windows

### Pillar 3: Network Features (Graph-based)
- **Device Sharing**: Multiple customers using same device
- **Beneficiary Networks**: Transfer patterns and relationships
- **Graph Centrality**: Position in transaction network
- **Risk Propagation**: Exposure to known fraudulent entities

### Pillar 4: Contextual Features (Transaction-Specific)
- **Transaction Context**: Amount, time, location, channel
- **Behavioral Deviations**: Unusual patterns for the customer
- **Environmental Factors**: Location risk, time-of-day risk
- **Product-Specific**: Beneficiary risk, merchant category, etc.


In [None]:
# Generate synthetic data for demonstration
np.random.seed(42)

# Create sample customer data
n_customers = 1000

customer_data = {
    'customer_id': [f'cust_{i:06d}' for i in range(n_customers)],
    'age': np.random.normal(35, 15, n_customers).astype(int),
    'account_age_days': np.random.exponential(500, n_customers).astype(int),
    'total_products': np.random.poisson(2.5, n_customers),
    'credit_score': np.random.normal(650, 100, n_customers).astype(int),
    'is_pep': np.random.choice([True, False], n_customers, p=[0.02, 0.98]),
    'kyc_completion': np.random.uniform(0.5, 1.0, n_customers)
}

customers_df = pd.DataFrame(customer_data)

# Clean up the data
customers_df['age'] = customers_df['age'].clip(18, 80)
customers_df['credit_score'] = customers_df['credit_score'].clip(300, 850)
customers_df['total_products'] = customers_df['total_products'].clip(1, 8)

print("📊 Sample Customer Profile Data:")
print("=" * 40)
print(customers_df.head())

print(f"\n📈 Dataset Statistics:")
print(f"Total Customers: {len(customers_df):,}")
print(f"Average Age: {customers_df['age'].mean():.1f}")
print(f"Average Account Age: {customers_df['account_age_days'].mean():.0f} days")
print(f"Average Credit Score: {customers_df['credit_score'].mean():.0f}")
print(f"PEP Rate: {customers_df['is_pep'].mean():.1%}")

# Create sample transaction data
n_transactions = 5000

transaction_data = {
    'transaction_id': [f'txn_{i:08d}' for i in range(n_transactions)],
    'customer_id': np.random.choice(customers_df['customer_id'], n_transactions),
    'amount': np.random.lognormal(5, 1.5, n_transactions),
    'product_type': np.random.choice(['pix', 'credit_card', 'ted', 'loan'], n_transactions, 
                                   p=[0.4, 0.35, 0.15, 0.1]),
    'channel': np.random.choice(['mobile_app', 'web_browser', 'atm'], n_transactions,
                               p=[0.6, 0.3, 0.1]),
    'hour_of_day': np.random.randint(0, 24, n_transactions),
    'is_weekend': np.random.choice([True, False], n_transactions, p=[0.3, 0.7]),
    'fraud_label': np.random.choice([True, False], n_transactions, p=[0.02, 0.98])
}

transactions_df = pd.DataFrame(transaction_data)
transactions_df['amount'] = transactions_df['amount'].round(2)

print(f"\n💳 Sample Transaction Data:")
print("=" * 30)
print(transactions_df.head())

print(f"\nTransaction Statistics:")
print(f"Total Transactions: {len(transactions_df):,}")
print(f"Fraud Rate: {transactions_df['fraud_label'].mean():.1%}")
print(f"Average Amount: R$ {transactions_df['amount'].mean():.2f}")


In [None]:
# Create visualizations
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Enterprise Fraud Detection System - Data Overview', fontsize=16, fontweight='bold')

# 1. Customer Age Distribution
axes[0, 0].hist(customers_df['age'], bins=30, alpha=0.7, color='skyblue', edgecolor='black')
axes[0, 0].set_title('Customer Age Distribution')
axes[0, 0].set_xlabel('Age')
axes[0, 0].set_ylabel('Frequency')

# 2. Credit Score Distribution
axes[0, 1].hist(customers_df['credit_score'], bins=30, alpha=0.7, color='lightgreen', edgecolor='black')
axes[0, 1].set_title('Credit Score Distribution')
axes[0, 1].set_xlabel('Credit Score')
axes[0, 1].set_ylabel('Frequency')

# 3. Product Portfolio
product_counts = customers_df['total_products'].value_counts().sort_index()
axes[0, 2].bar(product_counts.index, product_counts.values, alpha=0.7, color='orange')
axes[0, 2].set_title('Products per Customer')
axes[0, 2].set_xlabel('Number of Products')
axes[0, 2].set_ylabel('Number of Customers')

# 4. Transaction Amount Distribution (log scale)
axes[1, 0].hist(np.log10(transactions_df['amount']), bins=30, alpha=0.7, color='coral', edgecolor='black')
axes[1, 0].set_title('Transaction Amount Distribution (Log Scale)')
axes[1, 0].set_xlabel('Log10(Amount)')
axes[1, 0].set_ylabel('Frequency')

# 5. Product Type Distribution
product_type_counts = transactions_df['product_type'].value_counts()
axes[1, 1].pie(product_type_counts.values, labels=product_type_counts.index, autopct='%1.1f%%', startangle=90)
axes[1, 1].set_title('Transaction Distribution by Product Type')

# 6. Hourly Transaction Pattern
hourly_counts = transactions_df['hour_of_day'].value_counts().sort_index()
axes[1, 2].plot(hourly_counts.index, hourly_counts.values, marker='o', linewidth=2, color='purple')
axes[1, 2].set_title('Transactions by Hour of Day')
axes[1, 2].set_xlabel('Hour')
axes[1, 2].set_ylabel('Number of Transactions')
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Fraud analysis
print("\n🚨 Fraud Analysis:")
print("=" * 20)

fraud_by_product = transactions_df.groupby('product_type')['fraud_label'].agg(['count', 'sum', 'mean'])
fraud_by_product.columns = ['Total_Transactions', 'Fraud_Count', 'Fraud_Rate']
fraud_by_product['Fraud_Rate'] = (fraud_by_product['Fraud_Rate'] * 100).round(2)

print("Fraud Rate by Product Type:")
print(fraud_by_product)

# Channel analysis
fraud_by_channel = transactions_df.groupby('channel')['fraud_label'].agg(['count', 'sum', 'mean'])
fraud_by_channel.columns = ['Total_Transactions', 'Fraud_Count', 'Fraud_Rate']
fraud_by_channel['Fraud_Rate'] = (fraud_by_channel['Fraud_Rate'] * 100).round(2)

print("\nFraud Rate by Channel:")
print(fraud_by_channel)


In [None]:
## Real-time Fraud Detection API

The system provides a REST API for real-time fraud detection. Here's how to interact with it:


In [None]:
# Example API usage (simulation since we may not have the API running)
import json
from datetime import datetime

# Sample API request payload
sample_transaction = {
    "transaction_id": "txn_12345678",
    "customer_id": "cust_000123", 
    "product_type": "pix",
    "amount": 1500.00,
    "currency": "BRL",
    "channel": "mobile_app",
    "timestamp": datetime.now().isoformat(),
    "device_id": "device_abc123",
    "ip_address": "192.168.1.100",
    "location_lat": -23.5505,
    "location_lon": -46.6333,
    "beneficiary_id": "beneficiary_999"
}

print("🔧 Sample API Request:")
print("=" * 25)
print(json.dumps(sample_transaction, indent=2))

# Simulate API response
sample_response = {
    "transaction_id": "txn_12345678",
    "customer_id": "cust_000123",
    "product_type": "pix",
    "hub_risk_score": 0.35,
    "spoke_fraud_score": 0.42,
    "final_score": 0.38,
    "risk_level": "medium",
    "predicted_class": "legitimate",
    "confidence": 0.62,
    "action": "challenge",
    "reason_codes": [
        "UNUSUAL_TIME_OF_DAY",
        "NEW_BENEFICIARY"
    ],
    "processing_time_ms": 87.5,
    "model_versions": {
        "hub_model": "v1.2.3",
        "spoke_model": "pix_v1.1.0"
    },
    "timestamp": datetime.now().isoformat()
}

print("\n✅ Sample API Response:")
print("=" * 26)
print(json.dumps(sample_response, indent=2))

print("\n🎯 Decision Logic:")
print("-" * 15)
print(f"Hub Risk Score: {sample_response['hub_risk_score']:.2f} (Customer overall risk)")
print(f"Spoke Score: {sample_response['spoke_fraud_score']:.2f} (PIX-specific risk)")
print(f"Final Score: {sample_response['final_score']:.2f} (Combined assessment)")
print(f"Action: {sample_response['action'].upper()} (Risk-based decision)")
print(f"Processing Time: {sample_response['processing_time_ms']}ms (Real-time)")


In [None]:
## Performance and Scalability

The system is designed for enterprise-scale fraud detection with the following performance characteristics:

### 🚀 Performance Metrics
- **Latency**: <100ms P95 for real-time scoring
- **Throughput**: 10,000+ transactions per second
- **Availability**: 99.9% uptime SLA
- **Accuracy**: >95% precision, >90% recall

### 📈 Scalability Features
- **Horizontal Scaling**: Auto-scaling API instances
- **Caching**: Multi-level caching strategy (Redis + in-memory)
- **Database Optimization**: Read replicas and connection pooling
- **Feature Store**: Distributed feature computation and serving

### 🔒 Security and Compliance
- **Data Encryption**: End-to-end encryption for sensitive data
- **Access Control**: Role-based authentication and authorization
- **Audit Logging**: Comprehensive audit trail for compliance
- **Model Explainability**: Transparent decision-making process

---

## Next Steps

This overview demonstrates the key concepts and architecture of the Enterprise Fraud Detection System. 

**Explore more in the following notebooks:**

1. **[02_model_training.ipynb](02_model_training.ipynb)** - Learn how to train Hub and Spoke models
2. **[03_api_usage.ipynb](03_api_usage.ipynb)** - Detailed API usage examples and integration patterns
3. **[04_feature_analysis.ipynb](04_feature_analysis.ipynb)** - Deep dive into feature engineering and analysis
4. **[05_monitoring.ipynb](05_monitoring.ipynb)** - Performance monitoring and model drift detection

---

**🎯 Key Takeaways:**
- The Hub and Spoke architecture provides both unified and specialized fraud detection
- The 4-pillar feature store enables comprehensive risk assessment
- Real-time inference with sub-100ms latency supports high-volume operations
- Enterprise-grade scalability, security, and monitoring capabilities
