# ThreatSight 360

### This notebook creates synthetic data for a financial fraud detection system called "ThreatSight 360".

 This system leverages MongoDB's document model and advanced features to provide dynamic behavioral profiling. Unlike traditional SQL/rules-based approaches, this solution can adapt to emerging fraud patterns and provide more sophisticated detection capabilities.

Key features demonstrated:
- Rich document model for complex user profiles and transaction data
- Nested documents for behavioral patterns
- Geospatial data for location-based fraud detection
- Vector search for similarity-based pattern matching

In [None]:
!pip install --upgrade docutils==0.20 -q

In [None]:
!pip install pymongo pandas faker numpy scikit-learn python-dotenv geojson boto3 awscli -q

import pymongo
import pandas as pd
import numpy as np
import random
import datetime
import json
import uuid
from faker import Faker
from bson import ObjectId
from sklearn.preprocessing import normalize
import os
from dotenv import load_dotenv
import geojson
import time
import json
from datetime import datetime, timedelta
import boto3
from botocore.config import Config

In [None]:
# Initialize Faker for generating synthetic data
fake = Faker()

### Setting Up MongoDB Atlas Connection (and AWS Bedrock)

In [None]:
# MongoDB Atlas connection string (replace with your actual connection string)
MONGODB_URI = "ENTER URI"
DB_NAME = "threatsight360"

# Add your AWS credentials and region
aws_access_key = "ENTER"
aws_secret_key = "ENTER"
aws_region = "us-east-1"  # Change to your preferred region

In [None]:
# Connect to MongoDB Atlas
client = pymongo.MongoClient(MONGODB_URI)
db = client[DB_NAME]


print("Connected to MongoDB Atlas!")

In [None]:
# Create collections
customers_collection = db["customers"]
transactions_collection = db["transactions"]
fraud_patterns_collection = db["fraud_patterns"]

In [None]:
# Configure AWS Session
boto3_config = Config(
    region_name=aws_region,
    signature_version='v4',
    retries={
        'max_attempts': 3,
        'mode': 'standard'
    }
)

# Initialize Bedrock Runtime client
try:
    bedrock_runtime = boto3.client(
        service_name='bedrock-runtime',
        aws_access_key_id=aws_access_key,
        aws_secret_access_key=aws_secret_key,
        config=boto3_config
    )
    print("AWS Bedrock client initialized successfully")
except Exception as e:
    print(f"Warning: AWS Bedrock client initialization failed: {e}")
    print("Using fallback random embeddings - for demo purposes only")
    bedrock_runtime = None

## Data Model Design
Note: This section defines the data models we'll use in our MongoDB collections

We'll create a rich document model for customer profiles that includes:
- Personal information
- Account details
- Behavioral profiles (nested documents)
- Risk metrics

### Transactions Collection
- Transaction details
- Location data as GeoJSON
- Device information
- Risk assessment

### Fraud Patterns Collection
- Pattern descriptions
- Vector embeddings
- Severity metrics




### Customer Profiles Collection

In [None]:
# Customer Profiles Collection

def generate_customer_profiles(num_customers=50):
    """Generate synthetic customer profiles with behavioral patterns"""

    customers = []

    for _ in range(num_customers):
        # Generate a customer ID
        customer_id = str(ObjectId())

        # Common merchant categories
        merchant_categories = random.sample([
            "grocery", "restaurant", "retail", "travel", "entertainment",
            "utilities", "healthcare", "electronics", "gas", "online"
        ], k=random.randint(3, 6))

        # Generate 1-3 devices for this customer
        devices = []
        num_devices = random.randint(1, 3)

        for _ in range(num_devices):
            device = {
                "device_id": str(uuid.uuid4()),
                "type": random.choice(["mobile", "desktop", "tablet"]),
                "os": random.choice(["iOS", "Android", "Windows", "macOS", "Linux"]),
                "browser": random.choice(["Chrome", "Safari", "Firefox", "Edge"]),
                "ip_range": [fake.ipv4() for _ in range(random.randint(1, 3))],
                "usual_locations": [
                    {
                        "city": fake.city(),
                        "state": fake.state(),
                        "country": fake.country_code(),
                        "location": {
                            "type": "Point",
                            "coordinates": [
                                float(fake.longitude()),
                                float(fake.latitude())
                            ]
                        },
                        "frequency": random.uniform(0.1, 0.9)
                    } for _ in range(random.randint(1, 3))
                ]
            }
            devices.append(device)

        # Generate transaction pattern behavior
        transaction_behavior = {
            "avg_transaction_amount": round(random.uniform(20, 500), 2),
            "std_transaction_amount": round(random.uniform(10, 100), 2),
            "avg_transactions_per_day": round(random.uniform(0.5, 5), 1),
            "common_merchant_categories": merchant_categories,
            "usual_transaction_times": [
                {
                    "day_of_week": day,
                    "hour_range": [
                        random.randint(8, 12),
                        random.randint(13, 22)
                    ]
                } for day in random.sample(range(7), k=random.randint(3, 7))
            ],
            "usual_transaction_locations": [
                {
                    "city": fake.city(),
                    "state": fake.state(),
                    "country": fake.country_code(),
                    "location": {
                        "type": "Point",
                        "coordinates": [
                            float(fake.longitude()),
                            float(fake.latitude())
                        ]
                    },
                    "frequency": random.uniform(0.1, 0.9)
                } for _ in range(random.randint(1, 3))
            ]
        }

        # Generate risk metrics
        risk_profile = {
            "overall_risk_score": round(random.uniform(1, 100), 2),
            "last_risk_assessment": fake.date_time_between(start_date="-30d", end_date="now").isoformat(),
            "risk_factors": random.sample([
                "irregular_location", "unusual_amount", "new_merchant_category",
                "strange_time_pattern", "new_device", "velocity_alert"
            ], k=random.randint(0, 3)),
            "last_reported_fraud": fake.date_time_between(start_date="-2y", end_date="-1m").isoformat() if random.random() < 0.1 else None
        }

        # Create the customer document
        customer = {
            "_id": customer_id,
            "personal_info": {
                "name": fake.name(),
                "email": fake.email(),
                "phone": fake.phone_number(),
                "address": {
                    "street": fake.street_address(),
                    "city": fake.city(),
                    "state": fake.state(),
                    "country": fake.country(),
                    "zip": fake.zipcode()
                },
                "dob": fake.date_of_birth(minimum_age=18, maximum_age=85).isoformat()
            },
            "account_info": {
                "account_number": fake.bban(),
                "account_type": random.choice(["checking", "savings", "credit"]),
                "creation_date": fake.date_time_between(start_date="-10y", end_date="-1m").isoformat(),
                "status": random.choice(["active", "inactive", "suspended", "closed"]) if random.random() < 0.1 else "active",
                "credit_score": random.randint(300, 850)
            },
            "behavioral_profile": {
                "devices": devices,
                "transaction_patterns": transaction_behavior,
            },
            "risk_profile": risk_profile,
            "metadata": {
                "last_updated": datetime.now().isoformat(),
                "created_at": fake.date_time_between(start_date="-10y", end_date="-1m").isoformat()
            }
        }

        customers.append(customer)

    return customers

### Transactions Collection

In [None]:
def generate_transactions(customers, num_months=6):
    """Generate 6 months of synthetic transactions with a mix of normal, suspicious, and fraudulent"""

    transactions = []
    end_date = datetime.now()
    start_date = end_date - timedelta(days=30 * num_months)

    # For each customer
    for customer in customers:
        customer_id = customer["_id"]

        # Calculate expected number of transactions based on customer's profile
        avg_txns_per_day = customer["behavioral_profile"]["transaction_patterns"]["avg_transactions_per_day"]
        expected_txns = int(avg_txns_per_day * 30 * num_months)

        # Add some randomness to the number of transactions
        num_txns = int(expected_txns * random.uniform(0.8, 1.2))

        # Get customer's behavioral patterns
        avg_amount = customer["behavioral_profile"]["transaction_patterns"]["avg_transaction_amount"]
        std_amount = customer["behavioral_profile"]["transaction_patterns"]["std_transaction_amount"]
        usual_categories = customer["behavioral_profile"]["transaction_patterns"]["common_merchant_categories"]
        usual_locations = customer["behavioral_profile"]["transaction_patterns"]["usual_transaction_locations"]
        devices = customer["behavioral_profile"]["devices"]

        # Generate transactions for this customer
        for _ in range(num_txns):
            # Decide if this is a normal, suspicious, or fraudulent transaction
            transaction_type = random.choices(
                ["normal", "suspicious", "fraudulent"],
                weights=[0.60, 0.25, 0.15],
                k=1
            )[0]

            # Generate transaction details based on type
            if transaction_type == "normal":
                # Normal transaction - follows customer's patterns
                amount = round(random.normalvariate(avg_amount, std_amount), 2)
                if amount < 0:  # Ensure amount is positive
                    amount = round(avg_amount * random.uniform(0.5, 0.8), 2)

                category = random.choice(usual_categories)

                # Select a usual location
                location_data = random.choice(usual_locations)
                location = location_data["location"]

                # Select one of customer's devices
                device = random.choice(devices)
                device_info = {
                    "device_id": device["device_id"],
                    "type": device["type"],
                    "os": device["os"],
                    "browser": device["browser"],
                    "ip": random.choice(device["ip_range"])
                }

                risk_score = random.uniform(1, 30)

            elif transaction_type == "suspicious":
                # Suspicious transaction - deviates from patterns but may be legitimate
                amount = round(avg_amount * random.uniform(1.5, 3), 2)

                # Maybe an unusual category
                if random.random() < 0.7:
                    all_categories = [
                        "grocery", "restaurant", "retail", "travel", "entertainment",
                        "utilities", "healthcare", "electronics", "gas", "online"
                    ]
                    unusual_categories = [c for c in all_categories if c not in usual_categories]
                    if unusual_categories:
                        category = random.choice(unusual_categories)
                    else:
                        category = random.choice(usual_categories)
                else:
                    category = random.choice(usual_categories)

                # Maybe unusual location
                if random.random() < 0.7:
                    location = {
                        "type": "Point",
                        "coordinates": [
                            float(fake.longitude()),
                            float(fake.latitude())
                        ]
                    }
                    location_data = {
                        "city": fake.city(),
                        "state": fake.state(),
                        "country": fake.country_code(),
                        "location": location
                    }
                else:
                    location_data = random.choice(usual_locations)
                    location = location_data["location"]

                # Use regular device
                device = random.choice(devices)
                device_info = {
                    "device_id": device["device_id"],
                    "type": device["type"],
                    "os": device["os"],
                    "browser": device["browser"],
                    "ip": random.choice(device["ip_range"])
                }

                risk_score = random.uniform(30, 70)

            else:
                # Fraudulent transaction - significant deviations
                amount = round(avg_amount * random.uniform(3, 10), 2)

                # Unusual category
                all_categories = [
                    "grocery", "restaurant", "retail", "travel", "entertainment",
                    "utilities", "healthcare", "electronics", "gas", "online",
                    "gambling", "cryptocurrency", "jewelry", "money_transfer"
                ]
                unusual_categories = [c for c in all_categories if c not in usual_categories]
                category = random.choice(unusual_categories if unusual_categories else all_categories)

                # Unusual location
                location = {
                    "type": "Point",
                    "coordinates": [
                        float(fake.longitude()),
                        float(fake.latitude())
                    ]
                }
                location_data = {
                    "city": fake.city(),
                    "state": fake.state(),
                    "country": random.choice(["RU", "NG", "CN", "BR"]),  # High fraud countries for demo
                    "location": location
                }

                # New unknown device
                device_info = {
                    "device_id": str(uuid.uuid4()),  # New unknown device
                    "type": random.choice(["mobile", "desktop", "tablet"]),
                    "os": random.choice(["iOS", "Android", "Windows", "macOS", "Linux"]),
                    "browser": random.choice(["Chrome", "Safari", "Firefox", "Edge"]),
                    "ip": fake.ipv4()
                }

                risk_score = random.uniform(70, 100)

            # Generate timestamp within the specified date range
            txn_date = fake.date_time_between(start_date=start_date, end_date=end_date)

            # Create transaction document
            transaction = {
                "customer_id": customer_id,
                "transaction_id": str(ObjectId()),
                "timestamp": txn_date.isoformat(),
                "amount": amount,
                "currency": "USD",  # Simplified for demo
                "merchant": {
                    "name": fake.company(),
                    "category": category,
                    "id": str(uuid.uuid4())[:8]
                },
                "location": {
                    "city": location_data["city"],
                    "state": location_data["state"],
                    "country": location_data["country"],
                    "coordinates": location
                },
                "device_info": device_info,
                "transaction_type": random.choice(["purchase", "refund", "payment", "transfer", "withdrawal"]),
                "payment_method": random.choice(["credit_card", "debit_card", "bank_transfer", "digital_wallet"]),
                "status": "completed",
                "risk_assessment": {
                    "score": risk_score,
                    "level": "high" if risk_score > 70 else "medium" if risk_score > 30 else "low",
                    "flags": [],
                    "transaction_type": transaction_type
                }
            }

            # Add risk flags based on risk score
            if risk_score > 30:
                possible_flags = ["unusual_amount", "unexpected_location", "velocity_alert",
                                 "new_merchant_category", "rare_transaction_time"]
                num_flags = min(int(risk_score / 20), len(possible_flags))
                transaction["risk_assessment"]["flags"] = random.sample(possible_flags, num_flags)

            transactions.append(transaction)

    return transactions

### Fraud Patterns Collection

For this demo, we'll create simple fraud pattern vectors that could be used for similarity matching. In a real-world scenario, these would be derived from actual fraud incidents and would be more complex.



In [None]:
def generate_bedrock_embedding(text_input):
    """Generate embeddings using AWS Bedrock Titan Embeddings model"""

    if bedrock_runtime is None:
        # Fallback to random embeddings if Bedrock client is not available
        return normalize(np.random.rand(1, 1536)).tolist()[0]

    try:
        # Amazon Titan Embeddings model ID
        model_id = "amazon.titan-embed-text-v1"

        # Prepare request payload
        request_body = json.dumps({
            "inputText": text_input
        })

        # Invoke the model
        response = bedrock_runtime.invoke_model(
            modelId=model_id,
            contentType="application/json",
            accept="application/json",
            body=request_body
        )

        # Parse response
        response_body = json.loads(response.get('body').read())
        embedding = response_body.get('embedding')

        return embedding

    except Exception as e:
        print(f"Error generating embedding: {e}")
        # Fallback to random embeddings
        return normalize(np.random.rand(1, 1536)).tolist()[0]

def create_pattern_text_representation(pattern):
    """Create a text representation of a fraud pattern for embedding"""

    text = f"""
    Pattern Name: {pattern['pattern_name']}
    Description: {pattern['description']}
    Severity: {pattern['severity']}
    Indicators: {', '.join(pattern['indicators'])}
    """

    return text.strip()

In [None]:
def generate_fraud_patterns():
    """Generate sample fraud patterns with real embeddings via AWS Bedrock"""

    # Define fraud patterns
    patterns_data = [
        {
            "pattern_name": "Account Takeover",
            "description": "New device login followed by unusual transactions and settings changes",
            "severity": "high",
            "indicators": [
                "new_device",
                "unusual_location",
                "settings_change",
                "high_value_transaction"
            ],
            "detection_rate": round(random.uniform(0.7, 0.95), 2),
            "false_positive_rate": round(random.uniform(0.01, 0.1), 2)
        },
        {
            "pattern_name": "Card Testing",
            "description": "Multiple small transactions in quick succession",
            "severity": "medium",
            "indicators": [
                "multiple_small_transactions",
                "high_velocity",
                "unusual_merchant_pattern"
            ],
            "detection_rate": round(random.uniform(0.6, 0.85), 2),
            "false_positive_rate": round(random.uniform(0.05, 0.15), 2)
        },
        {
            "pattern_name": "Transaction Laundering",
            "description": "Series of transactions that move funds through multiple accounts",
            "severity": "high",
            "indicators": [
                "structured_amounts",
                "circular_transfers",
                "multiple_accounts"
            ],
            "detection_rate": round(random.uniform(0.5, 0.8), 2),
            "false_positive_rate": round(random.uniform(0.05, 0.2), 2)
        },
        {
            "pattern_name": "Geographic Anomaly",
            "description": "Transactions from unusual locations or rapid location changes",
            "severity": "medium",
            "indicators": [
                "unusual_location",
                "impossible_travel",
                "high_risk_country"
            ],
            "detection_rate": round(random.uniform(0.6, 0.9), 2),
            "false_positive_rate": round(random.uniform(0.02, 0.1), 2)
        },
        {
            "pattern_name": "Purchase Anomaly",
            "description": "Purchases that deviate significantly from customer's usual behavior",
            "severity": "low",
            "indicators": [
                "unusual_merchant_category",
                "unusual_amount",
                "unusual_time"
            ],
            "detection_rate": round(random.uniform(0.4, 0.7), 2),
            "false_positive_rate": round(random.uniform(0.1, 0.3), 2)
        }
    ]

    # Generate embeddings for each pattern
    complete_patterns = []
    for pattern in patterns_data:
        # Create text representation of pattern
        pattern_text = create_pattern_text_representation(pattern)

        # Generate embedding
        print(f"Generating embedding for pattern: {pattern['pattern_name']}")
        vector_embedding = generate_bedrock_embedding(pattern_text)

        # Add embedding to pattern
        pattern["vector_embedding"] = vector_embedding
        complete_patterns.append(pattern)

    return complete_patterns

### Generating and Storing Synthetic Data to MongoDB

In [None]:
# Generate and insert customer profiles
print("Generating customer profiles...")
customers = generate_customer_profiles(50)
if customers_collection.count_documents({}) == 0:
    customers_collection.insert_many(customers)
    print(f"Inserted {len(customers)} customer profiles")
else:
    print("Customer collection already populated, skipping insert")

# Generate and insert transactions
print("Generating transactions...")
transactions = generate_transactions(customers, 6)
if transactions_collection.count_documents({}) == 0:
    transactions_collection.insert_many(transactions)
    print(f"Inserted {len(transactions)} transactions")
else:
    print("Transactions collection already populated, skipping insert")

# Generate and insert fraud patterns
print("Generating fraud patterns...")
fraud_patterns = generate_fraud_patterns()
if fraud_patterns_collection.count_documents({}) == 0:
    fraud_patterns_collection.insert_many(fraud_patterns)
    print(f"Inserted {len(fraud_patterns)} fraud patterns")
else:
    print("Fraud patterns collection already populated, skipping insert")

### Convert Transactions to Embeddings

In [None]:
#helper function
def generate_bedrock_embedding(text_input):
    """Generate embeddings using AWS Bedrock Titan Embeddings model - no fallbacks"""

    # Check if bedrock_runtime is defined in the global scope
    global bedrock_runtime

    # Initialize bedrock_runtime if it's not defined
    if 'bedrock_runtime' not in globals() or bedrock_runtime is None:
        # Load environment variables
        load_dotenv()

        # Get AWS credentials
        aws_access_key = os.getenv("AWS_ACCESS_KEY_ID")
        aws_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
        aws_region = os.getenv("AWS_REGION", "us-east-1")

        if not aws_access_key or not aws_secret_key:
            raise ValueError("AWS credentials not found. Please set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.")

        # Configure AWS
        boto3_config = Config(
            region_name=aws_region,
            signature_version='v4',
            retries={
                'max_attempts': 3,
                'mode': 'standard'
            }
        )

        # Initialize Bedrock client
        bedrock_runtime = boto3.client(
            service_name='bedrock-runtime',
            aws_access_key_id=aws_access_key,
            aws_secret_access_key=aws_secret_key,
            config=boto3_config
        )
        print("AWS Bedrock client initialized successfully")

    # Amazon Titan Embeddings model ID
    model_id = "amazon.titan-embed-text-v1"

    # Prepare request payload
    request_body = json.dumps({
        "inputText": text_input
    })

    # Invoke the model
    response = bedrock_runtime.invoke_model(
        modelId=model_id,
        contentType="application/json",
        accept="application/json",
        body=request_body
    )

    # Parse response
    response_body = json.loads(response.get('body').read())
    embedding = response_body.get('embedding')

    if not embedding:
        raise ValueError("Failed to generate embedding: Empty response from Bedrock API")

    return embedding

In [None]:
#helper function
def create_transaction_text_representation(transaction):
    """Create a text representation of a transaction for embedding"""

    # Format transaction details as text
    text = f"""
    Transaction ID: {transaction.get('transaction_id', 'N/A')}
    Amount: {transaction.get('amount', 0)} {transaction.get('currency', 'USD')}
    Merchant: {transaction.get('merchant', {}).get('name', 'N/A')}
    Merchant Category: {transaction.get('merchant', {}).get('category', 'N/A')}
    Transaction Type: {transaction.get('transaction_type', 'N/A')}
    Payment Method: {transaction.get('payment_method', 'N/A')}
    Location: {transaction.get('location', {}).get('city', 'N/A')}, {transaction.get('location', {}).get('state', 'N/A')}, {transaction.get('location', {}).get('country', 'N/A')}
    Device: {transaction.get('device_info', {}).get('type', 'N/A')}, {transaction.get('device_info', {}).get('os', 'N/A')}, {transaction.get('device_info', {}).get('browser', 'N/A')}
    Risk Score: {transaction.get('risk_assessment', {}).get('score', 0)}
    Risk Level: {transaction.get('risk_assessment', {}).get('level', 'N/A')}
    Risk Flags: {', '.join(transaction.get('risk_assessment', {}).get('flags', [])) if transaction.get('risk_assessment', {}).get('flags', []) else 'None'}
    """

    return text.strip()

In [None]:
#Batch Generate Transaction Embeddings
def batch_generate_transaction_embeddings(batch_size=10, max_documents=None):
    """
    Generate embeddings for all transactions in the database and update the documents

    Args:
        batch_size: Number of transactions to process in each batch
        max_documents: Maximum number of documents to process (None for all)

    Returns:
        dict: Summary of the processing
    """
    # Ensure MongoDB collections are defined
    global client, db, transactions_collection

    # Check if MongoDB collections are defined, if not, reconnect
    if 'transactions_collection' not in globals():
        # Load environment variables from .env file (if using)
        load_dotenv()

        # Get MongoDB connection string from environment variables or set directly
        MONGODB_URI = os.getenv("MONGODB_URI", "your_connection_string_here")
        DB_NAME = "threatsight360"

        # Connect to MongoDB Atlas
        client = pymongo.MongoClient(MONGODB_URI)
        db = client[DB_NAME]

        # Create collections
        transactions_collection = db["transactions"]
        print("Reconnected to MongoDB and defined collections")

    start_time = time.time()
    total_processed = 0
    successful = 0
    failed = 0

    # Get total count for progress tracking
    total_docs = transactions_collection.count_documents({})
    if max_documents:
        total_docs = min(total_docs, max_documents)

    print(f"Starting embedding generation for {total_docs} transactions")

    # Process documents in batches
    cursor = transactions_collection.find({})

    batch = []
    batch_ids = []

    for transaction in cursor:
        if max_documents and total_processed >= max_documents:
            break

        # Skip documents that already have embeddings
        if "vector_embedding" in transaction:
            total_processed += 1
            successful += 1
            continue

        # Add transaction to current batch
        batch.append(transaction)
        batch_ids.append(transaction["_id"])

        # When batch is full, process it
        if len(batch) >= batch_size:
            process_batch_results = process_transaction_batch(batch, batch_ids)
            successful += process_batch_results["successful"]
            failed += process_batch_results["failed"]

            # Clear batch
            batch = []
            batch_ids = []

            # Update progress
            total_processed += batch_size
            elapsed_time = time.time() - start_time
            progress = (total_processed / total_docs) * 100 if total_docs > 0 else 0

            print(f"Progress: {progress:.1f}% ({total_processed}/{total_docs}) - "
                  f"Success: {successful}, Failed: {failed} - "
                  f"Elapsed: {elapsed_time:.1f}s")

    # Process any remaining transactions in the final batch
    if batch:
        process_batch_results = process_transaction_batch(batch, batch_ids)
        successful += process_batch_results["successful"]
        failed += process_batch_results["failed"]
        total_processed += len(batch)

    elapsed_time = time.time() - start_time

    # Create summary
    summary = {
        "total_processed": total_processed,
        "successful": successful,
        "failed": failed,
        "elapsed_time": elapsed_time,
        "average_time_per_doc": elapsed_time / total_processed if total_processed > 0 else 0
    }

    print(f"Embedding generation complete!")
    print(f"Processed {total_processed} transactions in {elapsed_time:.1f} seconds")
    print(f"Success: {successful}, Failed: {failed}")

    return summary

In [None]:
#helper function
def process_transaction_batch(batch, batch_ids):
    """Process a batch of transactions to generate embeddings"""
    successful = 0
    failed = 0

    for i, transaction in enumerate(batch):
        try:
            # Create text representation
            transaction_text = create_transaction_text_representation(transaction)

            # Generate embedding
            embedding = generate_bedrock_embedding(transaction_text)

            # Update transaction with embedding
            result = transactions_collection.update_one(
                {"_id": batch_ids[i]},
                {"$set": {"vector_embedding": embedding}}
            )

            if result.modified_count == 1:
                successful += 1
            else:
                failed += 1
                print(f"Warning: Failed to update transaction {transaction['transaction_id']}")

        except Exception as e:
            failed += 1
            print(f"Error processing transaction {transaction['transaction_id']}: {e}")

    return {
        "successful": successful,
        "failed": failed
    }

In [None]:
# Process all transactions in batches of 50
results = batch_generate_transaction_embeddings(batch_size=100)

### Create MongoDB Indexes

In [None]:
# Create standard indexes for better query performance
customers_collection.create_index("personal_info.email")
customers_collection.create_index("risk_profile.overall_risk_score")
transactions_collection.create_index("customer_id")
transactions_collection.create_index("timestamp")
transactions_collection.create_index("risk_assessment.score")
transactions_collection.create_index([("location.coordinates", pymongo.GEOSPHERE)])
fraud_patterns_collection.create_index("severity")

print("Created standard indexes")

In [None]:
# Create a search index for the customers collection
search_index = {
    "name": "customer_search_index",
    "definition": {
        "mappings": {
            "dynamic": True,
            "fields": {
                "personal_info.name": {
                    "type": "string"
                },
                "personal_info.email": {
                    "type": "string"
                },
                "account_info.account_number": {
                    "type": "string"
                },
                "risk_profile.overall_risk_score": {
                    "type": "number"
                },
                "behavioral_profile.transaction_patterns.common_merchant_categories": {
                    "type": "string"
                }
            }
        }
    }
}

In [None]:
# Create the search index (if it doesn't exist)
try:
    db.command({
        "createSearchIndexes": "customers",
        "indexes": [search_index]
    })
    print("Created search index for customers collection")
except Exception as e:
    print(f"Error creating search index: {e}")
    print("Note: Make sure Atlas Search is enabled on your MongoDB Atlas cluster")

In [None]:
# Create a vector search index for the fraud patterns collection
vector_index = {
    "name": "fraud_pattern_vector_index",
    "definition": {
        "mappings": {
            "dynamic": True,
            "fields": {
                "vector_embedding": {
                    "type": "knnVector",
                    "dimensions": 1536,  # Amazon Titan Embeddings dimension
                    "similarity": "cosine"
                }
            }
        }
    }
}

In [None]:
# Create the vector search index (if it doesn't exist)
try:
    db.command({
        "createSearchIndexes": "fraud_patterns",
        "indexes": [vector_index]
    })
    print("Created vector search index for fraud patterns collection")
except Exception as e:
    print(f"Error creating vector search index: {e}")
    print("Note: Vector search requires Atlas Search to be enabled on your MongoDB Atlas cluster")

In [None]:
# Create a vector search index for transactions collection
# Create the vector search index for transactions (if it doesn't exist)
try:
    # Define the index configuration
    transaction_index_config = {
        "name": "transaction_vector_index",
        "definition": {
            "mappings": {
                "dynamic": True,
                "fields": {
                    "vector_embedding": {
                        "type": "knnVector",
                        "dimensions": 1536,  # Amazon Titan Embeddings dimension
                        "similarity": "cosine"
                    }
                }
            }
        }
    }

    # Create the index
    db.command({
        "createSearchIndexes": "transactions",
        "indexes": [transaction_index_config]
    })
    print("Created vector search index for transactions collection")
except Exception as e:
    print(f"Error creating transaction vector search index: {e}")
    print("Note: Vector search requires MongoDB Atlas with Search enabled")
    print("Alternative approach: You can create this index from the Atlas UI manually")

## Example Queries


### Basic Query

In [None]:
# Find customers with high risk score
high_risk_customers = customers_collection.find(
    {"risk_profile.overall_risk_score": {"$gt": 70}},
    {"personal_info.name": 1, "risk_profile.overall_risk_score": 1, "risk_profile.risk_factors": 1}
)

print("High-Risk Customers:")
for customer in high_risk_customers:
    print(f"{customer['personal_info']['name']}: Score {customer['risk_profile']['overall_risk_score']}")
    if "risk_factors" in customer["risk_profile"] and customer["risk_profile"]["risk_factors"]:
        print(f"  Risk factors: {', '.join(customer['risk_profile']['risk_factors'])}")
    print()

### Transaction Summary by Risk Level

In [None]:
# Analyze transactions by risk level
risk_pipeline = [
    {
        "$group": {
            "_id": "$risk_assessment.level",
            "count": {"$sum": 1},
            "avg_amount": {"$avg": "$amount"},
            "total_amount": {"$sum": "$amount"}
        }
    },
    {
        "$sort": {"_id": 1}
    }
]

risk_results = transactions_collection.aggregate(risk_pipeline)

print("Transaction Analysis by Risk Level:")
for result in risk_results:
    print(f"Risk Level: {result['_id']}")
    print(f"  Count: {result['count']}")
    print(f"  Average Amount: ${result['avg_amount']:.2f}")
    print(f"  Total Amount: ${result['total_amount']:.2f}")
    print()

Transaction Analysis by Risk Level:
Risk Level: high
  Count: 4015
  Average Amount: $1824.49
  Total Amount: $7325328.92

Risk Level: low
  Count: 15579
  Average Amount: $278.96
  Total Amount: $4345984.82

Risk Level: medium
  Count: 6700
  Average Amount: $621.26
  Total Amount: $4162445.45



### Geospatial Query: Find Transactions Near a Location

In [None]:
# Find transactions near a specific location
# For demo purposes, we'll use the location of the first transaction as our reference point
sample_txn = transactions_collection.find_one()
reference_location = sample_txn["location"]["coordinates"]["coordinates"]

nearby_txns = transactions_collection.find({
    "location.coordinates": {
        "$nearSphere": {
            "$geometry": {
                "type": "Point",
                "coordinates": reference_location
            },
            "$maxDistance": 100000  # 100 km in meters
        }
    }
})

print(f"Transactions within 100km of {reference_location}:")
for i, txn in enumerate(nearby_txns):
    if i >= 5:  # Limit to 5 results for display
        print("... and more")
        break
    print(f"Transaction {txn['transaction_id']}: ${txn['amount']:.2f}")
    print(f"  Location: {txn['location']['city']}, {txn['location']['country']}")
    print(f"  Risk Score: {txn['risk_assessment']['score']}")
    print()

Transactions within 100km of [-94.144577, 24.8050725]:
Transaction 67d2a82b654c7f1b869c4b0c: $225.17
  Location: Bradleyburgh, CN
  Risk Score: 86.0279076621168



### Vector Similarity Search: Find Similar Fraud Patterns

In [None]:
# Find similar fraud patterns using vector search
# For demo, we'll use the first pattern's vector as our query vector
sample_pattern = fraud_patterns_collection.find_one()
query_vector = sample_pattern["vector_embedding"]

vector_search_pipeline = [
    {
        "$search": {
            "index": "fraud_pattern_vector_index",
            "knnBeta": {
                "vector": query_vector,
                "path": "vector_embedding",
                "k": 3,
                "filter": {
                    "range": {
                        "path": "detection_rate",
                        "gte": 0.6
                    }
                }
            }
        }
    }
]

try:
    similar_patterns = db.fraud_patterns.aggregate(vector_search_pipeline)

    print(f"Patterns similar to '{sample_pattern['pattern_name']}':")
    for pattern in similar_patterns:
        print(f"Pattern: {pattern['pattern_name']}")
        print(f"  Description: {pattern['description']}")
        print(f"  Severity: {pattern['severity']}")
        print(f"  Detection Rate: {pattern['detection_rate']}")
        print()
except Exception as e:
    print(f"Error running vector search: {e}")
    print("Note: Vector search requires the Atlas Search feature to be enabled on your cluster")

Patterns similar to 'Account Takeover':
Pattern: Account Takeover
  Description: New device login followed by unusual transactions and settings changes
  Severity: high
  Detection Rate: 0.83

Pattern: Purchase Anomaly
  Description: Purchases that deviate significantly from customer's usual behavior
  Severity: low
  Detection Rate: 0.64

Pattern: Geographic Anomaly
  Description: Transactions from unusual locations or rapid location changes
  Severity: medium
  Detection Rate: 0.68



### Transaction Embedding Generation

In [None]:
def create_transaction_text_representation(transaction):
    """Create a text representation of a transaction for embedding"""

    # Format transaction details as text
    text = f"""
    Transaction ID: {transaction['transaction_id']}
    Amount: {transaction['amount']} {transaction['currency']}
    Merchant: {transaction['merchant']['name']}
    Merchant Category: {transaction['merchant']['category']}
    Transaction Type: {transaction['transaction_type']}
    Payment Method: {transaction['payment_method']}
    Location: {transaction['location']['city']}, {transaction['location']['state']}, {transaction['location']['country']}
    Device: {transaction['device_info']['type']}, {transaction['device_info']['os']}, {transaction['device_info']['browser']}
    Risk Score: {transaction['risk_assessment']['score']}
    Risk Level: {transaction['risk_assessment']['level']}
    Risk Flags: {', '.join(transaction['risk_assessment']['flags']) if transaction['risk_assessment']['flags'] else 'None'}
    """

    return text.strip()

def generate_transaction_embedding(transaction):
    """Generate embedding for a transaction using AWS Bedrock"""

    # Create text representation
    transaction_text = create_transaction_text_representation(transaction)

    # Generate embedding
    embedding = generate_bedrock_embedding(transaction_text)

    return embedding

# Example: Generate embedding for a sample transaction
sample_transaction = transactions_collection.find_one({"risk_assessment.level": "high"})
if sample_transaction:
    print("Generating embedding for a high-risk transaction...")
    sample_embedding = generate_transaction_embedding(sample_transaction)
    print(f"Embedding generated with {len(sample_embedding)} dimensions")

    # Update transaction with embedding (optional)
    transactions_collection.update_one(
        {"_id": sample_transaction["_id"]},
        {"$set": {"vector_embedding": sample_embedding}}
    )
    print("Transaction updated with embedding")

Generating embedding for a high-risk transaction...
Embedding generated with 1536 dimensions
Transaction updated with embedding


### Finding Similar Suspicious Transactions

In [None]:
def find_similar_transactions(transaction_id, limit=5):
    """Find transactions similar to the given transaction using vector search"""

    # Get the transaction
    transaction = transactions_collection.find_one({"transaction_id": transaction_id})
    if not transaction:
        return {"error": "Transaction not found"}

    # Check if transaction has embedding, if not generate one
    if "vector_embedding" not in transaction:
        embedding = generate_transaction_embedding(transaction)
    else:
        embedding = transaction["vector_embedding"]

    # Search for similar transactions
    try:
        pipeline = [
            {
                "$search": {
                    "index": "transaction_vector_index",  # You'll need to create this index
                    "knnBeta": {
                        "vector": embedding,
                        "path": "vector_embedding",
                        "k": limit + 1  # +1 because the query transaction will be included
                    }
                }
            },
            {
                "$match": {
                    "transaction_id": {"$ne": transaction_id}  # Exclude the query transaction
                }
            },
            {
                "$limit": limit
            },
            {
                "$project": {
                    "transaction_id": 1,
                    "amount": 1,
                    "merchant": 1,
                    "timestamp": 1,
                    "risk_assessment.score": 1,
                    "risk_assessment.level": 1,
                    "location.city": 1,
                    "location.country": 1
                }
            }
        ]

        similar_transactions = list(transactions_collection.aggregate(pipeline))
        return similar_transactions

    except Exception as e:
        print(f"Error searching for similar transactions: {e}")
        return {"error": str(e)}

### Behavioral Anomaly Detection Example

In [None]:
def detect_anomalies(customer_id):
    """Detect transaction anomalies for a specific customer based on their profile"""

    # Get customer profile
    customer = customers_collection.find_one({"_id": customer_id})
    if not customer:
        return {"error": "Customer not found"}

    # Get customer's behavioral patterns
    avg_amount = customer["behavioral_profile"]["transaction_patterns"]["avg_transaction_amount"]
    std_amount = customer["behavioral_profile"]["transaction_patterns"]["std_transaction_amount"]
    usual_categories = customer["behavioral_profile"]["transaction_patterns"]["common_merchant_categories"]

    # Get recent transactions
    recent_txns = transactions_collection.find({
        "customer_id": customer_id,
        "timestamp": {"$gte": (datetime.now() - timedelta(days=30)).isoformat()}
    }).sort("timestamp", -1)

    # Analyze transactions for anomalies
    anomalies = []
    for txn in recent_txns:
        flags = []

        # Check for amount anomaly
        if txn["amount"] > avg_amount + (3 * std_amount):
            flags.append("unusual_amount")

        # Check for category anomaly
        if txn["merchant"]["category"] not in usual_categories:
            flags.append("unusual_merchant_category")

        # Check for location anomaly
        unusual_location = True
        for loc in customer["behavioral_profile"]["transaction_patterns"]["usual_transaction_locations"]:
            # Simple check - in real system would use actual distance calculation
            if loc["city"] == txn["location"]["city"]:
                unusual_location = False
                break

        if unusual_location:
            flags.append("unusual_location")

        # If any flags, add to anomalies
        if flags:
            anomalies.append({
                "transaction_id": txn["transaction_id"],
                "timestamp": txn["timestamp"],
                "amount": txn["amount"],
                "merchant": txn["merchant"]["name"],
                "flags": flags,
                "risk_score": txn["risk_assessment"]["score"]
            })

    return {
        "customer_name": customer["personal_info"]["name"],
        "baseline": {
            "avg_amount": avg_amount,
            "usual_categories": usual_categories
        },
        "anomalies": anomalies
    }

# Test the anomaly detection function with a random customer
sample_customer = customers_collection.find_one()
anomaly_results = detect_anomalies(sample_customer["_id"])

print("Behavioral Anomaly Detection Results:")
print(f"Customer: {anomaly_results['customer_name']}")
print(f"Baseline - Avg Amount: ${anomaly_results['baseline']['avg_amount']:.2f}")
print(f"Baseline - Usual Categories: {', '.join(anomaly_results['baseline']['usual_categories'])}")
print(f"Found {len(anomaly_results['anomalies'])} anomalies in the last 30 days")

for i, anomaly in enumerate(anomaly_results['anomalies']):
    if i >= 3:  # Limit display to 3 anomalies
        print("... and more")
        break
    print(f"\nAnomaly {i+1}:")
    print(f"  Transaction: {anomaly['transaction_id']}")
    print(f"  Date: {anomaly['timestamp']}")
    print(f"  Amount: ${anomaly['amount']:.2f}")
    print(f"  Merchant: {anomaly['merchant']}")
    print(f"  Flags: {', '.join(anomaly['flags'])}")
    print(f"  Risk Score: {anomaly['risk_score']:.1f}")

### Vector Search Anomaly Detection Example

In [None]:
#Transaction Fraud Evaluation Using Vector Similarity

def evaluate_transaction_fraud(transaction, similarity_threshold=0.85, top_k=5):
    """
    Evaluate a transaction for potential fraud by:
    1. Converting it to an embedding
    2. Finding similar transactions using vector search
    3. Calculating a fraud score based on similarities

    Args:
        transaction: Transaction document or dict with transaction details
        similarity_threshold: Threshold for considering transactions similar (0.0-1.0)
        top_k: Number of similar transactions to return

    Returns:
        dict: Evaluation results including fraud score and similar transactions
    """
    try:
        # Generate embedding for the transaction
        if not transaction.get('vector_embedding'):
            print("Generating embedding for transaction...")
            transaction_text = create_transaction_text_representation(transaction)
            embedding = generate_bedrock_embedding(transaction_text)
        else:
            embedding = transaction['vector_embedding']

        # Find similar transactions using vector search
        try:
            pipeline = [
                {
                    "$search": {
                        "index": "transaction_vector_index",
                        "knnBeta": {
                            "vector": embedding,
                            "path": "vector_embedding",
                            "k": top_k + 1  # +1 to account for self-match
                        }
                    }
                },
                {
                    "$project": {
                        "transaction_id": 1,
                        "amount": 1,
                        "merchant": 1,
                        "risk_assessment.score": 1,
                        "risk_assessment.level": 1,
                        "risk_assessment.transaction_type": 1,
                        "timestamp": 1,
                        "location.city": 1,
                        "location.country": 1,
                        "score": {"$meta": "searchScore"}  # Get the similarity score
                    }
                }
            ]

            similar_transactions = list(transactions_collection.aggregate(pipeline))

            # Remove self-match if present
            if transaction.get('transaction_id'):
                similar_transactions = [t for t in similar_transactions
                                      if t.get('transaction_id') != transaction.get('transaction_id')]

            # Take only top_k transactions
            similar_transactions = similar_transactions[:top_k]

            # Calculate fraud score based on similar fraudulent transactions
            total_score = 0
            fraud_weight = 0

            for similar_tx in similar_transactions:
                # Get similarity score - normalize search score to 0-1 range
                similarity = min(similar_tx.get('score', 0) / 1.0, 1.0)

                # Skip transactions with similarity below threshold
                if similarity < similarity_threshold:
                    continue

                # Get risk level of similar transaction
                tx_type = similar_tx.get('risk_assessment', {}).get('transaction_type', 'normal')
                tx_risk_score = similar_tx.get('risk_assessment', {}).get('score', 0)

                # Weight by transaction type and similarity
                if tx_type == 'fraudulent':
                    weight = 1.0
                elif tx_type == 'suspicious':
                    weight = 0.5
                else:
                    weight = 0.1

                # Add to weighted average
                total_score += tx_risk_score * similarity * weight
                fraud_weight += similarity * weight

            # Calculate final fraud score (0-100)
            fraud_score = round(total_score / max(fraud_weight, 0.001), 2)

            # Determine risk level
            if fraud_score >= 70:
                risk_level = "high"
            elif fraud_score >= 30:
                risk_level = "medium"
            else:
                risk_level = "low"

            return {
                "fraud_score": fraud_score,
                "risk_level": risk_level,
                "similar_transactions": similar_transactions,
                "evaluation_method": "vector_similarity",
                "threshold_used": similarity_threshold
            }

        except Exception as e:
            print(f"Error performing vector search: {e}")
            # Fallback to basic evaluation if vector search fails
            return {
                "fraud_score": transaction.get('risk_assessment', {}).get('score', 50),
                "risk_level": transaction.get('risk_assessment', {}).get('level', 'medium'),
                "similar_transactions": [],
                "evaluation_method": "fallback_basic",
                "error": str(e)
            }

    except Exception as e:
        print(f"Error evaluating transaction: {e}")
        return {
            "error": str(e),
            "fraud_score": 50,  # Default middle score
            "risk_level": "medium",
            "evaluation_method": "error_fallback"
        }

In [None]:
def test_fraud_evaluation():
    """
    Test the fraud evaluation function on normal and fraudulent transactions
    """
    print("Testing fraud evaluation on normal and fraudulent transactions...")

    # Find a few normal transactions
    normal_transactions = list(transactions_collection.find(
        {"risk_assessment.transaction_type": "normal"},
        {"transaction_id": 1, "amount": 1, "merchant": 1, "risk_assessment": 1}
    ).limit(3))

    # Find a few fraudulent transactions
    fraudulent_transactions = list(transactions_collection.find(
        {"risk_assessment.transaction_type": "fraudulent"},
        {"transaction_id": 1, "amount": 1, "merchant": 1, "risk_assessment": 1}
    ).limit(3))

    print(f"Found {len(normal_transactions)} normal and {len(fraudulent_transactions)} fraudulent transactions for testing")

    # Test normal transactions
    print("\nEvaluating normal transactions:")
    for idx, tx in enumerate(normal_transactions):
        print(f"\nNormal Transaction {idx+1}: {tx.get('transaction_id')}")
        print(f"Original risk score: {tx.get('risk_assessment', {}).get('score', 'N/A')}")

        # Evaluate the transaction
        result = evaluate_transaction_fraud(tx)

        print(f"Calculated fraud score: {result.get('fraud_score')}")
        print(f"Risk level: {result.get('risk_level')}")
        print(f"Similar transactions found: {len(result.get('similar_transactions', []))}")

    # Test fraudulent transactions
    print("\nEvaluating fraudulent transactions:")
    for idx, tx in enumerate(fraudulent_transactions):
        print(f"\nFraudulent Transaction {idx+1}: {tx.get('transaction_id')}")
        print(f"Original risk score: {tx.get('risk_assessment', {}).get('score', 'N/A')}")

        # Evaluate the transaction
        result = evaluate_transaction_fraud(tx)

        print(f"Calculated fraud score: {result.get('fraud_score')}")
        print(f"Risk level: {result.get('risk_level')}")
        print(f"Similar transactions found: {len(result.get('similar_transactions', []))}")

        # Print top similar transaction
        if result.get('similar_transactions'):
            top_similar = result['similar_transactions'][0]
            print(f"Top similar transaction: {top_similar.get('transaction_id')}")
            print(f"  Type: {top_similar.get('risk_assessment', {}).get('transaction_type', 'N/A')}")
            print(f"  Score: {top_similar.get('risk_assessment', {}).get('score', 'N/A')}")

In [None]:
def simulate_transaction_evaluation(
    amount=None,
    merchant_category=None,
    location=None,
    customer_id=None
):
    """
    Simulate a transaction with the given parameters and evaluate it for fraud

    Args:
        amount: Transaction amount (float)
        merchant_category: Merchant category (string)
        location: Dictionary with city, state, country, coordinates
        customer_id: Customer ID (to use their profile for defaults)

    Returns:
        dict: Evaluation results
    """
    # Get default values from a random customer if customer_id is not provided
    if not customer_id:
        customer = customers_collection.find_one()
    else:
        customer = customers_collection.find_one({"_id": customer_id})

    if not customer:
        print("Customer not found. Using random values.")
        customer_id = None
        avg_amount = 100
        std_amount = 20
        common_categories = ["retail", "restaurant", "grocery"]
        common_location = {
            "city": "New York",
            "state": "NY",
            "country": "US",
            "coordinates": {
                "type": "Point",
                "coordinates": [-74.0060, 40.7128]
            }
        }
    else:
        customer_id = customer.get("_id")
        avg_amount = customer.get("behavioral_profile", {}).get("transaction_patterns", {}).get("avg_transaction_amount", 100)
        std_amount = customer.get("behavioral_profile", {}).get("transaction_patterns", {}).get("std_transaction_amount", 20)
        common_categories = customer.get("behavioral_profile", {}).get("transaction_patterns", {}).get("common_merchant_categories", ["retail"])

        # Get a common location
        location_data = customer.get("behavioral_profile", {}).get("transaction_patterns", {}).get("usual_transaction_locations", [])
        common_location = location_data[0] if location_data else {
            "city": "New York",
            "state": "NY",
            "country": "US",
            "location": {
                "type": "Point",
                "coordinates": [-74.0060, 40.7128]
            }
        }

    # Use provided values or defaults
    if amount is None:
        amount = round(random.normalvariate(avg_amount, std_amount), 2)
        if amount < 0:
            amount = round(avg_amount * 0.5, 2)

    if merchant_category is None:
        merchant_category = random.choice(common_categories)

    if location is None:
        location = common_location

    # Create a synthetic transaction
    transaction = {
        "transaction_id": str(uuid.uuid4()),
        "customer_id": customer_id,
        "timestamp": datetime.now().isoformat(),
        "amount": amount,
        "currency": "USD",
        "merchant": {
            "name": fake.company(),
            "category": merchant_category,
            "id": str(uuid.uuid4())[:8]
        },
        "location": location,
        "device_info": {
            "device_id": str(uuid.uuid4()),
            "type": random.choice(["mobile", "desktop", "tablet"]),
            "os": random.choice(["iOS", "Android", "Windows", "macOS"]),
            "browser": random.choice(["Chrome", "Safari", "Firefox", "Edge"]),
            "ip": fake.ipv4()
        },
        "transaction_type": random.choice(["purchase", "payment"]),
        "payment_method": random.choice(["credit_card", "debit_card"]),
        "status": "pending"
    }

    print(f"Simulated transaction:")
    print(f"Amount: ${transaction['amount']:.2f}")
    print(f"Merchant Category: {transaction['merchant']['category']}")
    print(f"Location: {transaction['location']['city']}, {transaction['location']['country']}")

    # Generate embedding and evaluate
    print("\nEvaluating transaction...")
    result = evaluate_transaction_fraud(transaction)

    print(f"\nEvaluation Results:")
    print(f"Fraud Score: {result.get('fraud_score')}")
    print(f"Risk Level: {result.get('risk_level')}")
    print(f"Similar Transactions Found: {len(result.get('similar_transactions', []))}")

    # Display similar transactions
    if result.get('similar_transactions'):
        print("\nTop Similar Transactions:")
        for i, tx in enumerate(result['similar_transactions'][:3]):  # Show top 3
            print(f"{i+1}. Transaction ID: {tx.get('transaction_id')}")
            print(f"   Amount: ${tx.get('amount', 0):.2f}")
            print(f"   Risk Level: {tx.get('risk_assessment', {}).get('level', 'unknown')}")
            print(f"   Type: {tx.get('risk_assessment', {}).get('transaction_type', 'unknown')}")

    return {
        "transaction": transaction,
        "evaluation": result
    }

In [None]:
# Option 1: Run the test evaluation function
# This tests existing normal and fraudulent transactions
test_fraud_evaluation()

In [None]:
# Option 2: Simulate a normal transaction
print("\n\n======= SIMULATING NORMAL TRANSACTION =======")
normal_result = simulate_transaction_evaluation()

In [None]:
# Option 3: Simulate a high-amount transaction (potentially suspicious)
print("\n\n======= SIMULATING HIGH-AMOUNT TRANSACTION =======")
high_amount_result = simulate_transaction_evaluation(
    amount=3000  # Set a high amount for the transaction
)

In [None]:
# Option 4: Simulate a transaction from an unusual location
print("\n\n======= SIMULATING UNUSUAL LOCATION TRANSACTION =======")
unusual_location = {
    "city": "Lagos",
    "state": "Lagos",
    "country": "NG",
    "location": {
        "type": "Point",
        "coordinates": [3.3792, 6.5244]  # Lagos, Nigeria coordinates
    }
}
unusual_location_result = simulate_transaction_evaluation(location=unusual_location)

## Conclusion

This notebook demonstrates how MongoDB's document model and advanced features enable sophisticated fraud detection capabilities through dynamic behavioral profiling. Key advantages demonstrated include:

1. **Flexible Schema**: The document model allows for rich, nested data structures that can capture complex customer behavior patterns.

2. **Geospatial Capabilities**: MongoDB's geospatial indexing enables location-based fraud detection.

3. **Vector Search**: Similarity-based pattern matching allows the system to identify emerging fraud patterns.

4. **Aggregation Framework**: Complex analysis can be performed directly in the database.

5. **Performance**: Proper indexing ensures fast queries even with large datasets.

For a production system, you would want to add:
- Real-time transaction scoring
- Machine learning models for pattern recognition
- Alert management system
- Case management for fraud analysts
- Integration with external data sources