# Module 4: Graph Analytics with Neo4j GDS

This notebook introduces Graph Data Science (GDS) algorithms and their applications in AI/ML workflows, focusing on practical analytics for business insights.

## Learning Objectives
- Apply centrality algorithms to identify important entities
- Use community detection to find clusters and groups
- Implement similarity algorithms for recommendations
- Perform fraud detection and risk assessment analytics
- Integrate graph features into ML pipelines

## Prerequisites
- Completion of Module 3: Unstructured Data
- Basic understanding of statistics and ML concepts

## Setup and Dependencies

In [None]:
# Install required packages
!pip install neo4j pandas numpy matplotlib seaborn scikit-learn networkx faker

In [None]:
# Import libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from neo4j import GraphDatabase
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
import networkx as nx
from faker import Faker
import random
from datetime import datetime, timedelta
from typing import List, Dict, Tuple

# Set style for plots
plt.style.use('default')
sns.set_palette("husl")

fake = Faker()
random.seed(42)
np.random.seed(42)

## Neo4j Connection and GDS Setup

In [None]:
# Neo4j connection settings
NEO4J_URI = os.getenv('NEO4J_URI', 'bolt://localhost:7687')
NEO4J_USERNAME = os.getenv('NEO4J_USERNAME', 'neo4j')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD', 'password')

# Create Neo4j driver
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))

def run_query(query, parameters=None):
    """Execute a Cypher query and return results"""
    with driver.session() as session:
        result = session.run(query, parameters or {})
        return [record.data() for record in result]

# Test connection and GDS availability
print("Testing Neo4j connection...")
connection_test = run_query("RETURN 'Connected to Neo4j!' as message")
print(connection_test[0]['message'])

# Check GDS availability
try:
    gds_test = run_query("RETURN gds.version() as version")
    print(f"GDS Version: {gds_test[0]['version']}")
except Exception as e:
    print(f"GDS not available: {e}")
    print("Note: Some examples will use native Cypher instead of GDS")

## Lesson 1: GDS Overview and Data Preparation

In [None]:
# Clear existing data for fresh start
run_query("MATCH (n) DETACH DELETE n")

# Generate sample financial network data
def generate_financial_network(num_customers=100, num_accounts=150, num_transactions=500):
    """Generate sample financial network for fraud detection analysis"""
    
    # Create customers
    customers = []
    for i in range(num_customers):
        customer = {
            'id': f'C{i:03d}',
            'name': fake.name(),
            'email': fake.email(),
            'phone': fake.phone_number(),
            'address': fake.address().replace('\n', ', '),
            'age': random.randint(18, 80),
            'risk_score': random.uniform(0.1, 0.9),
            'is_fraudster': random.random() < 0.05  # 5% fraudsters
        }
        customers.append(customer)
    
    # Create accounts
    accounts = []
    for i in range(num_accounts):
        account = {
            'id': f'A{i:03d}',
            'account_type': random.choice(['checking', 'savings', 'credit']),
            'balance': random.uniform(100, 50000),
            'opened_date': fake.date_between(start_date='-5y', end_date='today'),
            'status': random.choice(['active', 'inactive', 'frozen']),
            'owner_id': random.choice(customers)['id']
        }
        accounts.append(account)
    
    # Create transactions
    transactions = []
    for i in range(num_transactions):
        from_account = random.choice(accounts)
        to_account = random.choice(accounts)
        
        if from_account['id'] != to_account['id']:
            transaction = {
                'id': f'T{i:04d}',
                'amount': random.uniform(10, 10000),
                'timestamp': fake.date_time_between(start_date='-1y', end_date='now'),
                'transaction_type': random.choice(['transfer', 'payment', 'withdrawal']),
                'from_account': from_account['id'],
                'to_account': to_account['id'],
                'is_suspicious': random.random() < 0.1  # 10% suspicious
            }
            transactions.append(transaction)
    
    return customers, accounts, transactions

# Generate data
customers, accounts, transactions = generate_financial_network()
print(f"Generated {len(customers)} customers, {len(accounts)} accounts, {len(transactions)} transactions")

In [None]:
# Load data into Neo4j
def load_financial_data(customers, accounts, transactions):
    """Load financial network data into Neo4j"""
    
    # Create customers
    for customer in customers:
        customer_query = """
        CREATE (c:Customer {
            id: $id,
            name: $name,
            email: $email,
            phone: $phone,
            address: $address,
            age: $age,
            risk_score: $risk_score,
            is_fraudster: $is_fraudster
        })
        """
        run_query(customer_query, customer)
    
    # Create accounts
    for account in accounts:
        account_query = """
        CREATE (a:Account {
            id: $id,
            account_type: $account_type,
            balance: $balance,
            opened_date: date($opened_date),
            status: $status
        })
        """
        account_data = account.copy()
        account_data['opened_date'] = account['opened_date'].strftime('%Y-%m-%d')
        del account_data['owner_id']
        run_query(account_query, account_data)
    
    # Create customer-account relationships
    for account in accounts:
        owner_query = """
        MATCH (c:Customer {id: $customer_id}),
              (a:Account {id: $account_id})
        CREATE (c)-[:OWNS]->(a)
        """
        run_query(owner_query, {
            'customer_id': account['owner_id'],
            'account_id': account['id']
        })
    
    # Create transactions
    for transaction in transactions:
        transaction_query = """
        MATCH (from_acc:Account {id: $from_account}),
              (to_acc:Account {id: $to_account})
        CREATE (from_acc)-[:TRANSFERS {
            id: $id,
            amount: $amount,
            timestamp: datetime($timestamp),
            transaction_type: $transaction_type,
            is_suspicious: $is_suspicious
        }]->(to_acc)
        """
        transaction_data = transaction.copy()
        transaction_data['timestamp'] = transaction['timestamp'].isoformat()
        del transaction_data['from_account']
        del transaction_data['to_account']
        run_query(transaction_query, transaction_data)

load_financial_data(customers, accounts, transactions)
print("Financial network data loaded successfully!")

In [None]:
# Verify data loading
network_stats = run_query("""
MATCH (c:Customer)
OPTIONAL MATCH (c)-[:OWNS]->(a:Account)
OPTIONAL MATCH (a)-[t:TRANSFERS]-()
RETURN 
    count(DISTINCT c) as customers,
    count(DISTINCT a) as accounts,
    count(t) as transactions
""")

print("Network Statistics:")
for key, value in network_stats[0].items():
    print(f"- {key.title()}: {value:,}")

## Lesson 2: Centrality Algorithms for Importance Analysis

In [None]:
# Create graph projection for GDS algorithms
def create_graph_projection():
    """Create a graph projection for GDS algorithms"""
    
    try:
        # Drop existing projection if it exists
        run_query("CALL gds.graph.drop('financial-network', false)")
    except:
        pass
    
    # Create projection
    projection_query = """
    CALL gds.graph.project(
        'financial-network',
        ['Customer', 'Account'],
        {
            OWNS: { orientation: 'UNDIRECTED' },
            TRANSFERS: { 
                orientation: 'UNDIRECTED',
                properties: ['amount']
            }
        }
    )
    """
    
    try:
        result = run_query(projection_query)
        print(f"Graph projection created: {result[0]['nodeCount']} nodes, {result[0]['relationshipCount']} relationships")
        return True
    except Exception as e:
        print(f"Could not create GDS projection: {e}")
        print("Will use native Cypher for centrality calculations")
        return False

gds_available = create_graph_projection()

In [None]:
# Degree Centrality - Find most connected accounts
def calculate_degree_centrality():
    """Calculate degree centrality for accounts"""
    
    if gds_available:
        # Using GDS
        gds_query = """
        CALL gds.degree.stream('financial-network')
        YIELD nodeId, score
        RETURN gds.util.asNode(nodeId).id as node_id,
               labels(gds.util.asNode(nodeId))[0] as node_type,
               score as degree_centrality
        ORDER BY score DESC
        LIMIT 10
        """
        return run_query(gds_query)
    else:
        # Using native Cypher
        cypher_query = """
        MATCH (a:Account)-[r]-()
        WITH a, count(r) as degree
        RETURN a.id as node_id, 
               'Account' as node_type,
               degree as degree_centrality
        ORDER BY degree DESC
        LIMIT 10
        """
        return run_query(cypher_query)

degree_results = calculate_degree_centrality()
print("Top 10 Accounts by Degree Centrality:")
for result in degree_results:
    print(f"- {result['node_id']} ({result['node_type']}): {result['degree_centrality']}")

In [None]:
# Betweenness Centrality - Find accounts that act as bridges
def calculate_betweenness_centrality():
    """Calculate betweenness centrality for key intermediaries"""
    
    if gds_available:
        # Using GDS
        gds_query = """
        CALL gds.betweenness.stream('financial-network')
        YIELD nodeId, score
        WHERE score > 0
        RETURN gds.util.asNode(nodeId).id as node_id,
               labels(gds.util.asNode(nodeId))[0] as node_type,
               score as betweenness_centrality
        ORDER BY score DESC
        LIMIT 10
        """
        return run_query(gds_query)
    else:
        # Simplified version using path-based analysis
        cypher_query = """
        MATCH path = (a1:Account)-[:TRANSFERS*2..3]-(a2:Account)
        WHERE a1 <> a2
        UNWIND nodes(path)[1..-1] as intermediate
        WITH intermediate, count(*) as paths_through
        WHERE intermediate:Account
        RETURN intermediate.id as node_id,
               'Account' as node_type,
               paths_through as betweenness_centrality
        ORDER BY paths_through DESC
        LIMIT 10
        """
        return run_query(cypher_query)

betweenness_results = calculate_betweenness_centrality()
print("\nTop 10 Accounts by Betweenness Centrality:")
for result in betweenness_results:
    print(f"- {result['node_id']} ({result['node_type']}): {result['betweenness_centrality']:.4f}")

In [None]:
# PageRank - Find most influential accounts in the network
def calculate_pagerank():
    """Calculate PageRank for account influence"""
    
    if gds_available:
        # Using GDS
        gds_query = """
        CALL gds.pageRank.stream('financial-network')
        YIELD nodeId, score
        RETURN gds.util.asNode(nodeId).id as node_id,
               labels(gds.util.asNode(nodeId))[0] as node_type,
               score as pagerank_score
        ORDER BY score DESC
        LIMIT 10
        """
        return run_query(gds_query)
    else:
        # Simplified version using transaction volume weighting
        cypher_query = """
        MATCH (a:Account)-[t:TRANSFERS]-(other:Account)
        WITH a, sum(t.amount) as total_volume, count(t) as transaction_count
        WITH a, total_volume * transaction_count as influence_score
        RETURN a.id as node_id,
               'Account' as node_type,
               influence_score as pagerank_score
        ORDER BY influence_score DESC
        LIMIT 10
        """
        return run_query(cypher_query)

pagerank_results = calculate_pagerank()
print("\nTop 10 Accounts by PageRank/Influence:")
for result in pagerank_results:
    print(f"- {result['node_id']} ({result['node_type']}): {result['pagerank_score']:.6f}")

## Lesson 3: Community Detection for Clustering

In [None]:
# Louvain Community Detection
def detect_communities():
    """Detect communities using Louvain algorithm"""
    
    if gds_available:
        # Using GDS Louvain
        gds_query = """
        CALL gds.louvain.stream('financial-network')
        YIELD nodeId, communityId
        RETURN gds.util.asNode(nodeId).id as node_id,
               labels(gds.util.asNode(nodeId))[0] as node_type,
               communityId as community
        ORDER BY communityId, node_id
        """
        return run_query(gds_query)
    else:
        # Simplified community detection using connected components
        cypher_query = """
        MATCH (a:Account)
        OPTIONAL MATCH (a)-[:TRANSFERS*1..2]-(connected:Account)
        WITH a, collect(DISTINCT connected.id) as connected_accounts
        RETURN a.id as node_id,
               'Account' as node_type,
               size(connected_accounts) as community
        ORDER BY community DESC, node_id
        """
        return run_query(cypher_query)

community_results = detect_communities()

# Analyze community structure
community_df = pd.DataFrame(community_results)
community_stats = community_df.groupby('community').size().sort_values(ascending=False)

print("Community Detection Results:")
print(f"Total communities found: {len(community_stats)}")
print("\nTop 5 communities by size:")
for community, size in community_stats.head().items():
    print(f"- Community {community}: {size} nodes")

In [None]:
# Analyze community characteristics
def analyze_community_characteristics():
    """Analyze characteristics of detected communities"""
    
    # Get community information with account details
    if gds_available:
        analysis_query = """
        CALL gds.louvain.stream('financial-network')
        YIELD nodeId, communityId
        WITH gds.util.asNode(nodeId) as node, communityId
        WHERE node:Account
        MATCH (c:Customer)-[:OWNS]->(node)
        OPTIONAL MATCH (node)-[t:TRANSFERS]-()
        WITH communityId, 
             collect(DISTINCT c.risk_score) as risk_scores,
             collect(DISTINCT t.amount) as transaction_amounts,
             count(DISTINCT c) as customer_count,
             count(DISTINCT node) as account_count,
             count(t) as transaction_count
        RETURN communityId,
               customer_count,
               account_count,
               transaction_count,
               reduce(sum = 0.0, x IN risk_scores | sum + x) / size(risk_scores) as avg_risk_score
        ORDER BY customer_count DESC
        LIMIT 10
        """
    else:
        # Simplified analysis
        analysis_query = """
        MATCH (c:Customer)-[:OWNS]->(a:Account)
        OPTIONAL MATCH (a)-[t:TRANSFERS]-()
        WITH a, c, count(t) as transaction_count
        WITH transaction_count / 10 as community_group, 
             collect(c.risk_score) as risk_scores,
             count(DISTINCT c) as customer_count,
             count(DISTINCT a) as account_count,
             sum(transaction_count) as transaction_count
        RETURN community_group as communityId,
               customer_count,
               account_count,
               transaction_count,
               reduce(sum = 0.0, x IN risk_scores | sum + x) / size(risk_scores) as avg_risk_score
        ORDER BY customer_count DESC
        LIMIT 10
        """
    
    return run_query(analysis_query)

community_analysis = analyze_community_characteristics()
print("\nCommunity Risk Analysis:")
for community in community_analysis:
    print(f"Community {community['communityId']}:")
    print(f"  - Customers: {community['customer_count']}")
    print(f"  - Accounts: {community['account_count']}")
    print(f"  - Transactions: {community['transaction_count']}")
    print(f"  - Avg Risk Score: {community['avg_risk_score']:.3f}")
    print()

## Lesson 4: Advanced Analytics for Fraud Detection

In [None]:
# Identify suspicious transaction patterns
def identify_suspicious_patterns():
    """Identify potentially fraudulent transaction patterns"""
    
    # Pattern 1: Rapid successive transactions
    rapid_transactions = run_query("""
    MATCH (a:Account)-[t1:TRANSFERS]->(b:Account)-[t2:TRANSFERS]->(c:Account)
    WHERE duration.between(t1.timestamp, t2.timestamp).minutes < 5
      AND t1.amount > 1000 AND t2.amount > 1000
    RETURN a.id as account1, b.id as account2, c.id as account3,
           t1.amount as amount1, t2.amount as amount2,
           t1.timestamp as time1, t2.timestamp as time2
    ORDER BY t1.timestamp DESC
    LIMIT 10
    """)
    
    # Pattern 2: Circular money flow
    circular_flows = run_query("""
    MATCH path = (a:Account)-[:TRANSFERS*3..4]->(a)
    WHERE length(path) >= 3
    WITH path, reduce(total = 0, r in relationships(path) | total + r.amount) as total_amount
    WHERE total_amount > 5000
    RETURN [n in nodes(path) | n.id] as accounts,
           total_amount,
           length(path) as path_length
    ORDER BY total_amount DESC
    LIMIT 5
    """)
    
    # Pattern 3: High-volume accounts with suspicious activity
    suspicious_accounts = run_query("""
    MATCH (a:Account)-[t:TRANSFERS]-()
    WITH a, count(t) as transaction_count, 
         sum(t.amount) as total_volume,
         sum(CASE WHEN t.is_suspicious THEN 1 ELSE 0 END) as suspicious_count
    WHERE transaction_count > 5 AND suspicious_count > 0
    RETURN a.id as account_id,
           transaction_count,
           total_volume,
           suspicious_count,
           toFloat(suspicious_count) / transaction_count as suspicion_rate
    ORDER BY suspicion_rate DESC, total_volume DESC
    LIMIT 10
    """)
    
    return {
        'rapid_transactions': rapid_transactions,
        'circular_flows': circular_flows,
        'suspicious_accounts': suspicious_accounts
    }

fraud_patterns = identify_suspicious_patterns()

print("Fraud Detection Results:")
print(f"\n1. Rapid Successive Transactions ({len(fraud_patterns['rapid_transactions'])} found):")
for pattern in fraud_patterns['rapid_transactions'][:3]:
    print(f"   {pattern['account1']} -> {pattern['account2']} -> {pattern['account3']}")
    print(f"   Amounts: ${pattern['amount1']:.2f} -> ${pattern['amount2']:.2f}")

print(f"\n2. Circular Money Flows ({len(fraud_patterns['circular_flows'])} found):")
for pattern in fraud_patterns['circular_flows'][:3]:
    accounts_str = ' -> '.join(pattern['accounts'])
    print(f"   {accounts_str} (${pattern['total_amount']:.2f})")

print(f"\n3. High-Risk Accounts ({len(fraud_patterns['suspicious_accounts'])} found):")
for account in fraud_patterns['suspicious_accounts'][:5]:
    print(f"   {account['account_id']}: {account['suspicious_count']}/{account['transaction_count']} suspicious ({account['suspicion_rate']:.2%})")

In [None]:
# Create graph-based features for ML
def create_ml_features():
    """Create graph-based features for machine learning"""
    
    # Extract features for each account
    features_query = """
    MATCH (c:Customer)-[:OWNS]->(a:Account)
    OPTIONAL MATCH (a)-[t:TRANSFERS]-(other:Account)
    OPTIONAL MATCH (a)-[t_out:TRANSFERS]->()
    OPTIONAL MATCH (a)<-[t_in:TRANSFERS]-()
    
    WITH a, c,
         count(DISTINCT other) as connected_accounts,
         count(t) as total_transactions,
         count(t_out) as outgoing_transactions,
         count(t_in) as incoming_transactions,
         sum(t.amount) as total_volume,
         sum(CASE WHEN t.is_suspicious THEN 1 ELSE 0 END) as suspicious_count,
         avg(t.amount) as avg_transaction_amount,
         max(t.amount) as max_transaction_amount
    
    RETURN a.id as account_id,
           c.age as customer_age,
           c.risk_score as customer_risk_score,
           c.is_fraudster as is_fraudster,
           a.balance as account_balance,
           connected_accounts,
           total_transactions,
           outgoing_transactions,
           incoming_transactions,
           coalesce(total_volume, 0) as total_volume,
           coalesce(suspicious_count, 0) as suspicious_count,
           coalesce(avg_transaction_amount, 0) as avg_transaction_amount,
           coalesce(max_transaction_amount, 0) as max_transaction_amount
    ORDER BY account_id
    """
    
    return run_query(features_query)

# Create feature dataset
ml_features = create_ml_features()
features_df = pd.DataFrame(ml_features)

print("ML Feature Dataset Created:")
print(f"Shape: {features_df.shape}")
print("\nFeature columns:")
for col in features_df.columns:
    print(f"- {col}")

print("\nSample features:")
print(features_df.head())

In [None]:
# Train fraud detection model using graph features
def train_fraud_model(features_df):
    """Train a fraud detection model using graph-based features"""
    
    # Prepare features and target
    feature_columns = [
        'customer_age', 'customer_risk_score', 'account_balance',
        'connected_accounts', 'total_transactions', 'outgoing_transactions',
        'incoming_transactions', 'total_volume', 'suspicious_count',
        'avg_transaction_amount', 'max_transaction_amount'
    ]
    
    X = features_df[feature_columns].fillna(0)
    y = features_df['is_fraudster']
    
    # Scale features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    # Train model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_scaled, y)
    
    # Predictions
    y_pred = model.predict(X_scaled)
    
    # Feature importance
    feature_importance = pd.DataFrame({
        'feature': feature_columns,
        'importance': model.feature_importances_
    }).sort_values('importance', ascending=False)
    
    return model, scaler, feature_importance, y_pred

# Train the model
model, scaler, feature_importance, predictions = train_fraud_model(features_df)

print("Fraud Detection Model Training Results:")
print("\nTop 5 Most Important Features:")
for _, row in feature_importance.head().iterrows():
    print(f"- {row['feature']}: {row['importance']:.4f}")

# Model performance
print("\nModel Performance:")
print(classification_report(features_df['is_fraudster'], predictions))

## Hands-on Exercise: Complete Financial Network Analysis

In [None]:
# Comprehensive network analysis dashboard
def create_network_dashboard():
    """Create a comprehensive analysis dashboard"""
    
    # Overall network statistics
    network_overview = run_query("""
    MATCH (c:Customer)-[:OWNS]->(a:Account)
    OPTIONAL MATCH (a)-[t:TRANSFERS]-()
    RETURN 
        count(DISTINCT c) as total_customers,
        count(DISTINCT a) as total_accounts,
        count(t) as total_transactions,
        sum(t.amount) as total_volume,
        sum(CASE WHEN c.is_fraudster THEN 1 ELSE 0 END) as fraudulent_customers,
        sum(CASE WHEN t.is_suspicious THEN 1 ELSE 0 END) as suspicious_transactions
    """)
    
    # Risk distribution
    risk_distribution = run_query("""
    MATCH (c:Customer)
    WITH 
        CASE 
            WHEN c.risk_score < 0.3 THEN 'Low'
            WHEN c.risk_score < 0.7 THEN 'Medium'
            ELSE 'High'
        END as risk_category,
        c.is_fraudster as is_fraudster
    RETURN risk_category, 
           count(*) as total_customers,
           sum(CASE WHEN is_fraudster THEN 1 ELSE 0 END) as fraudulent_customers
    ORDER BY risk_category
    """)
    
    # Transaction patterns by time
    temporal_patterns = run_query("""
    MATCH ()-[t:TRANSFERS]-()
    WITH t.timestamp.hour as hour, 
         t.amount as amount,
         t.is_suspicious as is_suspicious
    RETURN hour,
           count(*) as transaction_count,
           avg(amount) as avg_amount,
           sum(CASE WHEN is_suspicious THEN 1 ELSE 0 END) as suspicious_count
    ORDER BY hour
    """)
    
    return {
        'overview': network_overview[0],
        'risk_distribution': risk_distribution,
        'temporal_patterns': temporal_patterns
    }

dashboard_data = create_network_dashboard()

print("=== FINANCIAL NETWORK ANALYSIS DASHBOARD ===")
print("\n📊 Network Overview:")
overview = dashboard_data['overview']
print(f"• Total Customers: {overview['total_customers']:,}")
print(f"• Total Accounts: {overview['total_accounts']:,}")
print(f"• Total Transactions: {overview['total_transactions']:,}")
print(f"• Total Volume: ${overview['total_volume']:,.2f}")
print(f"• Fraudulent Customers: {overview['fraudulent_customers']} ({overview['fraudulent_customers']/overview['total_customers']:.1%})")
print(f"• Suspicious Transactions: {overview['suspicious_transactions']} ({overview['suspicious_transactions']/overview['total_transactions']:.1%})")

print("\n⚠️ Risk Distribution:")
for risk in dashboard_data['risk_distribution']:
    fraud_rate = risk['fraudulent_customers'] / risk['total_customers'] if risk['total_customers'] > 0 else 0
    print(f"• {risk['risk_category']} Risk: {risk['total_customers']} customers, {risk['fraudulent_customers']} fraudulent ({fraud_rate:.1%})")

print("\n⏰ Peak Activity Hours:")
sorted_hours = sorted(dashboard_data['temporal_patterns'], key=lambda x: x['transaction_count'], reverse=True)
for hour_data in sorted_hours[:5]:
    hour = hour_data['hour']
    count = hour_data['transaction_count']
    suspicious = hour_data['suspicious_count']
    suspicious_rate = suspicious / count if count > 0 else 0
    print(f"• Hour {hour:02d}:00: {count} transactions, {suspicious} suspicious ({suspicious_rate:.1%})")

In [None]:
# Visualize network insights
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Risk distribution
risk_data = dashboard_data['risk_distribution']
risk_categories = [r['risk_category'] for r in risk_data]
risk_counts = [r['total_customers'] for r in risk_data]
fraud_counts = [r['fraudulent_customers'] for r in risk_data]

x = np.arange(len(risk_categories))
width = 0.35

axes[0, 0].bar(x - width/2, risk_counts, width, label='Total Customers', alpha=0.7)
axes[0, 0].bar(x + width/2, fraud_counts, width, label='Fraudulent Customers', alpha=0.7)
axes[0, 0].set_xlabel('Risk Category')
axes[0, 0].set_ylabel('Number of Customers')
axes[0, 0].set_title('Customer Risk Distribution')
axes[0, 0].set_xticks(x)
axes[0, 0].set_xticklabels(risk_categories)
axes[0, 0].legend()

# 2. Feature importance
top_features = feature_importance.head(8)
axes[0, 1].barh(range(len(top_features)), top_features['importance'])
axes[0, 1].set_yticks(range(len(top_features)))
axes[0, 1].set_yticklabels(top_features['feature'])
axes[0, 1].set_xlabel('Feature Importance')
axes[0, 1].set_title('Top Features for Fraud Detection')

# 3. Temporal patterns
temporal_data = dashboard_data['temporal_patterns']
hours = [t['hour'] for t in temporal_data]
tx_counts = [t['transaction_count'] for t in temporal_data]
susp_counts = [t['suspicious_count'] for t in temporal_data]

axes[1, 0].plot(hours, tx_counts, 'b-', label='Total Transactions', marker='o')
axes[1, 0].plot(hours, susp_counts, 'r-', label='Suspicious Transactions', marker='s')
axes[1, 0].set_xlabel('Hour of Day')
axes[1, 0].set_ylabel('Transaction Count')
axes[1, 0].set_title('Transaction Patterns by Time')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# 4. Community size distribution
if 'community_df' in locals():
    community_sizes = community_df.groupby('community').size().values
    axes[1, 1].hist(community_sizes, bins=20, alpha=0.7, edgecolor='black')
    axes[1, 1].set_xlabel('Community Size')
    axes[1, 1].set_ylabel('Number of Communities')
    axes[1, 1].set_title('Community Size Distribution')
    axes[1, 1].grid(True, alpha=0.3)
else:
    axes[1, 1].text(0.5, 0.5, 'Community data\nnot available', 
                   ha='center', va='center', transform=axes[1, 1].transAxes)
    axes[1, 1].set_title('Community Analysis')

plt.tight_layout()
plt.show()

print("\n📈 Visualization Complete - Network analysis insights displayed above")

## Module Summary and Next Steps

In this module, you learned to:
- Apply centrality algorithms to identify key nodes in financial networks
- Use community detection to segment customers and accounts
- Implement fraud detection using graph-based pattern analysis
- Create machine learning features from graph structures
- Build comprehensive network analysis dashboards

### Key Takeaways
- Graph algorithms reveal hidden patterns in connected data
- Centrality measures identify influential nodes and potential risks
- Community detection helps segment networks for targeted analysis
- Graph features significantly improve ML model performance
- Combining multiple algorithms provides comprehensive insights

### Business Applications
- **Fraud Detection**: Identify suspicious transaction patterns and account networks
- **Risk Assessment**: Score customers based on their network position and behavior
- **Customer Segmentation**: Group similar customers for targeted marketing
- **Compliance**: Monitor for regulatory violations and suspicious activities

### Next Module
Module 5: Retrievers - Learn how to implement retrieval patterns for RAG applications using graph databases.

In [None]:
# Cleanup (optional)
# driver.close()
print("\n🎉 Module 4: Graph Analytics completed successfully!")
print("You're now ready to move on to Module 5: Retrievers")