# üöÄ Multi-GNN AML Detection - Colab Production Ready

Este notebook implementa **detec√ß√£o de lavagem de dinheiro usando Multi-GNN** seguindo o guia de otimiza√ß√£o completo.

## üìã Checklist Pr√©-Execu√ß√£o

- [ ] Conta Google (para acessar Colab)
- [ ] Kaggle API key (`kaggle.json`) - [Como obter](https://www.kaggle.com/docs/api)
- [ ] GPU habilitada no Colab (Runtime > Change runtime type > GPU)

## üéØ Ordem de Execu√ß√£o (IMPORTANTE!)

1. **Cell 1**: Verifica√ß√£o GPU
2. **Cell 2**: Instala√ß√£o PyTorch Geometric
3. **Cell 3**: Imports + Kaggle Setup
4. **Cell 4**: üß™ Teste Configura√ß√£o Kaggle
5. **Cell 5**: Download Dados
6. **Cell 6**: Configura√ß√£o
7. **Cell 7**: Load Data
8. **Cell 8**: Feature Engineering
9. **Cell 9**: Graph Construction
10. **Cell 10**: Model Definition
11. **Cell 11**: Training Setup
12. **Cell 12**: TREINAMENTO
13. **Cell 13**: Evaluation
14. **Cell 14**: Export

**‚ö†Ô∏è Execute as c√©lulas nesta ordem espec√≠fica!**

In [1]:
# üîç SYSTEM VERIFICATION
print("üîç SYSTEM VERIFICATION")
print("=" * 60)

# Verificar Python
import sys
print(f"üêç Python: {sys.version}")

# Verificar se est√° no Colab
try:
    import google.colab
    print("‚úÖ Running on Google Colab")
except ImportError:
    print("‚ùå Not running on Google Colab")

# Verificar GPU
import torch
if torch.cuda.is_available():
    print("‚úÖ GPU Detected:")
    print(f"   Device: {torch.cuda.get_device_name(0)}")
    print(f"   CUDA Version: {torch.version.cuda}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
else:
    print("‚ùå No GPU detected - This will be very slow!")

print("=" * 60)

üîç SYSTEM VERIFICATION
üêç Python: 3.9.23 | packaged by conda-forge | (main, Jun  4 2025, 17:49:16) [MSC v.1929 64 bit (AMD64)]
‚ùå Not running on Google Colab
‚ùå No GPU detected - This will be very slow!
‚ùå No GPU detected - This will be very slow!


## üîß Instala√ß√£o PyTorch Geometric

Esta c√©lula instala automaticamente o PyTorch Geometric compat√≠vel com a vers√£o do CUDA detectada.

In [None]:
# üöÄ INSTALA√á√ÉO PYTORCH GEOMETRIC
print("üîß PYTORCH GEOMETRIC INSTALLATION")
print("=" * 60)

import subprocess
import sys

def install_pytorch_geometric():
    """Instala PyTorch Geometric compat√≠vel com CUDA."""
    try:
        # Verificar CUDA
        cuda_version = torch.version.cuda
        if cuda_version:
            cuda_short = cuda_version.replace(".", "")[:3]  # e.g., "118" for 11.8
            print(f"üì¶ PyTorch: {torch.__version__}+cu{cuda_short}")
            print(f"üéÆ CUDA: {cuda_version}")

            # URL do wheel
            wheel_url = f"https://data.pyg.org/whl/torch-{torch.__version__}+cu{cuda_short}.html"
            print(f"üåê Wheel URL: {wheel_url}")

            # Instalar depend√™ncias PyG
            print("\nüì• Installing PyG dependencies...")

            packages = [
                "torch-scatter",
                "torch-sparse",
                "torch-cluster",
                "torch-spline-conv",
                "torch-geometric"
            ]

            for package in packages:
                print(f"   üì¶ Installing {package}...")
                cmd = f"pip install {package} -f {wheel_url}"
                result = subprocess.run(cmd, shell=True, capture_output=True, text=True)

                if result.returncode == 0:
                    print(f"   ‚úÖ {package} installed successfully")
                else:
                    print(f"   ‚ùå Failed to install {package}")
                    print(f"      Error: {result.stderr}")
                    return False

            print("\n‚úÖ Installation complete!")
            return True

        else:
            print("‚ùå CUDA not available - installing CPU version")
            result = subprocess.run("pip install torch-geometric", shell=True, capture_output=True, text=True)
            return result.returncode == 0

    except Exception as e:
        print(f"‚ùå Installation failed: {e}")
        return False

# Executar instala√ß√£o
success = install_pytorch_geometric()

if success:
    print("\nüß™ Testing installation...")
    try:
        import torch_geometric
        print(f"‚úÖ PyTorch Geometric: {torch_geometric.__version__}")
        print("‚úÖ CUDA available:", torch.cuda.is_available())
        print("‚úÖ GPU:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "None")
    except ImportError as e:
        print(f"‚ùå Import failed: {e}")
        success = False

if not success:
    print("\nüí° Troubleshooting:")
    print("   1. Restart runtime (Runtime > Restart runtime)")
    print("   2. Run this cell again")
    print("   3. Check CUDA version compatibility")

print("=" * 60)

## üìö Imports + Kaggle Setup

Esta c√©lula importa todas as bibliotecas necess√°rias e configura a API do Kaggle.

In [None]:
# üìö IMPORTING LIBRARIES
print("üìö IMPORTING LIBRARIES")
print("=" * 60)

import os
import sys
import subprocess
import json
import shutil
import pandas as pd
import numpy as np
from pathlib import Path
import torch
import torch.nn.functional as F
from torch_geometric.data import Data, DataLoader
from torch_geometric.nn import GINConv, global_mean_pool
import torch.nn as nn
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, classification_report, confusion_matrix
from datetime import datetime
import requests
from io import StringIO
import zipfile
from urllib.request import urlopen
import warnings
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sns
warnings.filterwarnings('ignore')

print(f"‚úÖ PyTorch: {torch.__version__}")
print(f"‚úÖ PyTorch Geometric: {torch_geometric.__version__}")
print(f"‚úÖ CUDA available: {torch.cuda.is_available()}")
print(f"‚úÖ GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")
print(f"üéØ Using device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

# Configurar device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print("=" * 60)

# üîë KAGGLE API SETUP
print("üîë KAGGLE API SETUP")
print("=" * 60)

def setup_kaggle():
    """Configura a API do Kaggle."""
    try:
        # Criar diret√≥rio .kaggle
        kaggle_dir = Path("/root/.kaggle")
        kaggle_dir.mkdir(exist_ok=True)

        # Verificar se kaggle.json j√° existe
        kaggle_file = kaggle_dir / "kaggle.json"
        if kaggle_file.exists():
            print("‚úÖ kaggle.json already exists")
        else:
            print("üì§ Please upload your kaggle.json file")
            print("   (Get it from: https://www.kaggle.com/settings/account)")

            from google.colab import files
            uploaded = files.upload()

            if 'kaggle.json' in uploaded:
                # Mover arquivo
                shutil.move('kaggle.json', str(kaggle_file))

                # Definir permiss√µes
                kaggle_file.chmod(0o600)
                print("‚úÖ kaggle.json uploaded and configured")
            else:
                print("‚ùå kaggle.json not found in upload")
                return False

        # Testar API
        result = subprocess.run(["kaggle", "competitions", "list", "--csv"],
                              capture_output=True, text=True, timeout=30)

        if result.returncode == 0:
            print("‚úÖ Kaggle API configured!")
            print("‚úÖ Kaggle API import successful")
            return True
        else:
            print("‚ùå Kaggle API test failed")
            print(f"   Error: {result.stderr}")
            return False

    except Exception as e:
        print(f"‚ùå Kaggle setup failed: {e}")
        return False

# Configurar Kaggle
kaggle_success = setup_kaggle()

if kaggle_success:
    print("‚úÖ All imports successful!")
else:
    print("‚ö†Ô∏è  Kaggle setup failed - you can still run with synthetic data")

print("=" * 60)

In [None]:
# üß™ TESTANDO CONFIGURA√á√ÉO DO KAGGLE
print("üß™ TESTANDO CONFIGURA√á√ÉO DO KAGGLE")
print("=" * 60)

import os
from pathlib import Path

# Verificar se estamos no Colab
try:
    import google.colab
    print("‚úÖ Executando no Google Colab")
except ImportError:
    print("‚ùå N√ÉO est√° executando no Google Colab")
    print("   Este notebook foi projetado para o Google Colab")

# Verificar kaggle.json
kaggle_dir = Path("/root/.kaggle")
kaggle_file = kaggle_dir / "kaggle.json"

if kaggle_file.exists():
    print("‚úÖ kaggle.json encontrado!")
    print(f"   Local: {kaggle_file}")
    
    # Verificar permiss√µes
    permissions = oct(kaggle_file.stat().st_mode)[-3:]
    print(f"   Permiss√µes: {permissions}")
    
    if permissions == "600":
        print("‚úÖ Permiss√µes corretas (600)")
    else:
        print(f"‚ö†Ô∏è  Permiss√µes incorretas: {permissions} (deve ser 600)")
else:
    print("‚ùå kaggle.json N√ÉO encontrado!")
    print("   Execute a c√©lula de Imports + Kaggle Setup primeiro!")

# Testar API do Kaggle
print()
print("üîç Testando API do Kaggle...")
try:
    import subprocess
    result = subprocess.run(["kaggle", "competitions", "list", "--csv"], 
                          capture_output=True, text=True, timeout=10)
    if result.returncode == 0:
        print("‚úÖ API do Kaggle funcionando!")
        print("‚úÖ Configura√ß√£o completa - pode prosseguir!")
    else:
        print("‚ùå API do Kaggle falhou")
        print(f"   Erro: {result.stderr.strip()}")
except Exception as e:
    print(f"‚ùå Erro ao testar API: {e}")

print("=" * 60)
print()
print("üéØ Se tudo estiver ‚úÖ, pode prosseguir para a pr√≥xima c√©lula!")

## üì• Download Dados

Esta c√©lula baixa o dataset AML do Kaggle ou gera dados sint√©ticos se o Kaggle falhar.

In [None]:
# üì• DOWNLOADING IBM AML DATASET
print("üì• DOWNLOADING IBM AML DATASET")
print("=" * 60)

# Configurar caminhos
data_dir = Path("/content/aml_data")
raw_dir = data_dir / "raw"
processed_dir = data_dir / "processed"

for dir_path in [data_dir, raw_dir, processed_dir]:
    dir_path.mkdir(exist_ok=True)

print(f"üìÅ Data directory: {data_dir}")
print(f"üìÅ Raw data: {raw_dir}")
print(f"üìÅ Processed data: {processed_dir}")

def download_kaggle_dataset():
    """Download do dataset via Kaggle API."""
    try:
        print("üì• Downloading from Kaggle: ealtman2019/ibm-transactions-for-anti-money-laundering-aml")
        print("   This may take a few minutes...")

        # Comando kaggle
        cmd = "kaggle datasets download ealtman2019/ibm-transactions-for-anti-money-laundering-aml -p /content/aml_data/raw --unzip"
        result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=600)

        if result.returncode == 0:
            print("‚úÖ Download complete!")

            # Listar arquivos baixados
            downloaded_files = list(raw_dir.glob("*.csv"))
            print("üìã Downloaded files:")
            for file in downloaded_files:
                size_mb = file.stat().st_size / (1024 * 1024)
                print(f"   {file.name}: {size_mb:.1f} MB")

            return True
        else:
            print("‚ùå Download failed")
            print(f"   Error: {result.stderr}")
            return False

    except subprocess.TimeoutExpired:
        print("‚è∞ Download timeout")
        return False
    except Exception as e:
        print(f"‚ùå Download error: {e}")
        return False

def generate_synthetic_data(sample_size=None):
    """Gera dados sint√©ticos realistas para AML."""
    print("‚ö†Ô∏è  Using synthetic data (Kaggle download failed)")
    print("   Generating realistic AML transaction data...")

    np.random.seed(42)

    n_transactions = sample_size or 50000
    n_accounts = 2000

    # Gerar dados transacionais realistas
    data = {
        'Timestamp': pd.date_range('2020-01-01', periods=n_transactions, freq='1min'),
        'From Bank': np.random.randint(1, 11, n_transactions),
        'From Account': np.random.randint(100000, 999999, n_transactions),
        'To Bank': np.random.randint(1, 11, n_transactions),
        'To Account': np.random.randint(100000, 999999, n_transactions),
        'Amount Received': np.random.exponential(1000, n_transactions),
        'Receiving Currency': np.random.choice(['USD', 'EUR', 'GBP', 'JPY'], n_transactions),
        'Amount Paid': np.random.exponential(1000, n_transactions),
        'Payment Currency': np.random.choice(['USD', 'EUR', 'GBP', 'JPY'], n_transactions),
        'Payment Format': np.random.choice(['ACH', 'Wire', 'Check', 'Cash'], n_transactions),
        'Is Laundering': np.random.choice([0, 1], n_transactions, p=[0.95, 0.05])
    }

    df = pd.DataFrame(data)

    # Salvar dados
    data_file = raw_dir / "HI-Small_Trans.csv"
    df.to_csv(data_file, index=False)

    print(f"‚úÖ Synthetic data generated: {len(df)} transactions")
    print(f"   File saved: {data_file}")
    print(f"   Laundering transactions: {df['Is Laundering'].sum()}")

    return True

# Tentar download do Kaggle primeiro
if kaggle_success:
    download_success = download_kaggle_dataset()
else:
    download_success = False

# Fallback para dados sint√©ticos
if not download_success:
    generate_synthetic_data(sample_size=50000)

print("=" * 60)

## ‚öôÔ∏è Configura√ß√£o

Esta c√©lula define todos os hiperpar√¢metros e configura√ß√µes do experimento.

In [None]:
# ‚öôÔ∏è CONFIGURATION
print("‚öôÔ∏è CONFIGURATION")
print("=" * 60)

class Config:
    """Configura√ß√µes do experimento Multi-GNN AML."""

    # PARA TESTE R√ÅPIDO (5-10 min total):
    SAMPLE_SIZE = 10000  # Apenas 10k transa√ß√µes
    EPOCHS = 20          # Menos √©pocas

    # PARA PRODU√á√ÉO (30-60 min total):
    # SAMPLE_SIZE = None   # Dataset completo
    # EPOCHS = 100         # Treinamento completo

    # Arquitetura (escolha uma):
    GNN_TYPE = 'GIN'         # ‚≠ê RECOMENDADO - melhor performance
    # GNN_TYPE = 'GAT'       # Interpretabilidade
    # GNN_TYPE = 'GraphSAGE' # Escalabilidade
    # GNN_TYPE = 'GCN'       # Baseline r√°pido

    # Hiperpar√¢metros da rede
    HIDDEN_CHANNELS = 128
    NUM_LAYERS = 3
    DROPOUT = 0.3

    # Treinamento
    LEARNING_RATE = 0.001
    WEIGHT_DECAY = 1e-4
    BATCH_SIZE = 32

    # Early stopping
    PATIENCE = 15
    MIN_DELTA = 0.001

    # Dados
    TEST_SIZE = 0.2
    VAL_SIZE = 0.1
    RANDOM_STATE = 42

# Instanciar configura√ß√£o
config = Config()

print("üìã Configuration:")
print("-" * 40)
print(f"   SAMPLE_SIZE: {config.SAMPLE_SIZE}")
print(f"   TEST_SIZE: {config.TEST_SIZE}")
print(f"   VAL_SIZE: {config.VAL_SIZE}")
print(f"   GNN_TYPE: {config.GNN_TYPE}")
print(f"   HIDDEN_CHANNELS: {config.HIDDEN_CHANNELS}")
print(f"   NUM_LAYERS: {config.NUM_LAYERS}")
print(f"   DROPOUT: {config.DROPOUT}")
print(f"   EPOCHS: {config.EPOCHS}")
print(f"   LEARNING_RATE: {config.LEARNING_RATE}")
print(f"   BATCH_SIZE: {config.BATCH_SIZE}")
print(f"   PATIENCE: {config.PATIENCE}")

print("=" * 60)

## üìä Load Data

Esta c√©lula carrega e faz uma limpeza b√°sica dos dados transacionais.

In [None]:
# üìä DATA LOADING & BASIC CLEANING
print("üìä DATA LOADING & BASIC CLEANING")
print("=" * 60)

# Carregar dados
data_file = raw_dir / "HI-Small_Trans.csv"

if not data_file.exists():
    print(f"‚ùå Data file not found: {data_file}")
    print("   Please run the download cell first!")
else:
    # Carregar dados
    df = pd.read_csv(data_file)

    print(f"üìÑ Loading: {data_file.name}")
    print(f"‚úÖ Loaded {len(df):,} transactions")

    # Mostrar colunas
    print("üìã Columns:")
    for i, col in enumerate(df.columns, 1):
        print(f"   {i:2d}. {col}")

    print(f"üìä Data shape: {df.shape}")

    # Amostrar dados se necess√°rio
    if config.SAMPLE_SIZE and len(df) > config.SAMPLE_SIZE:
        print(f"üìä Sampling {config.SAMPLE_SIZE:,} transactions...")
        df = df.sample(n=config.SAMPLE_SIZE, random_state=config.RANDOM_STATE)
        print(f"‚úÖ Sampled to {len(df):,} transactions")

    # Limpeza b√°sica
    print("üßπ Basic data cleaning...")
    original_size = len(df)

    # Remover valores nulos
    df = df.dropna()
    print(f"   Removed {original_size - len(df)} rows with null values")

    # Remover duplicatas
    original_size = len(df)
    df = df.drop_duplicates()
    print(f"   Removed {original_size - len(df)} duplicate rows")

    # Verificar coluna target
    target_col = 'Is Laundering'
    if target_col in df.columns:
        print(f"üéØ Target column: '{target_col}'")

        # Distribui√ß√£o da classe
        class_counts = df[target_col].value_counts().sort_index()
        print("üìä Class distribution:")
        for class_val, count in class_counts.items():
            percentage = count / len(df) * 100
            print(".1f")

        # Calcular pos_weight para loss function
        pos_weight = (len(df) - df[target_col].sum()) / df[target_col].sum()
        print(".2f")

        # Salvar pos_weight para uso posterior
        config.POS_WEIGHT = pos_weight

    else:
        print(f"‚ùå Target column '{target_col}' not found!")
        print(f"   Available columns: {list(df.columns)}")

    # Salvar dados limpos
    clean_file = processed_dir / "transactions_clean.csv"
    df.to_csv(clean_file, index=False)

    print(f"üíæ Saved cleaned data to: {clean_file}")

    # Mostrar primeiras linhas
    print("\nüîç First few rows:")
    print(df.head())

    # Estat√≠sticas b√°sicas
    print(f"\n‚úÖ Final dataset: {len(df):,} transactions")
    print("=" * 60)

    # Salvar DataFrame global
    global df_clean
    df_clean = df

## üîß Feature Engineering

Esta c√©lula cria features avan√ßadas para detec√ß√£o de AML, incluindo features de rede usando NetworkX.

In [None]:
# üîß FEATURE ENGINEERING
print("üîß FEATURE ENGINEERING")
print("=" * 60)

try:
    # Verificar se dados foram carregados
    if 'df_clean' not in globals():
        print("‚ùå Clean data not found! Run the Load Data cell first.")
    else:
        df = df_clean.copy()
        print(f"Starting with {len(df):,} transactions")

        # 1. Processamento de timestamps
        print("1Ô∏è‚É£ Processing timestamps...")
        df['Timestamp'] = pd.to_datetime(df['Timestamp'])
        df['timestamp_seconds'] = (df['Timestamp'] - df['Timestamp'].min()).dt.total_seconds()
        print("   ‚úÖ Processed timestamp: Timestamp")

        # 2. Features temporais
        print("2Ô∏è‚É£ Creating temporal features...")
        df['hour'] = df['Timestamp'].dt.hour
        df['day_of_week'] = df['Timestamp'].dt.dayofweek
        df['month'] = df['Timestamp'].dt.month
        print("   ‚úÖ Created temporal features: hour, day_of_week, month")

        # 3. Features de transa√ß√£o
        print("3Ô∏è‚É£ Creating transaction features...")
        df['amount_ratio'] = df['Amount Received'] / (df['Amount Paid'] + 1e-6)
        df['amount_diff'] = abs(df['Amount Received'] - df['Amount Paid'])
        df['amount_log'] = np.log1p(df['Amount Paid'])
        print("   ‚úÖ Created transaction features: amount_ratio, amount_diff, amount_log")

        # 4. Encoding categ√≥rico
        print("4Ô∏è‚É£ Encoding categorical variables...")
        categorical_cols = ['Receiving Currency', 'Payment Currency', 'Payment Format']
        label_encoders = {}

        for col in categorical_cols:
            if col in df.columns:
                le = LabelEncoder()
                df[f'{col}_encoded'] = le.fit_transform(df[col])
                label_encoders[col] = le
                print(f"   ‚úÖ Encoded {col} -> {col}_encoded")

        # 5. Features de frequ√™ncia por conta
        print("5Ô∏è‚É£ Creating account frequency features...")
        # Frequ√™ncia hor√°ria por conta de origem
        df['freq_hour'] = df.groupby(['From Account', 'hour']).cumcount()

        # Frequ√™ncia di√°ria por conta de origem
        df['freq_day'] = df.groupby(['From Account', df['Timestamp'].dt.date]).cumcount()

        # N√∫mero total de transa√ß√µes por conta
        from_freq = df['From Account'].value_counts()
        to_freq = df['To Account'].value_counts()

        df['from_account_degree'] = df['From Account'].map(from_freq)
        df['to_account_degree'] = df['To Account'].map(to_freq)
        print("   ‚úÖ Created account frequency features")

        # 6. Features de tempo
        print("6Ô∏è‚É£ Creating temporal sequence features...")
        # Diferen√ßa de tempo entre transa√ß√µes consecutivas por conta
        df = df.sort_values(['From Account', 'Timestamp'])
        df['time_diff'] = df.groupby('From Account')['timestamp_seconds'].diff().fillna(0)
        df['time_diff_log'] = np.log1p(df['time_diff'])
        print("   ‚úÖ Created temporal sequence features")

        # 7. Features de comportamento (m√©dias m√≥veis)
        print("7Ô∏è‚É£ Creating behavioral features...")
        # M√©dia m√≥vel de valores por conta
        df['rolling_mean_amount'] = df.groupby('From Account')['Amount Paid'].rolling(5, min_periods=1).mean().reset_index(0, drop=True)

        # Desvio padr√£o m√≥vel
        df['rolling_std_amount'] = df.groupby('From Account')['Amount Paid'].rolling(5, min_periods=1).std().reset_index(0, drop=True).fillna(0)

        # M√©dia m√≥vel de frequ√™ncia
        df['rolling_mean_freq'] = df.groupby('From Account')['freq_hour'].rolling(5, min_periods=1).mean().reset_index(0, drop=True)
        print("   ‚úÖ Created behavioral features")

        # 8. Features de rede usando NetworkX
        print("8Ô∏è‚É£ Creating network features with NetworkX...")
        try:
            # Criar grafo direcionado
            G = nx.DiGraph()

            # Adicionar n√≥s (contas √∫nicas)
            all_accounts = set(df['From Account'].unique()) | set(df['To Account'].unique())
            G.add_nodes_from(all_accounts)

            # Adicionar arestas (transa√ß√µes)
            edges = list(zip(df['From Account'], df['To Account']))
            G.add_edges_from(edges)

            print(f"   Network: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")

            # Calcular PageRank
            print("   Computing PageRank...")
            pagerank = nx.pagerank(G, alpha=0.85)
            df['pagerank_from'] = df['From Account'].map(pagerank).fillna(0)
            df['pagerank_to'] = df['To Account'].map(pagerank).fillna(0)

            # Calcular Betweenness Centrality (amostra para performance)
            print("   Computing Betweenness Centrality...")
            if G.number_of_nodes() > 1000:
                # Para grafos grandes, calcular apenas para uma amostra
                sample_nodes = list(G.nodes())[:1000]
                betweenness = nx.betweenness_centrality_subset(G, sources=sample_nodes, targets=sample_nodes)
            else:
                betweenness = nx.betweenness_centrality(G)

            df['betweenness_from'] = df['From Account'].map(betweenness).fillna(0)
            df['betweenness_to'] = df['To Account'].map(betweenness).fillna(0)

            # Calcular Clustering Coefficient
            print("   Computing Clustering Coefficient...")
            clustering = nx.clustering(G.to_undirected())
            df['clustering_from'] = df['From Account'].map(clustering).fillna(0)
            df['clustering_to'] = df['To Account'].map(clustering).fillna(0)

            # Calcular Degree Centrality
            print("   Computing Degree Centrality...")
            degree_centrality = nx.degree_centrality(G)
            df['degree_centrality_from'] = df['From Account'].map(degree_centrality).fillna(0)
            df['degree_centrality_to'] = df['To Account'].map(degree_centrality).fillna(0)

            print("   ‚úÖ Network features computed successfully")

        except Exception as e:
            print(f"   ‚ö†Ô∏è  NetworkX features failed: {e}")
            print("   Continuing without network features...")
            # Adicionar features b√°sicas de rede como fallback
            df['pagerank_from'] = 0.0
            df['pagerank_to'] = 0.0
            df['betweenness_from'] = 0.0
            df['betweenness_to'] = 0.0
            df['clustering_from'] = 0.0
            df['clustering_to'] = 0.0
            df['degree_centrality_from'] = df['from_account_degree'] / df['from_account_degree'].max()
            df['degree_centrality_to'] = df['to_account_degree'] / df['to_account_degree'].max()

        # 9. Normaliza√ß√£o
        print("9Ô∏è‚É£ Normalizing numerical features...")
        numeric_cols = [
            'Amount Received', 'Amount Paid', 'timestamp_seconds', 'amount_ratio',
            'amount_diff', 'amount_log', 'time_diff', 'time_diff_log', 'freq_hour', 'freq_day',
            'from_account_degree', 'to_account_degree', 'rolling_mean_amount', 'rolling_std_amount',
            'rolling_mean_freq', 'pagerank_from', 'pagerank_to', 'betweenness_from', 'betweenness_to',
            'clustering_from', 'clustering_to', 'degree_centrality_from', 'degree_centrality_to'
        ]

        # Filtrar colunas que existem
        numeric_cols = [col for col in numeric_cols if col in df.columns]

        scaler = StandardScaler()
        df[numeric_cols] = scaler.fit_transform(df[numeric_cols])
        print(f"   ‚úÖ Normalized {len(numeric_cols)} numerical features")

        # Salvar dados processados
        processed_file = processed_dir / "transactions_processed.csv"
        df.to_csv(processed_file, index=False)

        print(f"‚úÖ Feature engineering complete!")
        print(f"   Total features: {len(df.columns)}")
        print(f"   üíæ Saved processed data to: {processed_file}")

        # Estat√≠sticas finais
        print(f"\nüìä Final dataset: {len(df):,} transactions, {len(df.columns)} features")

        # Salvar objetos globais
        global df_processed, feature_scaler, categorical_encoders
        df_processed = df
        feature_scaler = scaler
        categorical_encoders = label_encoders

        print("=" * 60)

except Exception as e:
    print(f"‚ùå Feature engineering failed: {e}")
    print("=" * 60)

## üèóÔ∏è Graph Construction

Esta c√©lula constr√≥i o grafo para o Multi-GNN usando PyTorch Geometric.

In [None]:
# üèóÔ∏è GRAPH CONSTRUCTION
print("üèóÔ∏è GRAPH CONSTRUCTION")
print("=" * 60)

try:
    if 'df_processed' not in globals():
        print("‚ùå Processed data not found! Run Feature Engineering cell first.")
    else:
        df = df_processed
        print(f"Building graph from {len(df):,} transactions")

        # 1. Criar mapeamento de n√≥s (contas)
        print("1Ô∏è‚É£ Creating account mapping...")
        all_accounts = pd.concat([df['From Account'], df['To Account']]).unique()
        account_to_node = {acc: i for i, acc in enumerate(all_accounts)}
        print(f"   ‚úÖ Mapped {len(account_to_node):,} unique accounts to nodes")

        # 2. Criar arestas direcionadas
        print("2Ô∏è‚É£ Building directed edges...")
        edges_from = df['From Account'].map(account_to_node).values
        edges_to = df['To Account'].map(account_to_node).values
        edge_index = torch.tensor([edges_from, edges_to], dtype=torch.long)
        print(f"   ‚úÖ Created {len(edge_index[0]):,} directed edges")

        # 3. Features dos n√≥s (contas)
        print("3Ô∏è‚É£ Creating node features...")
        node_features = []

        for account in all_accounts:
            # Agregar features por conta
            account_data = df[df['From Account'] == account]
            if len(account_data) == 0:
                account_data = df[df['To Account'] == account]

            if len(account_data) > 0:
                # Agregar estat√≠sticas da conta
                features = [
                    account_data['Amount Paid'].mean(),  # Volume m√©dio
                    account_data['Amount Received'].mean(),  # Recebimento m√©dio
                    len(account_data),  # N√∫mero de transa√ß√µes
                    account_data['Is Laundering'].mean(),  # Risco m√©dio
                    account_data['time_diff'].mean(),  # Tempo m√©dio entre transa√ß√µes
                    account_data['pagerank_from'].iloc[0] if len(account_data) > 0 else 0,  # PageRank
                    account_data['betweenness_from'].iloc[0] if len(account_data) > 0 else 0,  # Betweenness
                    account_data['clustering_from'].iloc[0] if len(account_data) > 0 else 0,  # Clustering
                    account_data['degree_centrality_from'].iloc[0] if len(account_data) > 0 else 0,  # Degree centrality
                ]
            else:
                features = [0, 0, 0, 0, 0, 0, 0, 0, 0]

            node_features.append(features)

        x = torch.tensor(node_features, dtype=torch.float)
        print(f"   ‚úÖ Node features shape: {x.shape}")

        # 4. Labels das arestas (transa√ß√µes)
        y = torch.tensor(df['Is Laundering'].values, dtype=torch.long)
        print(f"   ‚úÖ Edge labels shape: {y.shape}")
        print(f"   Positive edges: {y.sum().item():,} ({y.sum().item()/len(y)*100:.2f}%)")

        # 5. Features das arestas (transa√ß√µes)
        print("5Ô∏è‚É£ Creating edge features...")
        edge_features = []

        for _, row in df.iterrows():
            edge_feat = [
                row['Amount Paid'],
                row['Amount Received'],
                row['amount_ratio'],
                row['amount_diff'],
                row['amount_log'],
                row['time_diff'],
                row['time_diff_log'],
                row['freq_hour'],
                row['freq_day'],
                row['from_account_degree'],
                row['to_account_degree'],
                row['rolling_mean_amount'],
                row['rolling_std_amount'],
                row['rolling_mean_freq'],
                row['pagerank_from'],
                row['pagerank_to'],
                row['betweenness_from'],
                row['betweenness_to'],
                row['clustering_from'],
                row['clustering_to'],
                row['degree_centrality_from'],
                row['degree_centrality_to'],
                # Features codificadas
                row.get('Receiving Currency_encoded', 0),
                row.get('Payment Currency_encoded', 0),
                row.get('Payment Format_encoded', 0),
                # Features temporais
                row['hour'] / 23.0,  # Normalizar
                row['day_of_week'] / 6.0,  # Normalizar
                row['month'] / 12.0,  # Normalizar
            ]
            edge_features.append(edge_feat)

        edge_attr = torch.tensor(edge_features, dtype=torch.float)
        print(f"   ‚úÖ Edge features shape: {edge_attr.shape}")

        # 6. Criar objeto Data do PyTorch Geometric
        print("6Ô∏è‚É£ Building PyG Data object...")
        graph_data = Data(x=x, edge_index=edge_index, edge_attr=edge_attr, y=y)

        print("\nüìä Graph Statistics:")
        print(f"   Nodes: {graph_data.num_nodes:,}")
        print(f"   Edges: {graph_data.num_edges:,}")
        print(f"   Node features: {graph_data.x.shape[1]}")
        print(f"   Edge features: {graph_data.edge_attr.shape[1]}")
        print(f"   Positive edges: {y.sum().item():,} ({y.sum().item()/len(y)*100:.2f}%)")

        # Salvar grafo
        models_dir = Path("/content/models")
        models_dir.mkdir(exist_ok=True)
        graph_file = models_dir / "graph_data.pt"
        torch.save(graph_data, graph_file)
        print(f"üíæ Saved graph to: {graph_file}")

        # Salvar mapeamento de contas
        account_mapping_file = models_dir / "account_mapping.json"
        with open(account_mapping_file, 'w') as f:
            # Converter chaves para string (contas podem ser n√∫meros grandes)
            json.dump({str(k): v for k, v in account_to_node.items()}, f)
        print(f"üíæ Saved account mapping to: {account_mapping_file}")

        print("‚úÖ Graph construction complete!")
        print("=" * 60)

        # Salvar objetos globais
        global pyg_graph_data, node_mapping
        pyg_graph_data = graph_data
        node_mapping = account_to_node

except Exception as e:
    print(f"‚ùå Graph construction failed: {e}")
    print("=" * 60)

## ü§ñ Model Definition

Esta c√©lula define a arquitetura do modelo GNN para classifica√ß√£o de arestas.

In [None]:
# ü§ñ MODEL ARCHITECTURE
print("ü§ñ MODEL ARCHITECTURE")
print("=" * 60)

class EdgeGINModel(torch.nn.Module):
    """Modelo GIN para classifica√ß√£o de arestas (transa√ß√µes) em AML."""

    def __init__(self, num_node_features, num_edge_features, hidden_channels=128, num_classes=2):
        super(EdgeGINModel, self).__init__()

        self.num_node_features = num_node_features
        self.num_edge_features = num_edge_features
        self.hidden_channels = hidden_channels

        # Encoder de arestas
        self.edge_encoder = torch.nn.Sequential(
            torch.nn.Linear(num_edge_features, hidden_channels),
            torch.nn.ReLU(),
            torch.nn.Dropout(config.DROPOUT),
            torch.nn.Linear(hidden_channels, hidden_channels),
            torch.nn.ReLU(),
            torch.nn.Dropout(config.DROPOUT)
        )

        # Camadas GIN para n√≥s
        self.conv1 = GINConv(
            torch.nn.Sequential(
                torch.nn.Linear(num_node_features, hidden_channels),
                torch.nn.ReLU(),
                torch.nn.Linear(hidden_channels, hidden_channels)
            )
        )
        self.bn1 = torch.nn.BatchNorm1d(hidden_channels)

        self.conv2 = GINConv(
            torch.nn.Sequential(
                torch.nn.Linear(hidden_channels, hidden_channels),
                torch.nn.ReLU(),
                torch.nn.Linear(hidden_channels, hidden_channels)
            )
        )
        self.bn2 = torch.nn.BatchNorm1d(hidden_channels)

        if config.NUM_LAYERS >= 3:
            self.conv3 = GINConv(
                torch.nn.Sequential(
                    torch.nn.Linear(hidden_channels, hidden_channels),
                    torch.nn.ReLU(),
                    torch.nn.Linear(hidden_channels, hidden_channels)
                )
            )
            self.bn3 = torch.nn.BatchNorm1d(hidden_channels)

        # Classificador de arestas
        self.edge_classifier = torch.nn.Sequential(
            torch.nn.Linear(hidden_channels * 2 + hidden_channels, hidden_channels),
            torch.nn.ReLU(),
            torch.nn.Dropout(config.DROPOUT),
            torch.nn.Linear(hidden_channels, hidden_channels // 2),
            torch.nn.ReLU(),
            torch.nn.Dropout(config.DROPOUT),
            torch.nn.Linear(hidden_channels // 2, num_classes)
        )

    def forward(self, x, edge_index, edge_attr):
        # Codificar features das arestas
        edge_features = self.edge_encoder(edge_attr)

        # Camadas GIN
        x = self.conv1(x, edge_index)
        x = self.bn1(x)
        x = F.relu(x)
        x = F.dropout(x, p=config.DROPOUT, training=self.training)

        x = self.conv2(x, edge_index)
        x = self.bn2(x)
        x = F.relu(x)
        x = F.dropout(x, p=config.DROPOUT, training=self.training)

        if config.NUM_LAYERS >= 3:
            x = self.conv3(x, edge_index)
            x = self.bn3(x)
            x = F.relu(x)
            x = F.dropout(x, p=config.DROPOUT, training=self.training)

        # Para cada aresta, concatenar features dos n√≥s de origem/destino + features da aresta
        row, col = edge_index
        edge_embeddings = torch.cat([x[row], x[col], edge_features], dim=1)

        # Classificar arestas
        out = self.edge_classifier(edge_embeddings)
        return out

# Instanciar modelo
try:
    if 'pyg_graph_data' not in globals():
        print("‚ùå Graph data not found! Run Graph Construction cell first.")
    else:
        model = EdgeGINModel(
            num_node_features=pyg_graph_data.x.shape[1],
            num_edge_features=pyg_graph_data.edge_attr.shape[1],
            hidden_channels=config.HIDDEN_CHANNELS,
            num_classes=2
        ).to(device)

        # Contar par√¢metros
        total_params = sum(p.numel() for p in model.parameters())
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

        print("‚úÖ Model created successfully!")
        print(f"   Architecture: {config.GNN_TYPE}")
        print(f"   Parameters: {total_params:,} (trainable: {trainable_params:,})")
        print(f"   Node features: {pyg_graph_data.x.shape[1]}")
        print(f"   Edge features: {pyg_graph_data.edge_attr.shape[1]}")
        print(f"   Hidden channels: {config.HIDDEN_CHANNELS}")
        print(f"   Layers: {config.NUM_LAYERS}")
        print(f"   Dropout: {config.DROPOUT}")

        print("\nüìã Model Architecture:")
        print(model)

        # Salvar modelo global
        global gnn_model
        gnn_model = model

        print("=" * 60)

except Exception as e:
    print(f"‚ùå Model creation failed: {e}")
    print("=" * 60)

## ‚öôÔ∏è Training Setup

Esta c√©lula configura otimizador, loss function, scheduler e early stopping.

In [None]:
# ‚öôÔ∏è TRAINING SETUP
print("‚öôÔ∏è TRAINING SETUP")
print("=" * 60)

try:
    if 'gnn_model' not in globals():
        print("‚ùå Model not found! Run Model Definition cell first.")
    else:
        # Dividir dados em treino/val/test
        print("üìä Data split...")

        # Usar train_test_split estratificado
        from sklearn.model_selection import train_test_split

        # √çndices das arestas
        edge_indices = np.arange(len(pyg_graph_data.y))

        # Split estratificado
        train_val_idx, test_idx = train_test_split(
            edge_indices,
            test_size=config.TEST_SIZE,
            stratify=pyg_graph_data.y.numpy(),
            random_state=config.RANDOM_STATE
        )

        train_idx, val_idx = train_test_split(
            train_val_idx,
            test_size=config.VAL_SIZE / (1 - config.TEST_SIZE),
            stratify=pyg_graph_data.y.numpy()[train_val_idx],
            random_state=config.RANDOM_STATE
        )

        print(f"   Train: {len(train_idx):,} edges ({len(train_idx)/len(edge_indices)*100:.1f}%)")
        print(f"   Val:   {len(val_idx):,} edges ({len(val_idx)/len(edge_indices)*100:.1f}%)")
        print(f"   Test:  {len(test_idx):,} edges ({len(test_idx)/len(edge_indices)*100:.1f}%)")

        # Otimizador
        optimizer = torch.optim.AdamW(
            gnn_model.parameters(),
            lr=config.LEARNING_RATE,
            weight_decay=config.WEIGHT_DECAY
        )
        print(f"‚úÖ Optimizer: AdamW (lr={config.LEARNING_RATE}, wd={config.WEIGHT_DECAY})")

        # Loss function com pesos para classe desbalanceada
        pos_weight = torch.tensor([config.POS_WEIGHT], dtype=torch.float).to(device)
        criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
        print(f"‚úÖ Loss: BCEWithLogitsLoss (pos_weight={config.POS_WEIGHT:.2f})")

        # Scheduler
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
            optimizer, mode='max', factor=0.5, patience=10,
            min_lr=1e-6, verbose=True
        )
        print("‚úÖ Scheduler: ReduceLROnPlateau")

        # Early stopping
        class EarlyStopping:
            def __init__(self, patience=15, min_delta=0.001):
                self.patience = patience
                self.min_delta = min_delta
                self.counter = 0
                self.best_score = None
                self.early_stop = False

            def __call__(self, val_score):
                if self.best_score is None:
                    self.best_score = val_score
                elif val_score < self.best_score + self.min_delta:
                    self.counter += 1
                    if self.counter >= self.patience:
                        self.early_stop = True
                else:
                    self.best_score = val_score
                    self.counter = 0

        early_stopping = EarlyStopping(patience=config.PATIENCE, min_delta=config.MIN_DELTA)
        print(f"‚úÖ Early stopping: patience={config.PATIENCE}, min_delta={config.MIN_DELTA}")

        # Salvar objetos globais
        global train_indices, val_indices, test_indices, gnn_optimizer, gnn_criterion, gnn_scheduler, gnn_early_stopping
        train_indices = train_idx
        val_indices = val_idx
        test_indices = test_idx
        gnn_optimizer = optimizer
        gnn_criterion = criterion
        gnn_scheduler = scheduler
        gnn_early_stopping = early_stopping

        print("‚úÖ Training setup complete!")
        print("=" * 60)

except Exception as e:
    print(f"‚ùå Training setup failed: {e}")
    print("=" * 60)

## üöÄ TREINAMENTO

Esta c√©lula executa o treinamento completo do Multi-GNN com monitoramento de m√©tricas.

In [None]:
# üöÄ TRAINING
print("üöÄ TRAINING")
print("=" * 60)

def train_epoch(model, optimizer, criterion, data, train_idx):
    """Treina por uma √©poca."""
    model.train()
    total_loss = 0

    # Forward pass apenas para arestas de treino
    out = model(data.x, data.edge_index, data.edge_attr)

    # Loss apenas para arestas de treino
    loss = criterion(out[train_idx], data.y[train_idx].float().unsqueeze(1))
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    return loss.item()

def evaluate(model, data, idx):
    """Avalia o modelo."""
    model.eval()
    with torch.no_grad():
        out = model(data.x, data.edge_index, data.edge_attr)
        pred = torch.sigmoid(out[idx]).squeeze()
        pred_binary = (pred > 0.5).float()

        labels = data.y[idx].float()

        # M√©tricas
        accuracy = (pred_binary == labels).float().mean().item()

        # F1 Score
        tp = ((pred_binary == 1) & (labels == 1)).sum().item()
        fp = ((pred_binary == 1) & (labels == 0)).sum().item()
        fn = ((pred_binary == 0) & (labels == 1)).sum().item()

        precision = tp / (tp + fp + 1e-6)
        recall = tp / (tp + fn + 1e-6)
        f1 = 2 * (precision * recall) / (precision + recall + 1e-6)

        # AUC
        try:
            from sklearn.metrics import roc_auc_score
            auc = roc_auc_score(labels.cpu().numpy(), pred.cpu().numpy())
        except:
            auc = 0.5

    return accuracy, f1, auc

try:
    if 'gnn_model' not in globals():
        print("‚ùå Model not found! Run Training Setup cell first.")
    else:
        print(f"üéØ Training {config.GNN_TYPE} for {config.EPOCHS} epochs...")

        # Hist√≥rico de treinamento
        history = {
            'epoch': [], 'train_loss': [], 'train_acc': [], 'train_f1': [], 'train_auc': [],
            'val_acc': [], 'val_f1': [], 'val_auc': [], 'lr': []
        }

        best_val_f1 = 0
        best_model_state = None

        print("Epoch | Train Loss | Train Acc | Train F1 | Val Acc | Val F1 | Val AUC | LR")
        print("-" * 80)

        for epoch in range(config.EPOCHS):
            # Treinar
            train_loss = train_epoch(gnn_model, gnn_optimizer, gnn_criterion, pyg_graph_data, train_indices)

            # Avaliar treino
            train_acc, train_f1, train_auc = evaluate(gnn_model, pyg_graph_data, train_indices)

            # Avaliar valida√ß√£o
            val_acc, val_f1, val_auc = evaluate(gnn_model, pyg_graph_data, val_indices)

            # Learning rate atual
            current_lr = gnn_optimizer.param_groups[0]['lr']

            # Salvar melhores pesos
            if val_f1 > best_val_f1:
                best_val_f1 = val_f1
                best_model_state = gnn_model.state_dict().copy()

            # Scheduler step
            gnn_scheduler.step(val_f1)

            # Early stopping
            gnn_early_stopping(val_f1)

            # Logging
            print(f"{epoch+1:5d} | {train_loss:10.4f} | {train_acc:9.3f} | {train_f1:8.3f} | {val_acc:7.3f} | {val_f1:6.3f} | {val_auc:7.3f} | {current_lr:.2e}")

            # Salvar hist√≥rico
            history['epoch'].append(epoch+1)
            history['train_loss'].append(train_loss)
            history['train_acc'].append(train_acc)
            history['train_f1'].append(train_f1)
            history['train_auc'].append(train_auc)
            history['val_acc'].append(val_acc)
            history['val_f1'].append(val_f1)
            history['val_auc'].append(val_auc)
            history['lr'].append(current_lr)

            # Early stopping
            if gnn_early_stopping.early_stop:
                print(f"\n‚ö†Ô∏è  Early stopping triggered at epoch {epoch+1}")
                print(f"   Best epoch: {np.argmax(history['val_f1'])+1} (Val F1: {best_val_f1:.4f})")
                break

        # Carregar melhores pesos
        if best_model_state:
            gnn_model.load_state_dict(best_model_state)

        # Salvar modelo
        models_dir = Path("/content/models")
        model_file = models_dir / f"{config.GNN_TYPE}_best_model.pth"
        torch.save({
            'model_state_dict': gnn_model.state_dict(),
            'config': config.__dict__,
            'history': history,
            'best_val_f1': best_val_f1
        }, model_file)

        print(f"\n‚úÖ Training complete!")
        print(f"   Best validation F1: {best_val_f1:.4f}")
        print(f"   Model saved to: {model_file}")

        # Salvar hist√≥rico global
        global training_history
        training_history = history

        print("=" * 60)

except Exception as e:
    print(f"‚ùå Training failed: {e}")
    print("=" * 60)

## üìä Evaluation

Esta c√©lula avalia o modelo no conjunto de teste e gera m√©tricas finais.

In [None]:
# üìä EVALUATION
print("üìä EVALUATION")
print("=" * 60)

try:
    if 'gnn_model' not in globals():
        print("‚ùå Model not found! Run training first.")
    else:
        # Avaliar no conjunto de teste
        test_acc, test_f1, test_auc = evaluate(gnn_model, pyg_graph_data, test_indices)

        print("‚úÖ Test Set Results:")
        print(f"   Loss: N/A (computed during training)")
        print(f"   Accuracy: {test_acc:.4f}")
        print(f"   F1 Score: {test_f1:.4f}")
        print(f"   ROC-AUC: {test_auc:.4f}")

        # Classification report detalhado
        gnn_model.eval()
        with torch.no_grad():
            out = gnn_model(pyg_graph_data.x, pyg_graph_data.edge_index, pyg_graph_data.edge_attr)
            test_pred = torch.sigmoid(out[test_indices]).squeeze()
            test_pred_binary = (test_pred > 0.5).cpu().numpy()
            test_labels = pyg_graph_data.y[test_indices].cpu().numpy()

        print("üìã Classification Report:")
        print(classification_report(test_labels, test_pred_binary, target_names=['Legitimate', 'Laundering']))

        # Confusion matrix
        cm = confusion_matrix(test_labels, test_pred_binary)
        print("üìä Confusion Matrix:")
        print(f"   {cm}")

        # Salvar resultados
        results_dir = Path("/content/results")
        results_dir.mkdir(exist_ok=True)

        # Salvar m√©tricas
        metrics = {
            'model': config.GNN_TYPE,
            'dataset': 'IBM AML (Kaggle or Synthetic)',
            'num_transactions': len(pyg_graph_data.y),
            'num_nodes': pyg_graph_data.num_nodes,
            'num_edges': pyg_graph_data.num_edges,
            'test_accuracy': test_acc,
            'test_f1': test_f1,
            'test_auc': test_auc,
            'best_val_f1': best_val_f1 if 'best_val_f1' in globals() else 0,
            'training_epochs': len(training_history['epoch']) if 'training_history' in globals() else 0,
            'parameters': sum(p.numel() for p in gnn_model.parameters()),
            'timestamp': datetime.now().isoformat()
        }

        metrics_file = results_dir / f"evaluation_results_{config.GNN_TYPE}.json"
        with open(metrics_file, 'w') as f:
            json.dump(metrics, f, indent=2)

        print(f"\nüíæ Saved evaluation results to: {metrics_file}")
        print("=" * 60)

except Exception as e:
    print(f"‚ùå Evaluation failed: {e}")
    print("=" * 60)

## üì§ Export

Esta c√©lula exporta as predi√ß√µes para benchmark contra XGBoost.

In [None]:
# üì§ EXPORTING PREDICTIONS
print("üì§ EXPORTING PREDICTIONS")
print("=" * 60)

try:
    if 'gnn_model' not in globals():
        print("‚ùå Model not found! Run training first.")
    else:
        # Gerar predi√ß√µes para todas as transa√ß√µes
        gnn_model.eval()
        with torch.no_grad():
            out = gnn_model(pyg_graph_data.x, pyg_graph_data.edge_index, pyg_graph_data.edge_attr)
            all_pred_probs = torch.sigmoid(out).squeeze().cpu().numpy()
            all_labels = pyg_graph_data.y.cpu().numpy()

        # Criar DataFrame de resultados
        results_df = pd.DataFrame({
            'prediction_prob': all_pred_probs,
            'ground_truth': all_labels
        })

        # Adicionar coluna de predi√ß√µes bin√°rias
        results_df['prediction'] = (results_df['prediction_prob'] > 0.5).astype(int)

        # Salvar predi√ß√µes
        output_file = "/content/multi_gnn_predictions.csv"
        results_df.to_csv(output_file, index=False)

        print(f"‚úÖ Exported predictions to: {output_file}")
        print(f"   Total predictions: {len(results_df):,}")

        # Estat√≠sticas das predi√ß√µes
        pred_dist = results_df['prediction'].value_counts().sort_index()
        print("üìä Prediction Distribution:")
        for class_val, count in pred_dist.items():
            class_name = "Laundering" if class_val == 1 else "Legitimate"
            percentage = count / len(results_df) * 100
            print(".1f")

        # Estat√≠sticas por conjunto
        splits = {
            'train': train_indices,
            'val': val_indices,
            'test': test_indices
        }

        print("\nüìà Summary by Split:")
        print("       Count  Avg Prob  Positive Cases")
        print("split                                 ")
        print("-" * 40)

        for split_name, indices in splits.items():
            split_preds = results_df.iloc[indices]
            count = len(split_preds)
            avg_prob = split_preds['prediction_prob'].mean()
            pos_cases = split_preds['prediction'].sum()
            print(f"{split_name:6} {count:8,} {avg_prob:8.3f} {pos_cases:13,}")

        # Salvar hist√≥rico de treinamento
        if 'training_history' in globals():
            history_file = results_dir / f"training_history_{config.GNN_TYPE}.json"
            with open(history_file, 'w') as f:
                json.dump(training_history, f)
            print(f"\nüíæ Saved training history to: {history_file}")

        # Benchmark summary
        benchmark_summary = {
            'model': config.GNN_TYPE,
            'dataset': 'IBM AML',
            'num_transactions': len(results_df),
            'num_nodes': pyg_graph_data.num_nodes,
            'num_edges': pyg_graph_data.num_edges,
            'test_accuracy': test_acc if 'test_acc' in globals() else 0,
            'test_f1': test_f1 if 'test_f1' in globals() else 0,
            'test_auc': test_auc if 'test_auc' in globals() else 0,
            'best_val_f1': best_val_f1 if 'best_val_f1' in globals() else 0,
            'training_epochs': len(training_history['epoch']) if 'training_history' in globals() else 0,
            'parameters': sum(p.numel() for p in gnn_model.parameters()),
            'timestamp': datetime.now().isoformat()
        }

        summary_file = results_dir / f"benchmark_summary_{config.GNN_TYPE}.json"
        with open(summary_file, 'w') as f:
            json.dump(benchmark_summary, f, indent=2)

        print(f"üíæ Saved benchmark summary to: {summary_file}")

        print("\nüéâ PIPELINE COMPLETE!")
        print("=" * 60)
        print("\nüìÅ Output Files:")
        print(f"   1. Predictions: {output_file}")
        print(f"   2. Model: /content/models/{config.GNN_TYPE}_best_model.pth")
        print(f"   3. Summary: {summary_file}")
        print(f"   4. History: {history_file}")
        print("\n‚úÖ Ready for benchmark comparison with XGBoost!")

except Exception as e:
    print(f"‚ùå Export failed: {e}")
    print("=" * 60)

In [None]:
## üì• Download dos Resultados

# Ap√≥s a conclus√£o bem-sucedida, baixe o arquivo de predi√ß√µes para usar no benchmark contra XGBoost.

### **Op√ß√£o 1: Download Manual (recomendado)**

# python
# Execute esta c√©lula para baixar arquivos importantes
from google.colab import files

# Download predi√ß√µes (para benchmark)
files.download('/content/multi_gnn_predictions.csv')

# Download m√©tricas
files.download('/content/results/evaluation_results_GIN.json')

# Download modelo (opcional - arquivo grande)
# files.download('/content/models/GIN_best_model.pth')
#

### **Op√ß√£o 2: Salvar no Google Drive**

# python
# Monte o Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Copie resultados
import shutil
output_dir = '/content/drive/MyDrive/AML_GNN_Results'
!mkdir -p "{output_dir}"

!cp /content/multi_gnn_predictions.csv "{output_dir}/"
!cp /content/results/evaluation_results_GIN.json "{output_dir}/"

print(f"‚úÖ Resultados salvos em: {output_dir}")


##  Download dos Resultados

ApÔøΩs a conclusÔøΩo bem-sucedida, baixe o arquivo de prediÔøΩÔøΩes para usar no benchmark contra XGBoost.

### **OpÔøΩÔøΩo 1: Download Manual (recomendado)**

```python
# Execute esta cÔøΩlula para baixar arquivos importantes
from google.colab import files

# Download prediÔøΩÔøΩes (para benchmark)
files.download('/content/multi_gnn_predictions.csv')

# Download mÔøΩtricas
files.download('/content/results/evaluation_results_GIN.json')

# Download modelo (opcional - arquivo grande)
# files.download('/content/models/GIN_best_model.pth')
```

### **OpÔøΩÔøΩo 2: Salvar no Google Drive**

```python
# Monte o Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Copie resultados
import shutil
output_dir = '/content/drive/MyDrive/AML_GNN_Results'
!mkdir -p "{output_dir}"

!cp /content/multi_gnn_predictions.csv "{output_dir}/"
!cp /content/results/evaluation_results_GIN.json "{output_dir}/"

print(f" Resultados salvos em: {output_dir}")
```