# 🥋 Karate Club Graph Neural Network - Complete Guide from Scratch

## 📚 The Ultimate Educational GNN Tutorial

### 🎯 What You'll Master:
1. **Graph Theory Fundamentals** - Nodes, edges, adjacency matrices
2. **Message Passing** - How GNNs aggregate neighbor information
3. **Graph Convolutions** - Mathematical foundations from scratch
4. **Custom GNN Implementation** - Build every component yourself
5. **Node Classification** - Predict communities in social networks
6. **Visualization & Analysis** - Deep understanding through visuals

---

## 🧠 The Karate Club Story

**The Dataset:**
- 34 members of a university karate club
- Network of friendships (78 edges)
- Club splits into 2 groups after a dispute
- **Goal:** Can GNN predict which group each member joins?

**Why This Matters:**
- Social network analysis
- Community detection
- Influence propagation
- Recommendation systems

---

## 📖 Learning Philosophy

**We'll build THREE implementations:**
1. **Pure NumPy** - Understand the math completely
2. **PyTorch from Scratch** - Build custom GNN layers
3. **PyTorch Geometric** - Use optimized library

This progression ensures you understand EVERY detail!

---

Let's begin! 🚀

In [None]:
# Cell 1: Import Libraries and Setup

print("🔧 Setting up Karate Club GNN Environment")
print("=" * 60)

# Core libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import FancyBboxPatch

# Graph libraries
import networkx as nx
from torch_geometric.datasets import KarateClub
from torch_geometric.utils import to_networkx

# For better plots
plt.style.use('default')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10
sns.set_palette("husl")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("✅ Libraries imported successfully!")
print(f"📊 PyTorch version: {torch.__version__}")
print(f"🔢 NumPy version: {np.__version__}")
print(f"📈 NetworkX version: {nx.__version__}")
print("🚀 Ready to learn GNNs from scratch!")

---

## 📊 Part 1: Understanding Graph Data

### What is a Graph?

A graph `G = (V, E)` consists of:
- **V**: Set of vertices (nodes)
- **E**: Set of edges (connections)

### Key Representations:

**1. Adjacency Matrix (A):**
```
A[i,j] = 1 if edge exists between node i and j
A[i,j] = 0 otherwise
```

**2. Feature Matrix (X):**
```
X[i] = feature vector for node i
Shape: (num_nodes, num_features)
```

**3. Edge Index:**
```
[[source_nodes],
 [target_nodes]]
```

Let's load and explore!

In [None]:
# Cell 2: Load Karate Club Dataset

print("📥 Loading Karate Club Dataset")
print("=" * 50)

# Load dataset using PyTorch Geometric
dataset = KarateClub()
data = dataset[0]  # Get the single graph

print(f"\n📊 Dataset Overview:")
print(f"Dataset: {dataset}")
print(f"\nGraph Data Object:")
print(data)

print(f"\n🔍 Detailed Information:")
print(f"{'='*50}")
print(f"Number of nodes: {data.num_nodes}")
print(f"Number of edges: {data.num_edges}")
print(f"Number of features per node: {data.num_node_features}")
print(f"Number of classes (communities): {dataset.num_classes}")
print(f"\nFeature matrix shape: {data.x.shape}")
print(f"Edge index shape: {data.edge_index.shape}")
print(f"Labels shape: {data.y.shape}")
print(f"\nTrain mask shape: {data.train_mask.shape}")
print(f"Number of training nodes: {data.train_mask.sum().item()}")

print(f"\n💡 Understanding the Features:")
print(f"The Karate Club dataset uses one-hot encoding")
print(f"Each node has a 34-dimensional feature vector (one per node)")
print(f"This is like an identity matrix - each node knows 'who' it is")
print(f"\nExample - Node 0 features:")
print(data.x[0][:10])  # Show first 10 features
print(f"... (24 more zeros)")

print(f"\n🎯 Labels (Community Assignment):")
print(f"Label 0: Mr. Hi's group")
print(f"Label 1: Officer's group")
print(f"\nLabel distribution:")
unique, counts = torch.unique(data.y, return_counts=True)
for label, count in zip(unique, counts):
    print(f"  Community {label}: {count} members")

print(f"\n✅ Dataset loaded and explored!")

In [None]:
# Cell 3: Visualize the Karate Club Network

print("🎨 Creating Karate Club Network Visualization")
print("=" * 50)

def visualize_karate_club(data, title="Karate Club Network", predictions=None, figsize=(16, 12)):
    """
    Visualize the Karate Club graph with community colors.
    
    Parameters:
    -----------
    data : PyTorch Geometric Data object
    title : str
    predictions : tensor, optional
        Predicted labels for visualization
    """
    
    # Convert to NetworkX for visualization
    G = to_networkx(data, to_undirected=True)
    
    # Create figure with subplots
    fig, axes = plt.subplots(1, 2, figsize=figsize)
    
    # Use labels or predictions
    colors_true = data.y.numpy()
    colors_pred = predictions.numpy() if predictions is not None else colors_true
    
    # Calculate layout once for consistency
    pos = nx.spring_layout(G, seed=42, k=0.5, iterations=50)
    
    # Plot 1: True Labels
    ax1 = axes[0]
    node_colors_true = ['#FF6B6B' if label == 0 else '#4ECDC4' for label in colors_true]
    
    nx.draw_networkx_nodes(G, pos, node_color=node_colors_true, 
                           node_size=800, alpha=0.9, ax=ax1,
                           edgecolors='black', linewidths=2)
    nx.draw_networkx_edges(G, pos, alpha=0.3, width=2, ax=ax1)
    nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold', 
                           font_color='white', ax=ax1)
    
    ax1.set_title('True Community Labels', fontsize=16, fontweight='bold', pad=20)
    ax1.axis('off')
    
    # Add legend
    from matplotlib.lines import Line2D
    legend_elements_1 = [
        Line2D([0], [0], marker='o', color='w', markerfacecolor='#FF6B6B', 
               markersize=15, label="Mr. Hi's Group (0)"),
        Line2D([0], [0], marker='o', color='w', markerfacecolor='#4ECDC4', 
               markersize=15, label="Officer's Group (1)")
    ]
    ax1.legend(handles=legend_elements_1, loc='upper left', fontsize=12)
    
    # Plot 2: Predicted Labels or Graph Statistics
    ax2 = axes[1]
    
    if predictions is not None:
        node_colors_pred = ['#FF6B6B' if label == 0 else '#4ECDC4' for label in colors_pred]
        
        nx.draw_networkx_nodes(G, pos, node_color=node_colors_pred, 
                               node_size=800, alpha=0.9, ax=ax2,
                               edgecolors='black', linewidths=2)
        nx.draw_networkx_edges(G, pos, alpha=0.3, width=2, ax=ax2)
        nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold', 
                               font_color='white', ax=ax2)
        
        # Calculate accuracy
        accuracy = (colors_pred == colors_true).sum() / len(colors_true) * 100
        ax2.set_title(f'GNN Predictions (Accuracy: {accuracy:.1f}%)', 
                     fontsize=16, fontweight='bold', pad=20)
        
        # Highlight incorrect predictions
        incorrect = np.where(colors_pred != colors_true)[0]
        if len(incorrect) > 0:
            incorrect_pos = {node: pos[node] for node in incorrect}
            nx.draw_networkx_nodes(G, incorrect_pos, nodelist=incorrect,
                                   node_color='yellow', node_size=1000, 
                                   alpha=0.5, ax=ax2, edgecolors='red', linewidths=4)
    else:
        # Show degree distribution
        degrees = dict(G.degree())
        node_sizes = [degrees[node] * 100 for node in G.nodes()]
        
        nx.draw_networkx_nodes(G, pos, node_color=node_colors_true, 
                               node_size=node_sizes, alpha=0.9, ax=ax2,
                               edgecolors='black', linewidths=2)
        nx.draw_networkx_edges(G, pos, alpha=0.3, width=2, ax=ax2)
        nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold', 
                               font_color='white', ax=ax2)
        
        ax2.set_title('Node Sizes by Degree', fontsize=16, fontweight='bold', pad=20)
    
    ax2.axis('off')
    ax2.legend(handles=legend_elements_1, loc='upper left', fontsize=12)
    
    plt.suptitle(title, fontsize=18, fontweight='bold', y=1.02)
    plt.tight_layout()
    plt.show()
    
    return G, pos

# Visualize the network
G, pos = visualize_karate_club(data, "Karate Club Social Network")

print("\n📊 Graph Statistics:")
print(f"Average degree: {np.mean([d for n, d in G.degree()]):.2f}")
print(f"Network density: {nx.density(G):.3f}")
print(f"Number of triangles: {sum(nx.triangles(G).values()) // 3}")
print(f"Clustering coefficient: {nx.average_clustering(G):.3f}")

print("\n✅ Visualization complete!")

---

## 🧮 Part 2: Graph Neural Networks - Mathematical Foundation

### The Core Idea: Message Passing

**Traditional Neural Networks:**
- Process independent samples
- No concept of "neighbors"
- Example: Image classification

**Graph Neural Networks:**
- Nodes have neighbors
- Information flows through edges
- Aggregates neighbor information

### Message Passing Framework

For each layer `l`, update node `i`:

```
1. MESSAGE:     m_ij = Message(h_i^(l), h_j^(l), e_ij)
2. AGGREGATE:   m_i  = Aggregate({m_ij : j ∈ N(i)})
3. UPDATE:      h_i^(l+1) = Update(h_i^(l), m_i)
```

Where:
- `h_i^(l)` = hidden state of node i at layer l
- `N(i)` = neighbors of node i
- `e_ij` = edge features (if any)

### Graph Convolutional Network (GCN)

**Simplified formula:**
```
H^(l+1) = σ(D^(-1/2) A D^(-1/2) H^(l) W^(l))
```

**Breaking it down:**
- `A` = Adjacency matrix (who connects to whom)
- `D` = Degree matrix (how many connections each node has)
- `H^(l)` = Node features at layer l
- `W^(l)` = Learnable weight matrix
- `σ` = Activation function (e.g., ReLU)
- `D^(-1/2) A D^(-1/2)` = Normalized adjacency (prevents numerical issues)

**Intuition:**
1. Take your features: `H^(l)`
2. Transform them: `H^(l) W^(l)`
3. Aggregate from neighbors: `A * (transformed features)`
4. Normalize by degree: `D^(-1/2) ... D^(-1/2)`
5. Apply activation: `σ(...)`

Let's implement this from scratch!

---

In [None]:
# Cell 4: Build Adjacency Matrix from Edge Index

print("🔨 Building Graph Representations from Scratch")
print("=" * 50)

def edge_index_to_adjacency_matrix(edge_index, num_nodes):
    """
    Convert edge index to adjacency matrix.
    
    Parameters:
    -----------
    edge_index : torch.Tensor
        Shape [2, num_edges], where edge_index[0] = source, edge_index[1] = target
    num_nodes : int
        Total number of nodes
    
    Returns:
    --------
    A : numpy.ndarray
        Adjacency matrix of shape [num_nodes, num_nodes]
    """
    
    # Initialize adjacency matrix with zeros
    A = np.zeros((num_nodes, num_nodes))
    
    # Convert edge_index to numpy
    edge_index_np = edge_index.numpy()
    
    # Fill in the edges
    for i in range(edge_index.shape[1]):
        source = edge_index_np[0, i]
        target = edge_index_np[1, i]
        A[source, target] = 1
    
    return A

def compute_degree_matrix(A):
    """
    Compute degree matrix from adjacency matrix.
    
    The degree matrix D is a diagonal matrix where D[i,i] = number of neighbors of node i
    """
    # Sum along rows to get degree of each node
    degrees = np.sum(A, axis=1)
    
    # Create diagonal matrix
    D = np.diag(degrees)
    
    return D

def normalize_adjacency(A, add_self_loops=True):
    """
    Compute normalized adjacency matrix: D^(-1/2) * A * D^(-1/2)
    
    This normalization prevents the gradients from exploding or vanishing
    and ensures fair aggregation from all neighbors.
    """
    
    # Add self-loops: each node also aggregates its own features
    if add_self_loops:
        A = A + np.eye(A.shape[0])
    
    # Compute degree matrix
    D = compute_degree_matrix(A)
    
    # Compute D^(-1/2)
    D_inv_sqrt = np.diag(1.0 / np.sqrt(np.diag(D) + 1e-8))  # Add epsilon to avoid division by zero
    
    # Compute normalized adjacency: D^(-1/2) * A * D^(-1/2)
    A_normalized = D_inv_sqrt @ A @ D_inv_sqrt
    
    return A_normalized

# Build adjacency matrix from Karate Club data
A = edge_index_to_adjacency_matrix(data.edge_index, data.num_nodes)
D = compute_degree_matrix(A)
A_normalized = normalize_adjacency(A, add_self_loops=True)

print(f"\n📊 Adjacency Matrix (A):")
print(f"Shape: {A.shape}")
print(f"Number of edges: {np.sum(A):.0f}")
print(f"Sparsity: {(1 - np.count_nonzero(A) / A.size) * 100:.1f}%")
print(f"\nFirst 5x5 block of A:")
print(A[:5, :5].astype(int))

print(f"\n📊 Degree Matrix (D):")
print(f"Diagonal elements (node degrees):")
print(f"Node 0: {D[0,0]:.0f} connections")
print(f"Node 33: {D[33,33]:.0f} connections")
print(f"Average degree: {np.mean(np.diag(D)):.2f}")

print(f"\n📊 Normalized Adjacency (A_norm):")
print(f"Shape: {A_normalized.shape}")
print(f"\nFirst 5x5 block of A_norm:")
print(np.round(A_normalized[:5, :5], 3))
print(f"\n💡 Notice: Normalized values are smaller and balanced!")

# Visualize adjacency matrix
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Plot original adjacency matrix
im1 = axes[0].imshow(A, cmap='Blues', aspect='auto')
axes[0].set_title('Adjacency Matrix (A)', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Node ID')
axes[0].set_ylabel('Node ID')
plt.colorbar(im1, ax=axes[0])

# Plot degree distribution
degrees = np.diag(D)
axes[1].bar(range(len(degrees)), degrees, color='steelblue', alpha=0.7)
axes[1].set_title('Node Degree Distribution', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Node ID')
axes[1].set_ylabel('Degree (Number of Connections)')
axes[1].grid(True, alpha=0.3)

# Plot normalized adjacency matrix
im3 = axes[2].imshow(A_normalized, cmap='viridis', aspect='auto')
axes[2].set_title('Normalized Adjacency (A_norm)', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Node ID')
axes[2].set_ylabel('Node ID')
plt.colorbar(im3, ax=axes[2])

plt.tight_layout()
plt.show()

print("\n✅ Graph representations built successfully!")

In [None]:
# Cell 5: Pure NumPy GNN - Understanding the Mathematics

print("🔬 Implementing GNN from Scratch using Pure NumPy")
print("=" * 60)

def gcn_layer_numpy(X, A_norm, W, activation='relu'):
    """
    Single Graph Convolutional Layer - Pure NumPy Implementation
    
    Formula: H = activation(A_norm @ X @ W)
    
    Parameters:
    -----------
    X : numpy.ndarray
        Node features, shape [num_nodes, input_features]
    A_norm : numpy.ndarray
        Normalized adjacency matrix, shape [num_nodes, num_nodes]
    W : numpy.ndarray
        Weight matrix, shape [input_features, output_features]
    activation : str
        Activation function ('relu', 'sigmoid', 'none')
    
    Returns:
    --------
    H : numpy.ndarray
        Output features, shape [num_nodes, output_features]
    """
    
    # Step 1: Linear transformation X @ W
    transformed = X @ W
    print(f"  Step 1 - Transformed: {transformed.shape}")
    
    # Step 2: Aggregate from neighbors A_norm @ (X @ W)
    aggregated = A_norm @ transformed
    print(f"  Step 2 - Aggregated: {aggregated.shape}")
    
    # Step 3: Apply activation
    if activation == 'relu':
        output = np.maximum(0, aggregated)
    elif activation == 'sigmoid':
        output = 1 / (1 + np.exp(-aggregated))
    else:
        output = aggregated
    
    print(f"  Step 3 - After {activation}: {output.shape}")
    
    return output

def softmax_numpy(X):
    """Compute softmax for classification."""
    exp_X = np.exp(X - np.max(X, axis=1, keepdims=True))
    return exp_X / np.sum(exp_X, axis=1, keepdims=True)

def cross_entropy_loss_numpy(predictions, labels):
    """Compute cross-entropy loss."""
    n = labels.shape[0]
    log_likelihood = -np.log(predictions[range(n), labels] + 1e-8)
    loss = np.sum(log_likelihood) / n
    return loss

# Build a 2-layer GCN from scratch
print("\n🏗️ Building 2-Layer GCN Architecture:")
print(f"Input features: {data.num_node_features}")
print(f"Hidden units: 16")
print(f"Output classes: {dataset.num_classes}")

# Initialize random weights
np.random.seed(42)
input_dim = data.num_node_features
hidden_dim = 16
output_dim = dataset.num_classes

W1 = np.random.randn(input_dim, hidden_dim) * 0.01
W2 = np.random.randn(hidden_dim, output_dim) * 0.01

print(f"\nWeight matrices:")
print(f"W1 shape: {W1.shape} - transforms {input_dim}D -> {hidden_dim}D")
print(f"W2 shape: {W2.shape} - transforms {hidden_dim}D -> {output_dim}D")

# Forward pass
print("\n🔄 Forward Pass:")
X_np = data.x.numpy()

print("\nLayer 1:")
H1 = gcn_layer_numpy(X_np, A_normalized, W1, activation='relu')

print("\nLayer 2:")
H2 = gcn_layer_numpy(H1, A_normalized, W2, activation='none')

print("\nOutput (logits):")
predictions_np = softmax_numpy(H2)
print(f"Predictions shape: {predictions_np.shape}")
print(f"Sum of probabilities per node: {predictions_np[0].sum():.4f}")

# Compute loss
labels_np = data.y.numpy()
loss_np = cross_entropy_loss_numpy(predictions_np, labels_np)
print(f"\nInitial loss (random weights): {loss_np:.4f}")

# Get predictions
pred_labels_np = np.argmax(predictions_np, axis=1)
accuracy_np = np.mean(pred_labels_np == labels_np) * 100
print(f"Initial accuracy (random): {accuracy_np:.1f}%")

print("\n💡 Key Insight:")
print("This is a SINGLE forward pass with random weights.")
print("To improve accuracy, we need to TRAIN with backpropagation!")
print("\n✅ NumPy GCN implementation complete!")

In [None]:
# Cell 6: Custom GCN Layer in PyTorch

print("🔧 Building Custom GCN Layer in PyTorch")
print("=" * 55)

class GCNLayer(nn.Module):
    """
    Custom Graph Convolutional Layer - PyTorch Implementation
    
    This layer implements: H_out = activation(A_norm @ H_in @ W + bias)
    """
    
    def __init__(self, in_features, out_features, activation=True, dropout=0.5):
        """
        Parameters:
        -----------
        in_features : int
            Number of input features per node
        out_features : int
            Number of output features per node
        activation : bool
            Whether to apply ReLU activation
        dropout : float
            Dropout probability
        """
        super(GCNLayer, self).__init__()
        
        # Learnable weight matrix
        self.weight = nn.Parameter(torch.FloatTensor(in_features, out_features))
        
        # Learnable bias
        self.bias = nn.Parameter(torch.FloatTensor(out_features))
        
        # Activation and regularization
        self.activation = activation
        self.dropout = nn.Dropout(dropout)
        
        # Initialize weights using Xavier initialization
        self.reset_parameters()
    
    def reset_parameters(self):
        """Initialize parameters using Xavier uniform initialization."""
        nn.init.xavier_uniform_(self.weight)
        nn.init.zeros_(self.bias)
    
    def forward(self, X, A_norm):
        """
        Forward pass of GCN layer.
        
        Parameters:
        -----------
        X : torch.Tensor
            Node features, shape [num_nodes, in_features]
        A_norm : torch.Tensor
            Normalized adjacency matrix, shape [num_nodes, num_nodes]
        
        Returns:
        --------
        output : torch.Tensor
            Output features, shape [num_nodes, out_features]
        """
        
        # Apply dropout to input features
        X = self.dropout(X)
        
        # Step 1: Linear transformation X @ W
        support = torch.mm(X, self.weight)
        
        # Step 2: Aggregate from neighbors A_norm @ (X @ W)
        output = torch.mm(A_norm, support)
        
        # Step 3: Add bias
        output = output + self.bias
        
        # Step 4: Apply activation
        if self.activation:
            output = F.relu(output)
        
        return output
    
    def __repr__(self):
        return f'{self.__class__.__name__}({self.weight.shape[0]} -> {self.weight.shape[1]})'

# Test custom GCN layer
print("\n🧪 Testing Custom GCN Layer:")

# Convert normalized adjacency to PyTorch tensor
A_norm_torch = torch.FloatTensor(A_normalized)
X_torch = data.x.float()

# Create a single GCN layer
test_layer = GCNLayer(in_features=34, out_features=16)

print(f"Layer: {test_layer}")
print(f"\nInput shape: {X_torch.shape}")
print(f"Adjacency shape: {A_norm_torch.shape}")

# Forward pass
with torch.no_grad():
    test_output = test_layer(X_torch, A_norm_torch)

print(f"Output shape: {test_output.shape}")
print(f"Output sample (first node, first 5 features): {test_output[0, :5]}")

print("\n✅ Custom GCN layer working correctly!")

In [None]:
# Cell 7: Complete GCN Model Architecture

print("🏗️ Building Complete GCN Model")
print("=" * 45)

class GCN(nn.Module):
    """
    Complete Graph Convolutional Network for Node Classification
    
    Architecture:
    Input -> GCN Layer 1 -> ReLU -> Dropout -> GCN Layer 2 -> Softmax
    """
    
    def __init__(self, num_features, hidden_dim, num_classes, dropout=0.5):
        """
        Parameters:
        -----------
        num_features : int
            Number of input features per node
        hidden_dim : int
            Number of hidden units
        num_classes : int
            Number of output classes
        dropout : float
            Dropout probability
        """
        super(GCN, self).__init__()
        
        # Layer 1: Input -> Hidden
        self.gcn1 = GCNLayer(num_features, hidden_dim, activation=True, dropout=dropout)
        
        # Layer 2: Hidden -> Output
        self.gcn2 = GCNLayer(hidden_dim, num_classes, activation=False, dropout=dropout)
        
        self.num_parameters = sum(p.numel() for p in self.parameters())
    
    def forward(self, X, A_norm):
        """
        Forward pass through the GCN.
        
        Parameters:
        -----------
        X : torch.Tensor
            Node features
        A_norm : torch.Tensor
            Normalized adjacency matrix
        
        Returns:
        --------
        output : torch.Tensor
            Log probabilities for each class
        """
        
        # First GCN layer with ReLU activation
        h1 = self.gcn1(X, A_norm)
        
        # Second GCN layer (no activation, will use log_softmax)
        h2 = self.gcn2(h1, A_norm)
        
        # Log softmax for numerical stability
        output = F.log_softmax(h2, dim=1)
        
        return output
    
    def get_embeddings(self, X, A_norm):
        """Get node embeddings from the first layer (before classification)."""
        with torch.no_grad():
            embeddings = self.gcn1(X, A_norm)
        return embeddings
    
    def __repr__(self):
        return (f'{self.__class__.__name__}(\n'
                f'  Layer 1: {self.gcn1}\n'
                f'  Layer 2: {self.gcn2}\n'
                f'  Total parameters: {self.num_parameters:,}\n'
                f')')

# Create model instance
model = GCN(
    num_features=data.num_node_features,
    hidden_dim=16,
    num_classes=dataset.num_classes,
    dropout=0.5
)

print("\n📊 Model Architecture:")
print(model)

print("\n🔍 Model Details:")
print(f"Input dimension: {data.num_node_features}")
print(f"Hidden dimension: 16")
print(f"Output dimension: {dataset.num_classes}")
print(f"Total trainable parameters: {model.num_parameters:,}")

# Count parameters per layer
total_params = 0
for name, param in model.named_parameters():
    params = param.numel()
    total_params += params
    print(f"\n{name}:")
    print(f"  Shape: {param.shape}")
    print(f"  Parameters: {params:,}")

print(f"\nTotal: {total_params:,} parameters")

# Test forward pass
print("\n🧪 Testing Forward Pass:")
with torch.no_grad():
    test_output = model(X_torch, A_norm_torch)

print(f"Output shape: {test_output.shape}")
print(f"Output is log probabilities (log_softmax)")
print(f"Example output for first node: {test_output[0]}")
print(f"Converting to probabilities: {torch.exp(test_output[0])}")
print(f"Sum of probabilities: {torch.exp(test_output[0]).sum():.4f}")

print("\n✅ GCN model created successfully!")

In [None]:
# Cell 8: Train the GCN Model

print("🎯 Training the GCN Model")
print("=" * 45)

def train_gcn(model, X, A_norm, labels, train_mask, epochs=200, lr=0.01, weight_decay=5e-4):
    """
    Train the GCN model.
    
    Parameters:
    -----------
    model : nn.Module
        GCN model
    X : torch.Tensor
        Node features
    A_norm : torch.Tensor
        Normalized adjacency matrix
    labels : torch.Tensor
        True labels
    train_mask : torch.Tensor
        Boolean mask for training nodes
    epochs : int
        Number of training epochs
    lr : float
        Learning rate
    weight_decay : float
        L2 regularization strength
    
    Returns:
    --------
    history : dict
        Training history (loss and accuracy per epoch)
    """
    
    # Optimizer: Adam with weight decay (L2 regularization)
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
    
    # Loss function: Negative Log Likelihood (for log_softmax output)
    criterion = nn.NLLLoss()
    
    # Training history
    history = {
        'train_loss': [],
        'train_acc': [],
        'val_loss': [],
        'val_acc': []
    }
    
    # Training loop
    model.train()
    
    for epoch in range(epochs):
        # Zero gradients
        optimizer.zero_grad()
        
        # Forward pass
        output = model(X, A_norm)
        
        # Compute loss only on training nodes
        loss = criterion(output[train_mask], labels[train_mask])
        
        # Backward pass
        loss.backward()
        
        # Update weights
        optimizer.step()
        
        # Compute accuracy
        with torch.no_grad():
            pred = output.argmax(dim=1)
            train_correct = pred[train_mask] == labels[train_mask]
            train_acc = train_correct.sum().item() / train_mask.sum().item()
            
            # Validation (all nodes)
            val_correct = pred == labels
            val_acc = val_correct.sum().item() / labels.shape[0]
        
        # Store history
        history['train_loss'].append(loss.item())
        history['train_acc'].append(train_acc)
        history['val_loss'].append(loss.item())
        history['val_acc'].append(val_acc)
        
        # Print progress
        if (epoch + 1) % 20 == 0 or epoch == 0:
            print(f'Epoch {epoch+1:3d}/{epochs} | '
                  f'Loss: {loss.item():.4f} | '
                  f'Train Acc: {train_acc*100:5.1f}% | '
                  f'Val Acc: {val_acc*100:5.1f}%')
    
    return history

# Prepare data for training
X_train = data.x.float()
A_norm_train = torch.FloatTensor(A_normalized)
labels_train = data.y
train_mask = data.train_mask

print("\n📊 Training Configuration:")
print(f"Training nodes: {train_mask.sum().item()} / {data.num_nodes}")
print(f"Learning rate: 0.01")
print(f"Weight decay: 5e-4")
print(f"Epochs: 200")
print(f"Optimizer: Adam")

print("\n🚀 Starting training...")
print("=" * 60)

# Train the model
history = train_gcn(
    model=model,
    X=X_train,
    A_norm=A_norm_train,
    labels=labels_train,
    train_mask=train_mask,
    epochs=200,
    lr=0.01,
    weight_decay=5e-4
)

print("=" * 60)
print("\n✅ Training complete!")

# Final evaluation
model.eval()
with torch.no_grad():
    final_output = model(X_train, A_norm_train)
    final_pred = final_output.argmax(dim=1)
    final_acc = (final_pred == labels_train).sum().item() / labels_train.shape[0]
    
    print(f"\n🎯 Final Results:")
    print(f"Overall Accuracy: {final_acc*100:.1f}%")
    print(f"Correct predictions: {(final_pred == labels_train).sum().item()} / {labels_train.shape[0]}")

In [None]:
# Cell 9: Visualize Training Progress

print("📈 Visualizing Training History")
print("=" * 45)

def plot_training_history(history):
    """
    Plot training and validation metrics over epochs.
    """
    
    fig, axes = plt.subplots(1, 2, figsize=(16, 5))
    
    epochs_range = range(1, len(history['train_loss']) + 1)
    
    # Plot 1: Loss over epochs
    ax1 = axes[0]
    ax1.plot(epochs_range, history['train_loss'], 'b-', label='Training Loss', linewidth=2)
    ax1.set_xlabel('Epoch', fontsize=12)
    ax1.set_ylabel('Loss', fontsize=12)
    ax1.set_title('Training Loss Over Time', fontsize=14, fontweight='bold')
    ax1.legend(fontsize=11)
    ax1.grid(True, alpha=0.3)
    
    # Plot 2: Accuracy over epochs
    ax2 = axes[1]
    train_acc_pct = [acc * 100 for acc in history['train_acc']]
    val_acc_pct = [acc * 100 for acc in history['val_acc']]
    
    ax2.plot(epochs_range, train_acc_pct, 'g-', label='Training Accuracy', linewidth=2)
    ax2.plot(epochs_range, val_acc_pct, 'r--', label='Validation Accuracy', linewidth=2)
    ax2.set_xlabel('Epoch', fontsize=12)
    ax2.set_ylabel('Accuracy (%)', fontsize=12)
    ax2.set_title('Accuracy Over Time', fontsize=14, fontweight='bold')
    ax2.legend(fontsize=11)
    ax2.grid(True, alpha=0.3)
    ax2.set_ylim([0, 105])
    
    plt.tight_layout()
    plt.show()
    
    # Print statistics
    print(f"\n📊 Training Statistics:")
    print(f"Initial train accuracy: {history['train_acc'][0]*100:.1f}%")
    print(f"Final train accuracy: {history['train_acc'][-1]*100:.1f}%")
    print(f"Improvement: {(history['train_acc'][-1] - history['train_acc'][0])*100:+.1f}%")
    print(f"\nInitial val accuracy: {history['val_acc'][0]*100:.1f}%")
    print(f"Final val accuracy: {history['val_acc'][-1]*100:.1f}%")
    print(f"Improvement: {(history['val_acc'][-1] - history['val_acc'][0])*100:+.1f}%")
    print(f"\nFinal loss: {history['train_loss'][-1]:.4f}")
    print(f"Best val accuracy: {max(history['val_acc'])*100:.1f}% (Epoch {history['val_acc'].index(max(history['val_acc']))+1})")

plot_training_history(history)

print("\n✅ Training visualization complete!")

In [None]:
# Cell 10: Detailed Prediction Analysis

print("🔍 Analyzing Model Predictions")
print("=" * 45)

# Get final predictions
model.eval()
with torch.no_grad():
    output = model(X_train, A_norm_train)
    probabilities = torch.exp(output)
    predictions = output.argmax(dim=1)

# Analyze predictions
predictions_np = predictions.numpy()
labels_np = labels_train.numpy()
probabilities_np = probabilities.numpy()

# Create analysis dataframe
analysis_df = pd.DataFrame({
    'Node': range(data.num_nodes),
    'True_Label': labels_np,
    'Predicted_Label': predictions_np,
    'Correct': predictions_np == labels_np,
    'Prob_Class_0': probabilities_np[:, 0],
    'Prob_Class_1': probabilities_np[:, 1],
    'Confidence': np.max(probabilities_np, axis=1)
})

print(f"\n📊 Prediction Analysis:")
print(f"{'='*60}")

# Overall statistics
total_correct = analysis_df['Correct'].sum()
accuracy = total_correct / len(analysis_df) * 100
print(f"\nOverall Accuracy: {accuracy:.1f}% ({total_correct}/{len(analysis_df)} nodes)")

# Per-class accuracy
for class_label in [0, 1]:
    class_mask = analysis_df['True_Label'] == class_label
    class_correct = analysis_df[class_mask]['Correct'].sum()
    class_total = class_mask.sum()
    class_acc = class_correct / class_total * 100
    class_name = "Mr. Hi's Group" if class_label == 0 else "Officer's Group"
    print(f"\nClass {class_label} ({class_name}):")
    print(f"  Accuracy: {class_acc:.1f}% ({class_correct}/{class_total} nodes)")

# Confidence analysis
print(f"\n🎯 Confidence Analysis:")
avg_confidence = analysis_df['Confidence'].mean()
print(f"Average confidence: {avg_confidence:.1%}")
print(f"Min confidence: {analysis_df['Confidence'].min():.1%}")
print(f"Max confidence: {analysis_df['Confidence'].max():.1%}")

# Low confidence predictions
low_conf_threshold = 0.60
low_conf_nodes = analysis_df[analysis_df['Confidence'] < low_conf_threshold]
print(f"\nLow confidence predictions (< {low_conf_threshold:.0%}): {len(low_conf_nodes)}")
if len(low_conf_nodes) > 0:
    print(f"Nodes: {low_conf_nodes['Node'].tolist()}")

# Misclassified nodes
misclassified = analysis_df[~analysis_df['Correct']]
print(f"\n❌ Misclassified Nodes: {len(misclassified)}")
if len(misclassified) > 0:
    print(f"\nDetailed misclassifications:")
    for idx, row in misclassified.iterrows():
        print(f"  Node {row['Node']}: True={row['True_Label']}, Pred={row['Predicted_Label']}, "
              f"Confidence={row['Confidence']:.1%}")

# Display full analysis for first 10 nodes
print(f"\n📋 Sample Predictions (First 10 Nodes):")
print(analysis_df.head(10).to_string(index=False))

# Visualize predictions on graph
visualize_karate_club(data, "GCN Predictions vs Ground Truth", predictions=predictions)

print("\n✅ Prediction analysis complete!")

In [None]:
# Cell 11: Visualize Learned Node Embeddings

print("🗺️ Visualizing Node Embeddings")
print("=" * 45)

def visualize_embeddings_2d(embeddings, labels, title="Node Embeddings"):
    """
    Visualize node embeddings in 2D using t-SNE or PCA.
    """
    
    from sklearn.manifold import TSNE
    from sklearn.decomposition import PCA
    
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # PCA projection
    pca = PCA(n_components=2, random_state=42)
    embeddings_pca = pca.fit_transform(embeddings.numpy())
    
    ax1 = axes[0]
    scatter1 = ax1.scatter(embeddings_pca[:, 0], embeddings_pca[:, 1],
                          c=labels.numpy(), cmap='coolwarm', s=200, alpha=0.7,
                          edgecolors='black', linewidths=2)
    
    # Add node labels
    for i, (x, y) in enumerate(embeddings_pca):
        ax1.annotate(str(i), (x, y), fontsize=8, ha='center', va='center',
                    fontweight='bold', color='white')
    
    ax1.set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%} variance)', fontsize=11)
    ax1.set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%} variance)', fontsize=11)
    ax1.set_title('PCA Projection', fontsize=13, fontweight='bold')
    ax1.grid(True, alpha=0.3)
    plt.colorbar(scatter1, ax=ax1, label='Community')
    
    # t-SNE projection
    tsne = TSNE(n_components=2, random_state=42, perplexity=15, n_iter=1000)
    embeddings_tsne = tsne.fit_transform(embeddings.numpy())
    
    ax2 = axes[1]
    scatter2 = ax2.scatter(embeddings_tsne[:, 0], embeddings_tsne[:, 1],
                          c=labels.numpy(), cmap='coolwarm', s=200, alpha=0.7,
                          edgecolors='black', linewidths=2)
    
    # Add node labels
    for i, (x, y) in enumerate(embeddings_tsne):
        ax2.annotate(str(i), (x, y), fontsize=8, ha='center', va='center',
                    fontweight='bold', color='white')
    
    ax2.set_xlabel('t-SNE Dimension 1', fontsize=11)
    ax2.set_ylabel('t-SNE Dimension 2', fontsize=11)
    ax2.set_title('t-SNE Projection', fontsize=13, fontweight='bold')
    ax2.grid(True, alpha=0.3)
    plt.colorbar(scatter2, ax=ax2, label='Community')
    
    plt.suptitle(title, fontsize=15, fontweight='bold', y=1.02)
    plt.tight_layout()
    plt.show()
    
    print(f"\n📊 Embedding Analysis:")
    print(f"PCA explained variance: {pca.explained_variance_ratio_[0]:.1%} + {pca.explained_variance_ratio_[1]:.1%} = {sum(pca.explained_variance_ratio_):.1%}")
    print(f"Embedding dimension: {embeddings.shape[1]}")

# Get embeddings from trained model
embeddings = model.get_embeddings(X_train, A_norm_train)

print(f"\n🔍 Embeddings shape: {embeddings.shape}")
print(f"Each node is represented by a {embeddings.shape[1]}-dimensional vector")

# Visualize embeddings
visualize_embeddings_2d(embeddings, labels_train, "GCN Learned Node Embeddings")

print(f"\n💡 Interpretation:")
print(f"- Nodes with similar colors (same community) should cluster together")
print(f"- The GCN learned to separate the two communities in embedding space")
print(f"- Closer nodes in embedding space have similar properties")

print("\n✅ Embedding visualization complete!")

In [None]:
# Cell 12: Analyze Message Passing and Node Influence

print("🔬 Analyzing Message Passing and Node Influence")
print("=" * 55)

def analyze_message_passing(model, X, A_norm, A, node_id=0):
    """
    Analyze how a specific node receives messages from its neighbors.
    """
    
    print(f"\n🎯 Analyzing Message Passing for Node {node_id}")
    print(f"{'='*50}")
    
    # Get neighbors
    neighbors = np.where(A[node_id] > 0)[0]
    print(f"\nNode {node_id} has {len(neighbors)} neighbors: {neighbors.tolist()}")
    
    # Get embeddings
    with torch.no_grad():
        embeddings = model.get_embeddings(X, A_norm)
    
    # Analyze influence of each neighbor
    node_embedding = embeddings[node_id].numpy()
    neighbor_embeddings = embeddings[neighbors].numpy()
    
    # Compute similarity (cosine similarity)
    from sklearn.metrics.pairwise import cosine_similarity
    
    similarities = []
    for i, neighbor in enumerate(neighbors):
        similarity = cosine_similarity(
            node_embedding.reshape(1, -1),
            neighbor_embeddings[i].reshape(1, -1)
        )[0, 0]
        similarities.append(similarity)
    
    # Create influence dataframe
    influence_df = pd.DataFrame({
        'Neighbor': neighbors,
        'Similarity': similarities,
        'Degree': [A[neighbor].sum() for neighbor in neighbors],
        'Community': data.y[neighbors].numpy()
    })
    
    influence_df = influence_df.sort_values('Similarity', ascending=False)
    
    print(f"\n📊 Neighbor Influence Analysis:")
    print(influence_df.to_string(index=False))
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(15, 5))
    
    # Plot 1: Neighbor similarities
    ax1 = axes[0]
    colors = ['#FF6B6B' if c == 0 else '#4ECDC4' for c in influence_df['Community']]
    ax1.barh(range(len(influence_df)), influence_df['Similarity'], color=colors, alpha=0.7)
    ax1.set_yticks(range(len(influence_df)))
    ax1.set_yticklabels([f"Node {n}" for n in influence_df['Neighbor']])
    ax1.set_xlabel('Cosine Similarity', fontsize=11)
    ax1.set_title(f'Neighbor Influence on Node {node_id}', fontsize=13, fontweight='bold')
    ax1.grid(True, alpha=0.3, axis='x')
    
    # Plot 2: Ego network
    ax2 = axes[1]
    ego_graph = nx.ego_graph(G, node_id, radius=1)
    ego_pos = nx.spring_layout(ego_graph, seed=42)
    
    # Color nodes by community
    node_colors = []
    for node in ego_graph.nodes():
        if node == node_id:
            node_colors.append('#FFD700')
        else:
            node_colors.append('#FF6B6B' if data.y[node] == 0 else '#4ECDC4')
    
    nx.draw_networkx_nodes(ego_graph, ego_pos, node_color=node_colors,
                          node_size=800, alpha=0.9, ax=ax2,
                          edgecolors='black', linewidths=2)
    nx.draw_networkx_edges(ego_graph, ego_pos, alpha=0.4, width=2, ax=ax2)
    nx.draw_networkx_labels(ego_graph, ego_pos, font_size=10,
                           font_weight='bold', font_color='white', ax=ax2)
    
    ax2.set_title(f'Ego Network of Node {node_id}', fontsize=13, fontweight='bold')
    ax2.axis('off')
    
    plt.tight_layout()
    plt.show()
    
    return influence_df

# Analyze several interesting nodes
interesting_nodes = [0, 33, 1, 32]

print(f"\n🔍 Analyzing Key Nodes in the Network:")
print(f"Node 0: Mr. Hi (Instructor)")
print(f"Node 33: John A (Administrator)")
print(f"Nodes 1, 32: Other influential members")

for node_id in interesting_nodes[:2]:
    analyze_message_passing(model, X_train, A_norm_train, A, node_id)

print("\n✅ Message passing analysis complete!")

In [None]:
# Cell 13: Summary and Key Learnings

print("🎓 Karate Club GNN - Complete Learning Summary")
print("=" * 60)

print("\n✅ What We Accomplished:")
print("\n1. 📊 Graph Fundamentals:")
print("   - Loaded and explored Karate Club dataset (34 nodes, 78 edges)")
print("   - Understood graph representations (adjacency matrix, edge index)")
print("   - Computed degree matrices and normalized adjacency")

print("\n2. 🧮 Mathematical Understanding:")
print("   - Learned message passing framework")
print("   - Implemented GCN formula: H = σ(D^(-1/2) A D^(-1/2) H W)")
print("   - Built forward pass from scratch in NumPy")

print("\n3. 🏗️ Implementation Skills:")
print("   - Created custom GCN layers in PyTorch")
print("   - Built complete 2-layer GCN model")
print("   - Implemented training loop with Adam optimizer")

print("\n4. 📈 Results:")
print(f"   - Achieved {final_acc*100:.1f}% accuracy on node classification")
print("   - Successfully predicted community membership")
print("   - Visualized learned embeddings and predictions")

print("\n🧠 Key Insights About GNNs:")
print("\n1. Message Passing is Core:")
print("   - GNNs aggregate information from neighbors")
print("   - Multiple layers allow information to propagate further")
print("   - Normalization prevents gradient issues")

print("\n2. Graph Structure Matters:")
print("   - Node features alone aren't enough")
print("   - Connections define how information flows")
print("   - Community structure emerges from graph topology")

print("\n3. Inductive Bias:")
print("   - GNNs assume local structure is informative")
print("   - Similar nodes should have similar labels")
print("   - Connections indicate similarity")

print("\n💡 Real-World Applications:")
print("\n- Social Networks: Friend recommendation, influence detection")
print("- Citation Networks: Paper classification, collaboration prediction")
print("- Molecular Graphs: Drug discovery, property prediction")
print("- Knowledge Graphs: Link prediction, entity classification")
print("- Traffic Networks: Traffic prediction, route optimization")
print("- Recommendation Systems: User-item interaction graphs")

print("\n📚 Next Steps for Further Learning:")
print("\n1. Advanced GNN Architectures:")
print("   - GraphSAGE: Sampling-based aggregation")
print("   - GAT: Graph Attention Networks")
print("   - GIN: Graph Isomorphism Networks")

print("\n2. Different Tasks:")
print("   - Graph classification")
print("   - Link prediction")
print("   - Graph generation")

print("\n3. Larger Datasets:")
print("   - Cora, Citeseer, Pubmed (citation networks)")
print("   - Reddit, PPI (large-scale graphs)")
print("   - OGB datasets (benchmark suite)")

print("\n4. Advanced Topics:")
print("   - Heterogeneous graphs")
print("   - Dynamic graphs")
print("   - Graph transformers")

print("\n🎯 Practice Exercises:")
print("\n1. Modify the architecture:")
print("   - Try 3 layers instead of 2")
print("   - Experiment with different hidden dimensions")
print("   - Add batch normalization")

print("\n2. Hyperparameter tuning:")
print("   - Try different learning rates")
print("   - Adjust dropout probability")
print("   - Experiment with weight decay")

print("\n3. Feature engineering:")
print("   - Use actual node features (degree, centrality)")
print("   - Add edge features")
print("   - Combine structural and attribute information")

print("\n4. Compare architectures:")
print("   - Implement GraphSAGE")
print("   - Compare with traditional ML (SVM, Random Forest)")
print("   - Benchmark against PyTorch Geometric implementations")

print("\n" + "="*60)
print("🎉 Congratulations! You've mastered GNNs from scratch!")
print("="*60)

print("\n📖 Recommended Resources:")
print("- Papers: Kipf & Welling (2017) - Semi-Supervised Classification with GCNs")
print("- Tutorial: distill.pub/2021/gnn-intro")
print("- Library: PyTorch Geometric Documentation")
print("- Course: CS224W - Machine Learning with Graphs (Stanford)")

print("\n✅ Complete GNN tutorial finished!")