## üß† Benchmarking GCN vs GBT on the Cora Dataset

In this notebook, we compare two very different machine learning approaches on the **Cora citation network**:

- **Gradient Boosted Trees (GBT)** ‚Äî a powerful tabular model that treats each node independently.
- **Graph Convolutional Network (GCN)** ‚Äî a deep learning model that leverages graph structure.

Our goal: **see how important the graph structure is** for classifying academic papers by topic.

In [16]:
import numpy as np

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, roc_auc_score
from sklearn.preprocessing import StandardScaler

import torch
from torch_geometric.datasets import CitationFull
from torch_geometric.nn import GCNConv
import torch.nn.functional as F

## üì¶ Loading the Cora Dataset

The Cora dataset is a classic benchmark in graph machine learning. Each node represents a research paper, and each edge a citation.

- Nodes: 2,708
- Edges: 5,429
- Classes: 7 research topics
- Features: 1,433 (bag-of-words of paper content)

We‚Äôll use the full graph from PyTorch Geometric‚Äôs `CitationFull` dataset.


In [8]:
# Load the dataset
dataset = CitationFull(root='/tmp/Cora', name='Cora')
data = dataset[0]

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
data = data.to(device)

Downloading https://github.com/abojchevski/graph2gauss/raw/master/data/cora.npz
Processing...
Done!


## üå≤ Gradient Boosted Trees as Baseline

We start by treating the problem like a traditional tabular classification task:
- Use only node features (ignore graph edges)
- Train a `GradientBoostingClassifier` from `sklearn`

This gives us a useful **baseline**, showing what we can achieve *without* using the graph structure.


In [9]:
# ----------------------------------
# üå≤ Gradient Boosting Classifier
# ----------------------------------

x = data.x.cpu().numpy()
y = data.y.cpu().numpy()

# Manual train/test split (70/30)
np.random.seed(42)
num_nodes = x.shape[0]
indices = np.random.permutation(num_nodes)
split = int(0.7 * num_nodes)
train_idx, test_idx = indices[:split], indices[split:]

scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)

In [10]:
gb = GradientBoostingClassifier(n_estimators=100)
gb.fit(x_scaled[train_idx], y[train_idx])
y_pred = gb.predict(x_scaled[test_idx])
gb_acc = accuracy_score(y[test_idx], y_pred)
print(f"üå≤ Gradient Boosting Accuracy: {gb_acc:.4f}")

üå≤ Gradient Boosting Accuracy: 0.5428


## üìà GBT Performance

The GBT gives us a baseline performance. It's fast, interpretable, and does not require graph operations.

We'll record metrics like:
- Accuracy
- Precision / Recall / F1
- AUROC (One-vs-Rest)

Let‚Äôs now see if incorporating the graph edges with a GNN improves performance.


In [None]:
gb_f1 = f1_score(y[test_idx], y_pred, average='weighted')
gb_precision = precision_score(y[test_idx], y_pred, average='weighted')
gb_recall = recall_score(y[test_idx], y_pred, average='weighted')
gb_roc_auc = roc_auc_score(
    y[test_idx], gb.predict_proba(x_scaled[test_idx]), multi_class='ovr', average='weighted'
)

In [66]:
print(f"üå≤ Gradient Boosting F1: {gb_f1:.4f}")
print(f"üå≤ Gradient Boosting Precision: {gb_precision:.4f}")
print(f"üå≤ Gradient Boosting Recall: {gb_recall:.4f}")
print(f"üå≤ Gradient Boosting AUROC: {gb_roc_auc:.4f}")

üå≤ Gradient Boosting F1: 0.5419
üå≤ Gradient Boosting Precision: 0.5544
üå≤ Gradient Boosting Recall: 0.5428
üå≤ Gradient Boosting AUROC: 0.9038


## üîó Graph Convolutional Network

Now we implement a **GCN** using `torch_geometric`.

Unlike the GBT, the GCN:
- Uses the **edges** in the graph
- Learns representations by **aggregating features** from neighbors
- Can discover patterns in citation structure

We‚Äôll define a simple 2-layer GCN and train it on the same dataset.

In [11]:
# ----------------------------------
# üß† Graph Convolutional Network
# ----------------------------------

class GCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training, p=0.5)
        x = self.conv2(x, edge_index)
        return x

In [12]:
model = GCN(data.num_node_features, 64, dataset.num_classes).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

# Create masks
train_mask = torch.zeros(data.num_nodes, dtype=torch.bool)
train_mask[train_idx] = True
test_mask = torch.zeros(data.num_nodes, dtype=torch.bool)
test_mask[test_idx] = True

# Training loop
model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data.x, data.edge_index)
    loss = F.cross_entropy(out[train_mask], data.y[train_mask])
    loss.backward()
    optimizer.step()
    if epoch % 20 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

Epoch 0, Loss: 4.2471
Epoch 20, Loss: 1.2284
Epoch 40, Loss: 0.9122
Epoch 60, Loss: 0.7915
Epoch 80, Loss: 0.7258
Epoch 100, Loss: 0.6929
Epoch 120, Loss: 0.6552
Epoch 140, Loss: 0.6353
Epoch 160, Loss: 0.6218
Epoch 180, Loss: 0.6120


## üìä GCN Performance

With just two GCN layers, we‚Äôre able to learn powerful node embeddings from local graph structure.

We‚Äôll evaluate the GCN on:
- Accuracy
- Precision / Recall / F1
- AUROC

Compare these to the GBT to see the benefit of using graph structure.


In [None]:
# Evaluation
model.eval()
out = model(data.x, data.edge_index)
pred = out.argmax(dim=1)
correct = pred[test_mask] == data.y[test_mask]
gcn_acc = int(correct.sum()) / int(test_mask.sum())
print(f"üß† GCN Accuracy: {gcn_acc:.4f}")


üß† GCN Accuracy: 0.7216


In [67]:
l = data.y[test_mask]
scores = F.softmax(out, dim=1)[test_mask]

In [68]:
gcn_f1 = f1_score(l, pred[test_mask], average='weighted')
gcn_precision = precision_score(l, pred[test_mask], average='weighted')
gcn_recall = recall_score(l, pred[test_mask], average='weighted')
gcn_roc_auc = roc_auc_score(l, scores.detach().numpy(), average='weighted', multi_class='ovr')

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [69]:
print(f"üß† GCN F1: {gcn_acc:.4f}")
print(f"üß† GCN Precision: {gcn_precision:.4f}")
print(f"üß† GCN Recall: {gcn_recall:.4f}")
print(f"üß† GCN AUROC: {gcn_roc_auc:.4f}")

üß† GCN F1: 0.7216
üß† GCN Precision: 0.7234
üß† GCN Recall: 0.7216
üß† GCN AUROC: 0.9853


## üßæ Summary and Comparison

| Model | Uses Graph? | Accuracy | F1 Score | AUROC |
|-------|-------------|----------|----------|-------|
| GBT   | ‚ùå No       | 0.543    | 0.542    | 0.903 |
| GCN   | ‚úÖ Yes      | 0.722    |  0722    | 0.985 |

**Key Takeaways:**
- GBT performs reasonably well using features alone.
- GCN improves results by **incorporating graph structure**, capturing the context of each paper.

Graph Neural Networks are especially effective when **relationships matter**.
