<a href="https://colab.research.google.com/github/moosunny/Graph-Neural-Networks-Practice/blob/main/Chpater_5_Including_Node_Features_with_Vanila_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Chpater 5. Including Node Features with Vanilla Neural Networks

이전 Chapter에서는 단순 노드간 연결 관계를 나타내는 topology(네트워크 요소) 만을 취급했지만 그래프 데이터셋에서의 노드와 엣지는 점수, 색상, 단어 등을 나타내는 Feature representing을 가능케 한다.

노드와 엣지 feature는 정형 데이터와도 동일한 구조를 갖기 때문에, 신경망 구조를 표현할 수 있다. 이번 Chapter에서는 Cora와 Facebook 데이터를 통해 Pytorch 기반의 Vanilla Nueral Network 작동하는 방식을 학습한다.

이전에 활용한 Zachary 카라테 클럽 데이터셋과 가르게 Cora와 Face book 데이터셋에는 새로운 유형의 노드 정보가 담겨있다. 해당 두 데이터 셋의 경우 사용자의 나이, 성별, 관심사와 같은 노드에 대한 정보가 있다.

따라서, 이번에는 간단한 그래프 신경망 모델을 통해 Node Classification 문제를 해결한다.

In [None]:
# torch geometric CUDA 설치를 위하 pytorch 버전 확인
import torch

torch.__version__

'2.5.1+cu124'

In [None]:
!pip install torch_geometric



In [None]:
# Colab 환경에 맞는 pytorch geometric cuda 버전 설치
!pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.5.0+cu124.html

Looking in links: https://data.pyg.org/whl/torch-2.5.0+cu124.html


In [None]:
# Metric 정의
def accuracy(y_pred, y_true):
  return torch.sum(y_pred == y_true) / len(y_true)

In [None]:
from torch.nn import Linear
import torch.nn.functional as F

# Multi Layer Perceptron 모델
class MLP(torch.nn.Module):
  def __init__(self, dim_in, dim_h, dim_out):
    super().__init__()
    self.linear1 = Linear(dim_in, dim_h)
    self.linear2 = Linear(dim_h, dim_out)

  def forward(self, x):
    x = F.relu(self.linear1(x))
    x = self.linear2(x)

    return F.log_softmax(x, dim=1)

  def fit(self, data, epochs):
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(self.parameters(), lr=0.01, weight_decay=5e-4)

    self.train()
    for epoch in range(epochs + 1):
      optimizer.zero_grad()
      out = self(data.x)
      loss = criterion(out[data.train_mask], data.y[data.train_mask])
      acc = accuracy(out[data.train_mask]. argmax(dim=1), data.y[data.train_mask])
      loss.backward()
      optimizer.step()
      if epoch % 20 == 0:
        val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
        val_acc = accuracy(out[data.val_mask]. argmax(dim=1), data.y[data.val_mask])
        print(f'Epoch {epoch:>3} | Train Loss: {loss:.3f} | Train Acc: {acc*100:>5.2f}% | Val Loss: {val_loss:.2f} | Val Acc: {val_acc*100:.2f}%')

  def test(self, data):
    self.eval()
    out = self(data.x)
    acc = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])

    return acc

In [None]:
import torch_geometric
from torch_geometric.datasets import Planetoid

# Cora 데이터셋
dataset = Planetoid(root=".", name="Cora")
data = dataset[0]

print(f'Dataset: {dataset}')
print('---------------')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of nodes: {data.x.shape[0]}')

Dataset: Cora()
---------------
Number of graphs: 1
Number of nodes: 2708


In [None]:
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

Number of features: 1433
Number of classes: 7


In [None]:
print(f'Graph:')
print('------')
print(f'Edges are directed: {data.is_directed()}')
print(f'Graph has isolated nodes: {data.has_isolated_nodes()}')
print(f'Graph has loops: {data.has_self_loops()}')

Graph:
------
Edges are directed: False
Graph has isolated nodes: False
Graph has loops: False


In [None]:
import pandas as pd

# input data, 노드 별 feature를 담은 데이터셋 -> 이후 인접행렬과 곱해짐
df_x = pd.DataFrame(data.x.numpy())
df_x['label'] = pd.DataFrame(data.y)

df_x

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1424,1425,1426,1427,1428,1429,1430,1431,1432,label
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2703,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2704,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2705,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3
2706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3


In [None]:
mlp = MLP(dataset.num_features, 16, dataset.num_classes)
print(mlp)

MLP(
  (linear1): Linear(in_features=1433, out_features=16, bias=True)
  (linear2): Linear(in_features=16, out_features=7, bias=True)
)


In [None]:
mlp.fit(data, epochs=100)

Epoch   0 | Train Loss: 1.955 | Train Acc: 15.00% | Val Loss: 1.99 | Val Acc: 12.20%
Epoch  20 | Train Loss: 0.113 | Train Acc: 100.00% | Val Loss: 1.46 | Val Acc: 53.00%
Epoch  40 | Train Loss: 0.012 | Train Acc: 100.00% | Val Loss: 1.57 | Val Acc: 50.80%
Epoch  60 | Train Loss: 0.007 | Train Acc: 100.00% | Val Loss: 1.62 | Val Acc: 48.00%
Epoch  80 | Train Loss: 0.008 | Train Acc: 100.00% | Val Loss: 1.52 | Val Acc: 49.00%
Epoch 100 | Train Loss: 0.009 | Train Acc: 100.00% | Val Loss: 1.45 | Val Acc: 50.60%


In [None]:
acc = mlp.test(data)
print(f'MLP test accuracy: {acc*100:.2f}%')

MLP test accuracy: 51.70%


In [None]:
from torch_geometric.utils import to_dense_adj

adjacency = to_dense_adj(data.edge_index)[0] # edge_index -> 인접행렬
adjacency += torch.eye(len(adjacency)) # self loop(항등 행렬) + 인접 행렬
adjacency

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]])

# Classifying nodes with vanilla graph neural networks

일반적인 인공신경망은 선형 변환식을 $h_A = x_A W^t$ 표현할 수 있다. $X_A$는 입력 벡터인 노드 A를 의미하고 $W$는 가중치 행렬을 의미한다.

파이토치에서는 선형 변환식을 torch.mm() 이나 nn.Linear를 통해 구현 가능하다.

그래프 데이터 구조에서는 입력 벡터는 노드의 feature 정보를 담고 있다. 하지만, 해당 데이터 구조의 경우 노드 사이의 연결 관계를 표현하지 않기 때문에 노드 간의 문맥(context)을 이해하기 위해서는 이웃 노드를 확인할 수 있는 데이터 구조가 필요하다.

$$h_A = \sum_{i \in N_A} {x_i W^t}$$

예를 들어, 가중치 행렬 $W_1$이 중앙 노드를 나타내고 다른 가중치 행렬 $W_2$가 이웃 노드의 행렬이라고 한다면 이웃 노드 마다의 가중치 행렬을 도출하는 것은 불가능하다. 왜냐하면 노드 간의 연결 관계에 따라 수가 달라질 수 있기 때문이다.

따라서, 상기 식을 활용하여 각 노드에 대한 계산을 수행하지 않고 $H = X W^t$을 대신 활용할 수 있다.

이번 구현을 통해 활용하는 데이터셋의 경우 인접행렬 $A$가 모든 노드의 연결 관계를 포함하고 있다. 입력 데이터인 노드 feature 데이터와 인접행렬의 곱을 통해 직접적으로 각 행렬에 담긴 정보를 합산할 수 있다.

self loop(자기 자신의 연결관게를 포함)를 인접행렬 형태로 변환(항등행렬)하여 인접행렬 $A$와 더해주어 $\tilde{A}$를 활용한다.

결과적으로, 그래프 구조에서의 선형 변환식은 하기와 같이 표현 가능하다.

$H = \tilde{A} X W^t$



In [None]:
# Vanilla GNN Layer 정의
class VanillaGNNLayer(torch.nn.Module):
  def __init__(self, dim_in, dim_out):
    super().__init__()
    self.linear = Linear(dim_in, dim_out, bias=False)
  def forward(self, x, adjacency):
    x = self.linear(x)
    x = torch.sparse.mm(adjacency, x) #
    return x

In [None]:
class VanillaGNN(torch.nn.Module):
  def __init__(self, dim_in, dim_h, dim_out):
    super().__init__()
    self.linear1 = VanillaGNNLayer(dim_in, dim_h)
    self.linear2 = VanillaGNNLayer(dim_h, dim_out)

  def forward(self, x, edge_index):
    h = self.linear1(x, edge_index)
    h = F.relu(h)
    h = self.linear2(h, edge_index)

    return F.log_softmax(h, dim=1)

  def fit(self, data, epochs):
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(self.parameters(), lr=0.01, weight_decay=5e-4)

    self.train()
    for epoch in range(epochs):
      optimizer.zero_grad()
      out = self(data.x, adjacency)
      loss = criterion(out[data.train_mask], data.y[data.train_mask])
      loss.backward()
      optimizer.step()
      if epoch % 20 == 0:
        val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
        print(f'Epoch {epoch:>3} | Train Loss: {loss:.3f} | Val Loss: {val_loss:.2f}')
  def test(self, data):
    self.eval()
    out = self(data.x, adjacency)
    acc = accuracy(out.argmax(dim=1)[data.test_mask], data.y[data.test_mask])

    return acc

In [None]:
gnn = VanillaGNN(dataset.num_features, 16, dataset.num_classes)
print(gnn)
gnn.fit(data, epochs=100)
acc = gnn.test(data)
print(f'\nGNN test accuracy: {acc*100:.2f}%')

VanillaGNN(
  (linear1): VanillaGNNLayer(
    (linear): Linear(in_features=1433, out_features=16, bias=False)
  )
  (linear2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=7, bias=False)
  )
)
Epoch   0 | Train Loss: 2.231 | Val Loss: 2.21
Epoch  20 | Train Loss: 0.070 | Val Loss: 1.68
Epoch  40 | Train Loss: 0.007 | Val Loss: 2.13
Epoch  60 | Train Loss: 0.002 | Val Loss: 2.36
Epoch  80 | Train Loss: 0.002 | Val Loss: 2.42

GNN test accuracy: 73.80%


## Face book Page 데이터 분류 문제 구현

In [None]:
# Face book 페이지 데이터 로드
from torch_geometric.datasets import FacebookPagePage

dataset = FacebookPagePage(root=".")
data = dataset[0]

print(f'Dataset: {dataset}')
print('-----------------------')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of nodes: {data.x.shape[0]}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

print(f'\nGraph:')
print('------')
print(f'Edges are directed: {data.is_directed()}')
print(f'Graph has isolated nodes: {data.has_isolated_nodes()}')
print(f'Graph has loops: {data.has_self_loops()}')

Dataset: FacebookPagePage()
-----------------------
Number of graphs: 1
Number of nodes: 22470
Number of features: 128
Number of classes: 4

Graph:
------
Edges are directed: False
Graph has isolated nodes: False
Graph has loops: True


In [None]:
# Face book page 데이터는 train, validation, test mask를 지원하지 않음
data.train_mask = range(18000)
data.val_mask = range(18001, 20000)
data.test_mask = range(20001, 22470)

In [None]:
mlp = MLP(dataset.num_features, 16, dataset.num_classes)
print(mlp)

mlp.fit(data, epochs=100)

acc = mlp.test(data)
print(f'MLP test accuracy: {acc*100:.2f}%')

MLP(
  (linear1): Linear(in_features=128, out_features=16, bias=True)
  (linear2): Linear(in_features=16, out_features=4, bias=True)
)
Epoch   0 | Train Loss: 1.380 | Train Acc: 31.90% | Val Loss: 1.38 | Val Acc: 30.82%
Epoch  20 | Train Loss: 0.660 | Train Acc: 73.83% | Val Loss: 0.67 | Val Acc: 73.09%
Epoch  40 | Train Loss: 0.576 | Train Acc: 76.81% | Val Loss: 0.62 | Val Acc: 75.14%
Epoch  60 | Train Loss: 0.546 | Train Acc: 78.36% | Val Loss: 0.60 | Val Acc: 75.59%
Epoch  80 | Train Loss: 0.530 | Train Acc: 78.97% | Val Loss: 0.60 | Val Acc: 75.64%
Epoch 100 | Train Loss: 0.517 | Train Acc: 79.55% | Val Loss: 0.60 | Val Acc: 75.89%
MLP test accuracy: 75.05%


In [None]:
adjacency = to_dense_adj(data.edge_index)[0]

adjacency += torch.eye(len(adjacency))

adjacency

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]])

In [None]:
gnn = VanillaGNN(dataset.num_features, 16, dataset.num_classes)
print(gnn)

gnn.fit(data, epochs=100)
acc = gnn.test(data)
print(f'\nGNN test accuracy: {acc*100:.2f}%')

VanillaGNN(
  (linear1): VanillaGNNLayer(
    (linear): Linear(in_features=128, out_features=16, bias=False)
  )
  (linear2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=4, bias=False)
  )
)
Epoch   0 | Train Loss: 88.253 | Val Loss: 79.58
Epoch  20 | Train Loss: 5.543 | Val Loss: 4.03
Epoch  40 | Train Loss: 1.951 | Val Loss: 1.55
Epoch  60 | Train Loss: 1.504 | Val Loss: 1.07
Epoch  80 | Train Loss: 0.856 | Val Loss: 0.74

GNN test accuracy: 82.26%


# Summary



Cora와 Facebook page 데이텉 셋 모두 일반 MLP보다 Vanilla GNN을 활용했을 때의 분류 정확도가 더 높은 것을 확인할 수 있었다. 이는 노드 feature 정보를 그래프 신경망 구조를 통해서 더 잘 Projection 한다는 것을 확인힐 수 있는 사례다.

정형 데이터를 입력으로 주는 것이 아니라 이웃노드 관계를 고려할 수 있는 GNN의 우수성을 확인할 수 있었다. 하지만. 향후 현실 데이터를 활용했을 때에도 동일한 결과가 나올지는 확신할 수 없지만 MLP보다 GNN의 학습 속도가 더 느리다는 것을 실험을 통해 확인할 수 있어 성능 측면에서는 우수할 수 있지만, Computational Cost 측면에서는 많은 부담이 있음을 확인했다.

따라서, GNN에 특화된 데이터 구조를 Raw 데이터로부터 전처리 과정에서 많은 노력이 필요하다는 것을 느낄 수 있었고, 현실 데이터를 활용한 그래프 구조로의 전처리 경험을 쌓아야겠다는 생각이 들었다.