-
Notifications
You must be signed in to change notification settings - Fork 662
Description
Tutorial: 7
Describe the bug
I think there's a math error for this claim "Finally, to take the average instead of summing, we calculate the matrix 𝐷̂ which is a diagonal matrix with 𝐷𝑖𝑖 denoting the number of neighbors node 𝑖 has."
I read this and couldn't get this to be an average. When I did the math out I got this result
which got me to try to implement the math directly instead of just averaging the rows to see
class GCNLayer(nn.Module):
def __init__(self, c_in, c_out):
super().__init__()
self.projection = nn.Linear(c_in, c_out)
def forward(self, node_feats, adj_matrix):
"""
Inputs:
node_feats - Tensor with node features of shape [batch_size, num_nodes, c_in]
adj_matrix - Batch of adjacency matrices of the graph. If there is an edge from i to j, adj_matrix[b,i,j]=1 else 0.
Supports directed edges by non-symmetric matrices. Assumes to already have added the identity connections.
Shape: [batch_size, num_nodes, num_nodes]
"""
# Num neighbours = number of incoming edges
num_neighbours = adj_matrix.sum(dim=-1, keepdims=True)
# Create the diagonal matrix D^{-1/2}
# Avoid division by zero
num_neighbours = torch.clamp(num_neighbours, min=1)
d_inv_sqrt = torch.pow(num_neighbours, -0.5)
# Create a diagonal matrix from d_inv_sqrt
identity = torch.eye(num_neighbours.shape[1]).to(node_feats.device)
d_inv_sqrt_matrix = identity * d_inv_sqrt
# Perform D^{-1/2} adj_matrix D^{-1/2}
adj_matrix = torch.matmul(d_inv_sqrt_matrix, adj_matrix)
adj_matrix = torch.matmul(adj_matrix, d_inv_sqrt_matrix)
node_feats = self.projection(node_feats)
node_feats = torch.bmm(adj_matrix, node_feats)
return node_featswhich yielded results that still line up with the claims presented afterwards, mainly that the graph nodes would output the same values because the set of nodes is the same.
Adjacency matrix
tensor([[[1., 1., 0., 0.],
[1., 1., 1., 1.],
[0., 1., 1., 1.],
[0., 1., 1., 1.]]])
Input features
tensor([[[0., 1.],
[2., 3.],
[4., 5.],
[6., 7.]]])
Output features
tensor([[[0.7071, 1.5607],
[3.3868, 4.5677],
[3.9107, 4.8660], <- this node output is the same
[3.9107, 4.8660]]]) <- as this node, which was the point of the example.
So i guess the main thing is that the 'averaging' claim is not fully on the nose, it's more that it's getting scaled by some pairwise factor.
at least that's my intuition of the situation
Runtime environment (please complete the following information):
Colab - GPU